pax_global_header 0000666 0000000 0000000 00000000064 12542613570 0014517 g ustar 00root root 0000000 0000000 52 comment=6805fe69b0cacf68db651a886dae5f7e24cb6c3d
perf-tools-unstable-0.0.1~20150130+git85414b0/ 0000775 0000000 0000000 00000000000 12542613570 0020043 5 ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/LICENSE 0000664 0000000 0000000 00000043151 12542613570 0021054 0 ustar 00root root 0000000 0000000 GNU GENERAL PUBLIC LICENSE
Version 2, June 1991
Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users. This
General Public License applies to most of the Free Software
Foundation's software and to any other program whose authors commit to
using it. (Some other Free Software Foundation software is covered by
the GNU Lesser General Public License instead.) You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have. You must make sure that they, too, receive or can get the
source code. And you must show them these terms so they know their
rights.
We protect your rights with two steps: (1) copyright the software, and
(2) offer you this license which gives you legal permission to copy,
distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software. If the software is modified by someone else and passed on, we
want its recipients to know that what they have is not the original, so
that any problems introduced by others will not reflect on the original
authors' reputations.
Finally, any free program is threatened constantly by software
patents. We wish to avoid the danger that redistributors of a free
program will individually obtain patent licenses, in effect making the
program proprietary. To prevent this, we have made it clear that any
patent must be licensed for everyone's free use or not licensed at all.
The precise terms and conditions for copying, distribution and
modification follow.
GNU GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License applies to any program or other work which contains
a notice placed by the copyright holder saying it may be distributed
under the terms of this General Public License. The "Program", below,
refers to any such program or work, and a "work based on the Program"
means either the Program or any derivative work under copyright law:
that is to say, a work containing the Program or a portion of it,
either verbatim or with modifications and/or translated into another
language. (Hereinafter, translation is included without limitation in
the term "modification".) Each licensee is addressed as "you".
Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of
running the Program is not restricted, and the output from the Program
is covered only if its contents constitute a work based on the
Program (independent of having been made by running the Program).
Whether that is true depends on what the Program does.
1. You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the
notices that refer to this License and to the absence of any warranty;
and give any other recipients of the Program a copy of this License
along with the Program.
You may charge a fee for the physical act of transferring a copy, and
you may at your option offer warranty protection in exchange for a fee.
2. You may modify your copy or copies of the Program or any portion
of it, thus forming a work based on the Program, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:
a) You must cause the modified files to carry prominent notices
stating that you changed the files and the date of any change.
b) You must cause any work that you distribute or publish, that in
whole or in part contains or is derived from the Program or any
part thereof, to be licensed as a whole at no charge to all third
parties under the terms of this License.
c) If the modified program normally reads commands interactively
when run, you must cause it, when started running for such
interactive use in the most ordinary way, to print or display an
announcement including an appropriate copyright notice and a
notice that there is no warranty (or else, saying that you provide
a warranty) and that users may redistribute the program under
these conditions, and telling the user how to view a copy of this
License. (Exception: if the Program itself is interactive but
does not normally print such an announcement, your work based on
the Program is not required to print an announcement.)
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.
In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
3. You may copy and distribute the Program (or a work based on it,
under Section 2) in object code or executable form under the terms of
Sections 1 and 2 above provided that you also do one of the following:
a) Accompany it with the complete corresponding machine-readable
source code, which must be distributed under the terms of Sections
1 and 2 above on a medium customarily used for software interchange; or,
b) Accompany it with a written offer, valid for at least three
years, to give any third party, for a charge no more than your
cost of physically performing source distribution, a complete
machine-readable copy of the corresponding source code, to be
distributed under the terms of Sections 1 and 2 above on a medium
customarily used for software interchange; or,
c) Accompany it with the information you received as to the offer
to distribute corresponding source code. (This alternative is
allowed only for noncommercial distribution and only if you
received the program in object code or executable form with such
an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for
making modifications to it. For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable. However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
itself accompanies the executable.
If distribution of executable or object code is made by offering
access to copy from a designated place, then offering equivalent
access to copy the source code from the same place counts as
distribution of the source code, even though third parties are not
compelled to copy the source along with the object code.
4. You may not copy, modify, sublicense, or distribute the Program
except as expressly provided under this License. Any attempt
otherwise to copy, modify, sublicense or distribute the Program is
void, and will automatically terminate your rights under this License.
However, parties who have received copies, or rights, from you under
this License will not have their licenses terminated so long as such
parties remain in full compliance.
5. You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or
distribute the Program or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by
modifying or distributing the Program (or any work based on the
Program), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Program or works based on it.
6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the
original licensor to copy, distribute or modify the Program subject to
these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties to
this License.
7. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Program at all. For example, if a patent
license would not permit royalty-free redistribution of the Program by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.
If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
8. If the distribution and/or use of the Program is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Program under this License
may add an explicit geographical distribution limitation excluding
those countries, so that distribution is permitted only in or among
countries not thus excluded. In such case, this License incorporates
the limitation as if written in the body of this License.
9. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the Program
specifies a version number of this License which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version published by the Free
Software Foundation. If the Program does not specify a version number of
this License, you may choose any version ever published by the Free Software
Foundation.
10. If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the author
to ask for permission. For software which is copyrighted by the Free
Software Foundation, write to the Free Software Foundation; we sometimes
make exceptions for this. Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.
NO WARRANTY
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
{description}
Copyright (C) {year} {fullname}
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
Also add information on how to contact you by electronic and paper mail.
If the program is interactive, make it output a short notice like this
when it starts in an interactive mode:
Gnomovision version 69, Copyright (C) year name of author
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, the commands you use may
be called something other than `show w' and `show c'; they could even be
mouse-clicks or menu items--whatever suits your program.
You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the program, if
necessary. Here is a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
`Gnomovision' (which makes passes at compilers) written by James Hacker.
{signature of Ty Coon}, 1 April 1989
Ty Coon, President of Vice
This General Public License does not permit incorporating your program into
proprietary programs. If your program is a subroutine library, you may
consider it more useful to permit linking proprietary applications with the
library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License. perf-tools-unstable-0.0.1~20150130+git85414b0/README.md 0000664 0000000 0000000 00000026127 12542613570 0021332 0 ustar 00root root 0000000 0000000 perf-tools
==========
A miscellaneous collection of in-development and unsupported performance analysis tools for Linux ftrace and perf_events (aka the "perf" command). Both ftrace and perf are core Linux tracing tools, included in the kernel source. Your system probably has ftrace already, and perf is often just a package add (see Prerequisites).
These tools are designed to be easy to install (fewest dependencies), provide advanced performance observability, and be simple to use: do one thing and do it well. This collection was created by Brendan Gregg (author of the DTraceToolkit).
Many of these tools employ workarounds so that functionality is possible on existing Linux kernels. Because of this, many tools have caveats (see man pages), and their implementation should be considered a placeholder until future kernel features, or new tracing subsystems, are added.
These are intended for Linux 3.2 and newer kernels. For Linux 2.6.x, see Warnings.
## Presentation
These tools were introduced in the USENIX LISA 2014 presentation: Linux Performance Analysis: New Tools and Old Secrets
- slides: http://www.slideshare.net/brendangregg/linux-performance-analysis-new-tools-and-old-secrets
- video: https://www.usenix.org/conference/lisa14/conference-program/presentation/gregg
## Contents
Using ftrace:
- [iosnoop](iosnoop): trace disk I/O with details including latency. [Examples](examples/iosnoop_example.txt).
- [iolatency](iolatency): summarize disk I/O latency as a histogram. [Examples](examples/iolatency_example.txt).
- [execsnoop](execsnoop): trace process exec() with command line argument details. [Examples](examples/execsnoop_example.txt).
- [opensnoop](opensnoop): trace open() syscalls showing filenames. [Examples](examples/opensnoop_example.txt).
- [killsnoop](killsnoop): trace kill() signals showing process and signal details. [Examples](examples/killsnoop_example.txt).
- fs/[cachestat](fs/cachestat): basic cache hit/miss statistics for the Linux page cache. [Examples](examples/cachestat_example.txt).
- net/[tcpretrans](net/tcpretrans): show TCP retransmits, with address and other details. [Examples](examples/tcpretrans_example.txt).
- system/[tpoint](system/tpoint): trace a given tracepoint. [Examples](examples/tpoint_example.txt).
- kernel/[funccount](kernel/funccount): count kernel function calls, matching a string with wildcards. [Examples](examples/funccount_example.txt).
- kernel/[functrace](kernel/functrace): trace kernel function calls, matching a string with wildcards. [Examples](examples/functrace_example.txt).
- kernel/[funcslower](kernel/funcslower): trace kernel functions slower than a threshold. [Examples](examples/funcslower_example.txt).
- kernel/[funcgraph](kernel/funcgraph): trace a graph of kernel function calls, showing children and times. [Examples](examples/funcgraph_example.txt).
- kernel/[kprobe](kernel/kprobe): dynamically trace a kernel function call or its return, with variables. [Examples](examples/kprobe_example.txt).
- tools/[reset-ftrace](tools/reset-ftrace): reset ftrace state if needed. [Examples](examples/reset-ftrace_example.txt).
Using perf_events:
- misc/[perf-stat-hist](misc/perf-stat-hist): power-of aggregations for tracepoint variables. [Examples](examples/perf-stat-hist_example.txt).
- [syscount](syscount): count syscalls by syscall or process. [Examples](examples/syscount_example.txt).
- disk/[bitesize](disk/bitesize): histogram summary of disk I/O size. [Examples](examples/bitesize_example.txt).
## Screenshots
Showing new processes and arguments:
# ./execsnoop
Tracing exec()s. Ctrl-C to end.
PID PPID ARGS
22898 22004 man ls
22905 22898 preconv -e UTF-8
22908 22898 pager -s
22907 22898 nroff -mandoc -rLL=164n -rLT=164n -Tutf8
22906 22898 tbl
22911 22910 locale charmap
22912 22907 groff -mtty-char -Tutf8 -mandoc -rLL=164n -rLT=164n
22913 22912 troff -mtty-char -mandoc -rLL=164n -rLT=164n -Tutf8
22914 22912 grotty
Measuring block device I/O latency from queue insert to completion:
# ./iolatency -Q
Tracing block I/O. Output every 1 seconds. Ctrl-C to end.
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 1913 |######################################|
1 -> 2 : 438 |######### |
2 -> 4 : 100 |## |
4 -> 8 : 145 |### |
8 -> 16 : 43 |# |
16 -> 32 : 43 |# |
32 -> 64 : 1 |# |
[...]
Tracing the block:block_rq_insert tracepoint, with kernel stack traces, and only for reads:
# ./tpoint -s block:block_rq_insert 'rwbs ~ "*R*"'
cksum-11908 [000] d... 7269839.919098: block_rq_insert: 202,1 R 0 () 736560 + 136 [cksum]
cksum-11908 [000] d... 7269839.919107:
=> __elv_add_request
=> blk_flush_plug_list
=> blk_finish_plug
=> __do_page_cache_readahead
=> ondemand_readahead
=> page_cache_async_readahead
=> generic_file_read_iter
=> new_sync_read
=> vfs_read
=> SyS_read
=> system_call_fastpath
[...]
Count kernel function calls beginning with "bio_", summarize every second:
# ./funccount -i 1 'bio_*'
Tracing "bio_*"... Ctrl-C to end.
FUNC COUNT
bio_attempt_back_merge 26
bio_get_nr_vecs 361
bio_alloc 536
bio_alloc_bioset 536
bio_endio 536
bio_free 536
bio_fs_destructor 536
bio_init 536
bio_integrity_enabled 536
bio_put 729
bio_add_page 1004
[...]
There are many more examples in the [examples](examples) directory. Also see the [man pages](man/man8).
## Prerequisites
The intent is as few as possible. Eg, a Linux 3.2 server without debuginfo. See the tool man page for specifics.
### ftrace
FTRACE configured in the kernel. You may already have this configured and available in your kernel version, as FTRACE was first added in 2.6.27. This requires CONFIG_FTRACE and other FTRACE options depending on the tool. Some tools (eg, funccount) require CONFIG_FUNCTION_PROFILER.
### perf_events
Requires the "perf" command to be installed. This is in the linux-tools-common package. After installing that, perf may tell you to install an additional linux-tools package (linux-tools-_kernel_version_). perf can also be built under tools/perf in the kernel source. See [perf_events Prerequisites](http://www.brendangregg.com/perf.html#Prerequisites) for more details about getting perf_events to work fully.
### debugfs
Requires a kernel with CONFIG_DEBUG_FS option enabled. As with FTRACE, this may already be enabled (debugfs was added in 2.6.10-rc3). The debugfs also needs to be mounted:
```
# mount -t debugfs none /sys/kernel/debug
```
### awk
Many of there scripts use awk, and will try to use either mawk or gawk depending on the desired behavior: mawk for buffered output (because of its speed), and gawk for synchronous output (as fflush() works, allowing more efficient grouping of writes).
## Install
These are just scripts. Either grab everything:
```
git clone --depth 1 https://github.com/brendangregg/perf-tools
```
Or use the raw links on github to download individual scripts. Eg:
```
wget https://raw.githubusercontent.com/brendangregg/perf-tools/master/iosnoop
```
This preserves tabs (which copy-n-paste can mess up).
## Warnings
Ftrace was first added to Linux 2.6.27, and perf_events to Linux 2.6.31. These early versions had kernel bugs, and lockups and panics have been reported on 2.6.32 series kernels. This includes CentOS 6.x. If you must analyze older kernels, these tools may only be useful in a fault-tolerant environment, such as a lab with simulated issues. These tools have been primarily developed on Linux 3.2 and later kernels.
Depending on the tool, there may also be overhead incurred. See the next section.
## Internals and Overhead
perf_events is evolving. This collection began development circa Linux 3.16, with Linux 3.2 servers as the main target, at a time when perf_events lacks certain programmatic capabilities (eg, custom in-kernel aggregations). It's possible these will be added in a forthcoming kernel release. Until then, many of these tools employ workarounds, tricks, and hacks in order to work. Some of these tools pass event data to user space for post-processing, which costs much higher overhead than in-kernel aggregations. The overhead of each tool is described in its man page.
__WARNING__: In _extreme_ cases, your target application may run 5x slower when using these tools. Depending on the tool and kernel version, there may also be the risk of kernel panics. Read the program header for warnings, and test before use.
If the overhead is a problem, these tools can be improved. If a tool doesn't already, it could be rewritten in C to use perf_events_open() and mmap() for the trace buffer. It could also implement frequency counts in C, and operate on mmap() directly, rather than using awk/Perl/Python. Additional improvements are possible for ftrace-based tools, such as use of snapshots and per-instance buffers.
Some of these tools are intended as short-term workarounds until more kernel capabilities exist, at which point they can be substantially rewritten. Older versions of these tools will be kept in this repository, for older kernel versions.
As my main target is a fleet of Linux 3.2 servers that do not have debuginfo, these tools try not to require it. At times, this makes the tool more brittle than it needs to be, as I'm employing workarounds (that may be kernel version and platform specific) instead of using debuginfo information (which can be generic). See the man page for detailed prerequisites for each tool.
I've tried to use perf_events ("perf") where possible, since that interface has been developed for multi-user use. For various reasons I've often needed to use ftrace instead. ftrace is suprisingly powerful (thanks Steven Rostedt!), and not all of its features are exposed via perf, or in common usage. This tool collection is in some ways a demonstration of hidden Linux features using ftrace.
Since things are changing, it's very possible you may find some tools don't work on your Linux kernel version. Some expertise and assembly will be required to fix them.
## Links
A case study and summary:
- 13 Aug 2014: http://lwn.net/Articles/608497 Ftrace: The hidden light switch
Related articles:
- 06 Sep 2014: http://www.brendangregg.com/blog/2014-09-06/linux-ftrace-tcp-retransmit-tracing.html
- 28 Jul 2014: http://www.brendangregg.com/blog/2014-07-28/execsnoop-for-linux.html
- 25 Jul 2014: http://www.brendangregg.com/blog/2014-07-25/opensnoop-for-linux.html
- 23 Jul 2014: http://www.brendangregg.com/blog/2014-07-23/linux-iosnoop-latency-heat-maps.html
- 16 Jul 2014: http://www.brendangregg.com/blog/2014-07-16/iosnoop-for-linux.html
- 10 Jul 2014: http://www.brendangregg.com/blog/2014-07-10/perf-hacktogram.html
perf-tools-unstable-0.0.1~20150130+git85414b0/bin/ 0000775 0000000 0000000 00000000000 12542613570 0020613 5 ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/bin/bitesize 0000777 0000000 0000000 00000000000 12542613570 0025255 2../disk/bitesize ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/bin/cachestat 0000777 0000000 0000000 00000000000 12542613570 0025175 2../fs/cachestat ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/bin/execsnoop 0000777 0000000 0000000 00000000000 12542613570 0024675 2../execsnoop ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/bin/funccount 0000777 0000000 0000000 00000000000 12542613570 0026157 2../kernel/funccount ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/bin/funcgraph 0000777 0000000 0000000 00000000000 12542613570 0026101 2../kernel/funcgraph ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/bin/funcslower 0000777 0000000 0000000 00000000000 12542613570 0026525 2../kernel/funcslower ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/bin/functrace 0000777 0000000 0000000 00000000000 12542613570 0026073 2../kernel/functrace ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/bin/iolatency 0000777 0000000 0000000 00000000000 12542613570 0024645 2../iolatency ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/bin/iosnoop 0000777 0000000 0000000 00000000000 12542613570 0024043 2../iosnoop ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/bin/killsnoop 0000777 0000000 0000000 00000000000 12542613570 0024713 2../killsnoop ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/bin/kprobe 0000777 0000000 0000000 00000000000 12542613570 0024713 2../kernel/kprobe ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/bin/opensnoop 0000777 0000000 0000000 00000000000 12542613570 0024727 2../opensnoop ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/bin/perf-stat-hist 0000777 0000000 0000000 00000000000 12542613570 0027346 2../misc/perf-stat-hist ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/bin/reset-ftrace 0000777 0000000 0000000 00000000000 12542613570 0026777 2../tools/reset-ftrace ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/bin/syscount 0000777 0000000 0000000 00000000000 12542613570 0024445 2../syscount ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/bin/tcpretrans 0000777 0000000 0000000 00000000000 12542613570 0026027 2../net/tcpretrans ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/bin/tpoint 0000777 0000000 0000000 00000000000 12542613570 0025045 2../system/tpoint ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/deprecated/ 0000775 0000000 0000000 00000000000 12542613570 0022143 5 ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/deprecated/README.md 0000664 0000000 0000000 00000000036 12542613570 0023421 0 ustar 00root root 0000000 0000000 Deprecated versions of tools.
perf-tools-unstable-0.0.1~20150130+git85414b0/deprecated/execsnoop-proc 0000664 0000000 0000000 00000011021 12542613570 0025025 0 ustar 00root root 0000000 0000000 #!/usr/bin/perl
#
# execsnoop - trace process exec() with arguments. /proc version.
# Written using Linux ftrace.
#
# This shows the execution of new processes, especially short-lived ones that
# can be missed by sampling tools such as top(1).
#
# USAGE: ./execsnoop [-h] [-n name]
#
# REQUIREMENTS: FTRACE CONFIG, sched:sched_process_exec tracepoint (you may
# already have these on recent kernels), and Perl.
#
# This traces exec() from the fork()->exec() sequence, which means it won't
# catch new processes that only fork(), and, it will catch processes that
# re-exec. This instruments sched:sched_process_exec without buffering, and then
# in user-space (this program) reads PPID and process arguments asynchronously
# from /proc.
#
# If the process traced is very short-lived, this program may miss reading
# arguments and PPID details. In that case, ">" and "?" will be printed
# respectively. This program is best-effort, and should be improved in the
# future when other kernel capabilities are made available. If you need a
# more reliable tool now, then consider other tracing alternatives (eg,
# SystemTap). This tool is really a proof of concept to see what ftrace can
# currently do.
#
# From perf-tools: https://github.com/brendangregg/perf-tools
#
# See the execsnoop(8) man page (in perf-tools) for more info.
#
# COPYRIGHT: Copyright (c) 2014 Brendan Gregg.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
# (http://www.gnu.org/copyleft/gpl.html)
#
# 07-Jul-2014 Brendan Gregg Created this.
use strict;
use warnings;
use POSIX qw(strftime);
use Getopt::Long;
my $tracing = "/sys/kernel/debug/tracing";
my $flock = "/var/tmp/.ftrace-lock";
my $tpdir = "sched/sched_process_exec";
my $tptext = $tpdir; $tptext =~ s/\//:/;
local $SIG{INT} = \&cleanup;
local $SIG{QUIT} = \&cleanup;
local $SIG{TERM} = \&cleanup;
local $SIG{PIPE} = \&cleanup;
local $SIG{HUP} = \&cleanup;
$| = 1;
### options
my ($name, $help);
GetOptions("name=s" => \$name,
"help" => \$help)
or usage();
usage() if $help;
sub usage {
print STDERR "USAGE: execsnoop [-h] [-n name]\n";
print STDERR " eg,\n";
print STDERR " execsnoop -n ls # show \"ls\" cmds only.\n";
exit;
}
sub ldie {
unlink $flock;
die @_;
}
sub writeto {
my ($string, $file) = @_;
open FILE, ">$file" or return 0;
print FILE $string or return 0;
close FILE or return 0;
}
### check permissions
chdir "$tracing" or ldie "ERROR: accessing tracing. Root? Kernel has FTRACE?" .
"\ndebugfs mounted? (mount -t debugfs debugfs /sys/kernel/debug)";
### ftrace lock
if (-e $flock) {
open FLOCK, $flock; my $fpid = ; chomp $fpid; close FLOCK;
die "ERROR: ftrace may be in use by PID $fpid ($flock)";
}
writeto "$$", $flock or die "ERROR: unable to write $flock.";
### setup and begin tracing
writeto "nop", "current_tracer" or ldie "ERROR: disabling current_tracer.";
writeto "1", "events/$tpdir/enable" or ldie "ERROR: enabling tracepoint " .
"\"$tptext\" (tracepoint missing in this kernel version?)";
open TPIPE, "trace_pipe" or warn "ERROR: opening trace_pipe.";
printf "%-8s %6s %6s %s\n", "TIME", "PID", "PPID", "ARGS";
while () {
my ($taskpid, $rest) = split;
my ($task, $pid) = $taskpid =~ /(.*)-(\d+)/;
next if (defined $name and $name ne $task);
my $args = "$task >";
if (open CMDLINE, "/proc/$pid/cmdline") {
my $arglist = ;
if (defined $arglist) {
$arglist =~ s/\000/ /g;
$args = $arglist;
}
close CMDLINE;
}
my $ppid = "?";
if (open STAT, "/proc/$pid/stat") {
my $fields = ;
if (defined $fields) {
$ppid = (split ' ', $fields)[3];
}
close STAT;
}
my $now = strftime "%H:%M:%S", localtime;
printf "%-8s %6s %6s %s\n", $now, $pid, $ppid, $args;
}
### end tracing
cleanup();
sub cleanup {
print "\nEnding tracing...\n";
close TPIPE;
writeto "0", "events/$tpdir/enable" or
ldie "ERROR: disabling \"$tptext\"";
writeto "", "trace";
unlink $flock;
exit;
}
perf-tools-unstable-0.0.1~20150130+git85414b0/deprecated/execsnoop-proc.8 0000664 0000000 0000000 00000005072 12542613570 0025204 0 ustar 00root root 0000000 0000000 .TH execsnoop\-proc 8 "2014-07-07" "USER COMMANDS"
.SH NAME
execsnoop\-proc \- trace process exec() with arguments. Uses Linux ftrace. /proc version.
.SH SYNOPSIS
.B execsnoop\-proc
[\-h] [\-n name]
.SH DESCRIPTION
execsnoop\-proc traces process execution, showing PID, PPID, and argument details
if possible.
This traces exec() from the fork()->exec() sequence, which means it won't
catch new processes that only fork(), and, it will catch processes that
re-exec. This instruments sched:sched_process_exec without buffering, and then
in user-space (this program) reads PPID and process arguments asynchronously
from /proc.
If the process traced is very short-lived, this program may miss reading
arguments and PPID details. In that case, ">" and "?" will be printed
respectively.
This program is best-effort (a hack), and should be improved in the future when
other kernel capabilities are made available. It may be useful in the meantime.
If you need a more reliable tool now, consider other tracing alternates (eg,
SystemTap). This tool is really a proof of concept to see what ftrace can
currently do.
See execsnoop(8) for another version that reads arguments from registers
instead of /proc.
Since this uses ftrace, only the root user can use this tool.
.SH REQUIREMENTS
FTRACE CONFIG and the sched:sched_process_exec tracepoint, which you may already
have enabled and available on recent kernels, and Perl.
.SH OPTIONS
\-n name
Only show processes that match this name. This is filtered in user space.
.TP
\-h
Print usage message.
.SH EXAMPLES
.TP
Trace all new processes and arguments (if possible):
.B execsnoop\-proc
.TP
Trace all new processes with process name "sed":
.B execsnoop\-proc -n sed
.SH FIELDS
.TP
TIME
Time of process exec(): HH:MM:SS.
.TP
PID
Process ID.
.TP
PPID
Parent process ID, if this was able to be read (may be missed for short-lived
processes). If it is unable to be read, "?" is printed.
.TP
ARGS
Command line arguments, if these were able to be read in time (may be missed
for short-lived processes). If they are unable to be read, ">" is printed.
.SH OVERHEAD
This reads and processes exec() events in user space as they occur. Since the
rate of exec() is expected to be low (< 500/s), the overhead is expected to
be small or negligible.
.SH SOURCE
This is from the perf-tools collection.
.IP
https://github.com/brendangregg/perf-tools
.PP
Also look under the examples directory for a text file containing example
usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
execsnoop(8), top(1)
perf-tools-unstable-0.0.1~20150130+git85414b0/deprecated/execsnoop-proc_example.txt 0000664 0000000 0000000 00000003034 12542613570 0027363 0 ustar 00root root 0000000 0000000 Demonstrations of execsnoop-proc, the Linux ftrace version.
Here's execsnoop showing what's really executed by "man ls":
# ./execsnoop
TIME PID PPID ARGS
17:52:37 22406 25781 man ls
17:52:37 22413 22406 preconv -e UTF-8
17:52:37 22416 22406 pager -s
17:52:37 22415 22406 /bin/sh /usr/bin/nroff -mandoc -rLL=162n -rLT=162n -Tutf8
17:52:37 22414 22406 tbl
17:52:37 22419 22418 locale charmap
17:52:37 22420 22415 groff -mtty-char -Tutf8 -mandoc -rLL=162n -rLT=162n
17:52:37 22421 22420 troff -mtty-char -mandoc -rLL=162n -rLT=162n -Tutf8
17:52:37 22422 22420 grotty
These are short-lived processes, where the argument and PPID details are often
missed by execsnoop:
# ./execsnoop
TIME PID PPID ARGS
18:00:33 26750 1961 multilog >
18:00:33 26749 1972 multilog >
18:00:33 26749 1972 multilog >
18:00:33 26751 ? mkdir >
18:00:33 26749 1972 multilog >
18:00:33 26752 ? chown >
18:00:33 26750 1961 multilog >
18:00:33 26750 1961 multilog >
18:00:34 26753 1961 multilog >
18:00:34 26754 1972 multilog >
[...]
This will be fixed in a later version, but likely requires some kernel or
tracer changes first (fetching cmdline as the probe fires).
The previous examples were on Linux 3.14 and 3.16 kernels. Here's a 3.2 system
I'm running:
# ./execsnoop
ERROR: enabling tracepoint "sched:sched_process_exec" (tracepoint missing in this kernel version?) at ./execsnoop line 78.
This kernel version is missing the sched_process_exec probe, which is pretty
annoying.
perf-tools-unstable-0.0.1~20150130+git85414b0/disk/ 0000775 0000000 0000000 00000000000 12542613570 0020775 5 ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/disk/bitesize 0000775 0000000 0000000 00000011360 12542613570 0022542 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# bitesize - show disk I/O size as a histogram.
# Written using Linux perf_events (aka "perf").
#
# This can be used to characterize the distribution of block device I/O
# sizes. To study I/O in more detail, see iosnoop(8).
#
# USAGE: bitesize [-h] [-b buckets] [seconds]
# eg,
# ./bitesize 10
#
# Run "bitesize -h" for full usage.
#
# REQUIREMENTS: perf_events and block:block_rq_issue tracepoint, which you may
# already have on recent kernels.
#
# This uses multiple counting tracepoints with different filters, one for each
# histogram bucket. While this is summarized in-kernel, the use of multiple
# tracepoints does add addiitonal overhead, which is more evident if you add
# more buckets. In the future this functionality will be available in an
# efficient way in the kernel, and this tool can be rewritten.
#
# From perf-tools: https://github.com/brendangregg/perf-tools
#
# COPYRIGHT: Copyright (c) 2014 Brendan Gregg.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
# (http://www.gnu.org/copyleft/gpl.html)
#
# 22-Jul-2014 Brendan Gregg Created this.
duration=0
buckets=(1 8 64 128)
secsz=512
trap ':' INT QUIT TERM PIPE HUP
function usage {
cat <<-END >&2
USAGE: bitesize [-h] [-b buckets] [seconds]
-b buckets # specify histogram buckets (Kbytes)
-h # this usage message
eg,
bitesize # trace I/O size until Ctrl-C
bitesize 10 # trace I/O size for 10 seconds
bitesize -b "8 16 32" # specify custom bucket points
END
exit
}
function die {
echo >&2 "$@"
exit 1
}
### process options
while getopts b:h opt
do
case $opt in
b) buckets=($OPTARG) ;;
h|?) usage ;;
esac
done
shift $(( $OPTIND - 1 ))
tpoint=block:block_rq_issue
var=nr_sector
duration=$1
### convert buckets (Kbytes) to disk sectors
i=0
sectors=(${buckets[*]})
((max_i = ${#buckets[*]} - 1))
while (( i <= max_i )); do
(( sectors[$i] = ${sectors[$i]} * 1024 / $secsz ))
if (( i && ${sectors[$i]} <= ${sectors[$i - 1]} )); then
die "ERROR: bucket list must increase in size."
fi
(( i++ ))
done
### build list of tracepoints and filters for each histogram bucket
max_b=${buckets[$max_i]}
max_s=${sectors[$max_i]}
tpoints="-e $tpoint --filter \"$var < ${sectors[0]}\""
awkarray=
i=0
while (( i < max_i )); do
tpoints="$tpoints -e $tpoint --filter \"$var >= ${sectors[$i]} && "
tpoints="$tpoints $var < ${sectors[$i + 1]}\""
awkarray="$awkarray buckets[$i]=${buckets[$i]};"
(( i++ ))
done
awkarray="$awkarray buckets[$max_i]=${buckets[$max_i]};"
tpoints="$tpoints -e $tpoint --filter \"$var >= ${sectors[$max_i]}\""
### prepare to run
if (( duration )); then
etext="for $duration seconds"
cmd="sleep $duration"
else
etext="until Ctrl-C"
cmd="sleep 999999"
fi
echo "Tracing block I/O size (bytes), $etext..."
### run perf
out="-o /dev/stdout" # a workaround needed in linux 3.2; not by 3.4.15
stat=$(eval perf stat $tpoints -a $out $cmd 2>&1)
if (( $? != 0 )); then
echo >&2 "ERROR running perf:"
echo >&2 "$stat"
exit
fi
### find max value for ASCII histogram
most=$(echo "$stat" | awk -v tpoint=$tpoint '
$2 == tpoint { gsub(/,/, ""); if ($1 > m) { m = $1 } }
END { print m }'
)
### process output
echo
echo "$stat" | awk -v tpoint=$tpoint -v max_i=$max_i -v most=$most '
function star(sval, smax, swidth) {
stars = ""
if (smax == 0) return ""
for (si = 0; si < (swidth * sval / smax); si++) {
stars = stars "#"
}
return stars
}
BEGIN {
'"$awkarray"'
printf(" %-15s: %-8s %s\n", "Kbytes", "I/O",
"Distribution")
}
/Performance counter stats/ { i = -1 }
# reverse order of rule set is important
{ ok = 0 }
$2 == tpoint { num = $1; gsub(/,/, "", num); ok = 1 }
ok && i >= max_i {
printf(" %10.1f -> %-10s: %-8s |%-38s|\n",
buckets[i], "", num, star(num, most, 38))
next
}
ok && i >= 0 && i < max_i {
printf(" %10.1f -> %-10.1f: %-8s |%-38s|\n",
buckets[i], buckets[i+1] - 0.1, num,
star(num, most, 38))
i++
next
}
ok && i == -1 {
printf(" %10s -> %-10.1f: %-8s |%-38s|\n", "",
buckets[0] - 0.1, num, star(num, most, 38))
i++
}
'
perf-tools-unstable-0.0.1~20150130+git85414b0/examples/ 0000775 0000000 0000000 00000000000 12542613570 0021661 5 ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/examples/bitesize_example.txt 0000664 0000000 0000000 00000004677 12542613570 0025771 0 ustar 00root root 0000000 0000000 Demonstrations of bitesize, the Linux perf_events version.
bitesize traces block I/O issued, and reports a histogram of I/O size. By
default five buckets are used to gather statistics on common I/O sizes:
# ./bitesize
Tracing block I/O size (bytes), until Ctrl-C...
^C
Kbytes : I/O Distribution
-> 0.9 : 0 | |
1.0 -> 7.9 : 38 |# |
8.0 -> 63.9 : 10108 |######################################|
64.0 -> 127.9 : 13 |# |
128.0 -> : 1 |# |
In this case, most of the I/O was between 8 and 63.9 Kbytes. The "63.9"
really means "less than 64".
Specifying custom buckets to examine the I/O size in more detail:
# ./bitesize -b "8 16 24 32"
Tracing block I/O size (bytes), until Ctrl-C...
^C
Kbytes : I/O Distribution
-> 7.9 : 89 |# |
8.0 -> 15.9 : 14665 |######################################|
16.0 -> 23.9 : 657 |## |
24.0 -> 31.9 : 661 |## |
32.0 -> : 376 |# |
The I/O is mostly between 8 and 15.9 Kbytes
It's probably 8 Kbytes. Checking:
# ./bitesize -b "8 9"
Tracing block I/O size (bytes), until Ctrl-C...
^C
Kbytes : I/O Distribution
-> 7.9 : 62 |# |
8.0 -> 8.9 : 11719 |######################################|
9.0 -> : 1358 |##### |
It is.
The overhead of this tool is relative to the number of buckets used, hence only
using what is necessary.
To study this I/O in more detail, I can use iosnoop(8) and capture it to a file
for post-processing.
Use -h to print the USAGE message:
# ./bitesize -h
USAGE: bitesize [-h] [-b buckets] [seconds]
-b buckets # specify histogram buckets (Kbytes)
-h # this usage message
eg,
bitesize # trace I/O size until Ctrl-C
bitesize 10 # trace I/O size for 10 seconds
bitesize -b "8 16 32" # specify custom bucket points
perf-tools-unstable-0.0.1~20150130+git85414b0/examples/cachestat_example.txt 0000664 0000000 0000000 00000005042 12542613570 0026075 0 ustar 00root root 0000000 0000000 Demonstrations of cachestat, the Linux ftrace version.
Here is some sample output showing file system cache statistics, followed by
the workload that caused it:
# ./cachestat -t
Counting cache functions... Output every 1 seconds.
TIME HITS MISSES DIRTIES RATIO BUFFERS_MB CACHE_MB
08:28:57 415 0 0 100.0% 1 191
08:28:58 411 0 0 100.0% 1 191
08:28:59 362 97 0 78.9% 0 8
08:29:00 411 0 0 100.0% 0 9
08:29:01 775 20489 0 3.6% 0 89
08:29:02 411 0 0 100.0% 0 89
08:29:03 6069 0 0 100.0% 0 89
08:29:04 15249 0 0 100.0% 0 89
08:29:05 411 0 0 100.0% 0 89
08:29:06 411 0 0 100.0% 0 89
08:29:07 411 0 3 100.0% 0 89
[...]
I used the -t option to include the TIME column, to make describing the output
easier.
The workload was:
# echo 1 > /proc/sys/vm/drop_caches; sleep 2; cksum 80m; sleep 2; cksum 80m
At 8:28:58, the page cache was dropped by the first command, which can be seen
by the drop in size for "CACHE_MB" (page cache size) from 191 Mbytes to 8.
After a 2 second sleep, a cksum command was issued at 8:29:01, for an 80 Mbyte
file (called "80m"), which caused a total of ~20,400 misses ("MISSES" column),
and the page cache size to grow by 80 Mbytes. The hit ratio during this dropped
to 3.6%. Finally, after another 2 second sleep, at 8:29:03 the cksum command
was run a second time, this time hitting entirely from cache.
Instrumenting all file system cache accesses does cost some overhead, and this
tool might slow your target system by 2% or so. Test before use if this is a
concern.
This tool also uses dynamic tracing, and is tied to Linux kernel implementation
details. If it doesn't work for you, it probably needs fixing.
Use -h to print the USAGE message:
# ./cachestat -h
USAGE: cachestat [-Dht] [interval]
-D # print debug counters
-h # this usage message
-t # include timestamp
interval # output interval in secs (default 1)
eg,
cachestat # show stats every second
cachestat 5 # show stats every 5 seconds
See the man page and example file for more info.
perf-tools-unstable-0.0.1~20150130+git85414b0/examples/execsnoop_example.txt 0000664 0000000 0000000 00000017305 12542613570 0026146 0 ustar 00root root 0000000 0000000 Demonstrations of execsnoop, the Linux ftrace version.
Here's execsnoop showing what's really executed by "man ls":
# ./execsnoop
Tracing exec()s. Ctrl-C to end.
PID PPID ARGS
22898 22004 man ls
22905 22898 preconv -e UTF-8
22908 22898 pager -s
22907 22898 nroff -mandoc -rLL=164n -rLT=164n -Tutf8
22906 22898 tbl
22911 22910 locale charmap
22912 22907 groff -mtty-char -Tutf8 -mandoc -rLL=164n -rLT=164n
22913 22912 troff -mtty-char -mandoc -rLL=164n -rLT=164n -Tutf8
22914 22912 grotty
Many commands. This is particularly useful for understanding application
startup.
Another use for execsnoop is identifying short-lived processes. Eg, with the -t
option to see timestamps:
# ./execsnoop -t
Tracing exec()s. Ctrl-C to end.
TIMEs PID PPID ARGS
7419756.154031 8185 8181 mawk -W interactive -v o=1 -v opt_name=0 -v name= [...]
7419756.154131 8186 8184 cat -v trace_pipe
7419756.245264 8188 1698 ./run
7419756.245691 8189 1696 ./run
7419756.246212 8187 1689 ./run
7419756.278993 8190 1693 ./run
7419756.278996 8191 1692 ./run
7419756.288430 8192 1695 ./run
7419756.290115 8193 1691 ./run
7419756.292406 8194 1699 ./run
7419756.293986 8195 1690 ./run
7419756.294149 8196 1686 ./run
7419756.296527 8197 1687 ./run
7419756.296973 8198 1697 ./run
7419756.298356 8200 1685 ./run
7419756.298683 8199 1688 ./run
7419757.269883 8201 1696 ./run
[...]
So we're running many "run" commands every second. The PPID is included, so I
can debug this further (they are "supervise" processes).
Short-lived processes can consume CPU and not be visible from top(1), and can
be the source of hidden performance issues.
Here's another example: I noticed CPU usage was high in top(1), but couldn't
see the responsible process:
$ top
top - 00:04:32 up 78 days, 15:41, 3 users, load average: 0.85, 0.29, 0.14
Tasks: 123 total, 1 running, 121 sleeping, 0 stopped, 1 zombie
Cpu(s): 15.7%us, 34.9%sy, 0.0%ni, 49.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.2%st
Mem: 7629464k total, 7537216k used, 92248k free, 1376492k buffers
Swap: 0k total, 0k used, 0k free, 5432356k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7225 bgregg-t 20 0 29480 6196 2128 S 3 0.1 0:02.64 ec2rotatelogs
1 root 20 0 24320 2256 1340 S 0 0.0 0:01.23 init
2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0 0.0 1:19.61 ksoftirqd/0
4 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/0:0
5 root 20 0 0 0 0 S 0 0.0 0:00.01 kworker/u:0
6 root RT 0 0 0 0 S 0 0.0 0:16.00 migration/0
7 root RT 0 0 0 0 S 0 0.0 0:17.29 watchdog/0
8 root RT 0 0 0 0 S 0 0.0 0:15.85 migration/1
9 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/1:0
[...]
See the line starting with "Cpu(s):". So there's about 50% CPU utilized (this
is a two CPU server, so that's equivalent to one full CPU), but this CPU usage
isn't visible from the process listing.
vmstat agreed, showing the same average CPU usage statistics:
# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 0 0 92816 1376476 5432188 0 0 0 3 2 1 0 1 99 0
1 0 0 92676 1376484 5432264 0 0 0 24 6573 6130 12 38 49 0
1 0 0 91964 1376484 5432272 0 0 0 0 6529 6097 16 35 49 0
1 0 0 92692 1376484 5432272 0 0 0 0 6192 5775 17 35 49 0
1 0 0 92692 1376484 5432272 0 0 0 0 6554 6121 14 36 50 0
1 0 0 91940 1376484 5432272 0 0 0 12 6546 6101 13 38 49 0
1 0 0 92560 1376484 5432272 0 0 0 0 6201 5769 15 35 49 0
1 0 0 92676 1376484 5432272 0 0 0 0 6524 6123 17 34 49 0
1 0 0 91932 1376484 5432272 0 0 0 0 6546 6107 10 40 49 0
1 0 0 92832 1376484 5432272 0 0 0 0 6057 5710 13 38 49 0
1 0 0 92248 1376484 5432272 0 0 84 28 6592 6183 16 36 48 1
1 0 0 91504 1376492 5432348 0 0 0 12 6540 6098 18 33 49 1
[...]
So this could be caused by short-lived processes, who vanish before they are
seen by top(1). Do I have my execsnoop handy? Yes:
# ~/perf-tools/bin/execsnoop
Tracing exec()s. Ctrl-C to end.
PID PPID ARGS
10239 10229 gawk -v o=0 -v opt_name=0 -v name= -v opt_duration=0 [...]
10240 10238 cat -v trace_pipe
10242 7225 sh [?]
10243 10242 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.201201.3122.txt
10245 7225 sh [?]
10246 10245 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.202201.3122.txt
10248 7225 sh [?]
10249 10248 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.203201.3122.txt
10251 7225 sh [?]
10252 10251 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.204201.3122.txt
10254 7225 sh [?]
10255 10254 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.205201.3122.txt
10257 7225 sh [?]
10258 10257 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.210201.3122.txt
10260 7225 sh [?]
10261 10260 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.211201.3122.txt
10263 7225 sh [?]
10264 10263 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.212201.3122.txt
10266 7225 sh [?]
10267 10266 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.213201.3122.txt
[...]
The output scrolled quickly, showing that many shell and lsof processes were
being launched. If you check the PID and PPID columns carefully, you can see that
these are ultimately all from PID 7225. We saw that earlier in the top output:
ec2rotatelogs, at 3% CPU. I now know the culprit.
I should have used "-t" to show the timestamps with this example.
Run -h to print the USAGE message:
# ./execsnoop -h
USAGE: execsnoop [-hrt] [-a argc] [-d secs] [name]
-d seconds # trace duration, and use buffers
-a argc # max args to show (default 8)
-r # include re-execs
-t # include time (seconds)
-h # this usage message
name # process name to match (REs allowed)
eg,
execsnoop # watch exec()s live (unbuffered)
execsnoop -d 1 # trace 1 sec (buffered)
execsnoop grep # trace process names containing grep
execsnoop 'log$' # filenames ending in "log"
See the man page and example file for more info.
perf-tools-unstable-0.0.1~20150130+git85414b0/examples/funccount_example.txt 0000664 0000000 0000000 00000010544 12542613570 0026145 0 ustar 00root root 0000000 0000000 Demonstrations of funccount, the Linux ftrace version.
Tracing all kernel functions that start with "bio_" (which would be block
interface functions), and counting how many times they were executed until
Ctrl-C is hit:
# ./funccount 'bio_*'
Tracing "bio_*"... Ctrl-C to end.
^C
FUNC COUNT
bio_attempt_back_merge 26
bio_get_nr_vecs 361
bio_alloc 536
bio_alloc_bioset 536
bio_endio 536
bio_free 536
bio_fs_destructor 536
bio_init 536
bio_integrity_enabled 536
bio_put 729
bio_add_page 1004
Note that these counts are performed in-kernel context, using the ftrace
function profiler, which means this is a (relatively) low overhead technique.
Test yourself to quantify overhead.
As was demonstrated here, wildcards can be used. Individual functions can also
be specified. For example, all of the following are valid arguments:
bio_init
bio_*
*init
*bio*
A "*" within a string (eg, "bio*init") is not supported.
The full list of what can be traced is in:
/sys/kernel/debug/tracing/available_filter_functions, which can be grep'd to
check what is there. Note that grep uses regular expressions, whereas
funccount uses globbing for wildcards.
Counting all "tcp_" kernel functions, and printing a summary every one second:
# ./funccount -i 1 -t 5 'tcp_*'
Tracing "tcp_*". Top 5 only... Ctrl-C to end.
FUNC COUNT
tcp_cleanup_rbuf 386
tcp_service_net_dma 386
tcp_established_options 549
tcp_v4_md5_lookup 560
tcp_v4_md5_do_lookup 890
FUNC COUNT
tcp_service_net_dma 498
tcp_cleanup_rbuf 499
tcp_established_options 664
tcp_v4_md5_lookup 672
tcp_v4_md5_do_lookup 1071
[...]
Neat.
Tracing all "ext4*" kernel functions for 10 seconds, and printing the top 25:
# ./funccount -t 25 -d 10 'ext4*'
Tracing "ext4*" for 10 seconds. Top 25 only...
FUNC COUNT
ext4_inode_bitmap 840
ext4_meta_trans_blocks 840
ext4_ext_drop_refs 843
ext4_find_entry 845
ext4_discard_preallocations 1008
ext4_free_inodes_count 1120
ext4_group_desc_csum 1120
ext4_group_desc_csum_set 1120
ext4_getblk 1128
ext4_es_free_extent 1328
ext4_map_blocks 1471
ext4_es_lookup_extent 1751
ext4_mb_check_limits 1873
ext4_es_lru_add 2031
ext4_data_block_valid 2312
ext4_journal_check_start 3080
ext4_mark_inode_dirty 5320
ext4_get_inode_flags 5955
ext4_get_inode_loc 5955
ext4_mark_iloc_dirty 5955
ext4_reserve_inode_write 5955
ext4_inode_table 7076
ext4_get_group_desc 8476
ext4_has_inline_data 9492
ext4_inode_touch_time_cmp 38980
Ending tracing...
So ext4_inode_touch_time_cmp() was called the most frequently, at 38,980 times.
This may be normal, this may not. The purpose of this tool is to give you one
view of how one or many kernel functions are executed. Previously I had little
idea what ext4 was doing internally. Now I know the top 25 functions, and their
rate, and can begin researching them from the source code.
Use -h to print the USAGE message:
# ./funccount -h
USAGE: funccount [-hT] [-i secs] [-d secs] [-t top] funcstring
-d seconds # total duration of trace
-h # this usage message
-i seconds # interval summary
-t top # show top num entries only
-T # include timestamp (for -i)
eg,
funccount 'vfs*' # trace all funcs that match "vfs*"
funccount -d 5 'tcp*' # trace "tcp*" funcs for 5 seconds
funccount -t 10 'ext3*' # show top 10 "ext3*" funcs
funccount -i 1 'ext3*' # summary every 1 second
funccount -i 1 -d 5 'ext3*' # 5 x 1 second summaries
See the man page and example file for more info.
perf-tools-unstable-0.0.1~20150130+git85414b0/examples/funcgraph_example.txt 0000664 0000000 0000000 00000460602 12542613570 0026122 0 ustar 00root root 0000000 0000000 Demonstrations of funcgraph, the Linux ftrace version.
I'll start by showing do_nanosleep(), since it's usually a low frequency
function that can be easily triggered (run "vmstat 1"):
# ./funcgraph do_nanosleep
Tracing "do_nanosleep"... Ctrl-C to end.
0) | do_nanosleep() {
0) | hrtimer_start_range_ns() {
0) | __hrtimer_start_range_ns() {
0) | lock_hrtimer_base.isra.24() {
0) 0.198 us | _raw_spin_lock_irqsave();
0) 0.908 us | }
0) 0.061 us | idle_cpu();
0) 0.117 us | ktime_get();
0) 0.371 us | enqueue_hrtimer();
0) 0.075 us | _raw_spin_unlock_irqrestore();
0) 3.447 us | }
0) 3.998 us | }
0) | schedule() {
0) | __schedule() {
0) 0.050 us | rcu_note_context_switch();
0) 0.055 us | _raw_spin_lock_irq();
0) | deactivate_task() {
0) | dequeue_task() {
0) 0.142 us | update_rq_clock();
0) | dequeue_task_fair() {
0) | dequeue_entity() {
0) | update_curr() {
0) 0.086 us | cpuacct_charge();
0) 0.757 us | }
0) 0.052 us | clear_buddies();
0) 0.103 us | update_cfs_load();
0) | update_cfs_shares() {
0) | reweight_entity() {
0) 0.077 us | update_curr();
0) 0.438 us | }
0) 0.794 us | }
0) 3.067 us | }
0) 0.064 us | set_next_buddy();
0) 0.066 us | update_cfs_load();
0) 0.085 us | update_cfs_shares();
0) | hrtick_update() {
0) 0.063 us | hrtick_start_fair();
0) 0.367 us | }
0) 5.188 us | }
0) 5.923 us | }
0) 6.228 us | }
0) | put_prev_task_fair() {
0) 0.078 us | put_prev_entity();
0) | put_prev_entity() {
0) 0.070 us | update_curr();
0) 0.074 us | __enqueue_entity();
0) 0.737 us | }
0) 1.367 us | }
0) | pick_next_task_fair() {
0) | pick_next_entity() {
0) 0.052 us | wakeup_preempt_entity.isra.95();
0) 0.070 us | clear_buddies();
0) 0.676 us | }
0) | set_next_entity() {
0) 0.052 us | update_stats_wait_end();
0) 0.435 us | }
0) | pick_next_entity() {
0) 0.065 us | clear_buddies();
0) 0.376 us | }
0) | set_next_entity() {
0) 0.067 us | update_stats_wait_end();
0) 0.374 us | }
0) 0.051 us | hrtick_start_fair();
0) 3.879 us | }
0) 0.057 us | paravirt_start_context_switch();
0) | xen_load_sp0() {
0) 0.050 us | paravirt_get_lazy_mode();
0) 0.057 us | __xen_mc_entry();
0) 0.056 us | paravirt_get_lazy_mode();
0) 1.441 us | }
0) | xen_load_tls() {
0) 0.049 us | paravirt_get_lazy_mode();
0) 0.051 us | paravirt_get_lazy_mode();
0) | load_TLS_descriptor() {
0) | arbitrary_virt_to_machine() {
0) 0.081 us | __virt_addr_valid();
0) 0.052 us | __phys_addr();
0) 0.084 us | get_phys_to_machine();
0) 1.115 us | }
0) 0.053 us | __xen_mc_entry();
0) 1.744 us | }
0) | load_TLS_descriptor() {
0) | arbitrary_virt_to_machine() {
0) 0.053 us | __virt_addr_valid();
0) 0.056 us | __phys_addr();
0) 0.057 us | get_phys_to_machine();
0) 0.990 us | }
0) 0.053 us | __xen_mc_entry();
0) 1.583 us | } /* load_TLS_descriptor */
0) | load_TLS_descriptor() {
0) | arbitrary_virt_to_machine() {
0) 0.057 us | __virt_addr_valid();
0) 0.051 us | __phys_addr();
0) 0.053 us | get_phys_to_machine();
0) 0.978 us | }
0) 0.052 us | __xen_mc_entry();
0) 1.586 us | }
0) 0.052 us | paravirt_get_lazy_mode();
0) 6.630 us | }
0) | xen_end_context_switch() {
0) 0.666 us | xen_mc_flush();
0) 0.050 us | paravirt_end_context_switch();
0) 1.286 us | }
0) 0.172 us | xen_write_msr_safe();
------------------------------------------
0) platfor-3210 => vmstat-2854
------------------------------------------
0) | do_nanosleep() {
0) | hrtimer_start_range_ns() {
0) | __hrtimer_start_range_ns() {
0) | lock_hrtimer_base.isra.24() {
0) 0.217 us | _raw_spin_lock_irqsave();
0) 0.831 us | }
0) 0.066 us | idle_cpu();
0) 0.123 us | ktime_get();
0) 1.172 us | enqueue_hrtimer();
0) 0.089 us | _raw_spin_unlock_irqrestore();
0) 4.050 us | }
0) 4.523 us | }
[...]
The default output shows the function call graph, including all child kernel
functions, along with the function duration times. These times are printed on
either the return line for the function ("}"), or for leaf functions, on the
same line.
The format of this output is documented in the function graph section of the
kernel source file Documentation/trace/ftrace.txt.
This particular example shows the workings of do_nanosleep, in the first dozen
lines, and then schedule() is called to sleep this thread and run another. The
inner workings of schedule() is included in the output.
This output is great for determining the behavior of a certain kernel function,
and to identify functions that can be studied in more details using other, lower
overhead, tools (eg, funccount(8), functrace(8), kprobe(8)). The overheads
of funcgraph are moderate, since all kernel functions are traced in case
they are executed, then included in the output if they are.
Now, if you want to start understanding the general behavior of the kernel,
without a certain kernel function in mind, you may be better to begin with
CPU stack profiling using perf and generating a flame graph. Such an approach
has low overhead, as you are in control of the frequency of event collection
(eg, gathering CPU stacks at 99 Hertz). For instructions, see:
http://www.brendangregg.com/perf.html#FlameGraphs
On newer Linux kernels, you can use the -m option to limit the function
depth. Eg, 3 levels only:
# ./funcgraph -m 3 do_nanosleep
Tracing "do_nanosleep"... Ctrl-C to end.
1) | do_nanosleep() {
1) | hrtimer_start_range_ns() {
1) 1.115 us | __hrtimer_start_range_ns();
1) 1.919 us | }
1) | schedule() {
1) | __schedule() {
1) 1000131 us | }
1) 11.006 us | xen_evtchn_do_upcall();
1) 1000149 us | }
1) | hrtimer_cancel() {
1) 0.212 us | hrtimer_try_to_cancel();
1) 0.699 us | }
1) 1000154 us | }
Neat.
Now do_sys_open() to 3 levels:
0) | do_sys_open() {
0) | getname() {
0) 0.296 us | getname_flags();
0) 0.768 us | }
0) | get_unused_fd_flags() {
0) 0.397 us | __alloc_fd();
0) 0.827 us | }
0) | do_filp_open() {
0) 4.166 us | path_openat();
0) 4.617 us | }
0) | __fsnotify_parent() {
0) 0.083 us | dget_parent();
0) 0.063 us | dput();
0) 0.883 us | }
0) 0.058 us | fsnotify();
0) | fd_install() {
0) 0.133 us | __fd_install();
0) 0.525 us | }
0) | putname() {
0) 0.198 us | final_putname();
0) 0.512 us | }
0) 10.777 us | }
[...]
I can then pick the highest latency child function, then run funcgraph again
using it as the target.
Without timestamps (-D to elide duration):
# ./funcgraph -Dm 3 do_sys_open
Tracing "do_sys_open"... Ctrl-C to end.
1) do_sys_open() {
1) getname() {
1) getname_flags();
1) }
1) get_unused_fd_flags() {
1) __alloc_fd();
1) }
1) do_filp_open() {
1) path_openat();
1) }
1) __fsnotify_parent();
1) fsnotify();
1) fd_install() {
1) __fd_install();
1) }
1) putname() {
1) final_putname();
1) }
1) }
Beautiful.
I could elide the CPU column as well, but I want to leave it: if it changes
half-way through some output, you know the CPU buffer has switched, and the
output may be shuffled.
For this example, I trace vfs_read() calls by process ID 5363: which is a bash
shell. I also include headers (-H) and absolute timestamps (-t). While
tracing, in that bash shell, I typed the word "hello":
# ./funcgraph -Htp 5363 vfs_read
Tracing "vfs_read" for PID 5363... Ctrl-C to end.
# tracer: function_graph
#
# TIME CPU DURATION FUNCTION CALLS
# | | | | | | | |
7238523.638008 | 0) | finish_task_switch() {
7238523.638012 | 0) | xen_evtchn_do_upcall() {
7238523.638012 | 0) | irq_enter() {
7238523.638013 | 0) 0.153 us | rcu_irq_enter();
7238523.638014 | 0) 1.144 us | }
7238523.638014 | 0) 0.056 us | exit_idle();
7238523.638014 | 0) | __xen_evtchn_do_upcall() {
7238523.638015 | 0) | evtchn_2l_handle_events() {
7238523.638015 | 0) 0.057 us | irq_from_virq();
7238523.638015 | 0) | evtchn_from_irq() {
7238523.638015 | 0) | irq_get_irq_data() {
7238523.638016 | 0) 0.058 us | irq_to_desc();
7238523.638016 | 0) 0.565 us | }
7238523.638016 | 0) 0.966 us | }
7238523.638016 | 0) | get_evtchn_to_irq() {
7238523.638017 | 0) 0.050 us | evtchn_2l_max_channels();
7238523.638017 | 0) 0.386 us | }
7238523.638017 | 0) | generic_handle_irq() {
7238523.638017 | 0) 0.058 us | irq_to_desc();
7238523.638018 | 0) | handle_percpu_irq() {
7238523.638018 | 0) | ack_dynirq() {
7238523.638018 | 0) | evtchn_from_irq() {
7238523.638018 | 0) | irq_get_irq_data() {
7238523.638019 | 0) 0.049 us | irq_to_desc();
7238523.638019 | 0) 0.441 us | }
7238523.638019 | 0) 0.772 us | }
7238523.638019 | 0) 0.049 us | irq_move_irq();
7238523.638020 | 0) 0.060 us | evtchn_2l_clear_pending();
7238523.638020 | 0) 1.810 us | }
7238523.638020 | 0) | handle_irq_event_percpu() {
7238523.638020 | 0) | xen_irq_work_interrupt() {
7238523.638021 | 0) | irq_enter() {
7238523.638021 | 0) 0.056 us | rcu_irq_enter();
7238523.638021 | 0) 0.384 us | }
7238523.638021 | 0) | __wake_up() {
7238523.638022 | 0) 0.059 us | _raw_spin_lock_irqsave();
7238523.638022 | 0) | __wake_up_common() {
7238523.638022 | 0) | autoremove_wake_function() {
7238523.638023 | 0) | default_wake_function() {
7238523.638023 | 0) | try_to_wake_up() {
7238523.638023 | 0) 0.220 us | _raw_spin_lock_irqsave();
7238523.638024 | 0) 0.270 us | task_waking_fair();
7238523.638024 | 0) | select_task_rq_fair() {
7238523.638025 | 0) 0.055 us | source_load();
7238523.638025 | 0) 0.056 us | target_load();
7238523.638025 | 0) 0.060 us | idle_cpu();
7238523.638026 | 0) 0.054 us | cpus_share_cache();
7238523.638026 | 0) 0.083 us | idle_cpu();
7238523.638026 | 0) 2.060 us | }
7238523.638027 | 0) 0.051 us | _raw_spin_lock();
7238523.638027 | 0) | ttwu_do_activate.constprop.124() {
7238523.638027 | 0) | activate_task() {
7238523.638027 | 0) | enqueue_task() {
7238523.638028 | 0) 0.120 us | update_rq_clock();
7238523.638028 | 0) | enqueue_task_fair() {
7238523.638028 | 0) | enqueue_entity() {
7238523.638028 | 0) 0.147 us | update_curr();
7238523.638029 | 0) 0.055 us | __compute_runnable_contrib.part.51();
7238523.638029 | 0) 0.066 us | __update_entity_load_avg_contrib();
7238523.638029 | 0) 0.141 us | update_cfs_rq_blocked_load();
7238523.638030 | 0) 0.068 us | account_entity_enqueue();
7238523.638030 | 0) 0.351 us | update_cfs_shares();
7238523.638031 | 0) 0.053 us | place_entity();
7238523.638031 | 0) 0.082 us | __enqueue_entity();
7238523.638032 | 0) 0.050 us | update_cfs_rq_blocked_load();
7238523.638032 | 0) 3.922 us | }
7238523.638032 | 0) | enqueue_entity() {
7238523.638033 | 0) 0.058 us | update_curr();
7238523.638033 | 0) 0.056 us | __compute_runnable_contrib.part.51();
7238523.638033 | 0) 0.078 us | __update_entity_load_avg_contrib();
7238523.638034 | 0) 0.055 us | update_cfs_rq_blocked_load();
7238523.638034 | 0) 0.064 us | account_entity_enqueue();
7238523.638034 | 0) 0.059 us | update_cfs_shares();
7238523.638035 | 0) 0.050 us | place_entity();
7238523.638036 | 0) 0.057 us | __enqueue_entity();
7238523.638036 | 0) 3.829 us | }
7238523.638037 | 0) 0.057 us | hrtick_update();
7238523.638037 | 0) 8.876 us | }
7238523.638037 | 0) 9.698 us | }
7238523.638037 | 0) 10.113 us | }
7238523.638038 | 0) | ttwu_do_wakeup() {
7238523.638038 | 0) | check_preempt_curr() {
7238523.638038 | 0) | resched_task() {
7238523.638038 | 0) | xen_smp_send_reschedule() {
7238523.638038 | 0) | xen_send_IPI_one() {
7238523.638039 | 0) | notify_remote_via_irq() {
7238523.638039 | 0) | evtchn_from_irq() {
7238523.638039 | 0) | irq_get_irq_data() {
7238523.638039 | 0) 0.051 us | irq_to_desc();
7238523.638039 | 0) 0.518 us | }
7238523.638040 | 0) 0.955 us | }
7238523.638041 | 0) 2.001 us | }
7238523.638041 | 0) 2.391 us | }
7238523.638041 | 0) 2.745 us | }
7238523.638041 | 0) 3.183 us | }
7238523.638042 | 0) 3.663 us | }
7238523.638042 | 0) 4.621 us | }
7238523.638043 | 0) 15.443 us | }
7238523.638043 | 0) 0.067 us | _raw_spin_unlock();
7238523.638043 | 0) 0.167 us | ttwu_stat();
7238523.638044 | 0) 0.087 us | _raw_spin_unlock_irqrestore();
7238523.638044 | 0) 21.447 us | }
7238523.638045 | 0) 21.940 us | }
7238523.638045 | 0) 22.406 us | }
7238523.638045 | 0) 23.071 us | }
7238523.638045 | 0) 0.073 us | _raw_spin_unlock_irqrestore();
7238523.638046 | 0) 24.382 us | }
7238523.638046 | 0) | irq_exit() {
7238523.638047 | 0) 0.085 us | idle_cpu();
7238523.638047 | 0) 0.093 us | rcu_irq_exit();
7238523.638048 | 0) 1.242 us | }
7238523.638048 | 0) 27.410 us | }
7238523.638049 | 0) 0.139 us | add_interrupt_randomness();
7238523.638049 | 0) 0.089 us | note_interrupt();
7238523.638050 | 0) 29.582 us | }
7238523.638050 | 0) 32.112 us | }
7238523.638050 | 0) 32.951 us | }
7238523.638051 | 0) 35.765 us | }
7238523.638051 | 0) 36.170 us | }
7238523.638051 | 0) | irq_exit() {
7238523.638051 | 0) 0.082 us | idle_cpu();
7238523.638052 | 0) 0.071 us | rcu_irq_exit();
7238523.638053 | 0) 1.328 us | }
7238523.638053 | 0) 40.563 us | }
7238523.638054 | 0) | __mmdrop() {
7238523.638054 | 0) | pgd_free() {
7238523.638055 | 0) 0.151 us | _raw_spin_lock();
7238523.638055 | 0) 0.069 us | _raw_spin_unlock();
7238523.638056 | 0) | xen_pgd_free() {
7238523.638056 | 0) 0.067 us | xen_get_user_pgd();
7238523.638057 | 0) | free_pages() {
7238523.638057 | 0) | __free_pages() {
7238523.638057 | 0) | free_hot_cold_page() {
7238523.638058 | 0) 0.080 us | free_pages_prepare();
7238523.638058 | 0) 0.363 us | get_pfnblock_flags_mask();
7238523.638059 | 0) 1.626 us | }
7238523.638059 | 0) 2.317 us | }
7238523.638060 | 0) 2.847 us | }
7238523.638060 | 0) 3.908 us | }
7238523.638060 | 0) | free_pages() {
7238523.638060 | 0) | __free_pages() {
7238523.638061 | 0) | free_hot_cold_page() {
7238523.638061 | 0) 0.083 us | free_pages_prepare();
7238523.638061 | 0) 0.139 us | get_pfnblock_flags_mask();
7238523.638062 | 0) 1.062 us | }
7238523.638062 | 0) 1.534 us | }
7238523.638062 | 0) 2.038 us | }
7238523.638063 | 0) 8.268 us | }
7238523.638064 | 0) 0.160 us | destroy_context();
7238523.638065 | 0) 0.384 us | kmem_cache_free();
7238523.638066 | 0) 11.433 us | }
7238523.638066 | 0) 54.448 us | }
7238523.638066 | 0) 19354026 us | } /* __schedule */
7238523.638067 | 0) 19354026 us | } /* schedule */
7238523.638067 | 0) 19354027 us | } /* schedule_timeout */
7238523.638067 | 0) 0.121 us | down_read();
7238523.638068 | 0) | copy_from_read_buf() {
7238523.638069 | 0) | tty_audit_add_data() {
7238523.638070 | 0) 0.220 us | _raw_spin_lock_irqsave();
7238523.638071 | 0) 0.097 us | _raw_spin_unlock_irqrestore();
7238523.638071 | 0) 0.078 us | _raw_spin_lock_irqsave();
7238523.638072 | 0) 0.077 us | _raw_spin_unlock_irqrestore();
7238523.638072 | 0) 2.795 us | }
7238523.638073 | 0) 4.183 us | }
7238523.638073 | 0) 0.084 us | copy_from_read_buf();
7238523.638074 | 0) 0.078 us | n_tty_set_room();
7238523.638074 | 0) 0.082 us | n_tty_write_wakeup();
7238523.638075 | 0) | __wake_up() {
7238523.638075 | 0) 0.084 us | _raw_spin_lock_irqsave();
7238523.638076 | 0) | __wake_up_common() {
7238523.638076 | 0) 0.095 us | pollwake();
7238523.638077 | 0) 0.819 us | }
7238523.638077 | 0) 0.074 us | _raw_spin_unlock_irqrestore();
7238523.638078 | 0) 2.463 us | }
7238523.638078 | 0) 0.071 us | n_tty_set_room();
7238523.638078 | 0) 0.082 us | up_read();
7238523.638079 | 0) | remove_wait_queue() {
7238523.638079 | 0) 0.082 us | _raw_spin_lock_irqsave();
7238523.638080 | 0) 0.086 us | _raw_spin_unlock_irqrestore();
7238523.638080 | 0) 1.239 us | }
7238523.638081 | 0) 0.142 us | mutex_unlock();
7238523.638081 | 0) 19354047 us | } /* n_tty_read */
7238523.638082 | 0) | tty_ldisc_deref() {
7238523.638082 | 0) 0.064 us | ldsem_up_read();
7238523.638082 | 0) 0.554 us | }
7238523.638083 | 0) 0.074 us | get_seconds();
7238523.638083 | 0) 19354052 us | } /* tty_read */
7238523.638084 | 0) 0.352 us | __fsnotify_parent();
7238523.638085 | 0) 0.178 us | fsnotify();
7238523.638085 | 0) 19354058 us | } /* vfs_read */
7238523.638156 | 0) | vfs_read() {
7238523.638157 | 0) | rw_verify_area() {
7238523.638157 | 0) | security_file_permission() {
7238523.638158 | 0) | apparmor_file_permission() {
7238523.638158 | 0) 0.183 us | common_file_perm();
7238523.638159 | 0) 0.778 us | }
7238523.638159 | 0) 0.081 us | __fsnotify_parent();
7238523.638160 | 0) 0.104 us | fsnotify();
7238523.638160 | 0) 2.662 us | }
7238523.638161 | 0) 3.337 us | }
7238523.638161 | 0) | tty_read() {
7238523.638161 | 0) 0.067 us | tty_paranoia_check();
7238523.638162 | 0) | tty_ldisc_ref_wait() {
7238523.638162 | 0) 0.080 us | } /* ldsem_down_read */
7238523.638163 | 0) 0.637 us | }
7238523.638163 | 0) | n_tty_read() {
7238523.638164 | 0) 0.078 us | _raw_spin_lock_irq();
7238523.638164 | 0) 0.090 us | mutex_lock_interruptible();
7238523.638165 | 0) 0.078 us | down_read();
7238523.638165 | 0) | add_wait_queue() {
7238523.638166 | 0) 0.070 us | _raw_spin_lock_irqsave();
7238523.638166 | 0) 0.084 us | _raw_spin_unlock_irqrestore();
7238523.638167 | 0) 1.111 us | }
7238523.638167 | 0) 0.083 us | tty_hung_up_p();
7238523.638168 | 0) 0.080 us | n_tty_set_room();
7238523.638169 | 0) 0.068 us | up_read();
7238523.638169 | 0) | schedule_timeout() {
7238523.638170 | 0) | schedule() {
7238523.638170 | 0) | __schedule() {
7238523.638171 | 0) 0.078 us | rcu_note_context_switch();
7238523.638171 | 0) 0.081 us | _raw_spin_lock_irq();
7238523.638172 | 0) | deactivate_task() {
7238523.638172 | 0) | dequeue_task() {
7238523.638172 | 0) 0.181 us | update_rq_clock();
7238523.638173 | 0) | dequeue_task_fair() {
7238523.638174 | 0) | dequeue_entity() {
7238523.638174 | 0) | update_curr() {
7238523.638174 | 0) 0.257 us | cpuacct_charge();
7238523.638175 | 0) 0.982 us | }
7238523.638175 | 0) 0.079 us | update_cfs_rq_blocked_load();
7238523.638176 | 0) 0.080 us | clear_buddies();
7238523.638177 | 0) 0.096 us | account_entity_dequeue();
7238523.638177 | 0) | update_cfs_shares() {
7238523.638178 | 0) 0.113 us | update_curr();
7238523.638178 | 0) 0.087 us | account_entity_dequeue();
7238523.638179 | 0) 0.073 us | account_entity_enqueue();
7238523.638179 | 0) 1.948 us | }
7238523.638180 | 0) 5.913 us | }
7238523.638180 | 0) | dequeue_entity() {
7238523.638180 | 0) 0.086 us | update_curr();
7238523.638181 | 0) 0.079 us | update_cfs_rq_blocked_load();
7238523.638182 | 0) 0.076 us | clear_buddies();
7238523.638182 | 0) 0.076 us | account_entity_dequeue();
7238523.638183 | 0) 0.104 us | update_cfs_shares();
7238523.638183 | 0) 3.171 us | }
7238523.638184 | 0) 0.076 us | hrtick_update();
7238523.638184 | 0) 10.785 us | }
7238523.638184 | 0) 12.057 us | }
7238523.638185 | 0) 12.704 us | }
7238523.638185 | 0) | pick_next_task_fair() {
7238523.638185 | 0) 0.074 us | check_cfs_rq_runtime();
7238523.638186 | 0) | pick_next_entity() {
7238523.638186 | 0) 0.067 us | clear_buddies();
7238523.638187 | 0) 0.544 us | }
7238523.638187 | 0) | put_prev_entity() {
7238523.638187 | 0) 0.079 us | check_cfs_rq_runtime();
7238523.638188 | 0) 0.612 us | }
7238523.638188 | 0) | put_prev_entity() {
7238523.638188 | 0) 0.076 us | check_cfs_rq_runtime();
7238523.638189 | 0) 0.618 us | }
7238523.638189 | 0) | set_next_entity() {
7238523.638190 | 0) 0.078 us | update_stats_wait_end();
7238523.638190 | 0) 0.712 us | }
7238523.638190 | 0) 5.023 us | }
7238523.638191 | 0) 0.086 us | paravirt_start_context_switch();
7238523.638192 | 0) 0.070 us | xen_read_cr0();
7238523.638193 | 0) | xen_write_cr0() {
7238523.638193 | 0) 0.085 us | paravirt_get_lazy_mode();
7238523.638194 | 0) 0.085 us | __xen_mc_entry();
7238523.638194 | 0) 0.077 us | paravirt_get_lazy_mode();
7238523.638195 | 0) 1.822 us | }
7238523.638195 | 0) | xen_load_sp0() {
7238523.638195 | 0) 0.074 us | paravirt_get_lazy_mode();
7238523.638196 | 0) 0.085 us | __xen_mc_entry();
7238523.638196 | 0) 0.078 us | paravirt_get_lazy_mode();
7238523.638197 | 0) 1.754 us | }
7238523.638197 | 0) | xen_load_tls() {
7238523.638198 | 0) 0.069 us | paravirt_get_lazy_mode();
7238523.638198 | 0) 0.082 us | paravirt_get_lazy_mode();
7238523.638199 | 0) 0.127 us | load_TLS_descriptor();
7238523.638199 | 0) 0.080 us | load_TLS_descriptor();
7238523.638200 | 0) 0.094 us | load_TLS_descriptor();
7238523.638201 | 0) 0.081 us | paravirt_get_lazy_mode();
7238523.638202 | 0) 4.155 us | }
7238523.638202 | 0) | xen_end_context_switch() {
7238523.638202 | 0) 0.699 us | xen_mc_flush();
7238523.638204 | 0) 0.089 us | paravirt_end_context_switch();
7238523.638204 | 0) 1.915 us | }
7238523.797630 | 0) | finish_task_switch() {
7238523.797634 | 0) | xen_evtchn_do_upcall() {
7238523.797634 | 0) | irq_enter() {
7238523.797634 | 0) 0.134 us | rcu_irq_enter();
7238523.797635 | 0) 0.688 us | }
7238523.797635 | 0) 0.055 us | exit_idle();
7238523.797635 | 0) | __xen_evtchn_do_upcall() {
7238523.797636 | 0) | evtchn_2l_handle_events() {
7238523.797636 | 0) 0.048 us | irq_from_virq();
7238523.797636 | 0) | evtchn_from_irq() {
7238523.797636 | 0) | irq_get_irq_data() {
7238523.797637 | 0) 0.061 us | irq_to_desc();
7238523.797637 | 0) 0.564 us | }
7238523.797637 | 0) 0.954 us | }
7238523.797638 | 0) | get_evtchn_to_irq() {
7238523.797638 | 0) 0.057 us | evtchn_2l_max_channels();
7238523.797638 | 0) 0.409 us | }
7238523.797638 | 0) | generic_handle_irq() {
7238523.797638 | 0) 0.052 us | irq_to_desc();
7238523.797639 | 0) | handle_percpu_irq() {
7238523.797639 | 0) | ack_dynirq() {
7238523.797639 | 0) | evtchn_from_irq() {
7238523.797639 | 0) | irq_get_irq_data() {
7238523.797640 | 0) 0.057 us | irq_to_desc();
7238523.797640 | 0) 0.440 us | }
7238523.797640 | 0) 0.746 us | }
7238523.797640 | 0) 0.056 us | irq_move_irq();
7238523.797641 | 0) 0.058 us | evtchn_2l_clear_pending();
7238523.797641 | 0) 1.729 us | }
7238523.797641 | 0) | handle_irq_event_percpu() {
7238523.797641 | 0) | xen_irq_work_interrupt() {
7238523.797642 | 0) | irq_enter() {
7238523.797642 | 0) 0.053 us | rcu_irq_enter();
7238523.797642 | 0) 0.396 us | }
7238523.797642 | 0) | __wake_up() {
7238523.797643 | 0) 0.053 us | _raw_spin_lock_irqsave();
7238523.797643 | 0) | __wake_up_common() {
7238523.797643 | 0) | autoremove_wake_function() {
7238523.797644 | 0) | default_wake_function() {
7238523.797644 | 0) | try_to_wake_up() {
7238523.797644 | 0) 0.228 us | _raw_spin_lock_irqsave();
7238523.797645 | 0) 0.194 us | task_waking_fair();
7238523.797645 | 0) | select_task_rq_fair() {
7238523.797645 | 0) 0.051 us | source_load();
7238523.797646 | 0) 0.050 us | target_load();
7238523.797646 | 0) 0.067 us | idle_cpu();
7238523.797647 | 0) 0.050 us | cpus_share_cache();
7238523.797647 | 0) 0.068 us | idle_cpu();
7238523.797647 | 0) 1.983 us | }
7238523.797648 | 0) 0.051 us | _raw_spin_lock();
7238523.797648 | 0) | ttwu_do_activate.constprop.124() {
7238523.797648 | 0) | activate_task() {
7238523.797648 | 0) | enqueue_task() {
7238523.797648 | 0) 0.135 us | update_rq_clock();
7238523.797649 | 0) | enqueue_task_fair() {
7238523.797649 | 0) | enqueue_entity() {
7238523.797649 | 0) 0.059 us | update_curr();
7238523.797650 | 0) 0.073 us | __compute_runnable_contrib.part.51();
7238523.797650 | 0) 0.066 us | __update_entity_load_avg_contrib();
7238523.797650 | 0) 0.059 us | update_cfs_rq_blocked_load();
7238523.797651 | 0) 0.064 us | account_entity_enqueue();
7238523.797651 | 0) 0.137 us | update_cfs_shares();
7238523.797651 | 0) 0.054 us | place_entity();
7238523.797652 | 0) 0.074 us | __enqueue_entity();
7238523.797652 | 0) 3.085 us | }
7238523.797652 | 0) | enqueue_entity() {
7238523.797653 | 0) 0.058 us | update_curr();
7238523.797654 | 0) 0.049 us | update_cfs_rq_blocked_load();
7238523.797654 | 0) 0.057 us | account_entity_enqueue();
7238523.797655 | 0) 0.066 us | update_cfs_shares();
7238523.797655 | 0) 0.049 us | place_entity();
7238523.797655 | 0) 0.051 us | __enqueue_entity();
7238523.797656 | 0) 3.432 us | }
7238523.797656 | 0) 0.049 us | hrtick_update();
7238523.797657 | 0) 7.552 us | }
7238523.797657 | 0) 8.414 us | }
7238523.797657 | 0) 8.753 us | }
7238523.797657 | 0) | ttwu_do_wakeup() {
7238523.797657 | 0) | check_preempt_curr() {
7238523.797657 | 0) | resched_task() {
7238523.797658 | 0) | xen_smp_send_reschedule() {
7238523.797658 | 0) | xen_send_IPI_one() {
7238523.797658 | 0) | notify_remote_via_irq() {
7238523.797658 | 0) | evtchn_from_irq() {
7238523.797658 | 0) | irq_get_irq_data() {
7238523.797659 | 0) 0.069 us | irq_to_desc();
7238523.797659 | 0) 0.504 us | }
7238523.797659 | 0) 0.869 us | }
7238523.797660 | 0) 1.940 us | } /* notify_remote_via_irq */
7238523.797660 | 0) 2.319 us | }
7238523.797660 | 0) 2.712 us | }
7238523.797661 | 0) 3.147 us | }
7238523.797661 | 0) 3.625 us | }
7238523.797662 | 0) 4.525 us | }
7238523.797662 | 0) 13.961 us | }
7238523.797662 | 0) 0.069 us | _raw_spin_unlock();
7238523.797663 | 0) 0.168 us | ttwu_stat();
7238523.797663 | 0) 0.076 us | _raw_spin_unlock_irqrestore();
7238523.797664 | 0) 19.821 us | }
7238523.797664 | 0) 20.301 us | }
7238523.797664 | 0) 20.796 us | }
7238523.797664 | 0) 21.367 us | }
7238523.797665 | 0) 0.071 us | _raw_spin_unlock_irqrestore();
7238523.797665 | 0) 22.621 us | }
7238523.797666 | 0) | irq_exit() {
7238523.797666 | 0) 0.085 us | idle_cpu();
7238523.797666 | 0) 0.106 us | rcu_irq_exit();
7238523.797667 | 0) 1.220 us | }
7238523.797667 | 0) 25.712 us | }
7238523.797668 | 0) 0.138 us | add_interrupt_randomness();
7238523.797668 | 0) 0.092 us | note_interrupt();
7238523.797669 | 0) 27.713 us | }
7238523.797669 | 0) 30.163 us | }
7238523.797669 | 0) 31.017 us | }
7238523.797670 | 0) 33.953 us | }
7238523.797670 | 0) 34.384 us | }
7238523.797670 | 0) | irq_exit() {
7238523.797671 | 0) 0.079 us | idle_cpu();
7238523.797671 | 0) 0.072 us | rcu_irq_exit();
7238523.797672 | 0) 1.023 us | }
7238523.797672 | 0) 37.789 us | }
7238523.797672 | 0) 39.298 us | }
7238523.797673 | 0) 159502.1 us | }
7238523.797673 | 0) 159502.8 us | }
7238523.797673 | 0) 159503.5 us | }
7238523.797674 | 0) 0.112 us | down_read();
7238523.797675 | 0) | copy_from_read_buf() {
7238523.797676 | 0) | tty_audit_add_data() {
7238523.797676 | 0) 0.226 us | _raw_spin_lock_irqsave();
7238523.797677 | 0) 0.075 us | _raw_spin_unlock_irqrestore();
7238523.797677 | 0) 0.101 us | _raw_spin_lock_irqsave();
7238523.797678 | 0) 0.068 us | _raw_spin_unlock_irqrestore();
7238523.797679 | 0) 2.656 us | }
7238523.797679 | 0) 3.762 us | }
7238523.797679 | 0) 0.145 us | copy_from_read_buf();
7238523.797680 | 0) 0.068 us | n_tty_set_room();
7238523.797680 | 0) 0.058 us | n_tty_write_wakeup();
7238523.797681 | 0) | __wake_up() {
7238523.797682 | 0) 0.060 us | _raw_spin_lock_irqsave();
7238523.797682 | 0) | __wake_up_common() {
7238523.797683 | 0) 0.083 us | pollwake();
7238523.797683 | 0) 0.739 us | }
7238523.797683 | 0) 0.069 us | _raw_spin_unlock_irqrestore();
7238523.797684 | 0) 2.745 us | }
7238523.797684 | 0) 0.061 us | n_tty_set_room();
7238523.797685 | 0) 0.074 us | up_read();
7238523.797685 | 0) | remove_wait_queue() {
7238523.797685 | 0) 0.075 us | _raw_spin_lock_irqsave();
7238523.797686 | 0) 0.070 us | _raw_spin_unlock_irqrestore();
7238523.797686 | 0) 1.110 us | }
7238523.797687 | 0) 0.146 us | mutex_unlock();
7238523.797687 | 0) 159524.0 us | }
7238523.797688 | 0) | tty_ldisc_deref() {
7238523.797688 | 0) 0.070 us | ldsem_up_read();
7238523.797689 | 0) 0.739 us | }
7238523.797689 | 0) 0.066 us | get_seconds();
7238523.797690 | 0) 159528.3 us | }
7238523.797690 | 0) 0.298 us | __fsnotify_parent();
7238523.797691 | 0) 0.179 us | fsnotify();
7238523.797692 | 0) 159534.6 us | }
7238523.797762 | 0) | vfs_read() {
7238523.797763 | 0) | rw_verify_area() {
7238523.797763 | 0) | security_file_permission() {
7238523.797764 | 0) | apparmor_file_permission() {
7238523.797764 | 0) 0.165 us | common_file_perm();
7238523.797765 | 0) 0.732 us | }
7238523.797765 | 0) 0.081 us | __fsnotify_parent();
7238523.797766 | 0) 0.094 us | fsnotify();
7238523.797766 | 0) 2.711 us | }
7238523.797767 | 0) 3.386 us | }
7238523.797767 | 0) | tty_read() {
7238523.797767 | 0) 0.077 us | tty_paranoia_check();
7238523.797768 | 0) | tty_ldisc_ref_wait() {
7238523.797768 | 0) 0.083 us | ldsem_down_read();
7238523.797769 | 0) 0.686 us | }
7238523.797769 | 0) | n_tty_read() {
7238523.797770 | 0) 0.071 us | _raw_spin_lock_irq();
7238523.797770 | 0) 0.111 us | mutex_lock_interruptible();
7238523.797771 | 0) 0.072 us | down_read();
7238523.797771 | 0) | add_wait_queue() {
7238523.797772 | 0) 0.083 us | _raw_spin_lock_irqsave();
7238523.797772 | 0) 0.085 us | _raw_spin_unlock_irqrestore();
7238523.797773 | 0) 1.124 us | }
7238523.797773 | 0) 0.066 us | tty_hung_up_p();
7238523.797774 | 0) 0.090 us | n_tty_set_room();
7238523.797774 | 0) 0.064 us | up_read();
7238523.797775 | 0) | schedule_timeout() {
7238523.797775 | 0) | schedule() {
7238523.797775 | 0) | __schedule() {
7238523.797776 | 0) 0.083 us | rcu_note_context_switch();
7238523.797776 | 0) 0.078 us | _raw_spin_lock_irq();
7238523.797777 | 0) | deactivate_task() {
7238523.797777 | 0) | dequeue_task() {
7238523.797777 | 0) 0.191 us | update_rq_clock();
7238523.797778 | 0) | dequeue_task_fair() {
7238523.797778 | 0) | dequeue_entity() {
7238523.797779 | 0) | update_curr() {
7238523.797779 | 0) 0.179 us | cpuacct_charge();
7238523.797780 | 0) 0.902 us | }
7238523.797780 | 0) 0.070 us | __update_entity_load_avg_contrib();
7238523.797781 | 0) 0.152 us | update_cfs_rq_blocked_load();
7238523.797781 | 0) 0.073 us | clear_buddies();
7238523.797782 | 0) 0.074 us | account_entity_dequeue();
7238523.797783 | 0) | update_cfs_shares() {
7238523.797783 | 0) 0.111 us | update_curr();
7238523.797783 | 0) 0.082 us | account_entity_dequeue();
7238523.797784 | 0) 0.081 us | account_entity_enqueue();
7238523.797785 | 0) 2.330 us | }
7238523.797785 | 0) 6.633 us | } /* dequeue_entity */
7238523.797786 | 0) | dequeue_entity() {
7238523.797786 | 0) 0.078 us | update_curr();
7238523.797787 | 0) 0.086 us | update_cfs_rq_blocked_load();
7238523.797787 | 0) 0.076 us | clear_buddies();
7238523.797788 | 0) 0.079 us | account_entity_dequeue();
7238523.797789 | 0) 0.074 us | update_cfs_shares();
7238523.797789 | 0) 3.287 us | }
7238523.797789 | 0) 0.074 us | hrtick_update();
7238523.797790 | 0) 11.606 us | }
7238523.797790 | 0) 12.879 us | }
7238523.797790 | 0) 13.406 us | }
7238523.797791 | 0) | pick_next_task_fair() {
7238523.797791 | 0) 0.073 us | check_cfs_rq_runtime();
7238523.797792 | 0) | pick_next_entity() {
7238523.797792 | 0) 0.076 us | clear_buddies();
7238523.797793 | 0) 0.663 us | }
7238523.797793 | 0) | put_prev_entity() {
7238523.797793 | 0) 0.076 us | check_cfs_rq_runtime();
7238523.797794 | 0) 0.598 us | }
7238523.797794 | 0) | put_prev_entity() {
7238523.797794 | 0) 0.078 us | check_cfs_rq_runtime();
7238523.797795 | 0) 0.618 us | }
7238523.797795 | 0) | set_next_entity() {
7238523.797795 | 0) 0.096 us | update_stats_wait_end();
7238523.797796 | 0) 0.738 us | }
7238523.797796 | 0) 5.222 us | }
7238523.797797 | 0) 0.078 us | paravirt_start_context_switch();
7238523.797798 | 0) 0.071 us | xen_read_cr0();
7238523.797799 | 0) | xen_write_cr0() {
7238523.797799 | 0) 0.078 us | paravirt_get_lazy_mode();
7238523.797800 | 0) 0.084 us | __xen_mc_entry();
7238523.797800 | 0) 0.076 us | paravirt_get_lazy_mode();
7238523.797801 | 0) 1.798 us | }
7238523.797801 | 0) | xen_load_sp0() {
7238523.797801 | 0) 0.080 us | paravirt_get_lazy_mode();
7238523.797802 | 0) 0.076 us | __xen_mc_entry();
7238523.797802 | 0) 0.073 us | paravirt_get_lazy_mode();
7238523.797803 | 0) 1.623 us | }
7238523.797803 | 0) | xen_load_tls() {
7238523.797803 | 0) 0.082 us | paravirt_get_lazy_mode();
7238523.797804 | 0) 0.084 us | paravirt_get_lazy_mode();
7238523.797804 | 0) 0.136 us | load_TLS_descriptor();
7238523.797805 | 0) 0.072 us | load_TLS_descriptor();
7238523.797806 | 0) 0.080 us | load_TLS_descriptor();
7238523.797806 | 0) 0.088 us | paravirt_get_lazy_mode();
7238523.797807 | 0) 3.360 us | }
7238523.797807 | 0) | xen_end_context_switch() {
7238523.797807 | 0) 0.601 us | xen_mc_flush();
7238523.797808 | 0) 0.098 us | paravirt_end_context_switch();
7238523.797809 | 0) 1.902 us | }
7238524.005649 | 0) | finish_task_switch() {
7238524.005653 | 0) | xen_evtchn_do_upcall() {
7238524.005653 | 0) | irq_enter() {
7238524.005653 | 0) 0.138 us | rcu_irq_enter();
7238524.005654 | 0) 0.753 us | }
7238524.005654 | 0) 0.056 us | exit_idle();
7238524.005655 | 0) | __xen_evtchn_do_upcall() {
7238524.005655 | 0) | evtchn_2l_handle_events() {
7238524.005655 | 0) 0.057 us | irq_from_virq();
7238524.005656 | 0) | evtchn_from_irq() {
7238524.005656 | 0) | irq_get_irq_data() {
7238524.005656 | 0) 0.050 us | irq_to_desc();
7238524.005656 | 0) 0.499 us | }
7238524.005657 | 0) 0.958 us | }
7238524.005657 | 0) | get_evtchn_to_irq() {
7238524.005657 | 0) 0.057 us | evtchn_2l_max_channels();
7238524.005658 | 0) 0.400 us | }
7238524.005659 | 0) | generic_handle_irq() {
7238524.005659 | 0) 0.052 us | irq_to_desc();
7238524.005659 | 0) | handle_percpu_irq() {
7238524.005659 | 0) | ack_dynirq() {
7238524.005659 | 0) | evtchn_from_irq() {
7238524.005660 | 0) | irq_get_irq_data() {
7238524.005660 | 0) 0.056 us | irq_to_desc();
7238524.005660 | 0) 0.439 us | }
7238524.005660 | 0) 0.739 us | }
7238524.005661 | 0) 0.051 us | irq_move_irq();
7238524.005661 | 0) 0.051 us | evtchn_2l_clear_pending();
7238524.005661 | 0) 1.963 us | }
7238524.005662 | 0) | handle_irq_event_percpu() {
7238524.005662 | 0) | xen_irq_work_interrupt() {
7238524.005662 | 0) | irq_enter() {
7238524.005662 | 0) 0.053 us | rcu_irq_enter();
7238524.005663 | 0) 0.392 us | }
7238524.005663 | 0) | __wake_up() {
7238524.005663 | 0) 0.058 us | _raw_spin_lock_irqsave();
7238524.005664 | 0) | __wake_up_common() {
7238524.005664 | 0) | autoremove_wake_function() {
7238524.005664 | 0) | default_wake_function() {
7238524.005665 | 0) | try_to_wake_up() {
7238524.005665 | 0) 0.226 us | _raw_spin_lock_irqsave();
7238524.005665 | 0) 0.392 us | task_waking_fair();
7238524.005666 | 0) | select_task_rq_fair() {
7238524.005666 | 0) 0.067 us | source_load();
7238524.005667 | 0) 0.057 us | target_load();
7238524.005667 | 0) 0.065 us | idle_cpu();
7238524.005668 | 0) 0.050 us | cpus_share_cache();
7238524.005668 | 0) 0.080 us | idle_cpu();
7238524.005668 | 0) 2.053 us | }
7238524.005669 | 0) 0.051 us | _raw_spin_lock();
7238524.005669 | 0) | ttwu_do_activate.constprop.124() {
7238524.005669 | 0) | activate_task() {
7238524.005669 | 0) | enqueue_task() {
7238524.005669 | 0) 0.165 us | update_rq_clock();
7238524.005670 | 0) | enqueue_task_fair() {
7238524.005670 | 0) | enqueue_entity() {
7238524.005670 | 0) 0.065 us | update_curr();
7238524.005671 | 0) 0.078 us | __compute_runnable_contrib.part.51();
7238524.005671 | 0) 0.070 us | __update_entity_load_avg_contrib();
7238524.005671 | 0) 0.051 us | update_cfs_rq_blocked_load();
7238524.005672 | 0) 0.069 us | account_entity_enqueue();
7238524.005672 | 0) 0.132 us | update_cfs_shares();
7238524.005673 | 0) 0.054 us | place_entity();
7238524.005673 | 0) 0.081 us | __enqueue_entity();
7238524.005673 | 0) 3.111 us | }
7238524.005673 | 0) | enqueue_entity() {
7238524.005674 | 0) 0.059 us | update_curr();
7238524.005674 | 0) 0.057 us | update_cfs_rq_blocked_load();
7238524.005674 | 0) 0.067 us | account_entity_enqueue();
7238524.005675 | 0) 0.082 us | update_cfs_shares();
7238524.005675 | 0) 0.120 us | place_entity();
7238524.005675 | 0) 0.051 us | __enqueue_entity();
7238524.005676 | 0) 2.075 us | }
7238524.005676 | 0) 0.049 us | hrtick_update();
7238524.005676 | 0) 6.167 us | }
7238524.005676 | 0) 6.979 us | }
7238524.005676 | 0) 7.317 us | }
7238524.005677 | 0) | ttwu_do_wakeup() {
7238524.005677 | 0) | check_preempt_curr() {
7238524.005677 | 0) | resched_task() {
7238524.005677 | 0) | xen_smp_send_reschedule() {
7238524.005677 | 0) | xen_send_IPI_one() {
7238524.005678 | 0) | notify_remote_via_irq() {
7238524.005678 | 0) | evtchn_from_irq() {
7238524.005678 | 0) | irq_get_irq_data() {
7238524.005678 | 0) 0.051 us | irq_to_desc();
7238524.005679 | 0) 0.545 us | }
7238524.005679 | 0) 0.910 us | }
7238524.005680 | 0) 1.962 us | } /* notify_remote_via_irq */
7238524.005680 | 0) 2.332 us | }
7238524.005680 | 0) 2.684 us | }
7238524.005681 | 0) 3.606 us | }
7238524.005681 | 0) 4.064 us | }
7238524.005682 | 0) 5.129 us | }
7238524.005682 | 0) 13.194 us | }
7238524.005683 | 0) 0.066 us | _raw_spin_unlock();
7238524.005683 | 0) 0.165 us | ttwu_stat();
7238524.005684 | 0) 0.070 us | _raw_spin_unlock_irqrestore();
7238524.005684 | 0) 19.634 us | }
7238524.005685 | 0) 20.080 us | }
7238524.005685 | 0) 20.608 us | }
7238524.005685 | 0) 21.348 us | }
7238524.005685 | 0) 0.084 us | _raw_spin_unlock_irqrestore();
7238524.005686 | 0) 22.728 us | }
7238524.005686 | 0) | irq_exit() {
7238524.005687 | 0) 0.077 us | idle_cpu();
7238524.005687 | 0) 0.093 us | rcu_irq_exit();
7238524.005688 | 0) 1.101 us | }
7238524.005688 | 0) 25.644 us | }
7238524.005688 | 0) 0.138 us | add_interrupt_randomness();
7238524.005689 | 0) 0.083 us | note_interrupt();
7238524.005689 | 0) 27.672 us | }
7238524.005690 | 0) 30.410 us | }
7238524.005690 | 0) 31.458 us | }
7238524.005690 | 0) 35.276 us | }
7238524.005691 | 0) 35.797 us | }
7238524.005691 | 0) | irq_exit() {
7238524.005691 | 0) 0.066 us | idle_cpu();
7238524.005692 | 0) 0.080 us | rcu_irq_exit();
7238524.005692 | 0) 1.110 us | }
7238524.005693 | 0) 39.440 us | }
7238524.005693 | 0) 41.142 us | }
7238524.005694 | 0) 207918.1 us | }
7238524.005694 | 0) 207918.7 us | }
7238524.005694 | 0) 207919.4 us | }
7238524.005695 | 0) 0.068 us | down_read();
7238524.005696 | 0) | copy_from_read_buf() {
7238524.005697 | 0) | tty_audit_add_data() {
7238524.005697 | 0) 0.233 us | _raw_spin_lock_irqsave();
7238524.005698 | 0) 0.076 us | _raw_spin_unlock_irqrestore();
7238524.005699 | 0) 0.076 us | _raw_spin_lock_irqsave();
7238524.005699 | 0) 0.078 us | _raw_spin_unlock_irqrestore();
7238524.005700 | 0) 2.696 us | }
7238524.005700 | 0) 4.335 us | }
7238524.005701 | 0) 0.086 us | copy_from_read_buf();
7238524.005701 | 0) 0.074 us | n_tty_set_room();
7238524.005702 | 0) 0.085 us | n_tty_write_wakeup();
7238524.005702 | 0) | __wake_up() {
7238524.005703 | 0) 0.061 us | _raw_spin_lock_irqsave();
7238524.005703 | 0) | __wake_up_common() {
7238524.005703 | 0) 0.080 us | pollwake();
7238524.005704 | 0) 0.687 us | }
7238524.005704 | 0) 0.063 us | _raw_spin_unlock_irqrestore();
7238524.005705 | 0) 2.040 us | }
7238524.005705 | 0) 0.071 us | n_tty_set_room();
7238524.005706 | 0) 0.074 us | up_read();
7238524.005706 | 0) | remove_wait_queue() {
7238524.005706 | 0) 0.074 us | _raw_spin_lock_irqsave();
7238524.005707 | 0) 0.069 us | _raw_spin_unlock_irqrestore();
7238524.005707 | 0) 1.076 us | }
7238524.005708 | 0) 0.139 us | mutex_unlock();
7238524.005708 | 0) 207939.0 us | }
7238524.005709 | 0) | tty_ldisc_deref() {
7238524.005709 | 0) 0.077 us | ldsem_up_read();
7238524.005710 | 0) 0.702 us | }
7238524.005710 | 0) 0.068 us | get_seconds();
7238524.005711 | 0) 207943.4 us | }
7238524.005712 | 0) 0.301 us | __fsnotify_parent();
7238524.005713 | 0) 0.157 us | fsnotify();
7238524.005713 | 0) 207950.3 us | }
7238524.005783 | 0) | vfs_read() {
7238524.005784 | 0) | rw_verify_area() {
7238524.005784 | 0) | security_file_permission() {
7238524.005785 | 0) | apparmor_file_permission() {
7238524.005785 | 0) 0.164 us | common_file_perm();
7238524.005786 | 0) 0.790 us | }
7238524.005786 | 0) 0.080 us | __fsnotify_parent();
7238524.005787 | 0) 0.094 us | fsnotify();
7238524.005787 | 0) 2.683 us | }
7238524.005788 | 0) 3.313 us | }
7238524.005788 | 0) | tty_read() {
7238524.005788 | 0) 0.087 us | tty_paranoia_check();
7238524.005789 | 0) | tty_ldisc_ref_wait() {
7238524.005789 | 0) 0.080 us | ldsem_down_read();
7238524.005790 | 0) 0.683 us | }
7238524.005790 | 0) | n_tty_read() {
7238524.005791 | 0) 0.080 us | _raw_spin_lock_irq();
7238524.005791 | 0) 0.104 us | mutex_lock_interruptible();
7238524.005792 | 0) 0.070 us | down_read();
7238524.005792 | 0) | add_wait_queue() {
7238524.005793 | 0) 0.079 us | _raw_spin_lock_irqsave();
7238524.005793 | 0) 0.087 us | _raw_spin_unlock_irqrestore();
7238524.005794 | 0) 1.147 us | }
7238524.005794 | 0) 0.078 us | tty_hung_up_p();
7238524.005795 | 0) 0.071 us | n_tty_set_room();
7238524.005795 | 0) 0.077 us | up_read();
7238524.005796 | 0) | schedule_timeout() {
7238524.005796 | 0) | schedule() {
7238524.005796 | 0) | __schedule() {
7238524.005797 | 0) 0.087 us | rcu_note_context_switch();
7238524.005797 | 0) 0.075 us | _raw_spin_lock_irq();
7238524.005798 | 0) | deactivate_task() {
7238524.005798 | 0) | dequeue_task() {
7238524.005798 | 0) 0.177 us | update_rq_clock();
7238524.005799 | 0) | dequeue_task_fair() {
7238524.005799 | 0) | dequeue_entity() {
7238524.005800 | 0) | update_curr() {
7238524.005800 | 0) 0.334 us | cpuacct_charge();
7238524.005801 | 0) 1.199 us | }
7238524.005802 | 0) 0.081 us | __update_entity_load_avg_contrib();
7238524.005802 | 0) 0.064 us | update_cfs_rq_blocked_load();
7238524.005803 | 0) 0.076 us | clear_buddies();
7238524.005803 | 0) 0.079 us | account_entity_dequeue();
7238524.005804 | 0) | update_cfs_shares() {
7238524.005804 | 0) 0.108 us | update_curr();
7238524.005805 | 0) 0.083 us | account_entity_dequeue();
7238524.005805 | 0) 0.084 us | account_entity_enqueue();
7238524.005806 | 0) 1.869 us | }
7238524.005806 | 0) 6.530 us | } /* dequeue_entity */
7238524.005807 | 0) | dequeue_entity() {
7238524.005807 | 0) 0.104 us | update_curr();
7238524.005808 | 0) 0.115 us | __update_entity_load_avg_contrib();
7238524.005808 | 0) 0.069 us | update_cfs_rq_blocked_load();
7238524.005809 | 0) 0.066 us | clear_buddies();
7238524.005809 | 0) 0.086 us | account_entity_dequeue();
7238524.005810 | 0) 0.102 us | update_cfs_shares();
7238524.005811 | 0) 3.907 us | }
7238524.005811 | 0) 0.071 us | hrtick_update();
7238524.005812 | 0) 12.301 us | }
7238524.005812 | 0) 13.546 us | }
7238524.005812 | 0) 14.105 us | }
7238524.005812 | 0) | pick_next_task_fair() {
7238524.005813 | 0) 0.078 us | check_cfs_rq_runtime();
7238524.005813 | 0) | pick_next_entity() {
7238524.005814 | 0) 0.071 us | clear_buddies();
7238524.005815 | 0) 0.585 us | }
7238524.005815 | 0) | put_prev_entity() {
7238524.005815 | 0) 0.073 us | check_cfs_rq_runtime();
7238524.005816 | 0) 0.717 us | }
7238524.005816 | 0) | put_prev_entity() {
7238524.005817 | 0) 0.080 us | check_cfs_rq_runtime();
7238524.005817 | 0) 0.687 us | }
7238524.005817 | 0) | set_next_entity() {
7238524.005818 | 0) 0.091 us | update_stats_wait_end();
7238524.005818 | 0) 0.786 us | }
7238524.005819 | 0) 6.135 us | }
7238524.005820 | 0) 0.091 us | paravirt_start_context_switch();
7238524.005821 | 0) 0.089 us | xen_read_cr0();
7238524.005821 | 0) | xen_write_cr0() {
7238524.005821 | 0) 0.078 us | paravirt_get_lazy_mode();
7238524.005822 | 0) 0.083 us | __xen_mc_entry();
7238524.005823 | 0) 0.074 us | paravirt_get_lazy_mode();
7238524.005823 | 0) 1.657 us | }
7238524.005823 | 0) | xen_load_sp0() {
7238524.005824 | 0) 0.074 us | paravirt_get_lazy_mode();
7238524.005824 | 0) 0.083 us | __xen_mc_entry();
7238524.005825 | 0) 0.087 us | paravirt_get_lazy_mode();
7238524.005825 | 0) 1.764 us | }
7238524.005826 | 0) | xen_load_tls() {
7238524.005826 | 0) 0.077 us | paravirt_get_lazy_mode();
7238524.005826 | 0) 0.084 us | paravirt_get_lazy_mode();
7238524.005827 | 0) 0.150 us | load_TLS_descriptor();
7238524.005828 | 0) 0.082 us | load_TLS_descriptor();
7238524.005828 | 0) 0.084 us | load_TLS_descriptor();
7238524.005829 | 0) 0.076 us | paravirt_get_lazy_mode();
7238524.005829 | 0) 3.388 us | }
7238524.005829 | 0) | xen_end_context_switch() {
7238524.005830 | 0) 0.731 us | xen_mc_flush();
7238524.005831 | 0) 0.093 us | paravirt_end_context_switch();
7238524.005831 | 0) 1.836 us | }
7238524.141853 | 0) | finish_task_switch() {
7238524.141857 | 0) | xen_evtchn_do_upcall() {
7238524.141858 | 0) | irq_enter() {
7238524.141858 | 0) 0.133 us | rcu_irq_enter();
7238524.141859 | 0) 0.766 us | }
7238524.141859 | 0) 0.056 us | exit_idle();
7238524.141859 | 0) | __xen_evtchn_do_upcall() {
7238524.141859 | 0) | evtchn_2l_handle_events() {
7238524.141860 | 0) 0.049 us | irq_from_virq();
7238524.141860 | 0) | evtchn_from_irq() {
7238524.141860 | 0) | irq_get_irq_data() {
7238524.141860 | 0) 0.058 us | irq_to_desc();
7238524.141861 | 0) 0.498 us | }
7238524.141861 | 0) 0.897 us | }
7238524.141861 | 0) | get_evtchn_to_irq() {
7238524.141861 | 0) 0.049 us | evtchn_2l_max_channels();
7238524.141862 | 0) 0.392 us | }
7238524.141862 | 0) | generic_handle_irq() {
7238524.141862 | 0) 0.061 us | irq_to_desc();
7238524.141862 | 0) | handle_percpu_irq() {
7238524.141863 | 0) | ack_dynirq() {
7238524.141863 | 0) | evtchn_from_irq() {
7238524.141863 | 0) | irq_get_irq_data() {
7238524.141863 | 0) 0.051 us | irq_to_desc();
7238524.141863 | 0) 0.439 us | }
7238524.141864 | 0) 0.745 us | }
7238524.141864 | 0) 0.049 us | irq_move_irq();
7238524.141864 | 0) 0.060 us | evtchn_2l_clear_pending();
7238524.141864 | 0) 1.714 us | }
7238524.141865 | 0) | handle_irq_event_percpu() {
7238524.141865 | 0) | xen_irq_work_interrupt() {
7238524.141865 | 0) | irq_enter() {
7238524.141865 | 0) 0.053 us | rcu_irq_enter();
7238524.141866 | 0) 0.371 us | }
7238524.141866 | 0) | __wake_up() {
7238524.141866 | 0) 0.051 us | _raw_spin_lock_irqsave();
7238524.141867 | 0) | __wake_up_common() {
7238524.141867 | 0) | autoremove_wake_function() {
7238524.141867 | 0) | default_wake_function() {
7238524.141867 | 0) | try_to_wake_up() {
7238524.141868 | 0) 0.213 us | _raw_spin_lock_irqsave();
7238524.141869 | 0) 0.196 us | task_waking_fair();
7238524.141870 | 0) | select_task_rq_fair() {
7238524.141870 | 0) 0.051 us | source_load();
7238524.141870 | 0) 0.049 us | target_load();
7238524.141871 | 0) 0.065 us | idle_cpu();
7238524.141871 | 0) 0.051 us | cpus_share_cache();
7238524.141872 | 0) 0.078 us | idle_cpu();
7238524.141872 | 0) 2.427 us | }
7238524.141873 | 0) 0.050 us | _raw_spin_lock();
7238524.141873 | 0) | ttwu_do_activate.constprop.124() {
7238524.141873 | 0) | activate_task() {
7238524.141873 | 0) | enqueue_task() {
7238524.141873 | 0) 0.170 us | update_rq_clock();
7238524.141874 | 0) | enqueue_task_fair() {
7238524.141874 | 0) | enqueue_entity() {
7238524.141874 | 0) 0.058 us | update_curr();
7238524.141875 | 0) 0.076 us | __compute_runnable_contrib.part.51();
7238524.141875 | 0) 0.059 us | __update_entity_load_avg_contrib();
7238524.141875 | 0) 0.060 us | update_cfs_rq_blocked_load();
7238524.141876 | 0) 0.064 us | account_entity_enqueue();
7238524.141876 | 0) 0.123 us | update_cfs_shares();
7238524.141876 | 0) 0.055 us | place_entity();
7238524.141877 | 0) 0.078 us | __enqueue_entity();
7238524.141877 | 0) 3.039 us | }
7238524.141877 | 0) | enqueue_entity() {
7238524.141877 | 0) 0.065 us | update_curr();
7238524.141878 | 0) 0.050 us | update_cfs_rq_blocked_load();
7238524.141878 | 0) 0.049 us | account_entity_enqueue();
7238524.141878 | 0) 0.081 us | update_cfs_shares();
7238524.141879 | 0) 0.049 us | place_entity();
7238524.141879 | 0) 0.051 us | __enqueue_entity();
7238524.141879 | 0) 2.021 us | }
7238524.141880 | 0) 0.049 us | hrtick_update();
7238524.141880 | 0) 6.040 us | }
7238524.141880 | 0) 6.874 us | }
7238524.141880 | 0) 7.212 us | }
7238524.141880 | 0) | ttwu_do_wakeup() {
7238524.141881 | 0) | check_preempt_curr() {
7238524.141881 | 0) | resched_task() {
7238524.141881 | 0) | xen_smp_send_reschedule() {
7238524.141881 | 0) | xen_send_IPI_one() {
7238524.141881 | 0) | notify_remote_via_irq() {
7238524.141881 | 0) | evtchn_from_irq() {
7238524.141882 | 0) | irq_get_irq_data() {
7238524.141882 | 0) 0.049 us | irq_to_desc();
7238524.141882 | 0) 0.497 us | }
7238524.141882 | 0) 0.860 us | }
7238524.141883 | 0) 1.882 us | } /* notify_remote_via_irq */
7238524.141884 | 0) 2.257 us | }
7238524.141884 | 0) 2.619 us | }
7238524.141884 | 0) 3.079 us | }
7238524.141884 | 0) 3.526 us | }
7238524.141885 | 0) 4.485 us | }
7238524.141885 | 0) 12.333 us | }
7238524.141885 | 0) 0.062 us | _raw_spin_unlock();
7238524.141886 | 0) 0.169 us | ttwu_stat();
7238524.141887 | 0) 0.076 us | _raw_spin_unlock_irqrestore();
7238524.141887 | 0) 19.434 us | }
7238524.141887 | 0) 19.909 us | }
7238524.141888 | 0) 20.377 us | }
7238524.141888 | 0) 21.020 us | }
7238524.141888 | 0) 0.075 us | _raw_spin_unlock_irqrestore();
7238524.141888 | 0) 22.268 us | }
7238524.141889 | 0) | irq_exit() {
7238524.141889 | 0) 0.087 us | idle_cpu();
7238524.141889 | 0) 0.101 us | rcu_irq_exit();
7238524.141890 | 0) 1.127 us | }
7238524.141890 | 0) 25.163 us | }
7238524.141891 | 0) 0.133 us | add_interrupt_randomness();
7238524.141892 | 0) 0.083 us | note_interrupt();
7238524.141892 | 0) 27.453 us | }
7238524.141893 | 0) 30.024 us | }
7238524.141893 | 0) 30.898 us | }
7238524.141893 | 0) 33.683 us | }
7238524.141893 | 0) 34.097 us | }
7238524.141894 | 0) | irq_exit() {
7238524.141894 | 0) 0.065 us | idle_cpu();
7238524.141895 | 0) 0.076 us | rcu_irq_exit();
7238524.141895 | 0) 1.135 us | }
7238524.141895 | 0) 37.746 us | }
7238524.141896 | 0) 39.634 us | }
7238524.141897 | 0) 136100.0 us | }
7238524.141897 | 0) 136100.6 us | }
7238524.141897 | 0) 136101.4 us | }
7238524.141898 | 0) 0.093 us | down_read();
7238524.141899 | 0) | copy_from_read_buf() {
7238524.141900 | 0) | tty_audit_add_data() {
7238524.141900 | 0) 0.238 us | _raw_spin_lock_irqsave();
7238524.141901 | 0) 0.069 us | _raw_spin_unlock_irqrestore();
7238524.141901 | 0) 0.090 us | _raw_spin_lock_irqsave();
7238524.141902 | 0) 0.077 us | _raw_spin_unlock_irqrestore();
7238524.141902 | 0) 2.513 us | }
7238524.141903 | 0) 3.632 us | }
7238524.141903 | 0) 0.085 us | copy_from_read_buf();
7238524.141904 | 0) 0.066 us | n_tty_set_room();
7238524.141905 | 0) 0.067 us | n_tty_write_wakeup();
7238524.141905 | 0) | __wake_up() {
7238524.141906 | 0) 0.070 us | _raw_spin_lock_irqsave();
7238524.141906 | 0) | __wake_up_common() {
7238524.141906 | 0) 0.086 us | pollwake();
7238524.141907 | 0) 0.620 us | }
7238524.141907 | 0) 0.064 us | _raw_spin_unlock_irqrestore();
7238524.141907 | 0) 1.980 us | }
7238524.141908 | 0) 0.059 us | n_tty_set_room();
7238524.141909 | 0) 0.071 us | up_read();
7238524.141909 | 0) | remove_wait_queue() {
7238524.141909 | 0) 0.079 us | _raw_spin_lock_irqsave();
7238524.141910 | 0) 0.082 us | _raw_spin_unlock_irqrestore();
7238524.141910 | 0) 1.164 us | }
7238524.141910 | 0) 0.142 us | mutex_unlock();
7238524.141911 | 0) 136120.9 us | }
7238524.141911 | 0) | tty_ldisc_deref() {
7238524.141912 | 0) 0.062 us | ldsem_up_read();
7238524.141912 | 0) 0.593 us | }
7238524.141912 | 0) 0.079 us | get_seconds();
7238524.141913 | 0) 136125.1 us | }
7238524.141914 | 0) 0.280 us | __fsnotify_parent();
7238524.141915 | 0) 0.187 us | fsnotify();
7238524.141915 | 0) 136131.2 us | }
7238524.141988 | 0) | vfs_read() {
7238524.141989 | 0) | rw_verify_area() {
7238524.141989 | 0) | security_file_permission() {
7238524.141989 | 0) | apparmor_file_permission() {
7238524.141990 | 0) 0.149 us | common_file_perm();
7238524.141990 | 0) 0.774 us | }
7238524.141991 | 0) 0.079 us | __fsnotify_parent();
7238524.141991 | 0) 0.095 us | fsnotify();
7238524.141992 | 0) 2.558 us | }
7238524.141992 | 0) 3.300 us | }
7238524.141993 | 0) | tty_read() {
7238524.141993 | 0) 0.076 us | tty_paranoia_check();
7238524.141994 | 0) | tty_ldisc_ref_wait() {
7238524.141994 | 0) 0.081 us | ldsem_down_read();
7238524.141995 | 0) 0.689 us | }
7238524.141995 | 0) | n_tty_read() {
7238524.141995 | 0) 0.073 us | _raw_spin_lock_irq();
7238524.141996 | 0) 0.110 us | mutex_lock_interruptible();
7238524.141997 | 0) 0.069 us | down_read();
7238524.141998 | 0) | add_wait_queue() {
7238524.141998 | 0) 0.079 us | _raw_spin_lock_irqsave();
7238524.141999 | 0) 0.078 us | _raw_spin_unlock_irqrestore();
7238524.141999 | 0) 1.201 us | }
7238524.142000 | 0) 0.067 us | tty_hung_up_p();
7238524.142000 | 0) 0.078 us | n_tty_set_room();
7238524.142001 | 0) 0.079 us | up_read();
7238524.142001 | 0) | schedule_timeout() {
7238524.142002 | 0) | schedule() {
7238524.142002 | 0) | __schedule() {
7238524.142002 | 0) 0.076 us | rcu_note_context_switch();
7238524.142003 | 0) 0.080 us | _raw_spin_lock_irq();
7238524.142004 | 0) | deactivate_task() {
7238524.142004 | 0) | dequeue_task() {
7238524.142004 | 0) 0.178 us | update_rq_clock();
7238524.142005 | 0) | dequeue_task_fair() {
7238524.142005 | 0) | dequeue_entity() {
7238524.142005 | 0) | update_curr() {
7238524.142006 | 0) 0.263 us | cpuacct_charge();
7238524.142007 | 0) 0.965 us | }
7238524.142007 | 0) 0.075 us | update_cfs_rq_blocked_load();
7238524.142008 | 0) 0.065 us | clear_buddies();
7238524.142008 | 0) 0.084 us | account_entity_dequeue();
7238524.142009 | 0) | update_cfs_shares() {
7238524.142009 | 0) 0.115 us | update_curr();
7238524.142010 | 0) 0.084 us | account_entity_dequeue();
7238524.142010 | 0) 0.068 us | account_entity_enqueue();
7238524.142011 | 0) 1.754 us | }
7238524.142011 | 0) 5.580 us | }
7238524.142012 | 0) | dequeue_entity() {
7238524.142012 | 0) 0.089 us | update_curr();
7238524.142012 | 0) 0.101 us | update_cfs_rq_blocked_load();
7238524.142013 | 0) 0.076 us | clear_buddies();
7238524.142013 | 0) 0.078 us | account_entity_dequeue();
7238524.142014 | 0) 0.076 us | update_cfs_shares();
7238524.142015 | 0) 3.071 us | }
7238524.142015 | 0) 0.078 us | hrtick_update();
7238524.142016 | 0) 10.525 us | }
7238524.142016 | 0) 11.803 us | }
7238524.142016 | 0) 12.447 us | }
7238524.142017 | 0) | pick_next_task_fair() {
7238524.142017 | 0) 0.069 us | check_cfs_rq_runtime();
7238524.142017 | 0) | pick_next_entity() {
7238524.142018 | 0) 0.061 us | clear_buddies();
7238524.142018 | 0) 0.601 us | }
7238524.142019 | 0) | put_prev_entity() {
7238524.142019 | 0) 0.069 us | check_cfs_rq_runtime();
7238524.142019 | 0) 0.605 us | }
7238524.142020 | 0) | put_prev_entity() {
7238524.142020 | 0) 0.076 us | check_cfs_rq_runtime();
7238524.142020 | 0) 0.609 us | }
7238524.142021 | 0) | set_next_entity() {
7238524.142021 | 0) 0.088 us | update_stats_wait_end();
7238524.142022 | 0) 0.768 us | }
7238524.142022 | 0) 5.183 us | }
7238524.142023 | 0) 0.080 us | paravirt_start_context_switch();
7238524.142024 | 0) 0.076 us | xen_read_cr0();
7238524.142024 | 0) | xen_write_cr0() {
7238524.142025 | 0) 0.088 us | paravirt_get_lazy_mode();
7238524.142025 | 0) 0.096 us | __xen_mc_entry();
7238524.142026 | 0) 0.083 us | paravirt_get_lazy_mode();
7238524.142026 | 0) 1.802 us | }
7238524.142026 | 0) | xen_load_sp0() {
7238524.142027 | 0) 0.074 us | paravirt_get_lazy_mode();
7238524.142027 | 0) 0.098 us | __xen_mc_entry();
7238524.142028 | 0) 0.073 us | paravirt_get_lazy_mode();
7238524.142029 | 0) 2.289 us | }
7238524.142029 | 0) | xen_load_tls() {
7238524.142029 | 0) 0.073 us | paravirt_get_lazy_mode();
7238524.142030 | 0) 0.079 us | paravirt_get_lazy_mode();
7238524.142031 | 0) 0.135 us | load_TLS_descriptor();
7238524.142031 | 0) 0.082 us | load_TLS_descriptor();
7238524.142032 | 0) 0.091 us | load_TLS_descriptor();
7238524.142032 | 0) 0.081 us | paravirt_get_lazy_mode();
7238524.142033 | 0) 3.306 us | }
7238524.142033 | 0) | xen_end_context_switch() {
7238524.142033 | 0) 0.697 us | xen_mc_flush();
7238524.142034 | 0) 0.083 us | paravirt_end_context_switch();
7238524.142035 | 0) 1.876 us | }
7238524.269404 | 0) | finish_task_switch() {
7238524.269408 | 0) | xen_evtchn_do_upcall() {
7238524.269408 | 0) | irq_enter() {
7238524.269408 | 0) 0.132 us | rcu_irq_enter();
7238524.269409 | 0) 0.948 us | }
7238524.269409 | 0) 0.063 us | exit_idle();
7238524.269410 | 0) | __xen_evtchn_do_upcall() {
7238524.269410 | 0) | evtchn_2l_handle_events() {
7238524.269410 | 0) 0.057 us | irq_from_virq();
7238524.269411 | 0) | evtchn_from_irq() {
7238524.269411 | 0) | irq_get_irq_data() {
7238524.269411 | 0) 0.058 us | irq_to_desc();
7238524.269412 | 0) 0.579 us | }
7238524.269412 | 0) 0.898 us | }
7238524.269412 | 0) | get_evtchn_to_irq() {
7238524.269412 | 0) 0.049 us | evtchn_2l_max_channels();
7238524.269412 | 0) 0.390 us | }
7238524.269413 | 0) | generic_handle_irq() {
7238524.269413 | 0) 0.051 us | irq_to_desc();
7238524.269413 | 0) | handle_percpu_irq() {
7238524.269413 | 0) | ack_dynirq() {
7238524.269413 | 0) | evtchn_from_irq() {
7238524.269414 | 0) | irq_get_irq_data() {
7238524.269414 | 0) 0.057 us | irq_to_desc();
7238524.269414 | 0) 0.446 us | }
7238524.269414 | 0) 0.754 us | }
7238524.269414 | 0) 0.057 us | irq_move_irq();
7238524.269415 | 0) 0.057 us | evtchn_2l_clear_pending();
7238524.269415 | 0) 1.718 us | }
7238524.269415 | 0) | handle_irq_event_percpu() {
7238524.269416 | 0) | xen_irq_work_interrupt() {
7238524.269416 | 0) | irq_enter() {
7238524.269416 | 0) 0.059 us | rcu_irq_enter();
7238524.269416 | 0) 0.380 us | }
7238524.269417 | 0) | __wake_up() {
7238524.269417 | 0) 0.051 us | _raw_spin_lock_irqsave();
7238524.269417 | 0) | __wake_up_common() {
7238524.269417 | 0) | autoremove_wake_function() {
7238524.269418 | 0) | default_wake_function() {
7238524.269418 | 0) | try_to_wake_up() {
7238524.269418 | 0) 0.230 us | _raw_spin_lock_irqsave();
7238524.269419 | 0) 0.197 us | task_waking_fair();
7238524.269419 | 0) | select_task_rq_fair() {
7238524.269419 | 0) 0.050 us | source_load();
7238524.269420 | 0) 0.057 us | target_load();
7238524.269420 | 0) 0.065 us | idle_cpu();
7238524.269421 | 0) 0.055 us | cpus_share_cache();
7238524.269421 | 0) 0.076 us | idle_cpu();
7238524.269421 | 0) 2.041 us | }
7238524.269422 | 0) 0.050 us | _raw_spin_lock();
7238524.269422 | 0) | ttwu_do_activate.constprop.124() {
7238524.269422 | 0) | activate_task() {
7238524.269422 | 0) | enqueue_task() {
7238524.269422 | 0) 0.175 us | update_rq_clock();
7238524.269423 | 0) | enqueue_task_fair() {
7238524.269423 | 0) | enqueue_entity() {
7238524.269423 | 0) 0.065 us | update_curr();
7238524.269424 | 0) 0.070 us | __compute_runnable_contrib.part.51();
7238524.269424 | 0) 0.052 us | __update_entity_load_avg_contrib();
7238524.269424 | 0) 0.050 us | update_cfs_rq_blocked_load();
7238524.269425 | 0) 0.059 us | account_entity_enqueue();
7238524.269426 | 0) 0.134 us | update_cfs_shares();
7238524.269426 | 0) 0.055 us | place_entity();
7238524.269427 | 0) 0.083 us | __enqueue_entity();
7238524.269427 | 0) 4.026 us | }
7238524.269427 | 0) | enqueue_entity() {
7238524.269428 | 0) 0.065 us | update_curr();
7238524.269428 | 0) 0.051 us | update_cfs_rq_blocked_load();
7238524.269428 | 0) 0.058 us | account_entity_enqueue();
7238524.269429 | 0) 0.082 us | update_cfs_shares();
7238524.269429 | 0) 0.105 us | place_entity();
7238524.269429 | 0) 0.049 us | __enqueue_entity();
7238524.269430 | 0) 2.247 us | }
7238524.269430 | 0) 0.050 us | hrtick_update();
7238524.269430 | 0) 7.310 us | }
7238524.269430 | 0) 8.101 us | }
7238524.269431 | 0) 8.449 us | }
7238524.269431 | 0) | ttwu_do_wakeup() {
7238524.269431 | 0) | check_preempt_curr() {
7238524.269431 | 0) | resched_task() {
7238524.269431 | 0) | xen_smp_send_reschedule() {
7238524.269432 | 0) | xen_send_IPI_one() {
7238524.269432 | 0) | notify_remote_via_irq() {
7238524.269432 | 0) | evtchn_from_irq() {
7238524.269432 | 0) | irq_get_irq_data() {
7238524.269432 | 0) 0.051 us | irq_to_desc();
7238524.269433 | 0) 0.493 us | }
7238524.269433 | 0) 0.857 us | }
7238524.269434 | 0) 1.909 us | } /* notify_remote_via_irq */
7238524.269434 | 0) 2.288 us | }
7238524.269434 | 0) 2.655 us | }
7238524.269434 | 0) 3.127 us | }
7238524.269435 | 0) 3.590 us | }
7238524.269435 | 0) 4.506 us | }
7238524.269436 | 0) 13.594 us | }
7238524.269436 | 0) 0.070 us | _raw_spin_unlock();
7238524.269436 | 0) 0.163 us | ttwu_stat();
7238524.269437 | 0) 0.080 us | _raw_spin_unlock_irqrestore();
7238524.269438 | 0) 19.508 us | }
7238524.269438 | 0) 19.991 us | }
7238524.269438 | 0) 20.486 us | }
7238524.269438 | 0) 21.024 us | }
7238524.269438 | 0) 0.076 us | _raw_spin_unlock_irqrestore();
7238524.269439 | 0) 22.247 us | }
7238524.269439 | 0) | irq_exit() {
7238524.269439 | 0) 0.101 us | idle_cpu();
7238524.269440 | 0) 0.099 us | rcu_irq_exit();
7238524.269441 | 0) 1.207 us | }
7238524.269441 | 0) 25.035 us | }
7238524.269441 | 0) 0.131 us | add_interrupt_randomness();
7238524.269442 | 0) 0.076 us | note_interrupt();
7238524.269442 | 0) 26.909 us | }
7238524.269443 | 0) 29.377 us | }
7238524.269443 | 0) 30.139 us | }
7238524.269443 | 0) 32.759 us | }
7238524.269443 | 0) 33.204 us | }
7238524.269444 | 0) | irq_exit() {
7238524.269444 | 0) | __do_softirq() {
7238524.269444 | 0) 0.068 us | msecs_to_jiffies();
7238524.269445 | 0) | rcu_process_callbacks() {
7238524.269445 | 0) 0.070 us | note_gp_changes();
7238524.269445 | 0) 0.064 us | _raw_spin_lock_irqsave();
7238524.269446 | 0) 0.135 us | rcu_accelerate_cbs();
7238524.269447 | 0) | rcu_report_qs_rnp() {
7238524.269447 | 0) 0.061 us | _raw_spin_unlock_irqrestore();
7238524.269448 | 0) 0.779 us | }
7238524.269448 | 0) 0.081 us | cpu_needs_another_gp();
7238524.269449 | 0) | file_free_rcu() {
7238524.269449 | 0) 0.291 us | kmem_cache_free();
7238524.269450 | 0) 1.139 us | }
7238524.269451 | 0) | put_cred_rcu() {
7238524.269451 | 0) | security_cred_free() {
7238524.269452 | 0) | apparmor_cred_free() {
7238524.269453 | 0) | aa_free_task_context() {
7238524.269453 | 0) | kzfree() {
7238524.269454 | 0) 0.380 us | ksize();
7238524.269455 | 0) 0.147 us | kfree();
7238524.269455 | 0) 1.602 us | }
7238524.269455 | 0) 2.631 us | }
7238524.269456 | 0) 3.611 us | } /* apparmor_cred_free */
7238524.269456 | 0) 4.927 us | }
7238524.269457 | 0) 0.071 us | key_put();
7238524.269457 | 0) 0.071 us | key_put();
7238524.269458 | 0) 0.065 us | key_put();
7238524.269458 | 0) 0.066 us | key_put();
7238524.269459 | 0) 0.390 us | free_uid();
7238524.269460 | 0) 0.178 us | kmem_cache_free();
7238524.269460 | 0) 9.429 us | }
7238524.269461 | 0) 0.099 us | note_gp_changes();
7238524.269461 | 0) 0.080 us | cpu_needs_another_gp();
7238524.269462 | 0) 16.796 us | }
7238524.269462 | 0) 0.068 us | rcu_bh_qs();
7238524.269462 | 0) 0.066 us | __local_bh_enable();
7238524.269463 | 0) 18.770 us | }
7238524.269463 | 0) 0.073 us | idle_cpu();
7238524.269464 | 0) 0.088 us | rcu_irq_exit();
7238524.269464 | 0) 20.487 us | }
7238524.269465 | 0) 56.365 us | }
7238524.269465 | 0) 58.028 us | }
7238524.269466 | 0) 127463.5 us | }
7238524.269466 | 0) 127464.2 us | }
7238524.269467 | 0) 127465.0 us | }
7238524.269467 | 0) 0.095 us | down_read();
7238524.269468 | 0) | copy_from_read_buf() {
7238524.269469 | 0) | tty_audit_add_data() {
7238524.269469 | 0) 0.228 us | _raw_spin_lock_irqsave();
7238524.269470 | 0) 0.070 us | _raw_spin_unlock_irqrestore();
7238524.269471 | 0) 0.074 us | _raw_spin_lock_irqsave();
7238524.269471 | 0) 0.079 us | _raw_spin_unlock_irqrestore();
7238524.269472 | 0) 2.616 us | }
7238524.269472 | 0) 3.878 us | }
7238524.269473 | 0) 0.104 us | copy_from_read_buf();
7238524.269473 | 0) 0.074 us | n_tty_set_room();
7238524.269474 | 0) 0.067 us | n_tty_write_wakeup();
7238524.269474 | 0) | __wake_up() {
7238524.269475 | 0) 0.077 us | _raw_spin_lock_irqsave();
7238524.269475 | 0) | __wake_up_common() {
7238524.269476 | 0) 0.095 us | pollwake();
7238524.269476 | 0) 0.694 us | }
7238524.269476 | 0) 0.064 us | _raw_spin_unlock_irqrestore();
7238524.269477 | 0) 2.128 us | }
7238524.269477 | 0) 0.062 us | n_tty_set_room();
7238524.269477 | 0) 0.066 us | up_read();
7238524.269478 | 0) | remove_wait_queue() {
7238524.269478 | 0) 0.080 us | _raw_spin_lock_irqsave();
7238524.269479 | 0) 0.081 us | _raw_spin_unlock_irqrestore();
7238524.269480 | 0) 1.225 us | }
7238524.269480 | 0) 0.152 us | mutex_unlock();
7238524.269480 | 0) 127485.3 us | }
7238524.269481 | 0) | tty_ldisc_deref() {
7238524.269481 | 0) 0.081 us | ldsem_up_read();
7238524.269482 | 0) 0.655 us | }
7238524.269482 | 0) 0.089 us | get_seconds();
7238524.269483 | 0) 127490.1 us | }
7238524.269484 | 0) 0.287 us | __fsnotify_parent();
7238524.269484 | 0) 0.183 us | fsnotify();
7238524.269485 | 0) 127496.2 us | }
7238524.269559 | 0) | vfs_read() {
7238524.269559 | 0) | rw_verify_area() {
7238524.269560 | 0) | security_file_permission() {
7238524.269560 | 0) | apparmor_file_permission() {
7238524.269561 | 0) 0.164 us | common_file_perm();
7238524.269561 | 0) 0.831 us | }
7238524.269562 | 0) 0.078 us | __fsnotify_parent();
7238524.269562 | 0) 0.080 us | fsnotify();
7238524.269563 | 0) 2.765 us | }
7238524.269563 | 0) 3.490 us | }
7238524.269564 | 0) | tty_read() {
7238524.269564 | 0) 0.066 us | tty_paranoia_check();
7238524.269564 | 0) | tty_ldisc_ref_wait() {
7238524.269565 | 0) 0.085 us | ldsem_down_read();
7238524.269565 | 0) 0.656 us | }
7238524.269566 | 0) | n_tty_read() {
7238524.269566 | 0) 0.078 us | _raw_spin_lock_irq();
7238524.269567 | 0) 0.118 us | mutex_lock_interruptible();
7238524.269567 | 0) 0.078 us | down_read();
7238524.269568 | 0) | add_wait_queue() {
7238524.269568 | 0) 0.089 us | _raw_spin_lock_irqsave();
7238524.269569 | 0) 0.082 us | _raw_spin_unlock_irqrestore();
7238524.269569 | 0) 1.164 us | }
7238524.269570 | 0) 0.073 us | tty_hung_up_p();
7238524.269570 | 0) 0.076 us | n_tty_set_room();
7238524.269571 | 0) 0.078 us | up_read();
7238524.269571 | 0) | schedule_timeout() {
7238524.269572 | 0) | schedule() {
7238524.269572 | 0) | __schedule() {
7238524.269572 | 0) 0.078 us | rcu_note_context_switch();
7238524.269573 | 0) 0.085 us | _raw_spin_lock_irq();
7238524.269574 | 0) | deactivate_task() {
7238524.269574 | 0) | dequeue_task() {
7238524.269574 | 0) 0.185 us | update_rq_clock();
7238524.269575 | 0) | dequeue_task_fair() {
7238524.269575 | 0) | dequeue_entity() {
7238524.269575 | 0) | update_curr() {
7238524.269576 | 0) 0.206 us | cpuacct_charge();
7238524.269577 | 0) 0.937 us | }
7238524.269577 | 0) 0.084 us | __update_entity_load_avg_contrib();
7238524.269577 | 0) 0.077 us | update_cfs_rq_blocked_load();
7238524.269578 | 0) 0.075 us | clear_buddies();
7238524.269579 | 0) 0.096 us | account_entity_dequeue();
7238524.269579 | 0) | update_cfs_shares() {
7238524.269580 | 0) 0.095 us | update_curr();
7238524.269580 | 0) 0.104 us | account_entity_dequeue();
7238524.269581 | 0) 0.076 us | account_entity_enqueue();
7238524.269581 | 0) 1.898 us | }
7238524.269582 | 0) 6.120 us | }
7238524.269582 | 0) | dequeue_entity() {
7238524.269582 | 0) 0.093 us | update_curr();
7238524.269583 | 0) 0.116 us | __update_entity_load_avg_contrib();
7238524.269583 | 0) 0.085 us | update_cfs_rq_blocked_load();
7238524.269584 | 0) 0.067 us | clear_buddies();
7238524.269585 | 0) 0.082 us | account_entity_dequeue();
7238524.269585 | 0) 0.097 us | update_cfs_shares();
7238524.269586 | 0) 3.833 us | }
7238524.269586 | 0) 0.070 us | hrtick_update();
7238524.269587 | 0) 11.677 us | }
7238524.269587 | 0) 13.001 us | }
7238524.269587 | 0) 13.516 us | }
7238524.269588 | 0) | pick_next_task_fair() {
7238524.269588 | 0) 0.072 us | check_cfs_rq_runtime();
7238524.269588 | 0) | pick_next_entity() {
7238524.269589 | 0) 0.080 us | clear_buddies();
7238524.269589 | 0) 0.675 us | }
7238524.269590 | 0) | put_prev_entity() {
7238524.269590 | 0) 0.071 us | check_cfs_rq_runtime();
7238524.269591 | 0) 0.543 us | }
7238524.269591 | 0) | put_prev_entity() {
7238524.269591 | 0) 0.066 us | check_cfs_rq_runtime();
7238524.269592 | 0) 0.658 us | }
7238524.269592 | 0) | set_next_entity() {
7238524.269593 | 0) 0.082 us | update_stats_wait_end();
7238524.269593 | 0) 0.844 us | }
7238524.269594 | 0) 5.970 us | }
7238524.269594 | 0) 0.076 us | paravirt_start_context_switch();
7238524.269595 | 0) 0.074 us | xen_read_cr0();
7238524.269596 | 0) | xen_write_cr0() {
7238524.269597 | 0) 0.081 us | paravirt_get_lazy_mode();
7238524.269597 | 0) 0.086 us | __xen_mc_entry();
7238524.269598 | 0) 0.070 us | paravirt_get_lazy_mode();
7238524.269598 | 0) 1.739 us | }
7238524.269598 | 0) | xen_load_sp0() {
7238524.269599 | 0) 0.078 us | paravirt_get_lazy_mode();
7238524.269599 | 0) 0.078 us | __xen_mc_entry();
7238524.269600 | 0) 0.069 us | paravirt_get_lazy_mode();
7238524.269600 | 0) 1.568 us | }
7238524.269601 | 0) | xen_load_tls() {
7238524.269601 | 0) 0.068 us | paravirt_get_lazy_mode();
7238524.269601 | 0) 0.068 us | paravirt_get_lazy_mode();
7238524.269602 | 0) 0.078 us | load_TLS_descriptor();
7238524.269602 | 0) 0.071 us | load_TLS_descriptor();
7238524.269603 | 0) 0.073 us | load_TLS_descriptor();
7238524.269603 | 0) 0.063 us | paravirt_get_lazy_mode();
7238524.269604 | 0) 3.025 us | }
7238524.269604 | 0) | xen_end_context_switch() {
7238524.269604 | 0) 0.646 us | xen_mc_flush();
7238524.269605 | 0) 0.087 us | paravirt_end_context_switch();
7238524.269606 | 0) 1.604 us | }
^C
Ending tracing...
If you read through the durations carefully, you can see that the shell begins
by completing a 19 second read (time between commands), then has a series of
100 to 200 ms reads (inter-keystroke latency).
The function times printed are inclusive of their children.
The -C option will print on-CPU times only, excluding sleeping or blocking
events from the function duration times. Eg:
# ./funcgraph -Ctp 25285 vfs_read
Tracing "vfs_read" for PID 25285... Ctrl-C to end.
7338520.591816 | 0) | finish_task_switch() {
7338520.591820 | 0) | xen_evtchn_do_upcall() {
7338520.591821 | 0) | irq_enter() {
7338520.591821 | 0) 0.134 us | rcu_irq_enter();
7338520.591822 | 0) 0.823 us | }
7338520.591822 | 0) 0.055 us | exit_idle();
7338520.591822 | 0) | __xen_evtchn_do_upcall() {
7338520.591823 | 0) | evtchn_2l_handle_events() {
7338520.591823 | 0) 0.051 us | irq_from_virq();
7338520.591823 | 0) | evtchn_from_irq() {
7338520.591823 | 0) | irq_get_irq_data() {
7338520.591824 | 0) 0.064 us | irq_to_desc();
7338520.591824 | 0) 0.572 us | }
7338520.591824 | 0) 0.973 us | }
7338520.591825 | 0) | get_evtchn_to_irq() {
7338520.591825 | 0) 0.049 us | evtchn_2l_max_channels();
7338520.591825 | 0) 0.386 us | }
7338520.591825 | 0) | generic_handle_irq() {
7338520.591825 | 0) 0.061 us | irq_to_desc();
7338520.591826 | 0) | handle_percpu_irq() {
7338520.591826 | 0) | ack_dynirq() {
7338520.591826 | 0) | evtchn_from_irq() {
7338520.591826 | 0) | irq_get_irq_data() {
7338520.591827 | 0) 0.050 us | irq_to_desc();
7338520.591827 | 0) 0.441 us | }
7338520.591827 | 0) 0.748 us | }
7338520.591827 | 0) 0.048 us | irq_move_irq();
7338520.591828 | 0) 0.053 us | evtchn_2l_clear_pending();
7338520.591828 | 0) 1.810 us | }
7338520.591828 | 0) | handle_irq_event_percpu() {
7338520.591828 | 0) | xen_irq_work_interrupt() {
7338520.591829 | 0) | irq_enter() {
7338520.591829 | 0) 0.069 us | rcu_irq_enter();
7338520.591829 | 0) 0.386 us | }
7338520.591830 | 0) | __wake_up() {
7338520.591830 | 0) 0.060 us | _raw_spin_lock_irqsave();
7338520.591830 | 0) | __wake_up_common() {
7338520.591830 | 0) | autoremove_wake_function() {
7338520.591831 | 0) | default_wake_function() {
7338520.591831 | 0) | try_to_wake_up() {
7338520.591831 | 0) 0.223 us | _raw_spin_lock_irqsave();
7338520.591832 | 0) 0.243 us | task_waking_fair();
7338520.591832 | 0) | select_task_rq_fair() {
7338520.591833 | 0) 0.063 us | source_load();
7338520.591833 | 0) 0.059 us | target_load();
7338520.591834 | 0) 0.060 us | idle_cpu();
7338520.591834 | 0) 0.059 us | cpus_share_cache();
7338520.591834 | 0) 0.085 us | idle_cpu();
7338520.591835 | 0) 2.176 us | }
7338520.591835 | 0) 0.050 us | _raw_spin_lock();
7338520.591835 | 0) | ttwu_do_activate.constprop.124() {
7338520.591835 | 0) | activate_task() {
7338520.591836 | 0) | enqueue_task() {
7338520.591836 | 0) 0.197 us | update_rq_clock();
7338520.591836 | 0) | enqueue_task_fair() {
7338520.591836 | 0) | enqueue_entity() {
7338520.591837 | 0) 0.118 us | update_curr();
7338520.591837 | 0) 0.060 us | __compute_runnable_contrib.part.51();
7338520.591838 | 0) 0.052 us | __update_entity_load_avg_contrib();
7338520.591838 | 0) 0.132 us | update_cfs_rq_blocked_load();
7338520.591838 | 0) 0.068 us | account_entity_enqueue();
7338520.591839 | 0) 0.327 us | update_cfs_shares();
7338520.591839 | 0) 0.055 us | place_entity();
7338520.591840 | 0) 0.086 us | __enqueue_entity();
7338520.591840 | 0) 0.069 us | update_cfs_rq_blocked_load();
7338520.591840 | 0) 3.870 us | }
7338520.591841 | 0) | enqueue_entity() {
7338520.591841 | 0) 0.050 us | update_curr();
7338520.591841 | 0) 0.048 us | __compute_runnable_contrib.part.51();
7338520.591842 | 0) 0.079 us | __update_entity_load_avg_contrib();
7338520.591842 | 0) 0.068 us | update_cfs_rq_blocked_load();
7338520.591842 | 0) 0.072 us | account_entity_enqueue();
7338520.591843 | 0) 0.068 us | update_cfs_shares();
7338520.591844 | 0) 0.123 us | place_entity();
7338520.591844 | 0) 0.051 us | __enqueue_entity();
7338520.591845 | 0) 3.919 us | }
7338520.591845 | 0) 0.059 us | hrtick_update();
7338520.591845 | 0) 8.895 us | }
7338520.591846 | 0) 9.770 us | }
7338520.591846 | 0) 10.197 us | }
7338520.591846 | 0) | ttwu_do_wakeup() {
7338520.591846 | 0) | check_preempt_curr() {
7338520.591846 | 0) | resched_task() {
7338520.591847 | 0) | xen_smp_send_reschedule() {
7338520.591847 | 0) | xen_send_IPI_one() {
7338520.591847 | 0) | notify_remote_via_irq() {
7338520.591847 | 0) | evtchn_from_irq() {
7338520.591848 | 0) | irq_get_irq_data() {
7338520.591848 | 0) 0.051 us | irq_to_desc();
7338520.591848 | 0) 0.503 us | }
7338520.591848 | 0) 1.031 us | }
7338520.591849 | 0) 2.112 us | }
7338520.591849 | 0) 2.484 us | }
7338520.591850 | 0) 2.851 us | }
7338520.591850 | 0) 3.311 us | }
7338520.591850 | 0) 3.828 us | }
7338520.591851 | 0) 4.788 us | }
7338520.591851 | 0) 15.731 us | }
7338520.591851 | 0) 0.074 us | _raw_spin_unlock();
7338520.591852 | 0) 0.156 us | ttwu_stat();
7338520.591852 | 0) 0.080 us | _raw_spin_unlock_irqrestore();
7338520.591853 | 0) 21.807 us | }
7338520.591853 | 0) 22.286 us | }
7338520.591853 | 0) 22.738 us | }
7338520.591854 | 0) 23.387 us | }
7338520.591854 | 0) 0.105 us | _raw_spin_unlock_irqrestore();
7338520.591854 | 0) 24.698 us | }
7338520.591855 | 0) | irq_exit() {
7338520.591855 | 0) 0.086 us | idle_cpu();
7338520.591856 | 0) 0.105 us | rcu_irq_exit();
7338520.591856 | 0) 1.272 us | }
7338520.591856 | 0) 27.818 us | }
7338520.591857 | 0) 0.140 us | add_interrupt_randomness();
7338520.591857 | 0) 0.084 us | note_interrupt();
7338520.591858 | 0) 29.866 us | }
7338520.591858 | 0) 32.390 us | }
7338520.591859 | 0) 33.204 us | }
7338520.591859 | 0) 36.137 us | }
7338520.591859 | 0) 36.574 us | }
7338520.591859 | 0) | irq_exit() {
7338520.591860 | 0) 0.073 us | idle_cpu();
7338520.591860 | 0) 0.076 us | rcu_irq_exit();
7338520.591861 | 0) 1.091 us | }
7338520.591861 | 0) 40.156 us | }
7338520.591862 | 0) 41.874 us | }
7338520.591862 | 0) 75.633 us | } /* __schedule */
7338520.591862 | 0) 76.182 us | } /* schedule */
7338520.591863 | 0) 76.965 us | } /* schedule_timeout */
7338520.591863 | 0) 0.070 us | down_read();
7338520.591864 | 0) | copy_from_read_buf() {
7338520.591865 | 0) | tty_audit_add_data() {
7338520.591865 | 0) 0.232 us | _raw_spin_lock_irqsave();
7338520.591866 | 0) 0.079 us | _raw_spin_unlock_irqrestore();
7338520.591867 | 0) 0.122 us | _raw_spin_lock_irqsave();
7338520.591867 | 0) 0.066 us | _raw_spin_unlock_irqrestore();
7338520.591868 | 0) 2.642 us | }
7338520.591868 | 0) 3.886 us | }
7338520.591868 | 0) 0.149 us | copy_from_read_buf();
7338520.591869 | 0) 0.072 us | n_tty_set_room();
7338520.591870 | 0) 0.071 us | n_tty_write_wakeup();
7338520.591870 | 0) | __wake_up() {
7338520.591871 | 0) 0.071 us | _raw_spin_lock_irqsave();
7338520.591872 | 0) | __wake_up_common() {
7338520.591872 | 0) 0.097 us | pollwake();
7338520.591873 | 0) 0.739 us | }
7338520.591873 | 0) 0.066 us | _raw_spin_unlock_irqrestore();
7338520.591874 | 0) 3.043 us | }
7338520.591874 | 0) 0.075 us | n_tty_set_room();
7338520.591875 | 0) 0.106 us | up_read();
7338520.591875 | 0) | remove_wait_queue() {
7338520.591875 | 0) 0.078 us | _raw_spin_lock_irqsave();
7338520.591876 | 0) 0.075 us | _raw_spin_unlock_irqrestore();
7338520.591877 | 0) 1.165 us | }
7338520.591877 | 0) 0.137 us | mutex_unlock();
7338520.591877 | 0) 98.321 us | } /* n_tty_read */
7338520.591878 | 0) | tty_ldisc_deref() {
7338520.591878 | 0) 0.072 us | ldsem_up_read();
7338520.591879 | 0) 0.561 us | }
7338520.591879 | 0) 0.090 us | get_seconds();
7338520.591880 | 0) 102.599 us | } /* tty_read */
7338520.591880 | 0) 0.362 us | __fsnotify_parent();
7338520.591881 | 0) 0.171 us | fsnotify();
7338520.591882 | 0) 109.640 us | } /* vfs_read */
7338520.591951 | 0) | vfs_read() {
7338520.591951 | 0) | rw_verify_area() {
7338520.591952 | 0) | security_file_permission() {
7338520.591952 | 0) | apparmor_file_permission() {
7338520.591952 | 0) 0.174 us | common_file_perm();
7338520.591953 | 0) 0.762 us | }
7338520.591953 | 0) 0.126 us | __fsnotify_parent();
7338520.591954 | 0) 0.088 us | fsnotify();
7338520.591954 | 0) 2.609 us | }
7338520.591955 | 0) 3.351 us | }
7338520.591955 | 0) | tty_read() {
7338520.591956 | 0) 0.081 us | tty_paranoia_check();
7338520.591956 | 0) | tty_ldisc_ref_wait() {
7338520.591956 | 0) 0.090 us | ldsem_down_read();
7338520.591957 | 0) 0.633 us | }
7338520.591957 | 0) | n_tty_read() {
7338520.591958 | 0) 0.073 us | _raw_spin_lock_irq();
7338520.591958 | 0) 0.089 us | mutex_lock_interruptible();
7338520.591959 | 0) 0.080 us | down_read();
7338520.591960 | 0) | add_wait_queue() {
7338520.591960 | 0) 0.084 us | _raw_spin_lock_irqsave();
7338520.591960 | 0) 0.087 us | _raw_spin_unlock_irqrestore();
7338520.591961 | 0) 1.215 us | }
7338520.591961 | 0) 0.078 us | tty_hung_up_p();
7338520.591962 | 0) 0.084 us | n_tty_set_room();
7338520.591962 | 0) 0.072 us | up_read();
7338520.591963 | 0) | schedule_timeout() {
7338520.591963 | 0) | schedule() {
7338520.591964 | 0) | __schedule() {
7338520.591964 | 0) 0.084 us | rcu_note_context_switch();
7338520.591965 | 0) 0.086 us | _raw_spin_lock_irq();
7338520.591965 | 0) | deactivate_task() {
7338520.591966 | 0) | dequeue_task() {
7338520.591966 | 0) 0.171 us | update_rq_clock();
7338520.591966 | 0) | dequeue_task_fair() {
7338520.591967 | 0) | dequeue_entity() {
7338520.591967 | 0) | update_curr() {
7338520.591967 | 0) 0.248 us | cpuacct_charge();
7338520.591968 | 0) 0.974 us | }
7338520.591969 | 0) 0.074 us | update_cfs_rq_blocked_load();
7338520.591969 | 0) 0.081 us | clear_buddies();
7338520.591970 | 0) 0.094 us | account_entity_dequeue();
7338520.591971 | 0) | update_cfs_shares() {
7338520.591971 | 0) 0.096 us | update_curr();
7338520.591971 | 0) 0.093 us | account_entity_dequeue();
7338520.591972 | 0) 0.079 us | account_entity_enqueue();
7338520.591972 | 0) 1.743 us | }
7338520.591972 | 0) 5.515 us | }
7338520.591973 | 0) | dequeue_entity() {
7338520.591973 | 0) 0.088 us | update_curr();
7338520.591974 | 0) 0.106 us | update_cfs_rq_blocked_load();
7338520.591975 | 0) 0.078 us | clear_buddies();
7338520.591975 | 0) 0.088 us | account_entity_dequeue();
7338520.591976 | 0) 0.091 us | update_cfs_shares();
7338520.591977 | 0) 3.639 us | }
7338520.591977 | 0) 0.078 us | hrtick_update();
7338520.591978 | 0) 10.851 us | }
7338520.591978 | 0) 11.992 us | }
7338520.591978 | 0) 12.496 us | }
7338520.591978 | 0) | pick_next_task_fair() {
7338520.591979 | 0) 0.079 us | check_cfs_rq_runtime();
7338520.591979 | 0) | pick_next_entity() {
7338520.591979 | 0) 0.080 us | clear_buddies();
7338520.591980 | 0) 0.594 us | }
7338520.591980 | 0) | put_prev_entity() {
7338520.591980 | 0) 0.078 us | check_cfs_rq_runtime();
7338520.591981 | 0) 0.641 us | }
7338520.591981 | 0) | put_prev_entity() {
7338520.591982 | 0) 0.076 us | check_cfs_rq_runtime();
7338520.591982 | 0) 0.610 us | }
7338520.591982 | 0) | set_next_entity() {
7338520.591983 | 0) 0.097 us | update_stats_wait_end();
7338520.591983 | 0) 0.744 us | }
7338520.591984 | 0) 5.115 us | }
7338520.591984 | 0) 0.076 us | paravirt_start_context_switch();
7338520.591985 | 0) 0.086 us | xen_read_cr0();
7338520.591986 | 0) | xen_write_cr0() {
7338520.591986 | 0) 0.078 us | paravirt_get_lazy_mode();
7338520.591987 | 0) 0.086 us | __xen_mc_entry();
7338520.591987 | 0) 0.078 us | paravirt_get_lazy_mode();
7338520.591988 | 0) 1.698 us | }
7338520.591988 | 0) | xen_load_sp0() {
7338520.591988 | 0) 0.074 us | paravirt_get_lazy_mode();
7338520.591989 | 0) 0.084 us | __xen_mc_entry();
7338520.591989 | 0) 0.084 us | paravirt_get_lazy_mode();
7338520.591990 | 0) 1.724 us | }
7338520.591990 | 0) | xen_load_tls() {
7338520.591991 | 0) 0.080 us | paravirt_get_lazy_mode();
7338520.591991 | 0) 0.088 us | paravirt_get_lazy_mode();
7338520.591992 | 0) 0.140 us | load_TLS_descriptor();
7338520.591992 | 0) 0.079 us | load_TLS_descriptor();
7338520.591993 | 0) 0.087 us | load_TLS_descriptor();
7338520.591994 | 0) 0.078 us | paravirt_get_lazy_mode();
7338520.591994 | 0) 3.666 us | }
7338520.591995 | 0) | xen_end_context_switch() {
7338520.591995 | 0) 0.644 us | xen_mc_flush();
7338520.591996 | 0) 0.080 us | paravirt_end_context_switch();
7338520.591997 | 0) 1.813 us | }
7338520.855105 | 0) | finish_task_switch() {
7338520.855110 | 0) | xen_evtchn_do_upcall() {
7338520.855110 | 0) | irq_enter() {
7338520.855110 | 0) 0.137 us | rcu_irq_enter();
7338520.855111 | 0) 0.673 us | }
7338520.855111 | 0) 0.063 us | exit_idle();
7338520.855111 | 0) | __xen_evtchn_do_upcall() {
7338520.855112 | 0) | evtchn_2l_handle_events() {
7338520.855112 | 0) 0.050 us | irq_from_virq();
7338520.855112 | 0) | evtchn_from_irq() {
7338520.855112 | 0) | irq_get_irq_data() {
7338520.855113 | 0) 0.050 us | irq_to_desc();
7338520.855113 | 0) 0.568 us | }
7338520.855113 | 0) 0.895 us | }
7338520.855114 | 0) | get_evtchn_to_irq() {
7338520.855114 | 0) 0.048 us | evtchn_2l_max_channels();
7338520.855114 | 0) 0.386 us | }
7338520.855114 | 0) | generic_handle_irq() {
7338520.855114 | 0) 0.051 us | irq_to_desc();
7338520.855115 | 0) | handle_percpu_irq() {
7338520.855115 | 0) | ack_dynirq() {
7338520.855115 | 0) | evtchn_from_irq() {
7338520.855115 | 0) | irq_get_irq_data() {
7338520.855116 | 0) 0.058 us | irq_to_desc();
7338520.855117 | 0) 1.264 us | }
7338520.855117 | 0) 1.644 us | }
7338520.855117 | 0) 0.048 us | irq_move_irq();
7338520.855118 | 0) 0.050 us | evtchn_2l_clear_pending();
7338520.855118 | 0) 2.876 us | }
7338520.855118 | 0) | handle_irq_event_percpu() {
7338520.855119 | 0) | xen_irq_work_interrupt() {
7338520.855119 | 0) | irq_enter() {
7338520.855119 | 0) 0.055 us | rcu_irq_enter();
7338520.855119 | 0) 0.460 us | }
7338520.855120 | 0) | __wake_up() {
7338520.855120 | 0) 0.057 us | _raw_spin_lock_irqsave();
7338520.855120 | 0) | __wake_up_common() {
7338520.855121 | 0) | autoremove_wake_function() {
7338520.855121 | 0) | default_wake_function() {
7338520.855121 | 0) | try_to_wake_up() {
7338520.855121 | 0) 0.203 us | _raw_spin_lock_irqsave();
7338520.855122 | 0) 0.179 us | task_waking_fair();
7338520.855123 | 0) | select_task_rq_fair() {
7338520.855123 | 0) 0.048 us | source_load();
7338520.855123 | 0) 0.059 us | target_load();
7338520.855124 | 0) 0.059 us | idle_cpu();
7338520.855124 | 0) 0.058 us | cpus_share_cache();
7338520.855124 | 0) 0.058 us | idle_cpu();
7338520.855125 | 0) 1.940 us | }
7338520.855125 | 0) 0.057 us | _raw_spin_lock();
7338520.855125 | 0) | ttwu_do_activate.constprop.124() {
7338520.855125 | 0) | activate_task() {
7338520.855126 | 0) | enqueue_task() {
7338520.855126 | 0) 0.171 us | update_rq_clock();
7338520.855126 | 0) | enqueue_task_fair() {
7338520.855126 | 0) | enqueue_entity() {
7338520.855127 | 0) 0.063 us | update_curr();
7338520.855127 | 0) 0.078 us | __compute_runnable_contrib.part.51();
7338520.855127 | 0) 0.066 us | __update_entity_load_avg_contrib();
7338520.855128 | 0) 0.061 us | update_cfs_rq_blocked_load();
7338520.855128 | 0) 0.072 us | account_entity_enqueue();
7338520.855128 | 0) 0.116 us | update_cfs_shares();
7338520.855129 | 0) 0.062 us | place_entity();
7338520.855129 | 0) 0.087 us | __enqueue_entity();
7338520.855129 | 0) 2.950 us | }
7338520.855130 | 0) | enqueue_entity() {
7338520.855130 | 0) 0.065 us | update_curr();
7338520.855130 | 0) 0.065 us | update_cfs_rq_blocked_load();
7338520.855130 | 0) 0.067 us | account_entity_enqueue();
7338520.855131 | 0) 0.084 us | update_cfs_shares();
7338520.855131 | 0) 0.112 us | place_entity();
7338520.855131 | 0) 0.051 us | __enqueue_entity();
7338520.855132 | 0) 2.074 us | }
7338520.855132 | 0) 0.055 us | hrtick_update();
7338520.855132 | 0) 5.983 us | }
7338520.855133 | 0) 6.790 us | }
7338520.855133 | 0) 7.138 us | }
7338520.855133 | 0) | ttwu_do_wakeup() {
7338520.855133 | 0) | check_preempt_curr() {
7338520.855133 | 0) | resched_task() {
7338520.855133 | 0) | xen_smp_send_reschedule() {
7338520.855134 | 0) | xen_send_IPI_one() {
7338520.855134 | 0) | notify_remote_via_irq() {
7338520.855134 | 0) | evtchn_from_irq() {
7338520.855134 | 0) | irq_get_irq_data() {
7338520.855134 | 0) 0.057 us | irq_to_desc();
7338520.855135 | 0) 0.502 us | }
7338520.855135 | 0) 0.865 us | }
7338520.855136 | 0) 1.975 us | } /* notify_remote_via_irq */
7338520.855136 | 0) 2.350 us | }
7338520.855136 | 0) 2.723 us | }
7338520.855136 | 0) 3.175 us | }
7338520.855137 | 0) 3.620 us | }
7338520.855138 | 0) 4.642 us | }
7338520.855138 | 0) 12.409 us | }
7338520.855138 | 0) 0.059 us | _raw_spin_unlock();
7338520.855139 | 0) 0.108 us | ttwu_stat();
7338520.855140 | 0) 0.073 us | _raw_spin_unlock_irqrestore();
7338520.855140 | 0) 18.857 us | }
7338520.855141 | 0) 19.415 us | }
7338520.855141 | 0) 19.993 us | }
7338520.855141 | 0) 20.587 us | }
7338520.855141 | 0) 0.070 us | _raw_spin_unlock_irqrestore();
7338520.855142 | 0) 21.858 us | }
7338520.855142 | 0) | irq_exit() {
7338520.855143 | 0) 0.084 us | idle_cpu();
7338520.855143 | 0) 0.082 us | rcu_irq_exit();
7338520.855144 | 0) 1.235 us | }
7338520.855144 | 0) 25.109 us | }
7338520.855144 | 0) 0.126 us | add_interrupt_randomness();
7338520.855145 | 0) 0.091 us | note_interrupt();
7338520.855145 | 0) 26.935 us | }
7338520.855146 | 0) 30.693 us | }
7338520.855146 | 0) 31.575 us | }
7338520.855146 | 0) 34.424 us | }
7338520.855147 | 0) 34.841 us | }
7338520.855147 | 0) | irq_exit() {
7338520.855147 | 0) 0.083 us | idle_cpu();
7338520.855148 | 0) 0.069 us | rcu_irq_exit();
7338520.855148 | 0) 1.056 us | }
7338520.855148 | 0) 38.284 us | }
7338520.855149 | 0) 39.892 us | }
7338520.855150 | 0) 72.181 us | }
7338520.855150 | 0) 72.925 us | }
7338520.855150 | 0) 73.638 us | }
7338520.855151 | 0) 0.078 us | down_read();
7338520.855152 | 0) | copy_from_read_buf() {
7338520.855153 | 0) | tty_audit_add_data() {
7338520.855153 | 0) 0.272 us | _raw_spin_lock_irqsave();
7338520.855154 | 0) 0.063 us | _raw_spin_unlock_irqrestore();
7338520.855155 | 0) 0.086 us | _raw_spin_lock_irqsave();
7338520.855155 | 0) 0.067 us | _raw_spin_unlock_irqrestore();
7338520.855156 | 0) 2.808 us | }
7338520.855156 | 0) 4.330 us | }
7338520.855156 | 0) 0.083 us | copy_from_read_buf();
7338520.855157 | 0) 0.062 us | n_tty_set_room();
7338520.855158 | 0) 0.079 us | n_tty_write_wakeup();
7338520.855158 | 0) | __wake_up() {
7338520.855158 | 0) 0.068 us | _raw_spin_lock_irqsave();
7338520.855159 | 0) | __wake_up_common() {
7338520.855159 | 0) 0.092 us | pollwake();
7338520.855160 | 0) 0.643 us | }
7338520.855160 | 0) 0.074 us | _raw_spin_unlock_irqrestore();
7338520.855160 | 0) 2.040 us | }
7338520.855161 | 0) 0.074 us | n_tty_set_room();
7338520.855162 | 0) 0.073 us | up_read();
7338520.855162 | 0) | remove_wait_queue() {
7338520.855162 | 0) 0.084 us | _raw_spin_lock_irqsave();
7338520.855163 | 0) 0.078 us | _raw_spin_unlock_irqrestore();
7338520.855163 | 0) 1.166 us | }
7338520.855164 | 0) 0.140 us | mutex_unlock();
7338520.855164 | 0) 93.360 us | }
7338520.855165 | 0) | tty_ldisc_deref() {
7338520.855165 | 0) 0.070 us | ldsem_up_read();
7338520.855166 | 0) 0.746 us | }
7338520.855166 | 0) 0.071 us | get_seconds();
7338520.855167 | 0) 97.713 us | }
7338520.855167 | 0) 0.283 us | __fsnotify_parent();
7338520.855168 | 0) 0.172 us | fsnotify();
7338520.855168 | 0) 103.847 us | }
7338520.855238 | 0) | vfs_read() {
7338520.855239 | 0) | rw_verify_area() {
7338520.855240 | 0) | security_file_permission() {
7338520.855240 | 0) | apparmor_file_permission() {
7338520.855240 | 0) 0.160 us | common_file_perm();
7338520.855241 | 0) 0.770 us | }
7338520.855241 | 0) 0.078 us | __fsnotify_parent();
7338520.855242 | 0) 0.087 us | fsnotify();
7338520.855243 | 0) 2.595 us | }
7338520.855243 | 0) 4.148 us | }
7338520.855243 | 0) | tty_read() {
7338520.855244 | 0) 0.078 us | tty_paranoia_check();
7338520.855244 | 0) | tty_ldisc_ref_wait() {
7338520.855244 | 0) 0.084 us | ldsem_down_read();
7338520.855245 | 0) 0.643 us | }
7338520.855245 | 0) | n_tty_read() {
7338520.855246 | 0) 0.079 us | _raw_spin_lock_irq();
7338520.855247 | 0) 0.171 us | mutex_lock_interruptible();
7338520.855247 | 0) 0.064 us | down_read();
7338520.855248 | 0) | add_wait_queue() {
7338520.855248 | 0) 0.078 us | _raw_spin_lock_irqsave();
7338520.855249 | 0) 0.082 us | _raw_spin_unlock_irqrestore();
7338520.855249 | 0) 1.076 us | }
7338520.855250 | 0) 0.075 us | tty_hung_up_p();
7338520.855250 | 0) 0.079 us | n_tty_set_room();
7338520.855251 | 0) 0.075 us | up_read();
7338520.855251 | 0) | schedule_timeout() {
7338520.855252 | 0) | schedule() {
7338520.855252 | 0) | __schedule() {
7338520.855252 | 0) 0.084 us | rcu_note_context_switch();
7338520.855253 | 0) 0.079 us | _raw_spin_lock_irq();
7338520.855254 | 0) | deactivate_task() {
7338520.855254 | 0) | dequeue_task() {
7338520.855254 | 0) 0.219 us | update_rq_clock();
7338520.855255 | 0) | dequeue_task_fair() {
7338520.855255 | 0) | dequeue_entity() {
7338520.855255 | 0) | update_curr() {
7338520.855256 | 0) 0.186 us | cpuacct_charge();
7338520.855257 | 0) 0.924 us | }
7338520.855257 | 0) 0.078 us | update_cfs_rq_blocked_load();
7338520.855258 | 0) 0.078 us | clear_buddies();
7338520.855258 | 0) 0.083 us | account_entity_dequeue();
7338520.855259 | 0) | update_cfs_shares() {
7338520.855259 | 0) 0.105 us | update_curr();
7338520.855260 | 0) 0.093 us | account_entity_dequeue();
7338520.855260 | 0) 0.098 us | account_entity_enqueue();
7338520.855261 | 0) 1.825 us | }
7338520.855261 | 0) 5.574 us | }
7338520.855261 | 0) | dequeue_entity() {
7338520.855261 | 0) 0.086 us | update_curr();
7338520.855262 | 0) 0.127 us | __update_entity_load_avg_contrib();
7338520.855263 | 0) 0.070 us | update_cfs_rq_blocked_load();
7338520.855263 | 0) 0.066 us | clear_buddies();
7338520.855264 | 0) 0.082 us | account_entity_dequeue();
7338520.855264 | 0) 0.104 us | update_cfs_shares();
7338520.855265 | 0) 3.439 us | }
7338520.855265 | 0) 0.078 us | hrtick_update();
7338520.855266 | 0) 10.741 us | }
7338520.855266 | 0) 11.990 us | }
7338520.855266 | 0) 12.580 us | }
7338520.855267 | 0) | pick_next_task_fair() {
7338520.855267 | 0) 0.074 us | check_cfs_rq_runtime();
7338520.855268 | 0) | pick_next_entity() {
7338520.855268 | 0) 0.078 us | clear_buddies();
7338520.855269 | 0) 0.696 us | }
7338520.855269 | 0) | put_prev_entity() {
7338520.855269 | 0) 0.084 us | check_cfs_rq_runtime();
7338520.855270 | 0) 0.628 us | }
7338520.855270 | 0) | put_prev_entity() {
7338520.855270 | 0) 0.074 us | check_cfs_rq_runtime();
7338520.855271 | 0) 0.575 us | }
7338520.855271 | 0) | set_next_entity() {
7338520.855272 | 0) 0.104 us | update_stats_wait_end();
7338520.855273 | 0) 0.834 us | }
7338520.855273 | 0) 5.872 us | }
7338520.855274 | 0) 0.079 us | paravirt_start_context_switch();
7338520.855275 | 0) 0.080 us | xen_read_cr0();
7338520.855276 | 0) | xen_write_cr0() {
7338520.855276 | 0) 0.091 us | paravirt_get_lazy_mode();
7338520.855277 | 0) 0.087 us | __xen_mc_entry();
7338520.855277 | 0) 0.076 us | paravirt_get_lazy_mode();
7338520.855278 | 0) 1.986 us | }
7338520.855278 | 0) | xen_load_sp0() {
7338520.855278 | 0) 0.066 us | paravirt_get_lazy_mode();
7338520.855279 | 0) 0.083 us | __xen_mc_entry();
7338520.855280 | 0) 0.082 us | paravirt_get_lazy_mode();
7338520.855280 | 0) 1.925 us | }
7338520.855281 | 0) | xen_load_tls() {
7338520.855281 | 0) 0.082 us | paravirt_get_lazy_mode();
7338520.855281 | 0) 0.080 us | paravirt_get_lazy_mode();
7338520.855282 | 0) 0.137 us | load_TLS_descriptor();
7338520.855283 | 0) 0.090 us | load_TLS_descriptor();
7338520.855283 | 0) 0.081 us | load_TLS_descriptor();
7338520.855284 | 0) 0.081 us | paravirt_get_lazy_mode();
7338520.855284 | 0) 3.397 us | }
7338520.855284 | 0) | xen_end_context_switch() {
7338520.855285 | 0) 0.618 us | xen_mc_flush();
7338520.855286 | 0) 0.086 us | paravirt_end_context_switch();
7338520.855286 | 0) 1.708 us | }
^C
Ending tracing...
Understanding whether the time is on-CPU or blocked off-CPU directs the
performance investigation.
Use -h to print the USAGE message:
# ./funcgraph -h
USAGE: funcgraph [-aCDhHPtT] [-m maxdepth] [-p PID] [-d secs] funcstring
-a # all info (same as -HPt)
-C # measure on-CPU time only
-d seconds # trace duration, and use buffers
-D # do not show function duration
-h # this usage message
-H # include column headers
-m maxdepth # max stack depth to show
-p PID # trace when this pid is on-CPU
-P # show process names & PIDs
-t # show timestamps
-T # comment function tails
eg,
funcgraph do_nanosleep # trace do_nanosleep() and children
funcgraph -m 3 do_sys_open # trace do_sys_open() to 3 levels only
funcgraph -a do_sys_open # include timestamps and process name
funcgraph -p 198 do_sys_open # trace vfs_read() for PID 198 only
funcgraph -d 1 do_sys_open >out # trace 1 sec, then write to file
See the man page and example file for more info.
perf-tools-unstable-0.0.1~20150130+git85414b0/examples/funcslower_example.txt 0000664 0000000 0000000 00000010312 12542613570 0026321 0 ustar 00root root 0000000 0000000 Demonstrations of funcslower, the Linux ftrace version.
Show me ext3_readpages() calls slower than 1000 microseconds (1 ms):
# ./funcslower ext3_readpages 1000
Tracing "ext3_readpages" slower than 1000 us... Ctrl-C to end.
0) ! 8147.120 us | } /* ext3_readpages */
0) ! 8135.067 us | } /* ext3_readpages */
0) ! 12202.93 us | } /* ext3_readpages */
0) ! 12201.84 us | } /* ext3_readpages */
0) ! 8142.667 us | } /* ext3_readpages */
0) ! 12194.14 us | } /* ext3_readpages */
^C
Ending tracing...
Neat. So this confirms that there are ext3_readpages() calls that are taking
over 8000 us (8 ms).
funcslower uses the ftrace function graph profiler to dynamically instrument
the given kernel function, time it in-kernel, and only emit events slower
than the given latency threshold in-kernel. Since this all operates in
kernel context, the overheads are relatively low (compared to post-processing
in user space).
Now include the process name and PID (-P) of the process who is on-CPU, and the
absolute timestamp (-t) of the event:
# ./funcslower -Pt ext3_readpages 1000
Tracing "ext3_readpages" slower than 1000 us... Ctrl-C to end.
2678112.003180 | 0) cksum-26695 | ! 8145.268 us | } /* ext3_readpages */
2678113.538763 | 0) cksum-26695 | ! 8139.086 us | } /* ext3_readpages */
2678113.704901 | 0) cksum-26695 | ! 8147.549 us | } /* ext3_readpages */
2678113.721102 | 0) cksum-26695 | ! 8142.530 us | } /* ext3_readpages */
2678113.810269 | 0) cksum-26695 | ! 12234.70 us | } /* ext3_readpages */
2678113.996625 | 0) cksum-26695 | ! 8146.129 us | } /* ext3_readpages */
2678114.012832 | 0) cksum-26695 | ! 8148.153 us | } /* ext3_readpages */
^C
Ending tracing...
Great! Now I can see the process name, which in this case is the responsible
process. The timestamps also let me determine the rate of these slow events.
Now measure time differently: excluding time spent sleeping, so that we only
see on-CPU time:
# ./funcslower -Pct ext3_readpages 1000
Tracing "ext3_readpages" slower than 1000 us... Ctrl-C to end.
^C
Ending tracing...
I believe the workload hasn't changed, so these ext3_readpages() calls are
still happening, however, their CPU time doesn't exceed 1 ms. Compared to the
earlier output, this tells me that the latency in this function is due to time
spent blocked off-CPU, and not on-CPU. This makes sense: this function is
ultimately being blocked on disk I/O.
Were the function duration times to be similar with and without -C, that would
tell us that the high latency is due to time spent on-CPU executing code.
This traces the sys_nanosleep() kernel function, and shows calls taking over
100 us:
# ./funcslower sys_nanosleep 100
Tracing "sys_nanosleep" slower than 100 us... Ctrl-C to end.
0) ! 2000147 us | } /* sys_nanosleep */
------------------------------------------
0) registe-27414 => vmstat-27419
------------------------------------------
0) ! 1000143 us | } /* sys_nanosleep */
0) ! 1000154 us | } /* sys_nanosleep */
------------------------------------------
0) vmstat-27419 => registe-27414
------------------------------------------
0) ! 2000183 us | } /* sys_nanosleep */
------------------------------------------
0) registe-27414 => vmstat-27419
------------------------------------------
0) ! 1000141 us | } /* sys_nanosleep */
^C
Ending tracing...
This is an example where I did not use -P, but ftrace has included process
information anyway. Look for the lines containing "=>", which indicate a process
switch on the given CPU.
Use -h to print the USAGE message:
# ./funcslower -h
USAGE: funcslower [-aChHPt] [-p PID] [-d secs] funcstring latency_us
-a # all info (same as -HPt)
-C # measure on-CPU time only
-d seconds # trace duration, and use buffers
-h # this usage message
-H # include column headers
-p PID # trace when this pid is on-CPU
-P # show process names & PIDs
-t # show timestamps
eg,
funcslower vfs_read 10000 # trace vfs_read() slower than 10 ms
See the man page and example file for more info.
perf-tools-unstable-0.0.1~20150130+git85414b0/examples/functrace_example.txt 0000664 0000000 0000000 00000033367 12542613570 0026123 0 ustar 00root root 0000000 0000000 Demonstrations of functrace, the Linux ftrace version.
A (usually) good example to start with is do_nanosleep(), since it is not called
frequently, and easily triggered. Here's tracing it using functrace:
# ./functrace 'do_nanosleep'
Tracing "do_nanosleep"... Ctrl-C to end.
svscan-1678 [000] .... 6412438.703521: do_nanosleep <-hrtimer_nanosleep
svscan-1678 [000] .... 6412443.703678: do_nanosleep <-hrtimer_nanosleep
svscan-1678 [000] .... 6412448.703865: do_nanosleep <-hrtimer_nanosleep
vmstat-28371 [000] .... 6412453.216241: do_nanosleep <-hrtimer_nanosleep
svscan-1678 [000] .... 6412453.704049: do_nanosleep <-hrtimer_nanosleep
vmstat-28371 [000] .... 6412454.216524: do_nanosleep <-hrtimer_nanosleep
vmstat-28371 [000] .... 6412455.216816: do_nanosleep <-hrtimer_nanosleep
vmstat-28371 [000] .... 6412456.217093: do_nanosleep <-hrtimer_nanosleep
vmstat-28371 [000] .... 6412457.217378: do_nanosleep <-hrtimer_nanosleep
vmstat-28371 [000] .... 6412458.217660: do_nanosleep <-hrtimer_nanosleep
^C
Ending tracing...
While tracing, I ran a "vmstat 1" in another window. vmstat and its process ID
can be seen as the 1st column, and the timestamp and one second intervals can
be seen as the 4th column.
This is basic details: who was on-CPU (process name and PID), flags, timestamp,
and calling function. Treat this as the next step, after funccount, for getting
a little more information on kernel function execution, before using more
capabilities to dig further.
This is Linux 3.16, and the output is the ftrace text buffer format, which has
changed slightly between kernel versions.
To see the column headers, use -H. This is Linux 3.16:
# ./functrace -H do_nanosleep
Tracing "do_nanosleep"... Ctrl-C to end.
# tracer: function
#
# entries-in-buffer/entries-written: 0/0 #P:2
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
svscan-1678 [001] .... 6413283.729520: do_nanosleep <-hrtimer_nanosleep
svscan-1678 [001] .... 6413288.729679: do_nanosleep <-hrtimer_nanosleep
For comparison, here's Linux 3.2:
# ./functrace -H do_nanosleep
Tracing "do_nanosleep"... Ctrl-C to end.
# tracer: function
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
vmstat-11789 [000] 1763207.021204: do_nanosleep <-hrtimer_nanosleep
vmstat-11789 [000] 1763208.022970: do_nanosleep <-hrtimer_nanosleep
vmstat-11789 [000] 1763209.023267: do_nanosleep <-hrtimer_nanosleep
For documentation on the exact format, see the Linux kernel source under
Documentation/trace/ftrace.txt.
This error:
# ./functrace 'ext4_z*'
Tracing "ext4_z*"... Ctrl-C to end.
./functrace: line 136: echo: write error: Invalid argument
ERROR: enabling "ext4_z*". Exiting.
Is because there were no functions beginning with "ext4_z". You can check
available functions in the /sys/kernel/debug/tracing/available_filter_functions
file.
You might want to use funccount to check the frequency of events before using
functrace. For example, counting ext3 events on a system:
# ./funccount -d 10 'ext3*'
Tracing "ext3*" for 10 seconds...
FUNC COUNT
ext3_journal_dirty_data 1
ext3_ordered_write_end 1
ext3_write_begin 1
ext3_writepage_trans_blocks 1
ext3_dirty_inode 2
ext3_do_update_inode 2
ext3_get_group_desc 2
ext3_get_inode_block.isra.20 2
ext3_get_inode_flags 2
ext3_get_inode_loc 2
ext3_mark_iloc_dirty 2
ext3_mark_inode_dirty 2
ext3_reserve_inode_write 2
ext3_journal_start_sb 3
ext3_block_to_path.isra.22 6
ext3_bmap 6
ext3_get_block 6
ext3_get_blocks_handle 6
ext3_get_branch 6
ext3_discard_reservation 11
ext3_ioctl 11
ext3_release_file 11
Ending tracing...
During 10 seconds, there weren't many ext3 calls. I might consider tracing
them all (warnings about dynamic tracing many kernel functions apply: test
before use, as in the past there have been bugs causing panics).
# ./functrace 'ext3_*'
Tracing "ext3_*"... Ctrl-C to end.
register_start.-17008 [000] 1763557.577985: ext3_release_file <-__fput
register_start.-17008 [000] 1763557.577987: ext3_discard_reservation <-ext3_release_file
register_start.-17026 [000] 1763558.163620: ext3_ioctl <-file_ioctl
register_start.-17026 [000] 1763558.481081: ext3_release_file <-__fput
register_start.-17026 [000] 1763558.481083: ext3_discard_reservation <-ext3_release_file
register_start.-17041 [000] 1763559.186984: ext3_ioctl <-file_ioctl
register_start.-17041 [000] 1763559.511267: ext3_release_file <-__fput
[...]
For comparison, here's a different system and ext4:
# ./funccount -d 10 'ext4*'
Tracing "ext4*" for 10 seconds...
FUNC COUNT
ext4_journal_commit_callback 2
ext4_htree_fill_tree 6
ext4_htree_free_dir_info 6
ext4_release_dir 6
ext4_readdir 12
ext4fs_dirhash 29
ext4_htree_store_dirent 29
ext4_follow_link 36
ext4_file_mmap 42
ext4_free_data_callback 44
ext4_getattr 45
ext4_bmap 62
ext4_get_block 62
ext4_add_entry 280
ext4_add_nondir 280
ext4_alloc_da_blocks 280
ext4_alloc_inode 280
ext4_bio_write_page 280
ext4_can_truncate 280
ext4_claim_free_clusters 280
ext4_clear_inode 280
ext4_create 280
ext4_da_get_block_prep 280
ext4_da_invalidatepage 280
ext4_da_update_reserve_space 280
ext4_da_write_begin 280
ext4_da_write_end 280
ext4_dec_count.isra.22 280
ext4_delete_entry 280
ext4_destroy_inode 280
ext4_drop_inode 280
ext4_end_bio 280
ext4_es_init_tree 280
ext4_es_lru_del 280
ext4_evict_inode 280
ext4_ext_calc_metadata_amount 280
ext4_ext_correct_indexes 280
ext4_ext_find_goal 280
ext4_ext_insert_extent 280
ext4_ext_remove_space 280
ext4_ext_tree_init 280
ext4_ext_truncate 280
ext4_ext_truncate_extend_resta 280
ext4_ext_try_to_merge 280
ext4_ext_try_to_merge_right 280
ext4_file_write_iter 280
ext4_find_dest_de 280
ext4_finish_bio 280
ext4_free_blocks 280
ext4_free_inode 280
ext4_generic_delete_entry 280
ext4_has_free_clusters 280
ext4_i_callback 280
ext4_init_acl 280
ext4_init_security 280
ext4_inode_attach_jinode 280
ext4_inode_to_goal_block 280
ext4_insert_dentry 280
ext4_invalidatepage 280
ext4_io_submit_init 280
ext4_itable_unused_count 280
ext4_lookup 280
ext4_mb_complex_scan_group 280
ext4_mb_find_by_goal 280
ext4_mb_free_metadata 280
ext4_mb_initialize_context 280
ext4_mb_mark_diskspace_used 280
ext4_mb_new_blocks 280
ext4_mb_normalize_request 280
ext4_mb_regular_allocator 280
ext4_mb_release_context 280
ext4_mb_use_best_found 280
ext4_mb_use_preallocated 280
ext4_nonda_switch 280
ext4_orphan_del 280
ext4_put_io_end_defer 280
ext4_releasepage 280
ext4_rename 280
ext4_set_aops 280
ext4_setent 280
ext4_set_inode_flags 280
ext4_truncate 280
ext4_writepages 280
ext4_writepage_trans_blocks 280
ext4_xattr_delete_inode 280
ext4_xattr_get 285
ext4_xattr_ibody_get 285
ext4_xattr_security_get 285
ext4_bread 286
ext4_release_file 288
ext4_file_open 305
ext4_superblock_csum_set 494
ext4_block_bitmap_csum_set 560
ext4_es_free_extent 560
ext4_es_insert_extent 560
ext4_es_remove_extent 560
ext4_ext_find_extent 560
ext4_ext_map_blocks 560
ext4_free_group_clusters_set 560
ext4_free_inodes_set 560
ext4_get_group_no_and_offset 560
ext4_get_reserved_space 560
ext4_init_io_end 560
ext4_inode_bitmap_csum_set 560
ext4_io_submit 560
ext4_mb_good_group 560
ext4_orphan_add 560
ext4_put_io_end 560
ext4_read_block_bitmap 560
ext4_read_block_bitmap_nowait 560
ext4_read_inode_bitmap 560
ext4_release_io_end 560
ext4_set_bits 560
ext4_validate_block_bitmap 560
ext4_wait_block_bitmap 560
ext4_mb_load_buddy 604
ext4_mb_unload_buddy.isra.24 604
ext4_block_bitmap 840
ext4_discard_preallocations 840
ext4_ext_drop_refs 840
ext4_ext_get_access.isra.30 840
ext4_ext_index_trans_blocks 840
ext4_find_entry 840
ext4_free_group_clusters 840
ext4_handle_dirty_dirent_node 840
ext4_inode_bitmap 840
ext4_meta_trans_blocks 840
ext4_dirty_inode 845
ext4_free_inodes_count 1120
ext4_group_desc_csum 1120
ext4_group_desc_csum_set 1120
ext4_getblk 1126
ext4_map_blocks 1468
ext4_es_lookup_extent 1748
ext4_mb_check_limits 1875
ext4_es_lru_add 2028
ext4_data_block_valid 2308
ext4_journal_check_start 3085
ext4_mark_inode_dirty 5325
ext4_get_inode_flags 5951
ext4_get_inode_loc 5951
ext4_mark_iloc_dirty 5951
ext4_reserve_inode_write 5951
ext4_inode_table 7071
ext4_get_group_desc 8471
ext4_has_inline_data 9486
Ending tracing...
There are many functions called frequently. Tracing them all may cost
significant performance overhead. I may read through this list and look for
the most interesting functions to trace, reducing overheads by only selecting
a few.
For example, ext4_create() looks interesting:
# ./functrace ext4_create
Tracing "ext4_create"... Ctrl-C to end.
supervise-1681 [000] .... 6414396.700163: ext4_create <-vfs_create
supervise-1684 [001] .... 6414396.700287: ext4_create <-vfs_create
supervise-1681 [000] .... 6414396.700598: ext4_create <-vfs_create
supervise-1684 [001] .... 6414396.700636: ext4_create <-vfs_create
supervise-1687 [001] .... 6414396.701577: ext4_create <-vfs_create
supervise-1688 [000] .... 6414396.702590: ext4_create <-vfs_create
supervise-1693 [001] .... 6414396.702829: ext4_create <-vfs_create
supervise-1693 [001] .... 6414396.703592: ext4_create <-vfs_create
supervise-1688 [000] .... 6414396.703598: ext4_create <-vfs_create
supervise-1687 [001] .... 6414396.703988: ext4_create <-vfs_create
supervise-1685 [001] .... 6414396.704126: ext4_create <-vfs_create
supervise-1685 [001] .... 6414396.704458: ext4_create <-vfs_create
supervise-1682 [001] .... 6414396.704577: ext4_create <-vfs_create
supervise-1683 [000] .... 6414396.704984: ext4_create <-vfs_create
supervise-1682 [001] .... 6414396.704985: ext4_create <-vfs_create
[...]
Now I know that different PIDs of the supervise program are calling ext4_create,
of around the same time, and from vfs_create().
The duration mode uses buffering, instead of printing events as they occur.
This greatly reduces overheads. For example:
# ./functrace -d 10 ext4_create > out.ext4_create
# wc out.ext4_create
283 1687 21059 out.ext4_create
Note that the buffer has a limited size. Check the timestamps to see if the
range does not match your duration, as one clue that the buffer was exhausted
and events were missed.
Use -h to print the USAGE message:
# ./functrace -h
USAGE: functrace [-hH] [-p PID] [-d secs] funcstring
-d seconds # trace duration, and use buffers
-h # this usage message
-H # include column headers
-p PID # trace when this pid is on-CPU
eg,
functrace do_nanosleep # trace the do_nanosleep() function
functrace '*sleep' # trace functions ending in "sleep"
functrace -p 198 'vfs*' # trace "vfs*" funcs for PID 198
functrace 'tcp*' > out # trace all "tcp*" funcs to out file
functrace -d 1 'tcp*' > out # trace 1 sec, then write out file
See the man page and example file for more info.
perf-tools-unstable-0.0.1~20150130+git85414b0/examples/iolatency_example.txt 0000664 0000000 0000000 00000043616 12542613570 0026136 0 ustar 00root root 0000000 0000000 Demonstrations of iolatency, the Linux ftrace version.
Here's a busy system doing over 4k disk IOPS:
# ./iolatency
Tracing block I/O. Output every 1 seconds. Ctrl-C to end.
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 4381 |######################################|
1 -> 2 : 9 |# |
2 -> 4 : 5 |# |
4 -> 8 : 0 | |
8 -> 16 : 1 |# |
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 4053 |######################################|
1 -> 2 : 18 |# |
2 -> 4 : 9 |# |
4 -> 8 : 2 |# |
8 -> 16 : 1 |# |
16 -> 32 : 1 |# |
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 4658 |######################################|
1 -> 2 : 9 |# |
2 -> 4 : 2 |# |
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 4298 |######################################|
1 -> 2 : 17 |# |
2 -> 4 : 10 |# |
4 -> 8 : 1 |# |
8 -> 16 : 1 |# |
^C
Ending tracing...
Disk I/O latency is usually between 0 and 1 milliseconds, as this system uses
SSDs. There are occasional outliers, up to the 16->32 ms range.
Identifying outliers like these is difficult from iostat(1) alone, which at
the same time reported:
# iostat 1
[...]
avg-cpu: %user %nice %system %iowait %steal %idle
0.53 0.00 1.05 46.84 0.53 51.05
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvdap1 0.00 0.00 0.00 28.00 0.00 112.00 8.00 0.02 0.71 0.00 0.71 0.29 0.80
xvdb 0.00 0.00 2134.00 0.00 18768.00 0.00 17.59 0.51 0.24 0.24 0.00 0.23 50.00
xvdc 0.00 0.00 2088.00 0.00 18504.00 0.00 17.72 0.47 0.22 0.22 0.00 0.22 46.40
md0 0.00 0.00 4222.00 0.00 37256.00 0.00 17.65 0.00 0.00 0.00 0.00 0.00 0.00
I/O latency ("await") averages 0.24 and 0.22 ms for our busy disks, but this
output doesn't show that occasionally is much higher.
To get more information on these I/O, try the iosnoop(8) tool.
The -Q option includes the block I/O queued time, by tracing based on
block_rq_insert instead of block_rq_issue:
# ./iolatency -Q
Tracing block I/O. Output every 1 seconds. Ctrl-C to end.
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 1913 |######################################|
1 -> 2 : 438 |######### |
2 -> 4 : 100 |## |
4 -> 8 : 145 |### |
8 -> 16 : 43 |# |
16 -> 32 : 43 |# |
32 -> 64 : 1 |# |
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 2360 |######################################|
1 -> 2 : 132 |### |
2 -> 4 : 72 |## |
4 -> 8 : 14 |# |
8 -> 16 : 1 |# |
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 2138 |######################################|
1 -> 2 : 496 |######### |
2 -> 4 : 81 |## |
4 -> 8 : 40 |# |
8 -> 16 : 1 |# |
16 -> 32 : 2 |# |
^C
Ending tracing...
I use this along with the default mode to identify problems of load (queueing)
vs problems of the device, which is shown by default.
Here's a more interesting system. This is doing a mixed read/write workload,
and has a pretty awful latency distribution:
# ./iolatency 5 3
Tracing block I/O. Output every 5 seconds.
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 2809 |######################################|
1 -> 2 : 32 |# |
2 -> 4 : 14 |# |
4 -> 8 : 6 |# |
8 -> 16 : 7 |# |
16 -> 32 : 14 |# |
32 -> 64 : 39 |# |
64 -> 128 : 1556 |###################### |
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 3027 |######################################|
1 -> 2 : 19 |# |
2 -> 4 : 6 |# |
4 -> 8 : 5 |# |
8 -> 16 : 3 |# |
16 -> 32 : 7 |# |
32 -> 64 : 14 |# |
64 -> 128 : 540 |####### |
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 2939 |######################################|
1 -> 2 : 25 |# |
2 -> 4 : 15 |# |
4 -> 8 : 2 |# |
8 -> 16 : 3 |# |
16 -> 32 : 7 |# |
32 -> 64 : 17 |# |
64 -> 128 : 936 |############# |
Ending tracing...
It's multi-modal, with most I/O taking 0 to 1 milliseconds, then many between
64 and 128 milliseconds. This is how it looks in iostat:
# iostat -x 1
avg-cpu: %user %nice %system %iowait %steal %idle
0.52 0.00 12.37 32.99 0.00 54.12
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvdap1 0.00 12.00 0.00 156.00 0.00 19968.00 256.00 52.17 184.38 0.00 184.38 2.33 36.40
xvdb 0.00 0.00 298.00 0.00 2732.00 0.00 18.34 0.04 0.12 0.12 0.00 0.11 3.20
xvdc 0.00 0.00 297.00 0.00 2712.00 0.00 18.26 0.08 0.27 0.27 0.00 0.24 7.20
md0 0.00 0.00 595.00 0.00 5444.00 0.00 18.30 0.00 0.00 0.00 0.00 0.00 0.00
Fortunately, it turns out that the high latency is to xvdap1, which is for files
from a low priority application (processing and writing log files). A high
priority application is reading from the other disks, xvdb and xvdc.
Examining xvdap1 only:
# ./iolatency -d 202,1 5
Tracing block I/O. Output every 5 seconds. Ctrl-C to end.
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 38 |## |
1 -> 2 : 18 |# |
2 -> 4 : 0 | |
4 -> 8 : 0 | |
8 -> 16 : 5 |# |
16 -> 32 : 11 |# |
32 -> 64 : 26 |## |
64 -> 128 : 894 |######################################|
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 75 |### |
1 -> 2 : 11 |# |
2 -> 4 : 0 | |
4 -> 8 : 4 |# |
8 -> 16 : 4 |# |
16 -> 32 : 7 |# |
32 -> 64 : 13 |# |
64 -> 128 : 1141 |######################################|
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 61 |######## |
1 -> 2 : 21 |### |
2 -> 4 : 5 |# |
4 -> 8 : 1 |# |
8 -> 16 : 5 |# |
16 -> 32 : 7 |# |
32 -> 64 : 19 |### |
64 -> 128 : 324 |######################################|
128 -> 256 : 7 |# |
256 -> 512 : 26 |#### |
^C
Ending tracing...
And now xvdb:
# ./iolatency -d 202,16 5
Tracing block I/O. Output every 5 seconds. Ctrl-C to end.
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 1427 |######################################|
1 -> 2 : 5 |# |
2 -> 4 : 3 |# |
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 1409 |######################################|
1 -> 2 : 6 |# |
2 -> 4 : 1 |# |
4 -> 8 : 1 |# |
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 1478 |######################################|
1 -> 2 : 6 |# |
2 -> 4 : 5 |# |
4 -> 8 : 0 | |
8 -> 16 : 2 |# |
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 1437 |######################################|
1 -> 2 : 5 |# |
2 -> 4 : 7 |# |
4 -> 8 : 0 | |
8 -> 16 : 1 |# |
[...]
While that's much better, it is reaching the 8 - 16 millisecond range,
and these are SSDs with a light workload (~1500 IOPS).
I already know from iosnoop(8) analysis the reason for these high latency
outliers: they are queued behind writes. However, these writes are to a
different disk -- somewhere in this virtualized guest (Xen) there may be a
shared I/O queue.
One way to explore this is to reduce the queue length for the low priority disk,
so that it is less likely to pollute any shared queue. (There are other ways to
investigate and fix this too.) Here I reduce the disk queue length from its
default of 128 to 4:
# echo 4 > /sys/block/xvda1/queue/nr_requests
The overall distribution looks much better:
# ./iolatency 5
Tracing block I/O. Output every 5 seconds. Ctrl-C to end.
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 3005 |######################################|
1 -> 2 : 19 |# |
2 -> 4 : 9 |# |
4 -> 8 : 45 |# |
8 -> 16 : 859 |########### |
16 -> 32 : 16 |# |
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 2959 |######################################|
1 -> 2 : 43 |# |
2 -> 4 : 16 |# |
4 -> 8 : 39 |# |
8 -> 16 : 1009 |############# |
16 -> 32 : 76 |# |
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 3031 |######################################|
1 -> 2 : 27 |# |
2 -> 4 : 9 |# |
4 -> 8 : 24 |# |
8 -> 16 : 422 |###### |
16 -> 32 : 5 |# |
^C
Ending tracing...
Latency only reaching 32 ms.
Our important disk didn't appear to change much -- maybe a slight improvement
to the outliers:
# ./iolatency -d 202,16 5
Tracing block I/O. Output every 5 seconds. Ctrl-C to end.
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 1449 |######################################|
1 -> 2 : 6 |# |
2 -> 4 : 5 |# |
4 -> 8 : 1 |# |
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 1519 |######################################|
1 -> 2 : 12 |# |
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 1466 |######################################|
1 -> 2 : 2 |# |
2 -> 4 : 3 |# |
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 1460 |######################################|
1 -> 2 : 4 |# |
2 -> 4 : 7 |# |
[...]
And here's the other disk after the queue length change:
# ./iolatency -d 202,1 5
Tracing block I/O. Output every 5 seconds. Ctrl-C to end.
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 85 |### |
1 -> 2 : 12 |# |
2 -> 4 : 21 |# |
4 -> 8 : 76 |## |
8 -> 16 : 1539 |######################################|
16 -> 32 : 10 |# |
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 123 |################## |
1 -> 2 : 8 |## |
2 -> 4 : 6 |# |
4 -> 8 : 17 |### |
8 -> 16 : 270 |######################################|
16 -> 32 : 2 |# |
>=(ms) .. <(ms) : I/O |Distribution |
0 -> 1 : 91 |### |
1 -> 2 : 23 |# |
2 -> 4 : 8 |# |
4 -> 8 : 71 |### |
8 -> 16 : 1223 |######################################|
16 -> 32 : 12 |# |
^C
Ending tracing...
Much better looking distribution.
Use -h to print the USAGE message:
# ./iolatency -h
USAGE: iolatency [-hQT] [-d device] [-i iotype] [interval [count]]
-d device # device string (eg, "202,1)
-i iotype # match type (eg, '*R*' for all reads)
-Q # use queue insert as start time
-T # timestamp on output
-h # this usage message
interval # summary interval, seconds (default 1)
count # number of summaries
eg,
iolatency # summarize latency every second
iolatency -Q # include block I/O queue time
iolatency 5 2 # 2 x 5 second summaries
iolatency -i '*R*' # trace reads
iolatency -d 202,1 # trace device 202,1 only
See the man page and example file for more info.
perf-tools-unstable-0.0.1~20150130+git85414b0/examples/iosnoop_example.txt 0000664 0000000 0000000 00000035745 12542613570 0025641 0 ustar 00root root 0000000 0000000 Demonstrations of iosnoop, the Linux ftrace version.
Here's Linux 3.16, tracing tar archiving a filesystem:
# ./iosnoop
Tracing block I/O... Ctrl-C to end.
COMM PID TYPE DEV BLOCK BYTES LATms
supervise 1809 W 202,1 17039968 4096 1.32
supervise 1809 W 202,1 17039976 4096 1.30
tar 14794 RM 202,1 8457608 4096 7.53
tar 14794 RM 202,1 8470336 4096 14.90
tar 14794 RM 202,1 8470368 4096 0.27
tar 14794 RM 202,1 8470784 4096 7.74
tar 14794 RM 202,1 8470360 4096 0.25
tar 14794 RM 202,1 8469968 4096 0.24
tar 14794 RM 202,1 8470240 4096 0.24
tar 14794 RM 202,1 8470392 4096 0.23
tar 14794 RM 202,1 8470544 4096 5.96
tar 14794 RM 202,1 8470552 4096 0.27
tar 14794 RM 202,1 8470384 4096 0.24
[...]
The "tar" I/O looks like it is slightly random (based on BLOCK) and 4 Kbytes
in size (BYTES). One returned in 14.9 milliseconds, but the rest were fast,
so fast (0.24 ms) some may be returning from some level of cache (disk or
controller).
The "RM" TYPE means Read of Metadata. The start of the trace shows a
couple of Writes by supervise PID 1809.
Here's a deliberate random I/O workload:
# ./iosnoop
Tracing block I/O. Ctrl-C to end.
COMM PID TYPE DEV BLOCK BYTES LATms
randread 9182 R 202,32 30835224 8192 0.18
randread 9182 R 202,32 21466088 8192 0.15
randread 9182 R 202,32 13529496 8192 0.16
randread 9182 R 202,16 21250648 8192 0.18
randread 9182 R 202,16 1536776 32768 0.30
randread 9182 R 202,32 17157560 24576 0.23
randread 9182 R 202,32 21313320 8192 0.16
randread 9182 R 202,32 862184 8192 0.18
randread 9182 R 202,16 25496872 8192 0.21
randread 9182 R 202,32 31471768 8192 0.18
randread 9182 R 202,16 27571336 8192 0.20
randread 9182 R 202,16 30783448 8192 0.16
randread 9182 R 202,16 21435224 8192 1.28
randread 9182 R 202,16 970616 8192 0.15
randread 9182 R 202,32 13855608 8192 0.16
randread 9182 R 202,32 17549960 8192 0.15
randread 9182 R 202,32 30938232 8192 0.14
[...]
Note the changing offsets. The resulting latencies are very good in this case,
because the storage devices are flash memory-based solid state disks (SSDs).
For rotational disks, I'd expect these latencies to be roughly 10 ms.
Here's an idle Linux 3.2 system:
# ./iosnoop
Tracing block I/O. Ctrl-C to end.
COMM PID TYPE DEV BLOCK BYTES LATms
supervise 3055 W 202,1 12852496 4096 0.64
supervise 3055 W 202,1 12852504 4096 1.32
supervise 3055 W 202,1 12852800 4096 0.55
supervise 3055 W 202,1 12852808 4096 0.52
jbd2/xvda1-212 212 WS 202,1 1066720 45056 41.52
jbd2/xvda1-212 212 WS 202,1 1066808 12288 41.52
jbd2/xvda1-212 212 WS 202,1 1066832 4096 32.37
supervise 3055 W 202,1 12852800 4096 14.28
supervise 3055 W 202,1 12855920 4096 14.07
supervise 3055 W 202,1 12855960 4096 0.67
supervise 3055 W 202,1 12858208 4096 1.00
flush:1-409 409 W 202,1 12939640 12288 18.00
[...]
This shows supervise doing various writes from PID 3055. The highest latency
was from jbd2/xvda1-212, the journaling block device driver, doing
synchronous writes (TYPE = WS).
Options can be added to show the start time (-s) and end time (-t):
# ./iosnoop -ts
Tracing block I/O. Ctrl-C to end.
STARTs ENDs COMM PID TYPE DEV BLOCK BYTES LATms
5982800.302061 5982800.302679 supervise 1809 W 202,1 17039600 4096 0.62
5982800.302423 5982800.302842 supervise 1809 W 202,1 17039608 4096 0.42
5982800.304962 5982800.305446 supervise 1801 W 202,1 17039616 4096 0.48
5982800.305250 5982800.305676 supervise 1801 W 202,1 17039624 4096 0.43
5982800.308849 5982800.309452 supervise 1810 W 202,1 12862464 4096 0.60
5982800.308856 5982800.309470 supervise 1806 W 202,1 17039632 4096 0.61
5982800.309206 5982800.309740 supervise 1806 W 202,1 17039640 4096 0.53
5982800.309211 5982800.309805 supervise 1810 W 202,1 12862472 4096 0.59
5982800.309332 5982800.309953 supervise 1812 W 202,1 17039648 4096 0.62
5982800.309676 5982800.310283 supervise 1812 W 202,1 17039656 4096 0.61
[...]
This is useful when gathering I/O event data for post-processing.
Now for matching on a single PID:
# ./iosnoop -p 1805
Tracing block I/O issued by PID 1805. Ctrl-C to end.
COMM PID TYPE DEV BLOCK BYTES LATms
supervise 1805 W 202,1 17039648 4096 0.68
supervise 1805 W 202,1 17039672 4096 0.60
supervise 1805 W 202,1 17040040 4096 0.62
supervise 1805 W 202,1 17040056 4096 0.47
supervise 1805 W 202,1 17040624 4096 0.49
supervise 1805 W 202,1 17040632 4096 0.44
^C
Ending tracing...
This option works by using an in-kernel filter for that PID on I/O issue. There
is also a "-n" option to match on process names, however, that currently does so
in user space, so is less efficient.
I would say that this will generally identify the origin process, but there will
be an error margin. Depending on the file system, block I/O queueing, and I/O
subsystem, this could miss events that aren't issued in this PID context but are
related to this PID (eg, triggering a read readahead on the completion of
previous I/O. Again, whether this happens is up to the file system and storage
subsystem). You can try the -Q option for more reliable process identification.
The -Q option begins tracing on block I/O queue insert, instead of issue.
Here's before and after, while dd(1) writes a large file:
# ./iosnoop
Tracing block I/O. Ctrl-C to end.
COMM PID TYPE DEV BLOCK BYTES LATms
dd 26983 WS 202,16 4064416 45056 16.70
dd 26983 WS 202,16 4064504 45056 16.72
dd 26983 WS 202,16 4064592 45056 16.74
dd 26983 WS 202,16 4064680 45056 16.75
cat 27031 WS 202,16 4064768 45056 16.56
cat 27031 WS 202,16 4064856 45056 16.46
cat 27031 WS 202,16 4064944 45056 16.40
gawk 27030 WS 202,16 4065032 45056 0.88
gawk 27030 WS 202,16 4065120 45056 1.01
gawk 27030 WS 202,16 4065208 45056 16.15
gawk 27030 WS 202,16 4065296 45056 16.16
gawk 27030 WS 202,16 4065384 45056 16.16
[...]
The output here shows the block I/O time from issue to completion (LATms),
which is largely representative of the device.
The process names and PIDs identify dd, cat, and gawk. By default iosnoop shows
who is on-CPU at time of block I/O issue, but these may not be the processes
that originated the I/O. In this case (having debugged it), the reason is that
processes such as cat and gawk are making hypervisor calls (this is a Xen
guest instance), eg, for memory operations, and during hypervisor processing a
queue of pending work is checked and dispatched. So cat and gawk were on-CPU
when the block device I/O was issued, but they didn't originate it.
Now the -Q option is used:
# ./iosnoop -Q
Tracing block I/O. Ctrl-C to end.
COMM PID TYPE DEV BLOCK BYTES LATms
kjournald 1217 WS 202,16 6132200 45056 141.12
kjournald 1217 WS 202,16 6132288 45056 141.10
kjournald 1217 WS 202,16 6132376 45056 141.10
kjournald 1217 WS 202,16 6132464 45056 141.11
kjournald 1217 WS 202,16 6132552 40960 141.11
dd 27718 WS 202,16 6132624 4096 0.18
flush:16-1279 1279 W 202,16 6132632 20480 0.52
flush:16-1279 1279 W 202,16 5940856 4096 0.50
flush:16-1279 1279 W 202,16 5949056 4096 0.52
flush:16-1279 1279 W 202,16 5957256 4096 0.54
flush:16-1279 1279 W 202,16 5965456 4096 0.56
flush:16-1279 1279 W 202,16 5973656 4096 0.58
flush:16-1279 1279 W 202,16 5981856 4096 0.60
flush:16-1279 1279 W 202,16 5990056 4096 0.63
[...]
This uses the block_rq_insert tracepoint as the starting point of I/O, instead
of block_rq_issue. This makes the following differences to columns and options:
- COMM: more likely to show the originating process.
- PID: more likely to show the originating process.
- LATms: shows the I/O time, including time spent on the block I/O queue.
- STARTs (not shown above): shows the time of queue insert, not I/O issue.
- -p PID: more likely to match the originating process.
- -n name: more likely to match the originating process.
The reason that this ftrace-based iosnoop does not just instrument both insert
and issue tracepoints is one of overhead. Even with buffering, iosnoop can
have difficulty under high load.
If I want to capture events for post-processing, I use the duration mode, which
not only lets me set the duration, but also uses buffering, which reduces the
overheads of tracing.
Capturing 5 seconds, with both start timestamps (-s) and end timestamps (-t):
# time ./iosnoop -ts 5 > out
real 0m5.566s
user 0m0.336s
sys 0m0.140s
# wc out
27010 243072 2619744 out
This server is doing over 5,000 disk IOPS. Even with buffering, this did
consume a measurable amount of CPU to capture: 0.48 seconds of CPU time in
total. Note that the run took 5.57 seconds: this is 5 seconds for the capture,
followed by the CPU time for iosnoop to fetch and process the buffer.
Now tracing for 30 seconds:
# time ./iosnoop -ts 30 > out
real 0m31.207s
user 0m0.884s
sys 0m0.472s
# wc out
64259 578313 6232898 out
Since it's the same server and workload, this should have over 150k events,
but only has 64k. The tracing buffer has overflowed, and events have been
dropped. If I really must capture this many events, I can either increase
the trace buffer size (it's the bufsize_kb setting in the script), or, use
a different tracer (perf_evets, SystemTap, ktap, etc.) If the IOPS rate is low
(eg, less than 5k), then unbuffered (no duration), despite the higher overheads,
may be sufficient, and will keep capturing events until Ctrl-C.
Here's an example of digging into the sequence of I/O to explain an outlier.
My randread program on an SSD server (which is an AWS EC2 instance) usually
experiences about 0.15 ms I/O latency, but there are some outliers as high as
20 milliseconds. Here's an excerpt:
# ./iosnoop -ts > out
# more out
Tracing block I/O. Ctrl-C to end.
STARTs ENDs COMM PID TYPE DEV BLOCK BYTES LATms
6037559.121523 6037559.121685 randread 22341 R 202,32 29295416 8192 0.16
6037559.121719 6037559.121874 randread 22341 R 202,16 27515304 8192 0.16
[...]
6037595.999508 6037596.000051 supervise 1692 W 202,1 12862968 4096 0.54
6037595.999513 6037596.000144 supervise 1687 W 202,1 17040160 4096 0.63
6037595.999634 6037596.000309 supervise 1693 W 202,1 17040168 4096 0.68
6037595.999937 6037596.000440 supervise 1693 W 202,1 17040176 4096 0.50
6037596.000579 6037596.001192 supervise 1689 W 202,1 17040184 4096 0.61
6037596.000826 6037596.001360 supervise 1689 W 202,1 17040192 4096 0.53
6037595.998302 6037596.018133 randread 22341 R 202,32 954168 8192 20.03
6037595.998303 6037596.018150 randread 22341 R 202,32 954200 8192 20.05
6037596.018182 6037596.018347 randread 22341 R 202,32 18836600 8192 0.16
[...]
It's important to sort on the I/O completion time (ENDs). In this case it's
already in the correct order.
So my 20 ms reads happened after a large group of supervise writes were
completed (I truncated dozens of supervise write lines to keep this example
short). Other latency outliers in this output file showed the same sequence:
slow reads after a batch of writes.
Note the I/O request timestamp (STARTs), which shows that these 20 ms reads were
issued before the supervise writes – so they had been sitting on a queue. I've
debugged this type of issue many times before, but this one is different: those
writes were to a different device (202,1), so I would have assumed they would be
on different queues, and wouldn't interfere with each other. Somewhere in this
system (Xen guest) it looks like there is a shared queue. (Having just
discovered this using iosnoop, I can't yet tell you which queue, but I'd hope
that after identifying it there would be a way to tune its queueing behavior,
so that we can eliminate or reduce the severity of these outliers.)
Use -h to print the USAGE message:
# ./iosnoop -h
USAGE: iosnoop [-hQst] [-d device] [-i iotype] [-p PID] [-n name]
[duration]
-d device # device string (eg, "202,1)
-i iotype # match type (eg, '*R*' for all reads)
-n name # process name to match on I/O issue
-p PID # PID to match on I/O issue
-Q # use queue insert as start time
-s # include start time of I/O (s)
-t # include completion time of I/O (s)
-h # this usage message
duration # duration seconds, and use buffers
eg,
iosnoop # watch block I/O live (unbuffered)
iosnoop 1 # trace 1 sec (buffered)
iosnoop -Q # include queueing time in LATms
iosnoop -ts # include start and end timestamps
iosnoop -i '*R*' # trace reads
iosnoop -p 91 # show I/O issued when PID 91 is on-CPU
iosnoop -Qp 91 # show I/O queued by PID 91, queue time
See the man page and example file for more info.
perf-tools-unstable-0.0.1~20150130+git85414b0/examples/killsnoop_example.txt 0000664 0000000 0000000 00000004662 12542613570 0026157 0 ustar 00root root 0000000 0000000 Demonstrations of killsnoop, the Linux ftrace version.
What signals are happening on my system?
# ./killsnoop
Tracing kill()s. Ctrl-C to end.
COMM PID TPID SIGNAL RETURN
postgres 2209 2148 10 0
postgres 5416 2209 12 0
postgres 5416 2209 12 0
supervise 2135 5465 15 0
supervise 2135 5465 18 0
^C
Ending tracing...
The first line of output shows that PID 2209, process name "postgres", has
sent a signal 10 (SIGUSR1) to target PID 2148. This signal returned success (0).
kilsnoop traces the kill() syscall, which is used to send signals to other
processes. These signals can include SIGKILL and SIGTERM, both of which
ultimately kill the target process (in different fashions), but the signals
may also include other operations, including checking if a process still
exists (signal 0). To read more about signals, see "man -s7 signal".
killsnoop can be useful to identify why some processes are abruptly and
unexpectedly ending (also check for the OOM killer in dmesg).
The -s option can be used to print signal names instead of numbers:
# ./killsnoop -s
Tracing kill()s. Ctrl-C to end.
COMM PID KILLED SIGNAL RETURN
postgres 2209 2148 SIGUSR1 0
postgres 5665 2209 SIGUSR2 0
postgres 5665 2209 SIGUSR2 0
supervise 2135 5711 SIGTERM 0
supervise 2135 5711 SIGCONT 0
bash 27450 27450 0 0
[...]
On the last line: there wasn't a nice signal name for signal 0, so just numeric
0 is printed. You'll see signal 0's used to check if processes still exist.
Use -h to print the USAGE message:
# ./opensnoop -h
USAGE: killsnoop [-ht] [-d secs] [-p PID] [-n name] [filename]
-d seconds # trace duration, and use buffers
-n name # process name to match
-p PID # PID to match on kill issue
-t # include time (seconds)
-s # human readable signal names
-h # this usage message
eg,
killsnoop # watch kill()s live (unbuffered)
killsnoop -d 1 # trace 1 sec (buffered)
killsnoop -p 181 # trace kill()s issued to PID 181 only
See the man page and example file for more info.
perf-tools-unstable-0.0.1~20150130+git85414b0/examples/kprobe_example.txt 0000664 0000000 0000000 00000052313 12542613570 0025423 0 ustar 00root root 0000000 0000000 Demonstrations of kprobe, the Linux ftrace version.
This traces the kernel do_sys_open() function, when it is called:
# ./kprobe p:do_sys_open
Tracing kprobe do_sys_open. Ctrl-C to end.
kprobe-26042 [001] d... 6910441.001452: do_sys_open: (do_sys_open+0x0/0x220)
kprobe-26042 [001] d... 6910441.001475: do_sys_open: (do_sys_open+0x0/0x220)
kprobe-26042 [001] d... 6910441.001866: do_sys_open: (do_sys_open+0x0/0x220)
kprobe-26042 [001] d... 6910441.001966: do_sys_open: (do_sys_open+0x0/0x220)
supervise-1689 [000] d... 6910441.083302: do_sys_open: (do_sys_open+0x0/0x220)
supervise-1693 [001] d... 6910441.083530: do_sys_open: (do_sys_open+0x0/0x220)
supervise-1689 [000] d... 6910441.083759: do_sys_open: (do_sys_open+0x0/0x220)
supervise-1693 [001] d... 6910441.083877: do_sys_open: (do_sys_open+0x0/0x220)
[...]
The "p:" is for creating a probe. Use "r:" to probe the return of the function:
# ./kprobe r:do_sys_open
Tracing kprobe do_sys_open. Ctrl-C to end.
kprobe-29475 [001] d... 6910688.229777: do_sys_open: (SyS_open+0x1e/0x20 <- do_sys_open)
<...>-29476 [001] d... 6910688.231101: do_sys_open: (SyS_open+0x1e/0x20 <- do_sys_open)
<...>-29476 [001] d... 6910688.231123: do_sys_open: (SyS_open+0x1e/0x20 <- do_sys_open)
<...>-29476 [001] d... 6910688.231530: do_sys_open: (SyS_open+0x1e/0x20 <- do_sys_open)
<...>-29476 [001] d... 6910688.231624: do_sys_open: (SyS_open+0x1e/0x20 <- do_sys_open)
supervise-1685 [001] d... 6910688.328776: do_sys_open: (SyS_open+0x1e/0x20 <- do_sys_open)
supervise-1689 [000] d... 6910688.328780: do_sys_open: (SyS_open+0x1e/0x20 <- do_sys_open)
[...]
This output includes the function that the traced function is returning to.
The trace output can be a little different between kernel versions. Use -H to
print the header:
# ./kprobe -H p:do_sys_open
Tracing kprobe do_sys_open. Ctrl-C to end.
# tracer: nop
#
# entries-in-buffer/entries-written: 4/4 #P:2
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
kprobe-27952 [001] d... 6910580.008086: do_sys_open: (do_sys_open+0x0/0x220)
kprobe-27952 [001] d... 6910580.008109: do_sys_open: (do_sys_open+0x0/0x220)
kprobe-27952 [001] d... 6910580.008483: do_sys_open: (do_sys_open+0x0/0x220)
[...]
These columns are explained in the kernel source under Documentation/trace/ftrace.txt.
This traces do_sys_open() returns, using a probe alias "myopen", and showing
the return value ($retval):
# ./kprobe 'r:myopen do_sys_open $retval'
Tracing kprobe myopen. Ctrl-C to end.
kprobe-26386 [001] d... 6593278.858754: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x3
<...>-26387 [001] d... 6593278.860043: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x3
<...>-26387 [001] d... 6593278.860064: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x3
<...>-26387 [001] d... 6593278.860433: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x3
<...>-26387 [001] d... 6593278.860521: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x3
supervise-1685 [001] d... 6593279.178806: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x9
supervise-1689 [001] d... 6593279.228756: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x9
supervise-1689 [001] d... 6593279.229106: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x9
supervise-1688 [000] d... 6593279.229501: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x9
supervise-1695 [000] d... 6593279.229944: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x9
supervise-1685 [001] d... 6593279.230104: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x9
supervise-1687 [001] d... 6593279.230293: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x9
supervise-1699 [000] d... 6593279.230381: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x9
supervise-1692 [000] d... 6593279.230825: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x9
supervise-1698 [000] d... 6593279.230915: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x9
supervise-1698 [000] d... 6593279.231277: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x9
supervise-1690 [000] d... 6593279.231703: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) arg1=0x9
^C
Ending tracing...
The string specified, 'r:myopen do_sys_open $retval', is a kprobe definition,
and is the same as those documented in the Linux kernel source under
Documentation/trace/kprobetrace.txt, which can be written to the
/sys/kernel/debug/tracing/kprobe_events file.
Apart from probe name aliases, you can also provide arbitrary names for
arguments. Eg, instead of the "arg1" default, calling it "rval":
# ./kprobe 'r:myopen do_sys_open rval=$retval'
Tracing kprobe myopen. Ctrl-C to end.
kprobe-27454 [001] d... 6593356.250019: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) rval=0x3
<...>-27455 [001] d... 6593356.251280: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) rval=0x3
<...>-27455 [001] d... 6593356.251301: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) rval=0x3
<...>-27455 [001] d... 6593356.251672: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) rval=0x3
<...>-27455 [001] d... 6593356.251769: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) rval=0x3
supervise-1689 [000] d... 6593356.859758: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) rval=0x9
supervise-1689 [000] d... 6593356.860143: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) rval=0x9
supervise-1696 [000] d... 6593356.862682: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) rval=0x9
supervise-1685 [001] d... 6593356.862684: myopen: (SyS_open+0x1e/0x20 <- do_sys_open) rval=0x9
[...]
That's a bit better.
Tracing the open() mode:
# ./kprobe 'p:myopen do_sys_open mode=%cx:u16'
Tracing kprobe myopen. Ctrl-C to end.
kprobe-29572 [001] d... 6593503.353923: myopen: (do_sys_open+0x0/0x220) mode=0x1
kprobe-29572 [001] d... 6593503.353945: myopen: (do_sys_open+0x0/0x220) mode=0x0
kprobe-29572 [001] d... 6593503.354307: myopen: (do_sys_open+0x0/0x220) mode=0x5c00
kprobe-29572 [001] d... 6593503.354401: myopen: (do_sys_open+0x0/0x220) mode=0x0
supervise-1689 [000] d... 6593503.944125: myopen: (do_sys_open+0x0/0x220) mode=0x1a4
supervise-1688 [001] d... 6593503.944125: myopen: (do_sys_open+0x0/0x220) mode=0x1a4
supervise-1688 [001] d... 6593503.944606: myopen: (do_sys_open+0x0/0x220) mode=0x1a4
supervise-1689 [000] d... 6593503.944606: myopen: (do_sys_open+0x0/0x220) mode=0x1a4
supervise-1698 [000] d... 6593503.944728: myopen: (do_sys_open+0x0/0x220) mode=0x1a4
supervise-1698 [000] d... 6593503.945077: myopen: (do_sys_open+0x0/0x220) mode=0x1a4
[...]
Here I guessed that the mode was in register %cx, and cast it as a 16-bit
unsigned integer (":u16"). Your platform and kernel may be different, and the
mode may be in a different register. If fiddling with such registers becomes too
painful or unreliable for you, consider installing kernel debuginfo and using
the named variables with perf_events "perf probe".
Tracing the open() filename:
# ./kprobe 'p:myopen do_sys_open filename=+0(%si):string'
Tracing kprobe myopen. Ctrl-C to end.
kprobe-32369 [001] d... 6593706.999728: myopen: (do_sys_open+0x0/0x220) filename="/etc/ld.so.cache"
kprobe-32369 [001] d... 6593706.999748: myopen: (do_sys_open+0x0/0x220) filename="/lib/x86_64-linux-gnu/libc.so.6"
kprobe-32369 [001] d... 6593707.000092: myopen: (do_sys_open+0x0/0x220) filename="/usr/lib/locale/locale-archive"
kprobe-32369 [001] d... 6593707.000176: myopen: (do_sys_open+0x0/0x220) filename="trace_pipe"
supervise-1699 [000] d... 6593707.254970: myopen: (do_sys_open+0x0/0x220) filename="supervise/status.new"
supervise-1689 [001] d... 6593707.254970: myopen: (do_sys_open+0x0/0x220) filename="supervise/status.new"
supervise-1689 [001] d... 6593707.255432: myopen: (do_sys_open+0x0/0x220) filename="supervise/status.new"
supervise-1699 [000] d... 6593707.255432: myopen: (do_sys_open+0x0/0x220) filename="supervise/status.new"
supervise-1695 [001] d... 6593707.258805: myopen: (do_sys_open+0x0/0x220) filename="supervise/status.new"
[...]
As mentioned previously, the %si register may be different on your platform.
In this example, I cast it as a string.
Specifying a duration will buffer in-kernel (reducing overhead), and write at
the end. Here's tracing for 10 seconds, and writing to the "out" file:
# ./kprobe -d 10 'p:myopen do_sys_open filename=+0(%si):string' > out
You can match on a single PID only:
# ./kprobe -p 1696 'p:myopen do_sys_open filename=+0(%si):string'
Tracing kprobe myopen. Ctrl-C to end.
supervise-1696 [001] d... 6593773.677033: myopen: (do_sys_open+0x0/0x220) filename="supervise/status.new"
supervise-1696 [001] d... 6593773.677332: myopen: (do_sys_open+0x0/0x220) filename="supervise/status.new"
supervise-1696 [001] d... 6593774.697144: myopen: (do_sys_open+0x0/0x220) filename="supervise/status.new"
supervise-1696 [001] d... 6593774.697675: myopen: (do_sys_open+0x0/0x220) filename="supervise/status.new"
supervise-1696 [001] d... 6593775.717986: myopen: (do_sys_open+0x0/0x220) filename="supervise/status.new"
supervise-1696 [001] d... 6593775.718499: myopen: (do_sys_open+0x0/0x220) filename="supervise/status.new"
^C
Ending tracing...
This will only show events when that PID is on-CPU.
The -v option will show you the available variables you can use in custom
filters:
# ./kprobe -v 'p:myopen do_sys_open filename=+0(%si):string'
name: myopen
ID: 1443
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:unsigned long __probe_ip; offset:8; size:8; signed:0;
field:__data_loc char[] filename; offset:16; size:4; signed:1;
print fmt: "(%lx) filename=\"%s\"", REC->__probe_ip, __get_str(filename)
Tracing filenames that end in "stat", by adding a filter:
# ./kprobe 'p:myopen do_sys_open filename=+0(%si):string' 'filename ~ "*stat"'
Tracing kprobe myopen. Ctrl-C to end.
postgres-1172 [000] d... 6594028.787166: myopen: (do_sys_open+0x0/0x220) filename="pg_stat_tmp/pgstat.stat"
postgres-1172 [001] d... 6594028.797410: myopen: (do_sys_open+0x0/0x220) filename="pg_stat_tmp/pgstat.stat"
postgres-1172 [001] d... 6594028.797467: myopen: (do_sys_open+0x0/0x220) filename="pg_stat_tmp/pgstat.stat"
postgres-4443 [001] d... 6594028.800908: myopen: (do_sys_open+0x0/0x220) filename="pg_stat_tmp/pgstat.stat"
postgres-4443 [000] d... 6594028.811237: myopen: (do_sys_open+0x0/0x220) filename="pg_stat_tmp/pgstat.stat"
postgres-4443 [000] d... 6594028.811290: myopen: (do_sys_open+0x0/0x220) filename="pg_stat_tmp/pgstat.stat"
^C
Ending tracing...
This filtering is done in-kernel context.
As an example of tracing a deeper kernel function, lets trace bio_alloc() and
entry registers:
# ./kprobe 'p:myprobe bio_alloc %ax %bx %cx %dx'
Tracing kprobe myprobe. Ctrl-C to end.
supervise-3055 [000] 2172148.728250: myprobe: (bio_alloc+0x0/0x30) arg1=ffff880064acc8d0 arg2=ffff8800e56a7990 arg3=0 arg4=ffff880064acc910
supervise-3055 [000] 2172148.728527: myprobe: (bio_alloc+0x0/0x30) arg1=ffff880064acf948 arg2=ffff8800e56a7990 arg3=0 arg4=ffff880064acf988
jbd2/xvda1-8-212 [000] 2172149.749474: myprobe: (bio_alloc+0x0/0x30) arg1=ffffffff arg2=ffff8800ad1f87b8 arg3=ffff8800ba22c06c arg4=8
jbd2/xvda1-8-212 [000] 2172149.749485: myprobe: (bio_alloc+0x0/0x30) arg1=0 arg2=ffff880089d053a8 arg3=10f16c5bb arg4=0
jbd2/xvda1-8-212 [000] 2172149.749487: myprobe: (bio_alloc+0x0/0x30) arg1=0 arg2=ffff880089d05958 arg3=5 arg4=0
jbd2/xvda1-8-212 [000] 2172149.749488: myprobe: (bio_alloc+0x0/0x30) arg1=0 arg2=ffff880089d05b60 arg3=5 arg4=0
jbd2/xvda1-8-212 [000] 2172149.749489: myprobe: (bio_alloc+0x0/0x30) arg1=0 arg2=ffff880089d05820 arg3=5 arg4=0
jbd2/xvda1-8-212 [000] 2172149.749489: myprobe: (bio_alloc+0x0/0x30) arg1=0 arg2=ffff880089d055b0 arg3=5 arg4=0
jbd2/xvda1-8-212 [000] 2172149.749490: myprobe: (bio_alloc+0x0/0x30) arg1=0 arg2=ffff88006ff22ea0 arg3=5 arg4=0
jbd2/xvda1-8-212 [000] 2172149.749491: myprobe: (bio_alloc+0x0/0x30) arg1=0 arg2=ffff880089d1f000 arg3=5 arg4=0
jbd2/xvda1-8-212 [000] 2172149.749492: myprobe: (bio_alloc+0x0/0x30) arg1=0 arg2=ffff880089d1f138 arg3=5 arg4=0
jbd2/xvda1-8-212 [000] 2172149.749493: myprobe: (bio_alloc+0x0/0x30) arg1=0 arg2=ffff88005d267138 arg3=5 arg4=0
jbd2/xvda1-8-212 [000] 2172149.749494: myprobe: (bio_alloc+0x0/0x30) arg1=0 arg2=ffff88005d267680 arg3=5 arg4=0
jbd2/xvda1-8-212 [000] 2172149.749495: myprobe: (bio_alloc+0x0/0x30) arg1=0 arg2=ffff88005d2675b0 arg3=5 arg4=0
jbd2/xvda1-8-212 [000] 2172149.751044: myprobe: (bio_alloc+0x0/0x30) arg1=ffffffff arg2=ffff8800cc241ea0 arg3=445f0300 arg4=ffff8800effba000
supervise-3055 [000] 2172149.751095: myprobe: (bio_alloc+0x0/0x30) arg1=ffff880064acf948 arg2=ffff8800e56a7990 arg3=0 arg4=ffff880064acf988
supervise-3055 [000] 2172149.751341: myprobe: (bio_alloc+0x0/0x30) arg1=ffff880064acc8d0 arg2=ffff8800e56a7990 arg3=0 arg4=ffff880064acc910
supervise-3055 [000] 2172150.772033: myprobe: (bio_alloc+0x0/0x30) arg1=ffff880064acc8d0 arg2=ffff8800e56a7990 arg3=0 arg4=ffff880064acc910
supervise-3055 [000] 2172150.772305: myprobe: (bio_alloc+0x0/0x30) arg1=ffff880064acf948 arg2=ffff8800e56a7990 arg3=0 arg4=ffff880064acf988
flush-202:1-409 [000] 2172151.087815: myprobe: (bio_alloc+0x0/0x30) arg1=ffffffff arg2=ffff8800da51d6e8 arg3=16afd arg4=1
flush-202:1-409 [000] 2172151.087829: myprobe: (bio_alloc+0x0/0x30) arg1=ffffffff arg2=ffff8800e7537f08 arg3=16afd arg4=2
flush-202:1-409 [000] 2172151.087844: myprobe: (bio_alloc+0x0/0x30) arg1=ffffffff arg2=ffff8800e7519af8 arg3=16afd arg4=3
flush-202:1-409 [000] 2172151.087846: myprobe: (bio_alloc+0x0/0x30) arg1=ffffffff arg2=ffff8800e7511478 arg3=16afd arg4=4
flush-202:1-409 [000] 2172151.087849: myprobe: (bio_alloc+0x0/0x30) arg1=ffffffff arg2=ffff8800e75e6a90 arg3=16afd arg4=5
flush-202:1-409 [000] 2172151.087851: myprobe: (bio_alloc+0x0/0x30) arg1=ffffffff arg2=ffff8800e7512bc8 arg3=16afd arg4=6
flush-202:1-409 [000] 2172151.087853: myprobe: (bio_alloc+0x0/0x30) arg1=ffffffff arg2=ffff8800eb3bf410 arg3=16afd arg4=7
^C
The output includes who is on-CPU, high resolution timestamps, and the arguments
we requested (registers %ax to %dx). These registers are platform dependent,
and are mapped by the compiler to the entry arguments of the function.
How are these useful? If you are debugging this kernel function, you'll know. :)
Note that you can add qualifiers, eg, if I knew %ax was a uint32:
# ./kprobe 'p:myprobe bio_alloc %ax:u32'
Tracing kprobe myprobe. Ctrl-C to end.
supervise-3055 [000] 2172389.734606: myprobe: (bio_alloc+0x0/0x30) arg1=64acf948
supervise-3055 [000] 2172389.734865: myprobe: (bio_alloc+0x0/0x30) arg1=64acc8d0
supervise-3055 [000] 2172390.772391: myprobe: (bio_alloc+0x0/0x30) arg1=64acf948
supervise-3055 [000] 2172390.772676: myprobe: (bio_alloc+0x0/0x30) arg1=64acc8d0
^C
Ending tracing...
You can give them aliases too, instead of the default arg1..N:
# ./kprobe 'p:myprobe bio_alloc ax=%ax'
Tracing kprobe myprobe. Ctrl-C to end.
supervise-3055 [000] 2172420.451663: myprobe: (bio_alloc+0x0/0x30) ax=ffff880064acc8d0
supervise-3055 [000] 2172420.451938: myprobe: (bio_alloc+0x0/0x30) ax=ffff880064acf948
flush-202:1-409 [000] 2172421.163462: myprobe: (bio_alloc+0x0/0x30) ax=ffff880064acc8d0
supervise-3055 [000] 2172421.500994: myprobe: (bio_alloc+0x0/0x30) ax=ffff880064acc8d0
supervise-3055 [000] 2172421.501307: myprobe: (bio_alloc+0x0/0x30) ax=ffff880064acf948
^C
Ending tracing...
Now for the return of bio_alloc():
# ./kprobe 'r:myprobe bio_alloc $retval'
Tracing kprobe myprobe. Ctrl-C to end.
supervise-3055 [000] 2172164.145533: myprobe: (io_submit_init.isra.6+0x74/0x100 <- bio_alloc) arg1=ffff8800e55843c0
supervise-3055 [000] 2172164.145829: myprobe: (io_submit_init.isra.6+0x74/0x100 <- bio_alloc) arg1=ffff8800e5584840
jbd2/xvda1-8-212 [000] 2172165.166453: myprobe: (submit_bh+0x76/0x120 <- bio_alloc) arg1=ffff8800e57596c0
jbd2/xvda1-8-212 [000] 2172165.166493: myprobe: (submit_bh+0x76/0x120 <- bio_alloc) arg1=ffff8800e5759c00
jbd2/xvda1-8-212 [000] 2172165.166496: myprobe: (submit_bh+0x76/0x120 <- bio_alloc) arg1=ffff8800e5759600
jbd2/xvda1-8-212 [000] 2172165.166497: myprobe: (submit_bh+0x76/0x120 <- bio_alloc) arg1=ffff8800e5759e40
jbd2/xvda1-8-212 [000] 2172165.166498: myprobe: (submit_bh+0x76/0x120 <- bio_alloc) arg1=ffff8800e57590c0
jbd2/xvda1-8-212 [000] 2172165.166500: myprobe: (submit_bh+0x76/0x120 <- bio_alloc) arg1=ffff8800e57599c0
jbd2/xvda1-8-212 [000] 2172165.166500: myprobe: (submit_bh+0x76/0x120 <- bio_alloc) arg1=ffff8800e5759a80
jbd2/xvda1-8-212 [000] 2172165.166502: myprobe: (submit_bh+0x76/0x120 <- bio_alloc) arg1=ffff8800e5759f00
jbd2/xvda1-8-212 [000] 2172165.166503: myprobe: (submit_bh+0x76/0x120 <- bio_alloc) arg1=ffff8800e5759540
jbd2/xvda1-8-212 [000] 2172165.166504: myprobe: (submit_bh+0x76/0x120 <- bio_alloc) arg1=ffff8800e5759180
jbd2/xvda1-8-212 [000] 2172165.166504: myprobe: (submit_bh+0x76/0x120 <- bio_alloc) arg1=ffff8800e5759900
jbd2/xvda1-8-212 [000] 2172165.166505: myprobe: (submit_bh+0x76/0x120 <- bio_alloc) arg1=ffff8800e5759000
jbd2/xvda1-8-212 [000] 2172165.166506: myprobe: (submit_bh+0x76/0x120 <- bio_alloc) arg1=ffff8800e5759480
<...>-212 [000] 2172165.176261: myprobe: (submit_bh+0x76/0x120 <- bio_alloc) arg1=ffff8800e5759480
supervise-3055 [000] 2172165.176317: myprobe: (io_submit_init.isra.6+0x74/0x100 <- bio_alloc) arg1=ffff8800e57596c0
supervise-3055 [000] 2172165.176586: myprobe: (io_submit_init.isra.6+0x74/0x100 <- bio_alloc) arg1=ffff8800e5759900
^C
Ending tracing...
Great. This output includes the function we are returning to, in most cases,
submit_bh().
Note that this mode (without a duration) prints events as they happen,
so the overheads can be high for frequent events. You could try the -d mode,
which buffers in-kernel.
The -s option will print the kernel stack trace after the event:
# ./kprobe -s 'p:mytcp tcp_init_cwnd'
Tracing kprobe mytcp. Ctrl-C to end.
sshd-5121 [000] d... 6897275.911301: mytcp: (tcp_init_cwnd+0x0/0x40)
sshd-5121 [000] d... 6897275.911309:
=> tcp_write_xmit
=> __tcp_push_pending_frames
=> tcp_push
=> tcp_sendmsg
=> inet_sendmsg
=> sock_aio_write
=> do_sync_write
=> vfs_write
=> SyS_write
=> system_call_fastpath
sshd-32219 [000] d... 6897275.911467: mytcp: (tcp_init_cwnd+0x0/0x40)
sshd-32219 [000] d... 6897275.911471:
=> tcp_write_xmit
=> __tcp_push_pending_frames
=> tcp_push
=> tcp_sendmsg
=> inet_sendmsg
=> sock_aio_write
=> do_sync_write
=> vfs_write
=> SyS_write
=> system_call_fastpath
sshd-5121 [000] d... 6897277.878794: mytcp: (tcp_init_cwnd+0x0/0x40)
sshd-5121 [000] d... 6897277.878801:
=> tcp_write_xmit
=> __tcp_push_pending_frames
=> tcp_push
=> tcp_sendmsg
=> inet_sendmsg
=> sock_aio_write
=> do_sync_write
=> vfs_write
=> SyS_write
=> system_call_fastpath
This makes use of the kernel options/stacktrace feature.
Use -h to print the USAGE message:
# ./kprobe -h
USAGE: kprobe [-FhHsv] [-d secs] [-p PID] kprobe_definition [filter]
-F # force. trace despite warnings.
-d seconds # trace duration, and use buffers
-p PID # PID to match on I/O issue
-v # view format file (don't trace)
-H # include column headers
-s # show kernel stack traces
-h # this usage message
Note that these examples may need modification to match your kernel
version's function names and platform's register usage.
eg,
kprobe p:do_sys_open
# trace open() entry
kprobe r:do_sys_open
# trace open() return
kprobe 'r:do_sys_open $retval'
# trace open() return value
kprobe 'r:myopen do_sys_open $retval'
# use a custom probe name
kprobe 'p:myopen do_sys_open mode=%cx:u16'
# trace open() file mode
kprobe 'p:myopen do_sys_open filename=+0(%si):string'
# trace open() with filename
kprobe -s 'p:myprobe tcp_retransmit_skb'
# show kernel stacks
See the man page and example file for more info.
perf-tools-unstable-0.0.1~20150130+git85414b0/examples/opensnoop_example.txt 0000664 0000000 0000000 00000003642 12542613570 0026162 0 ustar 00root root 0000000 0000000 Demonstrations of opensnoop, the Linux ftrace version.
# ./opensnoop
Tracing open()s. Ctrl-C to end.
COMM PID FD FILE
opensnoop 5334 0x3
<...> 5343 0x3 /etc/ld.so.cache
opensnoop 5342 0x3 /etc/ld.so.cache
<...> 5343 0x3 /lib/x86_64-linux-gnu/libc.so.6
opensnoop 5342 0x3 /lib/x86_64-linux-gnu/libm.so.6
opensnoop 5342 0x3 /lib/x86_64-linux-gnu/libc.so.6
<...> 5343 0x3 /usr/lib/locale/locale-archive
<...> 5343 0x3 trace_pipe
supervise 1684 0x9 supervise/status.new
supervise 1684 0x9 supervise/status.new
supervise 1688 0x9 supervise/status.new
supervise 1688 0x9 supervise/status.new
supervise 1686 0x9 supervise/status.new
supervise 1685 0x9 supervise/status.new
supervise 1685 0x9 supervise/status.new
supervise 1686 0x9 supervise/status.new
[...]
The first several lines show opensnoop catching itself initializing.
Use -h to print the USAGE message:
# ./opensnoop -h
USAGE: opensnoop [-htx] [-d secs] [-p PID] [-n name] [filename]
-d seconds # trace duration, and use buffers
-n name # process name to match on I/O issue
-p PID # PID to match on I/O issue
-t # include time (seconds)
-x # only show failed opens
-h # this usage message
filename # match filename (partials, REs, ok)
eg,
opensnoop # watch open()s live (unbuffered)
opensnoop -d 1 # trace 1 sec (buffered)
opensnoop -p 181 # trace I/O issued by PID 181 only
opensnoop conf # trace filenames containing "conf"
opensnoop 'log$' # filenames ending in "log"
See the man page and example file for more info.
perf-tools-unstable-0.0.1~20150130+git85414b0/examples/perf-stat-hist_example.txt 0000664 0000000 0000000 00000016364 12542613570 0027021 0 ustar 00root root 0000000 0000000 Demonstrations of perf-stat-hist, the Linux perf_events version.
Tracing the net:net_dev_xmit tracepoint, and building a power-of-4 histogram
for the "len" variable, for 10 seconds:
# ./perf-stat-hist net:net_dev_xmit len 10
Tracing net:net_dev_xmit, power-of-4, max 1048576, for 10 seconds...
Range : Count Distribution
0 : 0 | |
1 -> 3 : 0 | |
4 -> 15 : 0 | |
16 -> 63 : 2 |# |
64 -> 255 : 30 |### |
256 -> 1023 : 3 |# |
1024 -> 4095 : 446 |######################################|
4096 -> 16383 : 0 | |
16384 -> 65535 : 0 | |
65536 -> 262143 : 0 | |
262144 -> 1048575 : 0 | |
1048576 -> : 0 | |
This showed that most of the network transmits were between 1024 and 4095 bytes,
with a handful between 64 and 255 bytes.
Cat the format file for the tracepoint to see what other variables are available
to trace. Eg:
# cat /sys/kernel/debug/tracing/events/net/net_dev_xmit/format
name: net_dev_xmit
ID: 1078
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:void * skbaddr; offset:8; size:8; signed:0;
field:unsigned int len; offset:16; size:4; signed:0;
field:int rc; offset:20; size:4; signed:1;
field:__data_loc char[] name; offset:24; size:4; signed:1;
print fmt: "dev=%s skbaddr=%p len=%u rc=%d", __get_str(name), REC->skbaddr, REC->len, REC->rc
That's where "len" came from.
This works by creating a series of tracepoint and filter pairs for each
histogram bucket, and doing in-kernel counts. The overhead should in many cases
be better than user space post-processing, however, this approach is still
not ideal. I've called it a "perf hacktogram". The overhead is relative to
the frequency of events, multiplied by the number of buckets. You can modify
the script to use power-of-2 instead, or whatever you like, but the overhead
for more buckets will be higher.
Histogram of the returned read() syscall sizes:
# ./perf-stat-hist syscalls:sys_exit_read ret 10
Tracing syscalls:sys_exit_read, power-of-4, max 1048576, for 10 seconds...
Range : Count Distribution
0 : 90 |# |
1 -> 3 : 9587 |######################################|
4 -> 15 : 69 |# |
16 -> 63 : 590 |### |
64 -> 255 : 250 |# |
256 -> 1023 : 389 |## |
1024 -> 4095 : 296 |## |
4096 -> 16383 : 183 |# |
16384 -> 65535 : 12 |# |
65536 -> 262143 : 0 | |
262144 -> 1048575 : 0 | |
1048576 -> : 0 | |
Most of our read()s were tiny, between 1 and 3 bytes.
Using power-of-2, and a max of 1024:
# ./perf-stat-hist -P 2 -m 1024 syscalls:sys_exit_read ret
Tracing syscalls:sys_exit_read, power-of-2, max 1024, until Ctrl-C...
^C
Range : Count Distribution
-> -1 : 29 |## |
0 -> 0 : 1 |# |
1 -> 1 : 959 |######################################|
2 -> 3 : 1 |# |
4 -> 7 : 0 | |
8 -> 15 : 2 |# |
16 -> 31 : 14 |# |
32 -> 63 : 1 |# |
64 -> 127 : 0 | |
128 -> 255 : 0 | |
256 -> 511 : 0 | |
512 -> 1023 : 1 |# |
1024 -> : 1 |# |
Specifying custom bucket sizes:
# ./perf-stat-hist -b "10 50 100 5000" syscalls:sys_exit_read ret
Tracing syscalls:sys_exit_read, specified buckets, until Ctrl-C...
^C
Range : Count Distribution
-> 9 : 989 |######################################|
10 -> 49 : 5 |# |
50 -> 99 : 0 | |
100 -> 4999 : 2 |# |
5000 -> : 0 | |
Specifying a single value to bifurcate statistics:
# ./perf-stat-hist -b 10 syscalls:sys_exit_read ret
Tracing syscalls:sys_exit_read, specified buckets, until Ctrl-C...
^C
Range : Count Distribution
-> 9 : 2959 |######################################|
10 -> : 7 |# |
This has the lowest overhead for collection, since only two tracepoint
filter pairs are used.
Use -h to print the USAGE message:
# ./perf-stat-hist -h
USAGE: perf-stat-hist [-h] [-b buckets|-P power] [-m max] tracepoint
variable [seconds]
-b buckets # specify histogram bucket points
-P power # power-of (default is 4)
-m max # max value for power-of
-h # this usage message
eg,
perf-stat-hist syscalls:sys_enter_read count 5
# read() request histogram, 5 seconds
perf-stat-hist syscalls:sys_exit_read ret 5
# read() return histogram, 5 seconds
perf-stat-hist -P 10 syscalls:sys_exit_read ret 5
# ... use power-of-10
perf-stat-hist -P 2 -m 1024 syscalls:sys_exit_read ret 5
# ... use power-of-2, max 1024
perf-stat-hist -b "10 50 100 500" syscalls:sys_exit_read ret 5
# ... histogram based on these bucket ranges
perf-stat-hist -b 10 syscalls:sys_exit_read ret 5
# ... bifurcate by the value 10 (lowest overhead)
See the man page and example file for more info.
perf-tools-unstable-0.0.1~20150130+git85414b0/examples/reset-ftrace_example.txt 0000664 0000000 0000000 00000005107 12542613570 0026524 0 ustar 00root root 0000000 0000000 Demonstrations of reset-ftrace, the Linux ftrace tool.
You will probably never need this tool. If you kill -9 an ftrace-based tool,
leaving the kernel in a tracing enabled state, you could try using this tool
to reset ftrace and disable tracing. Make sure no other ftrace sessions are
in use on your system, or it will kill those.
Here's an example:
# ./opensnoop
Tracing open()s. Ctrl-C to end.
ERROR: ftrace may be in use by PID 2197 /var/tmp/.ftrace-lock
I tried to run opensnoop, but there's a lock file for PID 2197. Checking if it
exists:
# ps -fp 2197
UID PID PPID C STIME TTY TIME CMD
#
No.
I also know that no one is using ftrace on this system. So I'll use reset-ftrace
to clean up this lock file and ftrace state:
# ./reset-ftrace
ERROR: ftrace lock (/var/tmp/.ftrace-lock) exists. It shows ftrace may be in use by PID 2197.
Double check to see if that PID is still active. If not, consider using -f to force a reset. Exiting.
... except it's complaining about the lock file too. I'm already sure that this
PID doesn't exist, so I'll add the -f option:
# ./reset-ftrace -f
Reseting ftrace state...
current_tracer, before:
1 nop
current_tracer, after:
1 nop
set_ftrace_filter, before:
1 #### all functions enabled ####
set_ftrace_filter, after:
1 #### all functions enabled ####
set_ftrace_pid, before:
1 no pid
set_ftrace_pid, after:
1 no pid
kprobe_events, before:
kprobe_events, after:
Done.
The output shows what has been reset, including the before and after state of
these files.
Now I can try iosnoop again:
# ./iosnoop
Tracing block I/O. Ctrl-C to end.
COMM PID TYPE DEV BLOCK BYTES LATms
supervise 1689 W 202,1 17039664 4096 0.58
supervise 1689 W 202,1 17039672 4096 0.47
supervise 1694 W 202,1 17039744 4096 0.98
supervise 1694 W 202,1 17039752 4096 0.74
supervise 1684 W 202,1 17039760 4096 0.63
[...]
Fixed.
Note that reset-ftrace currently only resets a few methods of enabling
tracing, such as set_ftrace_filter and kprobe_events. Static tracepoints could
be enabled individually, and this script currently doesn't find and disable
those.
Use -h to print the USAGE message:
# ./reset-ftrace -h
USAGE: reset-ftrace [-fhq]
-f # force: delete ftrace lock file
-q # quiet: reset, but say nothing
-h # this usage message
eg,
reset-ftrace # disable active ftrace session
perf-tools-unstable-0.0.1~20150130+git85414b0/examples/syscount_example.txt 0000664 0000000 0000000 00000022053 12542613570 0026026 0 ustar 00root root 0000000 0000000 Demonstrations of syscount, the Linux perf_events version.
The first mode I use is "-c", where it behaves like "strace -c", but for the
entire system (all procesess) and with much lower overhead:
# ./syscount -c
Tracing... Ctrl-C to end.
^Csleep: Interrupt
SYSCALL COUNT
accept 1
getsockopt 1
setsid 1
chdir 2
getcwd 2
getpeername 2
getsockname 2
setgid 2
setgroups 2
setpgid 2
setuid 2
getpgrp 4
getpid 4
rename 4
setitimer 4
setrlimit 4
setsockopt 4
statfs 4
set_tid_address 5
readlink 6
set_robust_list 6
nanosleep 7
newuname 7
faccessat 8
futex 10
clock_gettime 16
newlstat 20
pipe 20
epoll_wait 24
getrlimit 25
socket 27
connect 29
exit_group 30
getppid 31
dup2 34
wait4 51
fcntl 58
getegid 72
getgid 72
getuid 72
geteuid 75
perf_event_open 100
munmap 121
gettimeofday 216
access 266
ioctl 340
poll 348
sendto 374
mprotect 414
brk 597
rt_sigaction 632
recvfrom 664
lseek 749
newfstatat 2922
openat 2925
newfstat 3229
newstat 4334
open 4534
fchdir 5845
getdents 5854
read 7673
close 7728
select 9633
rt_sigprocmask 19886
write 34581
While tracing, the write() syscall was executed 34,581 times.
This mode uses "perf stat" to count the syscalls:* tracepoints in-kernel.
You can add a duration (-d) and limit the number shown (-t):
# ./syscount -cd 5 -t 10
Tracing for 5 seconds. Top 10 only...
SYSCALL COUNT
gettimeofday 1009
write 3583
read 8174
openat 21550
newfstat 21558
open 21824
fchdir 43098
getdents 43106
close 43694
newfstatat 110936
While tracing for 5 seconds, the newfstatat() syscall was executed 110,936
times.
Without the -c, syscount shows syscalls by process name:
# ./syscount -d 5 -t 10
Tracing for 5 seconds. Top 10 only...
[ perf record: Woken up 66 times to write data ]
[ perf record: Captured and wrote 16.513 MB perf.data (~721455 samples) ]
COMM COUNT
stat 450
perl 537
catalina.sh 1700
postgres 2094
run 2362
:6946 4764
ps 5961
sshd 45796
find 61039
So processes named "find" called 61,039 syscalls during the 5 seconds of
tracing.
Note that this mode writes a perf.data file. This is higher overhead for a
few reasons:
- all data is passed from kernel to user space, which eats CPU for the memory
copy. Note that it is buffered in an efficient way by perf_events, which
wakes up and context switches only a small number of times: 66 in this case,
to hand 16 Mbytes of trace data to user space.
- data is post-processed in user space, eating more CPU.
- data is stored on the file system in the perf.data file, consuming available
storage.
This will be improved in future kernels, but it is difficult to improve this
much further in existing kernels. For example, using a pipe to "perf script"
instead of writing perf.data can have issues with feedback loops, where
perf traces itself. This syscount version goes to lengths to avoid tracing
its own perf, but
right now with existing functionality in older kernels. The trip via perf.data
is necessary
Running without options shows syscalls by process name until Ctrl-C:
# ./syscount
Tracing... Ctrl-C to end.
^C[ perf record: Woken up 39 times to write data ]
[ perf record: Captured and wrote 9.644 MB perf.data (~421335 samples) ]
COMM COUNT
apache2 8
apacheLogParser 13
platformservice 16
snmpd 16
ntpd 21
multilog 66
supervise 84
dirname 102
echo 102
svstat 108
cut 111
bash 113
grep 132
xargs 132
redis-server 190
sed 192
setuidgid 294
stat 450
perl 537
catalina.sh 1275
postgres 1736
run 2352
:7396 4527
ps 5925
sshd 20154
find 28700
Note again it is writing a perf.data file to do this.
The -v option adds process IDs:
# ./syscount -v
Tracing... Ctrl-C to end.
^C[ perf record: Woken up 48 times to write data ]
[ perf record: Captured and wrote 12.114 MB perf.data (~529276 samples) ]
PID COMM COUNT
3599 apacheLogParser 3
7977 xargs 3
7982 supervise 3
7993 xargs 3
3575 apache2 4
1311 ntpd 6
3135 postgres 6
3600 apacheLogParser 6
3210 platformservice 8
6503 sshd 9
7978 :7978 9
7994 run 9
7968 :7968 11
7984 run 11
1451 snmpd 16
3040 svscan 17
3066 postgres 17
3133 postgres 24
3134 postgres 24
3136 postgres 24
3061 multilog 29
3055 supervise 30
7979 bash 31
7977 echo 34
7981 dirname 34
7993 echo 34
7968 svstat 36
7984 svstat 36
7975 cut 37
7991 cut 37
9857 bash 37
7967 :7967 40
7983 run 40
7972 :7972 41
7976 xargs 41
7988 run 41
7992 xargs 41
7969 :7969 42
7976 :7976 42
7985 run 42
7992 run 42
7973 :7973 43
7974 :7974 43
7989 run 43
7990 run 43
7973 grep 44
7989 grep 44
7975 :7975 45
7991 run 45
7970 :7970 51
7986 run 51
7981 catalina.sh 52
7974 sed 64
7990 sed 64
3455 postgres 66
7971 :7971 66
7987 run 66
7966 :7966 96
7966 setuidgid 98
3064 redis-server 110
7970 stat 150
7986 stat 150
7969 perl 179
7985 perl 179
7982 run 341
7966 catalina.sh 373
7980 postgres 432
7972 ps 1971
7988 ps 1983
9832 sshd 37511
7979 find 51040
Once you've found a process ID of interest, you can use "-c" and "-p PID" to
show syscall names. This also switches to "perf stat" mode for in-kernel
counts, and lower overhead:
# ./syscount -cp 7979
Tracing PID 7979... Ctrl-C to end.
^CSYSCALL COUNT
brk 10
newfstat 2171
open 2171
newfstatat 2175
openat 2175
close 4346
fchdir 4346
getdents 4351
write 25482
So the most frequent syscall by PID 7979 was write().
Use -h to print the USAGE message:
# ./syscount -h
USAGE: syscount [-chv] [-t top] {-p PID|-d seconds|command}
syscount # count by process name
-c # show counts by syscall name
-h # this usage message
-v # verbose: shows PID
-p PID # trace this PID only
-d seconds # duration of trace
-t num # show top number only
command # run and trace this command
eg,
syscount # syscalls by process name
syscount -c # syscalls by syscall name
syscount -d 5 # trace for 5 seconds
syscount -cp 923 # syscall names for PID 923
syscount -c ls # syscall names for "ls"
See the man page and example file for more info.
perf-tools-unstable-0.0.1~20150130+git85414b0/examples/tcpretrans_example.txt 0000664 0000000 0000000 00000006677 12542613570 0026342 0 ustar 00root root 0000000 0000000 Demonstrations of tcpretrans, the Linux ftrace version.
Tracing TCP retransmits on a busy server:
# ./tcpretrans
TIME PID LADDR:LPORT -- RADDR:RPORT STATE
05:16:44 3375 10.150.18.225:53874 R> 10.105.152.3:6001 ESTABLISHED
05:16:44 3375 10.150.18.225:53874 R> 10.105.152.3:6001 ESTABLISHED
05:16:54 4028 10.150.18.225:6002 R> 10.150.30.249:1710 ESTABLISHED
05:16:54 4028 10.150.18.225:6002 R> 10.150.30.249:1710 ESTABLISHED
05:16:54 4028 10.150.18.225:6002 R> 10.150.30.249:1710 ESTABLISHED
05:16:54 4028 10.150.18.225:6002 R> 10.150.30.249:1710 ESTABLISHED
05:16:54 4028 10.150.18.225:6002 R> 10.150.30.249:1710 ESTABLISHED
05:16:54 4028 10.150.18.225:6002 R> 10.150.30.249:1710 ESTABLISHED
05:16:54 4028 10.150.18.225:6002 R> 10.150.30.249:1710 ESTABLISHED
05:16:55 0 10.150.18.225:47115 R> 10.71.171.158:6001 ESTABLISHED
05:16:58 0 10.150.18.225:44388 R> 10.103.130.120:6001 ESTABLISHED
05:16:58 0 10.150.18.225:44388 R> 10.103.130.120:6001 ESTABLISHED
05:16:58 0 10.150.18.225:44388 R> 10.103.130.120:6001 ESTABLISHED
05:16:59 0 10.150.18.225:56086 R> 10.150.32.107:6001 ESTABLISHED
05:16:59 0 10.150.18.225:56086 R> 10.150.32.107:6001 ESTABLISHED
^C
Ending tracing...
This shows TCP retransmits by dynamically tracing the kernel function that does
the retransmit. This is a low overhead approach.
The PID may or may not make sense: it's showing the PID that was on-CPU,
however, retransmits are often timer-based, where it's the kernel that is
on-CPU.
The STATE column shows the TCP state for the socket performing the retransmit.
The "--" column is the packet type. "R>" for retransmit.
Kernel stack traces can be included with -s, which may show the type of
retransmit:
# ./tcpretrans -s
TIME PID LADDR:LPORT -- RADDR:RPORT STATE
06:21:10 19516 10.144.107.151:22 R> 10.13.106.251:32167 ESTABLISHED
=> tcp_fastretrans_alert
=> tcp_ack
=> tcp_rcv_established
=> tcp_v4_do_rcv
=> tcp_v4_rcv
=> ip_local_deliver_finish
=> ip_local_deliver
=> ip_rcv_finish
=> ip_rcv
=> __netif_receive_skb
=> netif_receive_skb
=> handle_incoming_queue
=> xennet_poll
=> net_rx_action
=> __do_softirq
=> call_softirq
=> do_softirq
=> irq_exit
=> xen_evtchn_do_upcall
=> xen_do_hypervisor_callback
This looks like a fast retransmit (inclusion of tcp_fastretrans_alert(), and
being based on receiving an ACK, rather than a timer).
The -l option will include TCP tail loss probe events (TLP; see
http://lwn.net/Articles/542642/). Eg:
# ./tcpretrans -l
TIME PID LADDR:LPORT -- RADDR:RPORT STATE
21:56:06 0 10.100.155.200:22 R> 10.10.237.72:18554 LAST_ACK
21:56:08 0 10.100.155.200:22 R> 10.10.237.72:18554 LAST_ACK
21:56:10 16452 10.100.155.200:22 R> 10.10.237.72:18554 LAST_ACK
21:56:10 0 10.100.155.200:22 L> 10.10.237.72:46408 LAST_ACK
21:56:10 0 10.100.155.200:22 R> 10.10.237.72:46408 LAST_ACK
21:56:12 0 10.100.155.200:22 R> 10.10.237.72:46408 LAST_ACK
21:56:13 0 10.100.155.200:22 R> 10.10.237.72:46408 LAST_ACK
^C
Ending tracing...
Look for "L>" in the type column ("--") for TLP events.
Use -h to print the USAGE message:
# ./tcpretrans -h
USAGE: tcpretrans [-hs]
-h # help message
-s # print stack traces
eg,
tcpretrans # trace TCP retransmits
perf-tools-unstable-0.0.1~20150130+git85414b0/examples/tpoint_example.txt 0000664 0000000 0000000 00000017307 12542613570 0025462 0 ustar 00root root 0000000 0000000 Demonstrations of tpoint, the Linux ftrace version.
Let's trace block:block_rq_issue, to see block device (disk) I/O requests:
# ./tpoint block:block_rq_issue
Tracing block:block_rq_issue. Ctrl-C to end.
supervise-1692 [001] d... 7269912.982162: block_rq_issue: 202,1 W 0 () 17039656 + 8 [supervise]
supervise-1696 [000] d... 7269912.982243: block_rq_issue: 202,1 W 0 () 12862264 + 8 [supervise]
cksum-12994 [000] d... 7269913.317924: block_rq_issue: 202,1 R 0 () 9357056 + 72 [cksum]
cksum-12994 [000] d... 7269913.319013: block_rq_issue: 202,1 R 0 () 2977536 + 144 [cksum]
cksum-12994 [000] d... 7269913.320217: block_rq_issue: 202,1 R 0 () 2986240 + 216 [cksum]
cksum-12994 [000] d... 7269913.321677: block_rq_issue: 202,1 R 0 () 620344 + 56 [cksum]
cksum-12994 [001] d... 7269913.329309: block_rq_issue: 202,1 R 0 () 9107912 + 88 [cksum]
cksum-12994 [001] d... 7269913.340133: block_rq_issue: 202,1 R 0 () 3147008 + 248 [cksum]
cksum-12994 [001] d... 7269913.354551: block_rq_issue: 202,1 R 0 () 11583488 + 256 [cksum]
cksum-12994 [001] d... 7269913.379904: block_rq_issue: 202,1 R 0 () 11583744 + 256 [cksum]
[...]
^C
Ending tracing...
Great, that was easy!
perf_events can do this as well, and is better in many ways, including a more
efficient buffering strategy, and multi-user access. It's not that easy to do
this one-liner in perf_events, however. An equivalent for recent kernels is:
perf record --no-buffer -e block:block_rq_issue -a -o - | PAGER=cat stdbuf -oL perf script -i -
Older kernels, use -D instead of --no-buffer. Even better is to set the buffer
page size to a sufficient grouping (using -m), to minimize overheads, at the
expense of liveliness of updates. Note that stack traces (-g) don't work on
my systems with this perf one-liner, however, they do work with tpoint -s.
Column headings can be printed using -H:
# ./tpoint -H block:block_rq_issue
Tracing block:block_rq_issue. Ctrl-C to end.
# tracer: nop
#
# entries-in-buffer/entries-written: 0/0 #P:2
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
supervise-1697 [000] d... 7270545.340856: block_rq_issue: 202,1 W 0 () 12862464 + 8 [supervise]
supervise-1697 [000] d... 7270545.341256: block_rq_issue: 202,1 W 0 () 12862472 + 8 [supervise]
supervise-1690 [000] d... 7270545.342363: block_rq_issue: 202,1 W 0 () 17040368 + 8 [supervise]
[...]
They are also documented in the Linux kernel source under:
Documentation/trace/ftrace.txt.
How about stacks traces for those block_rq_issue events? Adding -s:
# ./tpoint -s block:block_rq_issue
Tracing block:block_rq_issue. Ctrl-C to end.
supervise-1691 [000] d... 7269511.079179: block_rq_issue: 202,1 W 0 () 17040232 + 8 [supervise]
supervise-1691 [000] d... 7269511.079188:
=> blk_peek_request
=> do_blkif_request
=> __blk_run_queue
=> queue_unplugged
=> blk_flush_plug_list
=> blk_finish_plug
=> ext4_writepages
=> do_writepages
=> __filemap_fdatawrite_range
=> filemap_flush
=> ext4_alloc_da_blocks
=> ext4_rename
=> vfs_rename
=> SYSC_renameat2
=> SyS_renameat2
=> SyS_rename
=> system_call_fastpath
cksum-7428 [000] d... 7269511.331778: block_rq_issue: 202,1 R 0 () 9006848 + 208 [cksum]
cksum-7428 [000] d... 7269511.331784:
=> blk_peek_request
=> do_blkif_request
=> __blk_run_queue
=> queue_unplugged
=> blk_flush_plug_list
=> blk_finish_plug
=> __do_page_cache_readahead
=> ondemand_readahead
=> page_cache_async_readahead
=> generic_file_read_iter
=> new_sync_read
=> vfs_read
=> SyS_read
=> system_call_fastpath
cksum-7428 [000] d... 7269511.332631: block_rq_issue: 202,1 R 0 () 620992 + 200 [cksum]
cksum-7428 [000] d... 7269511.332639:
=> blk_peek_request
=> do_blkif_request
=> __blk_run_queue
=> queue_unplugged
=> blk_flush_plug_list
=> blk_finish_plug
=> __do_page_cache_readahead
=> ondemand_readahead
=> page_cache_sync_readahead
=> generic_file_read_iter
=> new_sync_read
=> vfs_read
=> SyS_read
=> system_call_fastpath
^C
Ending tracing...
Easy. Now I can read the ancestry to understand what actually lead to issuing
a block device (disk) I/O.
Here's insertion onto the block I/O queue (better matches processes):
# ./tpoint -s block:block_rq_insert
Tracing block:block_rq_insert. Ctrl-C to end.
cksum-11908 [000] d... 7269834.882517: block_rq_insert: 202,1 R 0 () 736304 + 256 [cksum]
cksum-11908 [000] d... 7269834.882528:
=> __elv_add_request
=> blk_flush_plug_list
=> blk_finish_plug
=> __do_page_cache_readahead
=> ondemand_readahead
=> page_cache_sync_readahead
=> generic_file_read_iter
=> new_sync_read
=> vfs_read
=> SyS_read
=> system_call_fastpath
[...]
You can also add tracepoint filters. To see what variables you can use, use -v:
# ./tpoint -v block:block_rq_issue
name: block_rq_issue
ID: 942
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:dev_t dev; offset:8; size:4; signed:0;
field:sector_t sector; offset:16; size:8; signed:0;
field:unsigned int nr_sector; offset:24; size:4; signed:0;
field:unsigned int bytes; offset:28; size:4; signed:0;
field:char rwbs[8]; offset:32; size:8; signed:1;
field:char comm[16]; offset:40; size:16; signed:1;
field:__data_loc char[] cmd; offset:56; size:4; signed:1;
print fmt: "%d,%d %s %u (%s) %llu + %u [%s]", ((unsigned int) ((REC->dev) >> 20)), ((unsigned int) ((REC->dev) & ((1U << 20) - 1))), REC->rwbs, REC->bytes, __get_str(cmd), (unsigned long long)REC->sector, REC->nr_sector, REC->comm
Now I'll add a filter to check that the rwbs field (I/O type) includes an "R",
making it a read:
# ./tpoint -s block:block_rq_insert 'rwbs ~ "*R*"'
cksum-11908 [000] d... 7269839.919098: block_rq_insert: 202,1 R 0 () 736560 + 136 [cksum]
cksum-11908 [000] d... 7269839.919107:
=> __elv_add_request
=> blk_flush_plug_list
=> blk_finish_plug
=> __do_page_cache_readahead
=> ondemand_readahead
=> page_cache_async_readahead
=> generic_file_read_iter
=> new_sync_read
=> vfs_read
=> SyS_read
=> system_call_fastpath
[...]
Use -h to print the USAGE message:
# ./tpoint -h
USAGE: tpoint [-hHsv] [-d secs] [-p PID] tracepoint [filter]
tpoint -l
-d seconds # trace duration, and use buffers
-p PID # PID to match on I/O issue
-v # view format file (don't trace)
-H # include column headers
-l # list all tracepoints
-s # show kernel stack traces
-h # this usage message
Note that these examples may need modification to match your kernel
version's function names and platform's register usage.
eg,
tpoint -l | grep open
# find tracepoints containing "open"
tpoint syscalls:sys_enter_open
# trace open() syscall entry
tpoint block:block_rq_issue
# trace block I/O issue
tpoint -s block:black_rq_issue
# show kernel stacks
See the man page and example file for more info.
perf-tools-unstable-0.0.1~20150130+git85414b0/execsnoop 0000775 0000000 0000000 00000020753 12542613570 0022003 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# execsnoop - trace process exec() with arguments.
# Written using Linux ftrace.
#
# This shows the execution of new processes, especially short-lived ones that
# can be missed by sampling tools such as top(1).
#
# USAGE: ./execsnoop [-hrt] [-n name]
#
# REQUIREMENTS: FTRACE and KPROBE CONFIG, sched:sched_process_fork tracepoint,
# and either the sys_execve, stub_execve or do_execve kernel function. You may
# already have these on recent kernels. And awk.
#
# This traces exec() from the fork()->exec() sequence, which means it won't
# catch new processes that only fork(). With the -r option, it will also catch
# processes that re-exec. It makes a best-effort attempt to retrieve the program
# arguments and PPID; if these are unavailable, 0 and "[?]" are printed
# respectively. There is also a limit to the number of arguments printed (by
# default, 8), which can be increased using -a.
#
# This implementation is designed to work on older kernel versions, and without
# kernel debuginfo. It works by dynamic tracing an execve kernel function to
# read the arguments from the %si register. The sys_execve function is tried
# first, then stub_execve and do_execve. The sched:sched_process_fork
# tracepoint is used to get the PPID. This program is a workaround that should be
# improved in the future when other kernel capabilities are made available. If
# you need a more reliable tool now, then consider other tracing alternatives
# (eg, SystemTap). This tool is really a proof of concept to see what ftrace can
# currently do.
#
# From perf-tools: https://github.com/brendangregg/perf-tools
#
# See the execsnoop(8) man page (in perf-tools) for more info.
#
# COPYRIGHT: Copyright (c) 2014 Brendan Gregg.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
# (http://www.gnu.org/copyleft/gpl.html)
#
# 07-Jul-2014 Brendan Gregg Created this.
### default variables
tracing=/sys/kernel/debug/tracing
flock=/var/tmp/.ftrace-lock; wroteflock=0
opt_duration=0; duration=; opt_name=0; name=; opt_time=0; opt_reexec=0
opt_argc=0; argc=8; max_argc=16; ftext=
trap ':' INT QUIT TERM PIPE HUP # sends execution to end tracing section
function usage {
cat <<-END >&2
USAGE: execsnoop [-hrt] [-a argc] [-d secs] [name]
-d seconds # trace duration, and use buffers
-a argc # max args to show (default 8)
-r # include re-execs
-t # include time (seconds)
-h # this usage message
name # process name to match (REs allowed)
eg,
execsnoop # watch exec()s live (unbuffered)
execsnoop -d 1 # trace 1 sec (buffered)
execsnoop grep # trace process names containing grep
execsnoop 'log$' # filenames ending in "log"
See the man page and example file for more info.
END
exit
}
function warn {
if ! eval "$@"; then
echo >&2 "WARNING: command failed \"$@\""
fi
}
function end {
# disable tracing
echo 2>/dev/null
echo "Ending tracing..." 2>/dev/null
cd $tracing
warn "echo 0 > events/kprobes/$kname/enable"
warn "echo 0 > events/sched/sched_process_fork/enable"
warn "echo -:$kname >> kprobe_events"
warn "echo > trace"
(( wroteflock )) && warn "rm $flock"
}
function die {
echo >&2 "$@"
exit 1
}
function edie {
# die with a quiet end()
echo >&2 "$@"
exec >/dev/null 2>&1
end
exit 1
}
### process options
while getopts a:d:hrt opt
do
case $opt in
a) opt_argc=1; argc=$OPTARG ;;
d) opt_duration=1; duration=$OPTARG ;;
r) opt_reexec=1 ;;
t) opt_time=1 ;;
h|?) usage ;;
esac
done
shift $(( $OPTIND - 1 ))
if (( $# )); then
opt_name=1
name=$1
shift
fi
(( $# )) && usage
### option logic
(( opt_pid && opt_name )) && die "ERROR: use either -p or -n."
(( opt_pid )) && ftext=" issued by PID $pid"
(( opt_name )) && ftext=" issued by process name \"$name\""
(( opt_file )) && ftext="$ftext for filenames containing \"$file\""
(( opt_argc && argc > max_argc )) && die "ERROR: max -a argc is $max_argc."
if (( opt_duration )); then
echo "Tracing exec()s$ftext for $duration seconds (buffered)..."
else
echo "Tracing exec()s$ftext. Ctrl-C to end."
fi
### select awk
if (( opt_duration )); then
[[ -x /usr/bin/mawk ]] && awk=mawk || awk=awk
else
# workarounds for mawk/gawk fflush behavior
if [[ -x /usr/bin/gawk ]]; then
awk=gawk
elif [[ -x /usr/bin/mawk ]]; then
awk="mawk -W interactive"
else
awk=awk
fi
fi
### check permissions
cd $tracing || die "ERROR: accessing tracing. Root user? Kernel has FTRACE?
debugfs mounted? (mount -t debugfs debugfs /sys/kernel/debug)"
### ftrace lock
[[ -e $flock ]] && die "ERROR: ftrace may be in use by PID $(cat $flock) $flock"
echo $$ > $flock || die "ERROR: unable to write $flock."
wroteflock=1
### build probe
if [[ -x /usr/bin/getconf ]]; then
bits=$(getconf LONG_BIT)
else
bits=64
[[ $(uname -m) == i* ]] && bits=32
fi
(( offset = bits / 8 ))
function makeprobe {
func=$1
kname=execsnoop_$func
kprobe="p:$kname $func"
i=0
while (( i < argc + 1 )); do
# p:kname do_execve +0(+0(%si)):string +0(+8(%si)):string ...
kprobe="$kprobe +0(+$(( i * offset ))(%si)):string"
(( i++ ))
done
}
# try in this order: sys_execve, stub_execve, do_execve
makeprobe sys_execve
### setup and begin tracing
echo nop > current_tracer
if ! echo $kprobe >> kprobe_events 2>/dev/null; then
makeprobe stub_execve
if ! echo $kprobe >> kprobe_events 2>/dev/null; then
makeprobe do_execve
if ! echo $kprobe >> kprobe_events 2>/dev/null; then
edie "ERROR: adding a kprobe for execve. Exiting."
fi
fi
fi
if ! echo 1 > events/kprobes/$kname/enable; then
edie "ERROR: enabling kprobe for execve. Exiting."
fi
if ! echo 1 > events/sched/sched_process_fork/enable; then
edie "ERROR: enabling sched:sched_process_fork tracepoint. Exiting."
fi
echo "Instrumenting $func"
(( opt_time )) && printf "%-16s " "TIMEs"
printf "%6s %6s %s\n" "PID" "PPID" "ARGS"
#
# Determine output format. It may be one of the following (newest first):
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# TASK-PID CPU# TIMESTAMP FUNCTION
# To differentiate between them, the number of header fields is counted,
# and an offset set, to skip the extra column when needed.
#
offset=$($awk 'BEGIN { o = 0; }
$1 == "#" && $2 ~ /TASK/ && NF == 6 { o = 1; }
$2 ~ /TASK/ { print o; exit }' trace)
### print trace buffer
warn "echo > trace"
( if (( opt_duration )); then
# wait then dump buffer
sleep $duration
cat -v trace
else
# print buffer live
cat -v trace_pipe
fi ) | $awk -v o=$offset -v opt_name=$opt_name -v name=$name \
-v opt_duration=$opt_duration -v opt_time=$opt_time -v kname=$kname \
-v opt_reexec=$opt_reexec '
# common fields
$1 != "#" {
# task name can contain dashes
comm = pid = $1
sub(/-[0-9][0-9]*/, "", comm)
sub(/.*-/, "", pid)
}
$1 != "#" && $(4+o) ~ /sched_process_fork/ {
cpid=$0
sub(/.* child_pid=/, "", cpid)
sub(/ .*/, "", cpid)
getppid[cpid] = pid
delete seen[pid]
}
$1 != "#" && $(4+o) ~ kname {
if (seen[pid])
next
if (opt_name && comm !~ name)
next
#
# examples:
# ... arg1="/bin/echo" arg2="1" arg3="2" arg4="3" ...
# ... arg1="sleep" arg2="2" arg3=(fault) arg4="" ...
# ... arg1="" arg2=(fault) arg3="" arg4="" ...
# the last example is uncommon, and may be a race.
#
if ($0 ~ /arg1=""/) {
args = comm " [?]"
} else {
args=$0
sub(/ arg[0-9]*=\(fault\).*/, "", args)
sub(/.*arg1="/, "", args)
gsub(/" arg[0-9]*="/, " ", args)
sub(/"$/, "", args)
if ($0 !~ /\(fault\)/)
args = args " [...]"
}
if (opt_time) {
time = $(3+o); sub(":", "", time)
printf "%-16s ", time
}
printf "%6s %6d %s\n", pid, getppid[pid], args
if (!opt_duration)
fflush()
if (!opt_reexec) {
seen[pid] = 1
delete getppid[pid]
}
}
$0 ~ /LOST.*EVENT[S]/ { print "WARNING: " $0 > "/dev/stderr" }
'
### end tracing
end
perf-tools-unstable-0.0.1~20150130+git85414b0/fs/ 0000775 0000000 0000000 00000000000 12542613570 0020453 5 ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/fs/cachestat 0000775 0000000 0000000 00000012473 12542613570 0022347 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# cachestat - show Linux page cache hit/miss statistics.
# Uses Linux ftrace.
#
# This is a proof of concept using Linux ftrace capabilities on older kernels,
# and works by using function profiling for in-kernel counters. Specifically,
# four kernel functions are traced:
#
# mark_page_accessed() for measuring cache accesses
# mark_buffer_dirty() for measuring cache writes
# add_to_page_cache_lru() for measuring page additions
# account_page_dirtied() for measuring page dirties
#
# It is possible that these functions have been renamed (or are different
# logically) for your kernel version, and this script will not work as-is.
# This script was written on Linux 3.13. This script is a sandcastle: the
# kernel may wash some away, and you'll need to rebuild.
#
# USAGE: cachestat [-Dht] [interval]
# eg,
# cachestat 5 # show stats every 5 seconds
#
# Run "cachestat -h" for full usage.
#
# WARNING: This uses dynamic tracing of kernel functions, and could cause
# kernel panics or freezes. Test, and know what you are doing, before use.
# It also traces cache activity, which can be frequent, and cost some overhead.
# The statistics should be treated as best-effort: there may be some error
# margin depending on unusual workload types.
#
# REQUIREMENTS: CONFIG_FUNCTION_PROFILER, awk.
#
# From perf-tools: https://github.com/brendangregg/perf-tools
#
# COPYRIGHT: Copyright (c) 2014 Brendan Gregg.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
# (http://www.gnu.org/copyleft/gpl.html)
#
# 28-Dec-2014 Brendan Gregg Created this.
### default variables
tracing=/sys/kernel/debug/tracing
interval=1; opt_timestamp=0; opt_debug=0
trap 'quit=1' INT QUIT TERM PIPE HUP # sends execution to end tracing section
function usage {
cat <<-END >&2
USAGE: cachestat [-Dht] [interval]
-D # print debug counters
-h # this usage message
-t # include timestamp
interval # output interval in secs (default 1)
eg,
cachestat # show stats every second
cachestat 5 # show stats every 5 seconds
See the man page and example file for more info.
END
exit
}
function warn {
if ! eval "$@"; then
echo >&2 "WARNING: command failed \"$@\""
fi
}
function die {
echo >&2 "$@"
exit 1
}
### process options
while getopts Dht opt
do
case $opt in
D) opt_debug=1 ;;
t) opt_timestamp=1 ;;
h|?) usage ;;
esac
done
shift $(( $OPTIND - 1 ))
### option logic
if (( $# )); then
interval=$1
fi
echo "Counting cache functions... Output every $interval seconds."
### check permissions
cd $tracing || die "ERROR: accessing tracing. Root user? Kernel has FTRACE?
debugfs mounted? (mount -t debugfs debugfs /sys/kernel/debug)"
### enable tracing
sysctl -q kernel.ftrace_enabled=1 # doesn't set exit status
printf "mark_page_accessed\nmark_buffer_dirty\nadd_to_page_cache_lru\naccount_page_dirtied\n" > set_ftrace_filter || \
die "ERROR: tracing these four kernel functions: mark_page_accessed,"\
"mark_buffer_dirty, add_to_page_cache_lru and account_page_dirtied (unknown kernel version?). Exiting."
warn "echo nop > current_tracer"
if ! echo 1 > function_profile_enabled; then
echo > set_ftrace_filter
die "ERROR: enabling function profiling. Have CONFIG_FUNCTION_PROFILER? Exiting."
fi
(( opt_timestamp )) && printf "%-8s " TIME
printf "%8s %8s %8s %8s %12s %10s" HITS MISSES DIRTIES RATIO "BUFFERS_MB" "CACHE_MB"
(( opt_debug )) && printf " DEBUG"
echo
### summarize
quit=0; secs=0
while (( !quit && (!opt_duration || secs < duration) )); do
(( secs += interval ))
echo 0 > function_profile_enabled
echo 1 > function_profile_enabled
sleep $interval
(( opt_timestamp )) && printf "%(%H:%M:%S)T " -1
# cat both meminfo and trace stats, and let awk pick them apart
cat /proc/meminfo trace_stat/function* | awk -v debug=$opt_debug '
# match meminfo stats:
$1 == "Buffers:" && $3 == "kB" { buffers_mb = $2 / 1024 }
$1 == "Cached:" && $3 == "kB" { cached_mb = $2 / 1024 }
# identify and save trace counts:
$2 ~ /[0-9]/ && $3 != "kB" { a[$1] += $2 }
END {
mpa = a["mark_page_accessed"]
mbd = a["mark_buffer_dirty"]
apcl = a["add_to_page_cache_lru"]
apd = a["account_page_dirtied"]
total = mpa - mbd
misses = apcl - apd
if (misses < 0)
misses = 0
hits = total - misses
ratio = 100 * hits / total
printf "%8d %8d %8d %7.1f%% %12.0f %10.0f", hits, misses, mbd,
ratio, buffers_mb, cached_mb
if (debug)
printf " (%d %d %d %d)", mpa, mbd, apcl, apd
printf "\n"
}'
done
### end tracing
echo 2>/dev/null
echo "Ending tracing..." 2>/dev/null
warn "echo 0 > function_profile_enabled"
warn "echo > set_ftrace_filter"
perf-tools-unstable-0.0.1~20150130+git85414b0/iolatency 0000775 0000000 0000000 00000017003 12542613570 0021761 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# iolatency - summarize block device I/O latency as a histogram.
# Written using Linux ftrace.
#
# This shows the distribution of latency, allowing modes and latency outliers
# to be identified and studied.
#
# USAGE: ./iolatency [-hQT] [-d device] [-i iotype] [interval [count]]
#
# REQUIREMENTS: FTRACE CONFIG and block:block_rq_* tracepoints, which you may
# already have on recent kernels.
#
# OVERHEAD: block device I/O issue and completion events are traced and buffered
# in-kernel, then processed and summarized in user space. There may be
# measurable overhead with this approach, relative to the block device IOPS.
#
# This was written as a proof of concept for ftrace.
#
# From perf-tools: https://github.com/brendangregg/perf-tools
#
# COPYRIGHT: Copyright (c) 2014 Brendan Gregg.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
# (http://www.gnu.org/copyleft/gpl.html)
#
# 20-Jul-2014 Brendan Gregg Created this.
### default variables
tracing=/sys/kernel/debug/tracing
flock=/var/tmp/.ftrace-lock
bufsize_kb=4096
opt_device=0; device=; opt_iotype=0; iotype=; opt_timestamp=0
opt_interval=0; interval=1; opt_count=0; count=0; opt_queue=0
trap ':' INT QUIT TERM PIPE HUP # sends execution to end tracing section
function usage {
cat <<-END >&2
USAGE: iolatency [-hQT] [-d device] [-i iotype] [interval [count]]
-d device # device string (eg, "202,1)
-i iotype # match type (eg, '*R*' for all reads)
-Q # use queue insert as start time
-T # timestamp on output
-h # this usage message
interval # summary interval, seconds (default 1)
count # number of summaries
eg,
iolatency # summarize latency every second
iolatency -Q # include block I/O queue time
iolatency 5 2 # 2 x 5 second summaries
iolatency -i '*R*' # trace reads
iolatency -d 202,1 # trace device 202,1 only
See the man page and example file for more info.
END
exit
}
function warn {
if ! eval "$@"; then
echo >&2 "WARNING: command failed \"$@\""
fi
}
function end {
# disable tracing
echo 2>/dev/null
echo "Ending tracing..." 2>/dev/null
cd $tracing
warn "echo 0 > events/block/$b_start/enable"
warn "echo 0 > events/block/block_rq_complete/enable"
if (( opt_device || opt_iotype )); then
warn "echo 0 > events/block/$b_start/filter"
warn "echo 0 > events/block/block_rq_complete/filter"
fi
warn "echo > trace"
(( wroteflock )) && warn "rm $flock"
}
function die {
echo >&2 "$@"
exit 1
}
function edie {
# die with a quiet end()
echo >&2 "$@"
exec >/dev/null 2>&1
end
exit 1
}
### process options
while getopts d:hi:QT opt
do
case $opt in
d) opt_device=1; device=$OPTARG ;;
i) opt_iotype=1; iotype=$OPTARG ;;
Q) opt_queue=1 ;;
T) opt_timestamp=1 ;;
h|?) usage ;;
esac
done
shift $(( $OPTIND - 1 ))
if (( $# )); then
opt_interval=1
interval=$1
shift
fi
if (( $# )); then
opt_count=1
count=$1
fi
if (( opt_device )); then
major=${device%,*}
minor=${device#*,}
dev=$(( (major << 20) + minor ))
fi
if (( opt_queue )); then
b_start=block_rq_insert
else
b_start=block_rq_issue
fi
### select awk
[[ -x /usr/bin/mawk ]] && awk='mawk -W interactive' || awk=awk
### check permissions
cd $tracing || die "ERROR: accessing tracing. Root user? Kernel has FTRACE?
debugfs mounted? (mount -t debugfs debugfs /sys/kernel/debug)"
### ftrace lock
[[ -e $flock ]] && die "ERROR: ftrace may be in use by PID $(cat $flock) $flock"
echo $$ > $flock || die "ERROR: unable to write $flock."
wroteflock=1
### setup and begin tracing
warn "echo nop > current_tracer"
warn "echo $bufsize_kb > buffer_size_kb"
filter=
if (( opt_iotype )); then
filter="rwbs ~ \"$iotype\""
fi
if (( opt_device )); then
[[ "$filter" != "" ]] && filter="$filter && "
filter="${filter}dev == $dev"
fi
if (( opt_iotype || opt_device )); then
if ! echo "$filter" > events/block/$b_start/filter || \
! echo "$filter" > events/block/block_rq_complete/filter
then
edie "ERROR: setting -d or -t filter. Exiting."
fi
fi
if ! echo 1 > events/block/$b_start/enable || \
! echo 1 > events/block/block_rq_complete/enable; then
edie "ERROR: enabling block I/O tracepoints. Exiting."
fi
etext=
(( !opt_count )) && etext=" Ctrl-C to end."
echo "Tracing block I/O. Output every $interval seconds.$etext"
#
# Determine output format. It may be one of the following (newest first):
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# TASK-PID CPU# TIMESTAMP FUNCTION
# To differentiate between them, the number of header fields is counted,
# and an offset set, to skip the extra column when needed.
#
offset=$($awk 'BEGIN { o = 0; }
$1 == "#" && $2 ~ /TASK/ && NF == 6 { o = 1; }
$2 ~ /TASK/ { print o; exit }' trace)
### print trace buffer
warn "echo > trace"
i=0
while (( !opt_count || (i < count) )); do
(( i++ ))
sleep $interval
# snapshots were added in 3.10
if [[ -x snapshot ]]; then
echo 1 > snapshot
echo > trace
cat snapshot
else
cat trace
echo > trace
fi
(( opt_timestamp )) && printf "time %(%H:%M:%S)T:\n" -1
echo "tick"
done | \
$awk -v o=$offset -v opt_timestamp=$opt_timestamp -v b_start=$b_start '
function star(sval, smax, swidth) {
stars = ""
if (smax == 0) return ""
for (si = 0; si < (swidth * sval / smax); si++) {
stars = stars "#"
}
return stars
}
BEGIN { max_i = 0 }
# common fields
$1 != "#" {
time = $(3+o); sub(":", "", time)
dev = $(5+o)
}
# block I/O request
$1 != "#" && $0 ~ b_start {
#
# example: (fields1..4+o) 202,1 W 0 () 12862264 + 8 [tar]
# The cmd field "()" might contain multiple words (hex),
# hence stepping from the right (NF-3).
#
loc = $(NF-3)
starts[dev, loc] = time
next
}
# block I/O completion
$1 != "#" && $0 ~ /rq_complete/ {
#
# example: (fields1..4+o) 202,1 W () 12862256 + 8 [0]
#
dir = $(6+o)
loc = $(NF-3)
if (starts[dev, loc] > 0) {
latency_ms = 1000 * (time - starts[dev, loc])
i = 0
for (ms = 1; latency_ms > ms; ms *= 2) { i++ }
hist[i]++
if (i > max_i)
max_i = i
delete starts[dev, loc]
}
next
}
# timestamp
$1 == "time" {
lasttime = $2
}
# print summary
$1 == "tick" {
print ""
if (opt_timestamp)
print lasttime
# find max value
max_v = 0
for (i = 0; i <= max_i; i++) {
if (hist[i] > max_v)
max_v = hist[i]
}
# print histogram
printf "%8s .. %-8s: %-8s |%-38s|\n", ">=(ms)", "<(ms)",
"I/O", "Distribution"
ms = 1
from = 0
for (i = 0; i <= max_i; i++) {
printf "%8d -> %-8d: %-8d |%-38s|\n", from, ms,
hist[i], star(hist[i], max_v, 38)
from = ms
ms *= 2
}
fflush()
delete hist
delete starts # invalid if events missed between snapshots
max_i = 0
}
$0 ~ /LOST.*EVENTS/ { print "WARNING: " $0 > "/dev/stderr" }
'
### end tracing
end
perf-tools-unstable-0.0.1~20150130+git85414b0/iosnoop 0000775 0000000 0000000 00000021630 12542613570 0021461 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# iosnoop - trace block device I/O.
# Written using Linux ftrace.
#
# This traces disk I/O at the block device interface, using the block:
# tracepoints. This can help characterize the I/O requested for the storage
# devices and their resulting performance. I/O completions can also be studied
# event-by-event for debugging disk and controller I/O scheduling issues.
#
# USAGE: ./iosnoop [-hQst] [-d device] [-i iotype] [-p pid] [-n name] [duration]
#
# Run "iosnoop -h" for full usage.
#
# REQUIREMENTS: FTRACE CONFIG, block:block_rq_* tracepoints (you may
# already have these on recent kernels).
#
# OVERHEAD: By default, iosnoop works without buffering, printing I/O events
# as they happen (uses trace_pipe), context switching and consuming CPU to do
# so. This has a limit of about 10,000 IOPS (depending on your platform), at
# which point iosnoop will be consuming 1 CPU. The duration mode uses buffering,
# and can handle much higher IOPS rates, however, the buffer has a limit of
# about 50,000 I/O, after which events will be dropped. You can tune this with
# bufsize_kb, which is per-CPU. Also note that the "-n" option is currently
# post-filtered, so all events are traced.
#
# This was written as a proof of concept for ftrace. It would be better written
# using perf_events (after some capabilities are added), which has a better
# buffering policy, or a tracer such as SystemTap or ktap.
#
# From perf-tools: https://github.com/brendangregg/perf-tools
#
# See the iosnoop(8) man page (in perf-tools) for more info.
#
# COPYRIGHT: Copyright (c) 2014 Brendan Gregg.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
# (http://www.gnu.org/copyleft/gpl.html)
#
# 12-Jul-2014 Brendan Gregg Created this.
### default variables
tracing=/sys/kernel/debug/tracing
flock=/var/tmp/.ftrace-lock
bufsize_kb=4096
opt_duration=0; duration=; opt_name=0; name=; opt_pid=0; pid=; ftext=
opt_start=0; opt_end=0; opt_device=0; device=; opt_iotype=0; iotype=
opt_queue=0
trap ':' INT QUIT TERM PIPE HUP # sends execution to end tracing section
function usage {
cat <<-END >&2
USAGE: iosnoop [-hQst] [-d device] [-i iotype] [-p PID] [-n name]
[duration]
-d device # device string (eg, "202,1)
-i iotype # match type (eg, '*R*' for all reads)
-n name # process name to match on I/O issue
-p PID # PID to match on I/O issue
-Q # use queue insert as start time
-s # include start time of I/O (s)
-t # include completion time of I/O (s)
-h # this usage message
duration # duration seconds, and use buffers
eg,
iosnoop # watch block I/O live (unbuffered)
iosnoop 1 # trace 1 sec (buffered)
iosnoop -Q # include queueing time in LATms
iosnoop -ts # include start and end timestamps
iosnoop -i '*R*' # trace reads
iosnoop -p 91 # show I/O issued when PID 91 is on-CPU
iosnoop -Qp 91 # show I/O queued by PID 91, queue time
See the man page and example file for more info.
END
exit
}
function warn {
if ! eval "$@"; then
echo >&2 "WARNING: command failed \"$@\""
fi
}
function end {
# disable tracing
echo 2>/dev/null
echo "Ending tracing..." 2>/dev/null
cd $tracing
warn "echo 0 > events/block/$b_start/enable"
warn "echo 0 > events/block/block_rq_complete/enable"
if (( opt_device || opt_iotype || opt_pid )); then
warn "echo 0 > events/block/$b_start/filter"
warn "echo 0 > events/block/block_rq_complete/filter"
fi
warn "echo > trace"
(( wroteflock )) && warn "rm $flock"
}
function die {
echo >&2 "$@"
exit 1
}
function edie {
# die with a quiet end()
echo >&2 "$@"
exec >/dev/null 2>&1
end
exit 1
}
### process options
while getopts d:hi:n:p:Qst opt
do
case $opt in
d) opt_device=1; device=$OPTARG ;;
i) opt_iotype=1; iotype=$OPTARG ;;
n) opt_name=1; name=$OPTARG ;;
p) opt_pid=1; pid=$OPTARG ;;
Q) opt_queue=1 ;;
s) opt_start=1 ;;
t) opt_end=1 ;;
h|?) usage ;;
esac
done
shift $(( $OPTIND - 1 ))
if (( $# )); then
opt_duration=1
duration=$1
shift
fi
if (( opt_device )); then
major=${device%,*}
minor=${device#*,}
dev=$(( (major << 20) + minor ))
fi
### option logic
(( opt_pid && opt_name )) && die "ERROR: use either -p or -n."
(( opt_pid )) && ftext=" issued by PID $pid"
(( opt_name )) && ftext=" issued by process name \"$name\""
if (( opt_duration )); then
echo "Tracing block I/O$ftext for $duration seconds (buffered)..."
else
echo "Tracing block I/O$ftext. Ctrl-C to end."
fi
if (( opt_queue )); then
b_start=block_rq_insert
else
b_start=block_rq_issue
fi
### select awk
(( opt_duration )) && use=mawk || use=gawk # workaround for mawk fflush()
[[ -x /usr/bin/$use ]] && awk=$use || awk=awk
wroteflock=1
### check permissions
cd $tracing || die "ERROR: accessing tracing. Root user? Kernel has FTRACE?
debugfs mounted? (mount -t debugfs debugfs /sys/kernel/debug)"
### ftrace lock
[[ -e $flock ]] && die "ERROR: ftrace may be in use by PID $(cat $flock) $flock"
echo $$ > $flock || die "ERROR: unable to write $flock."
### setup and begin tracing
echo nop > current_tracer
warn "echo $bufsize_kb > buffer_size_kb"
filter=
if (( opt_iotype )); then
filter="rwbs ~ \"$iotype\""
fi
if (( opt_device )); then
[[ "$filter" != "" ]] && filter="$filter && "
filter="${filter}dev == $dev"
fi
filter_i=$filter
if (( opt_pid )); then
[[ "$filter_i" != "" ]] && filter_i="$filter_i && "
filter_i="${filter_i}common_pid == $pid"
[[ "$filter" == "" ]] && filter=0
fi
if (( opt_iotype || opt_device || opt_pid )); then
if ! echo "$filter_i" > events/block/$b_start/filter || \
! echo "$filter" > events/block/block_rq_complete/filter
then
edie "ERROR: setting -d or -t filter. Exiting."
fi
fi
if ! echo 1 > events/block/$b_start/enable || \
! echo 1 > events/block/block_rq_complete/enable; then
edie "ERROR: enabling block I/O tracepoints. Exiting."
fi
(( opt_start )) && printf "%-15s " "STARTs"
(( opt_end )) && printf "%-15s " "ENDs"
printf "%-12.12s %-6s %-4s %-8s %-12s %-6s %8s\n" \
"COMM" "PID" "TYPE" "DEV" "BLOCK" "BYTES" "LATms"
#
# Determine output format. It may be one of the following (newest first):
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# TASK-PID CPU# TIMESTAMP FUNCTION
# To differentiate between them, the number of header fields is counted,
# and an offset set, to skip the extra column when needed.
#
offset=$($awk 'BEGIN { o = 0; }
$1 == "#" && $2 ~ /TASK/ && NF == 6 { o = 1; }
$2 ~ /TASK/ { print o; exit }' trace)
### print trace buffer
warn "echo > trace"
( if (( opt_duration )); then
# wait then dump buffer
sleep $duration
cat trace
else
# print buffer live
cat trace_pipe
fi ) | $awk -v o=$offset -v opt_name=$opt_name -v name=$name \
-v opt_duration=$opt_duration -v opt_start=$opt_start -v opt_end=$opt_end \
-v b_start=$b_start '
# common fields
$1 != "#" {
# task name can contain dashes
comm = pid = $1
sub(/-[0-9][0-9]*/, "", comm)
sub(/.*-/, "", pid)
time = $(3+o); sub(":", "", time)
dev = $(5+o)
}
# block I/O request
$1 != "#" && $0 ~ b_start {
if (opt_name && match(comm, name) == 0)
next
#
# example: (fields1..4+o) 202,1 W 0 () 12862264 + 8 [tar]
# The cmd field "()" might contain multiple words (hex),
# hence stepping from the right (NF-3).
#
loc = $(NF-3)
starts[dev, loc] = time
comms[dev, loc] = comm
pids[dev, loc] = pid
next
}
# block I/O completion
$1 != "#" && $0 ~ /rq_complete/ {
#
# example: (fields1..4+o) 202,1 W () 12862256 + 8 [0]
#
dir = $(6+o)
loc = $(NF-3)
nsec = $(NF-1)
if (starts[dev, loc] > 0) {
latency = sprintf("%.2f",
1000 * (time - starts[dev, loc]))
comm = comms[dev, loc]
pid = pids[dev, loc]
if (opt_start)
printf "%-15s ", starts[dev, loc]
if (opt_end)
printf "%-15s ", time
printf "%-12.12s %-6s %-4s %-8s %-12s %-6s %8s\n",
comm, pid, dir, dev, loc, nsec * 512, latency
if (!opt_duration)
fflush()
delete starts[dev, loc]
delete comms[dev, loc]
delete pids[dev, loc]
}
next
}
$0 ~ /LOST.*EVENTS/ { print "WARNING: " $0 > "/dev/stderr" }
'
### end tracing
end
perf-tools-unstable-0.0.1~20150130+git85414b0/kernel/ 0000775 0000000 0000000 00000000000 12542613570 0021323 5 ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/kernel/funccount 0000775 0000000 0000000 00000010534 12542613570 0023260 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# funccount - count kernel function calls matching specified wildcards.
# Uses Linux ftrace.
#
# This is a proof of concept using Linux ftrace capabilities on older kernels,
# and works by using function profiling: in-kernel counters.
#
# USAGE: funccount [-hT] [-i secs] [-d secs] [-t top] funcstring
# eg,
# funccount 'ext3*' # count all ext3* kernel function calls
#
# Run "funccount -h" for full usage.
#
# WARNING: This uses dynamic tracing of kernel functions, and could cause
# kernel panics or freezes. Test, and know what you are doing, before use.
#
# REQUIREMENTS: CONFIG_FUNCTION_PROFILER, awk.
#
# From perf-tools: https://github.com/brendangregg/perf-tools
#
# COPYRIGHT: Copyright (c) 2014 Brendan Gregg.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
# (http://www.gnu.org/copyleft/gpl.html)
#
# 12-Jul-2014 Brendan Gregg Created this.
### default variables
tracing=/sys/kernel/debug/tracing
opt_duration=0; duration=; opt_interval=0; interval=999999; opt_timestamp=0
opt_tail=0; tcmd=cat; ttext=
trap 'quit=1' INT QUIT TERM PIPE HUP # sends execution to end tracing section
function usage {
cat <<-END >&2
USAGE: funccount [-hT] [-i secs] [-d secs] [-t top] funcstring
-d seconds # total duration of trace
-h # this usage message
-i seconds # interval summary
-t top # show top num entries only
-T # include timestamp (for -i)
eg,
funccount 'vfs*' # trace all funcs that match "vfs*"
funccount -d 5 'tcp*' # trace "tcp*" funcs for 5 seconds
funccount -t 10 'ext3*' # show top 10 "ext3*" funcs
funccount -i 1 'ext3*' # summary every 1 second
funccount -i 1 -d 5 'ext3*' # 5 x 1 second summaries
See the man page and example file for more info.
END
exit
}
function warn {
if ! eval "$@"; then
echo >&2 "WARNING: command failed \"$@\""
fi
}
function die {
echo >&2 "$@"
exit 1
}
### process options
while getopts d:hi:t:T opt
do
case $opt in
d) opt_duration=1; duration=$OPTARG ;;
i) opt_interval=1; interval=$OPTARG ;;
t) opt_tail=1; tnum=$OPTARG ;;
T) opt_timestamp=1 ;;
h|?) usage ;;
esac
done
shift $(( $OPTIND - 1 ))
### option logic
(( $# == 0 )) && usage
funcs="$1"
if (( opt_tail )); then
tcmd="tail -$tnum"
ttext=" Top $tnum only."
fi
if (( opt_duration )); then
echo "Tracing \"$funcs\" for $duration seconds.$ttext.."
else
echo "Tracing \"$funcs\".$ttext.. Ctrl-C to end."
fi
(( opt_duration && !opt_interval )) && interval=$duration
### check permissions
cd $tracing || die "ERROR: accessing tracing. Root user? Kernel has FTRACE?
debugfs mounted? (mount -t debugfs debugfs /sys/kernel/debug)"
### enable tracing
sysctl -q kernel.ftrace_enabled=1 # doesn't set exit status
echo "$funcs" > set_ftrace_filter || die "ERROR: enabling \"$funcs\". Exiting."
warn "echo nop > current_tracer"
if ! echo 1 > function_profile_enabled; then
echo > set_ftrace_filter
die "ERROR: enabling function profiling."\
"Have CONFIG_FUNCTION_PROFILER? Exiting."
fi
### summarize
quit=0; secs=0
while (( !quit && (!opt_duration || secs < duration) )); do
(( secs += interval ))
echo 0 > function_profile_enabled
echo 1 > function_profile_enabled
sleep $interval
echo
(( opt_timestamp )) && date
printf "%-30s %8s\n" "FUNC" "COUNT"
cat trace_stat/function* | awk '
# skip headers by matching on the numeric hit column
$2 ~ /[0-9]/ { a[$1] += $2 }
END {
for (k in a) {
printf "%-30s %8d\n", k, a[k]
}
}' | sort -n -k2 | $tcmd
done
### end tracing
echo 2>/dev/null
echo "Ending tracing..." 2>/dev/null
warn "echo 0 > function_profile_enabled"
warn "echo > set_ftrace_filter"
perf-tools-unstable-0.0.1~20150130+git85414b0/kernel/funcgraph 0000775 0000000 0000000 00000020474 12542613570 0023235 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# funcgraph - trace kernel function graph, showing child function calls.
# Uses Linux ftrace.
#
# This is an exploratory tool that shows the graph of child function calls
# for a given kernel function. This can cost moderate overhead to execute, and
# should only be used to understand kernel behavior for a given function before
# using other, lower overhead tools. This is a proof of concept using Linux
# ftrace capabilities on older kernels.
#
# USAGE: funcgraph [-aCDhHPtT] [-m maxdepth] [-p PID] [-d secs] funcstring
#
# Run "funcgraph -h" for full usage.
#
# The output format is the same as the ftrace function graph trace format,
# described in the kernel source under Documentation/trace/ftrace.txt.
# Note that the output may be shuffled when different CPU buffers are read;
# check the CPU column for changes, or include timestamps (-t) and post sort.
#
# The "-d duration" mode leaves the trace data in the kernel buffer, and
# only reads it at the end. If the trace data is large, beware of exhausting
# buffer space (/sys/kernel/debug/tracing/buffer_size_kb) and losing data.
#
# Also beware of feedback loops: tracing tcp* functions over an ssh session,
# or writing ext4* functions to an ext4 file system. For the former, tcp
# trace data could be redirected to a file (as in the usage message). For
# the latter, trace to the screen or a different file system.
#
# WARNING: This uses dynamic tracing of kernel functions, and could cause
# kernel panics or freezes. Test, and know what you are doing, before use.
#
# OVERHEADS: This tool causes moderate to high overheads. Use with caution for
# exploratory purposes, then switch to lower overhead techniques based on
# findings. It's expected that the kernel will run at least 50% slower while
# this tool is running -- even while no output is being generated. This is
# because ALL kernel functions are traced, and filtered based on the function
# of interest. When output is generated, it can generate many lines quickly
# depending on the traced event. Such data will cause performance overheads.
# This also works without buffering by default, printing function events
# as they happen (uses trace_pipe), context switching and consuming CPU to do
# so. If needed, you can try the "-d secs" option, which buffers events
# instead, reducing overhead. If you think the buffer option is losing events,
# try increasing the buffer size (buffer_size_kb).
#
# From perf-tools: https://github.com/brendangregg/perf-tools
#
# COPYRIGHT: Copyright (c) 2014 Brendan Gregg.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
# (http://www.gnu.org/copyleft/gpl.html)
#
# 12-Jul-2014 Brendan Gregg Created this.
### default variables
tracing=/sys/kernel/debug/tracing
flock=/var/tmp/.ftrace-lock
opt_duration=0; duration=; opt_pid=0; pid=; pidtext=; opt_headers=0
opt_proc=0; opt_time=0; opt_tail=0; opt_nodur=0; opt_cpu=0
opt_max=0; max=0
trap ':' INT QUIT TERM PIPE HUP # sends execution to end tracing section
function usage {
cat <<-END >&2
USAGE: funcgraph [-aCDhHPtT] [-m maxdepth] [-p PID] [-d secs] funcstring
-a # all info (same as -HPt)
-C # measure on-CPU time only
-d seconds # trace duration, and use buffers
-D # do not show function duration
-h # this usage message
-H # include column headers
-m maxdepth # max stack depth to show
-p PID # trace when this pid is on-CPU
-P # show process names & PIDs
-t # show timestamps
-T # comment function tails
eg,
funcgraph do_nanosleep # trace do_nanosleep() and children
funcgraph -m 3 do_sys_open # trace do_sys_open() to 3 levels only
funcgraph -a do_sys_open # include timestamps and process name
funcgraph -p 198 do_sys_open # trace vfs_read() for PID 198 only
funcgraph -d 1 do_sys_open >out # trace 1 sec, then write to file
See the man page and example file for more info.
END
exit
}
function warn {
if ! eval "$@"; then
echo >&2 "WARNING: command failed \"$@\""
fi
}
function end {
# disable tracing
echo 2>/dev/null
echo "Ending tracing..." 2>/dev/null
cd $tracing
(( opt_time )) && warn "echo nofuncgraph-abstime > trace_options"
(( opt_proc )) && warn "echo nofuncgraph-proc > trace_options"
(( opt_tail )) && warn "echo nofuncgraph-tail > trace_options"
(( opt_nodur )) && warn "echo funcgraph-duration > trace_options"
(( opt_cpu )) && warn "echo sleep-time > trace_options"
warn "echo nop > current_tracer"
(( opt_pid )) && warn "echo > set_ftrace_pid"
(( opt_max )) && warn "echo 0 > max_graph_depth"
warn "echo > set_graph_function"
warn "echo > trace"
(( wroteflock )) && warn "rm $flock"
}
function die {
echo >&2 "$@"
exit 1
}
function edie {
# die with a quiet end()
echo >&2 "$@"
exec >/dev/null 2>&1
end
exit 1
}
### process options
while getopts aCd:DhHm:p:PtT opt
do
case $opt in
a) opt_headers=1; opt_proc=1; opt_time=1 ;;
C) opt_cpu=1; ;;
d) opt_duration=1; duration=$OPTARG ;;
D) opt_nodur=1; ;;
m) opt_max=1; max=$OPTARG ;;
p) opt_pid=1; pid=$OPTARG ;;
H) opt_headers=1; ;;
P) opt_proc=1; ;;
t) opt_time=1; ;;
T) opt_tail=1; ;;
h|?) usage ;;
esac
done
shift $(( $OPTIND - 1 ))
### option logic
(( $# == 0 )) && usage
funcs="$1"
(( opt_pid )) && pidtext=" for PID $pid"
if (( opt_duration )); then
echo "Tracing \"$funcs\"$pidtext for $duration seconds..."
else
echo "Tracing \"$funcs\"$pidtext... Ctrl-C to end."
fi
### check permissions
cd $tracing || die "ERROR: accessing tracing. Root user? Kernel has FTRACE?
debugfs mounted? (mount -t debugfs debugfs /sys/kernel/debug)"
### ftrace lock
[[ -e $flock ]] && die "ERROR: ftrace may be in use by PID $(cat $flock) $flock"
echo $$ > $flock || die "ERROR: unable to write $flock."
wroteflock=1
### setup and commence tracing
sysctl -q kernel.ftrace_enabled=1 # doesn't set exit status
read mode < current_tracer
[[ "$mode" != "nop" ]] && edie "ERROR: ftrace active (current_tracer=$mode)"
if (( opt_max )); then
if ! echo $max > max_graph_depth; then
edie "ERROR: setting -m $max. Older kernel version? Exiting."
fi
fi
if (( opt_pid )); then
if ! echo $pid > set_ftrace_pid; then
edie "ERROR: setting -p $pid (PID exist?). Exiting."
fi
fi
if ! echo > set_ftrace_filter; then
edie "ERROR: writing to set_ftrace_filter. Exiting."
fi
if ! echo "$funcs" > set_graph_function; then
edie "ERROR: enabling \"$funcs\". Exiting."
fi
if ! echo function_graph > current_tracer; then
edie "ERROR: setting current_tracer to \"function\". Exiting."
fi
if (( opt_cpu )); then
if ! echo nosleep-time > trace_options; then
edie "ERROR: setting -C (nosleep-time). Exiting."
fi
fi
# the following must be done after setting current_tracer
if (( opt_time )); then
if ! echo funcgraph-abstime > trace_options; then
edie "ERROR: setting -t (funcgraph-abstime). Exiting."
fi
fi
if (( opt_proc )); then
if ! echo funcgraph-proc > trace_options; then
edie "ERROR: setting -P (funcgraph-proc). Exiting."
fi
fi
if (( opt_tail )); then
if ! echo funcgraph-tail > trace_options; then
edie "ERROR: setting -T (funcgraph-tail). Old kernel? Exiting."
fi
fi
if (( opt_nodur )); then
if ! echo nofuncgraph-duration > trace_options; then
edie "ERROR: setting -D (nofuncgraph-duration). Exiting."
fi
fi
### print trace buffer
warn "echo > trace"
if (( opt_duration )); then
sleep $duration
if (( opt_headers )); then
cat trace
else
grep -v '^#' trace
fi
else
# trace_pipe lack headers, so fetch them from trace
(( opt_headers )) && cat trace
cat trace_pipe
fi
### end tracing
end
perf-tools-unstable-0.0.1~20150130+git85414b0/kernel/funcslower 0000775 0000000 0000000 00000015571 12542613570 0023451 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# funcslower - trace kernel functions slower than a threshold (microseconds).
# Uses Linux ftrace.
#
# This uses the Linux ftrace function graph profiler to time kernel functions
# and filter them based on a latency threshold. This is a proof of concept using
# Linux ftrace capabilities on older kernels.
#
# USAGE: funcslower [-aChHPt] [-p PID] [-d secs] funcstring latency_us
#
# Run "funcslower -h" for full usage.
#
# REQUIREMENTS: FTRACE function graph, which you may already have available
# and enabled in recent kernels. And awk.
#
# The output format is the same as the ftrace function graph trace format,
# described in the kernel source under Documentation/trace/ftrace.txt.
# Note that the output may be shuffled when different CPU buffers are read;
# check the CPU column for changes, or include timestamps (-t) and post sort.
#
# WARNING: This uses dynamic tracing of kernel functions, and could cause
# kernel panics or freezes. Test, and know what you are doing, before use.
#
# OVERHEADS: Timing and filtering is performed in-kernel context, costing
# lower overheads than post-processing in user space. If you trace frequent
# events (eg, pick a common function and a low threshold), you might want to
# try the "-d secs" option, which buffers events in-kernel instead of printing
# them live.
#
# From perf-tools: https://github.com/brendangregg/perf-tools
#
# COPYRIGHT: Copyright (c) 2014 Brendan Gregg.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
# (http://www.gnu.org/copyleft/gpl.html)
#
# 12-Jul-2014 Brendan Gregg Created this.
### default variables
tracing=/sys/kernel/debug/tracing
flock=/var/tmp/.ftrace-lock
opt_duration=0; duration=; opt_pid=0; pid=; pidtext=; opt_headers=0
opt_proc=0; opt_time=0; opt_cpu=0
trap ':' INT QUIT TERM PIPE HUP # sends execution to end tracing section
function usage {
cat <<-END >&2
USAGE: funcslower [-aChHPt] [-p PID] [-d secs] funcstring latency_us
-a # all info (same as -HPt)
-C # measure on-CPU time only
-d seconds # trace duration, and use buffers
-h # this usage message
-H # include column headers
-p PID # trace when this pid is on-CPU
-P # show process names & PIDs
-t # show timestamps
eg,
funcslower vfs_read 10000 # trace vfs_read() slower than 10 ms
See the man page and example file for more info.
END
exit
}
function warn {
if ! eval "$@"; then
echo >&2 "WARNING: command failed \"$@\""
fi
}
function end {
# disable tracing
echo 2>/dev/null
echo "Ending tracing..." 2>/dev/null
cd $tracing
(( opt_time )) && warn "echo nofuncgraph-abstime > trace_options"
(( opt_proc )) && warn "echo nofuncgraph-proc > trace_options"
(( opt_cpu )) && warn "echo sleep-time > trace_options"
warn "echo nop > current_tracer"
(( opt_pid )) && warn "echo > set_ftrace_pid"
warn "echo > set_ftrace_filter"
warn "echo > set_graph_function"
warn "echo 0 > tracing_thresh"
warn "echo > trace"
(( wroteflock )) && warn "rm $flock"
}
function die {
echo >&2 "$@"
exit 1
}
function edie {
# die with a quiet end()
echo >&2 "$@"
exec >/dev/null 2>&1
end
exit 1
}
### process options
while getopts aCd:hHp:Pt opt
do
case $opt in
a) opt_headers=1; opt_proc=1; opt_time=1 ;;
C) opt_cpu=1; ;;
d) opt_duration=1; duration=$OPTARG ;;
p) opt_pid=1; pid=$OPTARG ;;
H) opt_headers=1; ;;
P) opt_proc=1; ;;
t) opt_time=1; ;;
h|?) usage ;;
esac
done
shift $(( $OPTIND - 1 ))
### option logic
(( $# < 2 )) && usage
funcs="$1"
shift
thresh=$1
(( opt_pid )) && pidtext=" for PID $pid"
printf "Tracing \"$funcs\"$pidtext slower than $thresh us"
if (( opt_duration )); then
echo " for $duration seconds..."
else
echo "... Ctrl-C to end."
fi
## select awk
if (( opt_duration )); then
[[ -x /usr/bin/mawk ]] && awk=mawk || awk=awk
else
# workarounds for mawk/gawk fflush behavior
if [[ -x /usr/bin/gawk ]]; then
awk=gawk
elif [[ -x /usr/bin/mawk ]]; then
awk="mawk -W interactive"
else
awk=awk
fi
fi
### check permissions
cd $tracing || die "ERROR: accessing tracing. Root user? Kernel has FTRACE?
debugfs mounted? (mount -t debugfs debugfs /sys/kernel/debug)"
### ftrace lock
[[ -e $flock ]] && die "ERROR: ftrace may be in use by PID $(cat $flock) $flock"
echo $$ > $flock || die "ERROR: unable to write $flock."
wroteflock=1
### setup and commence tracing
sysctl -q kernel.ftrace_enabled=1 # doesn't set exit status
read mode < current_tracer
[[ "$mode" != "nop" ]] && edie "ERROR: ftrace active (current_tracer=$mode)"
if ! echo $thresh > tracing_thresh; then
edie "ERROR: setting tracing_thresh to $thresh. Exiting."
fi
if (( opt_pid )); then
if ! echo $pid > set_ftrace_pid; then
edie "ERROR: setting -p $pid (PID exist?). Exiting."
fi
fi
if ! echo "$funcs" > set_ftrace_filter; then
edie "ERROR: enabling \"$funcs\" filter. Function exist? Exiting."
fi
if ! echo "$funcs" > set_graph_function; then
edie "ERROR: enabling \"$funcs\" graph. Exiting."
fi
if ! echo function_graph > current_tracer; then
edie "ERROR: setting current_tracer to \"function_graph\". Exiting."
fi
if (( opt_cpu )); then
if ! echo nosleep-time > trace_options; then
edie "ERROR: setting -C (nosleep-time). Exiting."
fi
fi
# the following must be done after setting current_tracer
if (( opt_time )); then
if ! echo funcgraph-abstime > trace_options; then
edie "ERROR: setting -t (funcgraph-abstime). Exiting."
fi
fi
if (( opt_proc )); then
if ! echo funcgraph-proc > trace_options; then
edie "ERROR: setting -P (funcgraph-proc). Exiting."
fi
fi
### setup output filter
cat=cat
if (( opt_proc )); then
# remove proc change entries, since PID is included. example:
# ------------------------------------------
# 0) supervi-1699 => supervi-1693
# ------------------------------------------
#
cat=$awk' "/(^ ---|^$)/ || \$3 == \"=>\" { next } { print \$0 }"'
fi
### print trace buffer
warn "echo > trace"
if (( opt_duration )); then
sleep $duration
if (( opt_headers )); then
$cat trace
else
$cat trace | grep -v '^#'
fi
else
# trace_pipe lack headers, so fetch them from trace
(( opt_headers )) && cat trace
eval $cat trace_pipe
fi
### end tracing
end
perf-tools-unstable-0.0.1~20150130+git85414b0/kernel/functrace 0000775 0000000 0000000 00000013067 12542613570 0023232 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# functrace - trace kernel function calls matching specified wildcards.
# Uses Linux ftrace.
#
# This is a proof of concept using Linux ftrace capabilities on older kernels.
#
# USAGE: functrace [-hH] [-p PID] [-d secs] funcstring
# eg,
# functrace '*sleep' # trace all functions ending in "sleep"
#
# Run "functrace -h" for full usage.
#
# The output format is the same as the ftrace function trace format, described
# in the kernel source under Documentation/trace/ftrace.txt.
#
# The "-d duration" mode leaves the trace data in the kernel buffer, and
# only reads it at the end. If the trace data is large, beware of exhausting
# buffer space (/sys/kernel/debug/tracing/buffer_size_kb) and losing data.
#
# Also beware of feedback loops: tracing tcp* functions over an ssh session,
# or writing ext4* functions to an ext4 file system. For the former, tcp
# trace data could be redirected to a file (as in the usage message). For
# the latter, trace to the screen or a different file system.
#
# WARNING: This uses dynamic tracing of kernel functions, and could cause
# kernel panics or freezes. Test, and know what you are doing, before use.
#
# OVERHEADS: This can generate a lot of trace data quickly, depending on the
# frequency of the traced events. Such data will cause performance overheads.
# This also works without buffering by default, printing function events
# as they happen (uses trace_pipe), context switching and consuming CPU to do
# so. If needed, you can try the "-d secs" option, which buffers events
# instead, reducing overhead. If you think the buffer option is losing events,
# try increasing the buffer size (buffer_size_kb).
#
# From perf-tools: https://github.com/brendangregg/perf-tools
#
# COPYRIGHT: Copyright (c) 2014 Brendan Gregg.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
# (http://www.gnu.org/copyleft/gpl.html)
#
# 12-Jul-2014 Brendan Gregg Created this.
### default variables
tracing=/sys/kernel/debug/tracing
flock=/var/tmp/.ftrace-lock
opt_duration=0; duration=; opt_pid=0; pid=; pidtext=; opt_headers=0
trap ':' INT QUIT TERM PIPE HUP # sends execution to end tracing section
function usage {
cat <<-END >&2
USAGE: functrace [-hH] [-p PID] [-d secs] funcstring
-d seconds # trace duration, and use buffers
-h # this usage message
-H # include column headers
-p PID # trace when this pid is on-CPU
eg,
functrace do_nanosleep # trace the do_nanosleep() function
functrace '*sleep' # trace functions ending in "sleep"
functrace -p 198 'vfs*' # trace "vfs*" funcs for PID 198
functrace 'tcp*' > out # trace all "tcp*" funcs to out file
functrace -d 1 'tcp*' > out # trace 1 sec, then write out file
See the man page and example file for more info.
END
exit
}
function warn {
if ! eval "$@"; then
echo >&2 "WARNING: command failed \"$@\""
fi
}
function end {
# disable tracing
echo 2>/dev/null
echo "Ending tracing..." 2>/dev/null
cd $tracing
warn "echo nop > current_tracer"
(( opt_pid )) && warn "echo > set_ftrace_pid"
warn "echo > set_ftrace_filter"
warn "echo > trace"
(( wroteflock )) && warn "rm $flock"
}
function die {
echo >&2 "$@"
exit 1
}
function edie {
# die with a quiet end()
echo >&2 "$@"
exec >/dev/null 2>&1
end
exit 1
}
### process options
while getopts d:hHp: opt
do
case $opt in
d) opt_duration=1; duration=$OPTARG ;;
p) opt_pid=1; pid=$OPTARG ;;
H) opt_headers=1; ;;
h|?) usage ;;
esac
done
shift $(( $OPTIND - 1 ))
### option logic
(( $# == 0 )) && usage
funcs="$1"
(( opt_pid )) && pidtext=" for PID $pid"
if (( opt_duration )); then
echo "Tracing \"$funcs\"$pidtext for $duration seconds..."
else
echo "Tracing \"$funcs\"$pidtext... Ctrl-C to end."
fi
### check permissions
cd $tracing || die "ERROR: accessing tracing. Root user? Kernel has FTRACE?
debugfs mounted? (mount -t debugfs debugfs /sys/kernel/debug)"
### ftrace lock
[[ -e $flock ]] && die "ERROR: ftrace may be in use by PID $(cat $flock) $flock"
echo $$ > $flock || die "ERROR: unable to write $flock."
wroteflock=1
### setup and commence tracing
sysctl -q kernel.ftrace_enabled=1 # doesn't set exit status
read mode < current_tracer
[[ "$mode" != "nop" ]] && edie "ERROR: ftrace active (current_tracer=$mode)"
if (( opt_pid )); then
if ! echo $pid > set_ftrace_pid; then
edie "ERROR: setting -p $pid (PID exist?). Exiting."
fi
fi
if ! echo "$funcs" > set_ftrace_filter; then
edie "ERROR: enabling \"$funcs\". Exiting."
fi
if ! echo function > current_tracer; then
edie "ERROR: setting current_tracer to \"function\". Exiting."
fi
### print trace buffer
warn "echo > trace"
if (( opt_duration )); then
sleep $duration
if (( opt_headers )); then
cat trace
else
grep -v '^#' trace
fi
else
# trace_pipe lack headers, so fetch them from trace
(( opt_headers )) && cat trace
cat trace_pipe
fi
### end tracing
end
perf-tools-unstable-0.0.1~20150130+git85414b0/kernel/kprobe 0000775 0000000 0000000 00000016502 12542613570 0022537 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# kprobe - trace a given kprobe definition. Kernel dynamic tracing.
# Written using Linux ftrace.
#
# This will create, trace, then destroy a given kprobe definition. See
# Documentation/trace/kprobetrace.txt in the Linux kernel source for the
# syntax of a kprobe definition, and "kprobe -h" for examples. With this tool,
# the probe alias is optional (it will become to kprobe: if not
# specified).
#
# USAGE: ./kprobe [-FhHsv] [-d secs] [-p pid] kprobe_definition [filter]
#
# Run "kprobe -h" for full usage.
#
# I wrote this because I kept testing different custom kprobes at the command
# line, and wanted a way to automate the steps.
#
# WARNING: This uses dynamic tracing of kernel functions, and could cause
# kernel panics or freezes, depending on the function traced. Test in a lab
# environment, and know what you are doing, before use.
#
# REQUIREMENTS: FTRACE and KPROBE CONFIG, which you may already have on recent
# kernel versions.
#
# From perf-tools: https://github.com/brendangregg/perf-tools
#
# See the kprobe(8) man page (in perf-tools) for more info.
#
# COPYRIGHT: Copyright (c) 2014 Brendan Gregg.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
# (http://www.gnu.org/copyleft/gpl.html)
#
# 22-Jul-2014 Brendan Gregg Created this.
### default variables
tracing=/sys/kernel/debug/tracing
flock=/var/tmp/.ftrace-lock; wroteflock=0
opt_duration=0; duration=; opt_pid=0; pid=; opt_filter=0; filter=
opt_view=0; opt_headers=0; opt_stack=0; dmesg=2; debug=0; opt_force=0
trap ':' INT QUIT TERM PIPE HUP # sends execution to end tracing section
function usage {
cat <<-END >&2
USAGE: kprobe [-FhHsv] [-d secs] [-p PID] kprobe_definition [filter]
-F # force. trace despite warnings.
-d seconds # trace duration, and use buffers
-p PID # PID to match on I/O issue
-v # view format file (don't trace)
-H # include column headers
-s # show kernel stack traces
-h # this usage message
Note that these examples may need modification to match your kernel
version's function names and platform's register usage.
eg,
kprobe p:do_sys_open
# trace open() entry
kprobe r:do_sys_open
# trace open() return
kprobe 'r:do_sys_open \$retval'
# trace open() return value
kprobe 'r:myopen do_sys_open \$retval'
# use a custom probe name
kprobe 'p:myopen do_sys_open mode=%cx:u16'
# trace open() file mode
kprobe 'p:myopen do_sys_open filename=+0(%si):string'
# trace open() with filename
kprobe -s 'p:myprobe tcp_retransmit_skb'
# show kernel stacks
See the man page and example file for more info.
END
exit
}
function warn {
if ! eval "$@"; then
echo >&2 "WARNING: command failed \"$@\""
fi
}
function end {
# disable tracing
echo 2>/dev/null
echo "Ending tracing..." 2>/dev/null
cd $tracing
warn "echo 0 > events/kprobes/$kname/enable"
if (( opt_filter )); then
warn "echo 0 > events/kprobes/$kname/filter"
fi
warn "echo -:$kname >> kprobe_events"
(( opt_stack )) && warn "echo 0 > options/stacktrace"
warn "echo > trace"
(( wroteflock )) && warn "rm $flock"
}
function die {
echo >&2 "$@"
exit 1
}
function edie {
# die with a quiet end()
echo >&2 "$@"
exec >/dev/null 2>&1
end
exit 1
}
### process options
while getopts Fd:hHp:sv opt
do
case $opt in
F) opt_force=1 ;;
d) opt_duration=1; duration=$OPTARG ;;
p) opt_pid=1; pid=$OPTARG ;;
H) opt_headers=1 ;;
s) opt_stack=1 ;;
v) opt_view=1 ;;
h|?) usage ;;
esac
done
shift $(( $OPTIND - 1 ))
(( $# )) || usage
kprobe=$1
shift
if (( $# )); then
opt_filter=1
filter=$1
fi
### option logic
(( opt_pid && opt_filter )) && die "ERROR: use either -p or -f."
(( opt_duration && opt_view )) && die "ERROR: use either -d or -v."
if (( opt_pid )); then
# convert to filter
opt_filter=1
filter="common_pid == $pid"
fi
if [[ "$kprobe" != p:* && "$kprobe" != r:* ]]; then
echo >&2 "ERROR: invalid kprobe definition (should start with p: or r:)"
usage
fi
#
# parse the following:
# r:do_sys_open
# r:my_sys_open do_sys_open
# r:do_sys_open %ax
# r:do_sys_open $retval %ax
# r:my_sys_open do_sys_open $retval %ax
# r:do_sys_open rval=$retval
# r:my_sys_open do_sys_open rval=$retval
# r:my_sys_open do_sys_open rval=$retval %ax
# ... and examples from USAGE message
#
krest=${kprobe#*:}
kname=${krest%% *}
set -- $krest
if [[ $2 == "" || $2 == *[=%\$]* ]]; then
# if probe name unspecified, default to function name
ktype=${kprobe%%:*}
kprobe="$ktype:$kname $krest"
fi
if (( debug )); then
echo "kname: $kname, kprobe: $kprobe"
fi
### check permissions
cd $tracing || die "ERROR: accessing tracing. Root user? Kernel has FTRACE?
debugfs mounted? (mount -t debugfs debugfs /sys/kernel/debug)"
## check function
set -- $kprobe
fname=$2
if (( !opt_force )) && ! grep -w $fname available_filter_functions 2>/dev/null
then
echo >&2 "ERROR: func $fname not in $PWD/available_filter_functions."
printf >&2 "Either it doesn't exist, or, it might be unsafe to kprobe. "
echo >&2 "Exiting. Use -F to override."
exit 1
fi
if (( !opt_view )); then
if (( opt_duration )); then
echo "Tracing kprobe $kname for $duration seconds (buffered)..."
else
echo "Tracing kprobe $kname. Ctrl-C to end."
fi
fi
### ftrace lock
[[ -e $flock ]] && die "ERROR: ftrace may be in use by PID $(cat $flock) $flock"
echo $$ > $flock || die "ERROR: unable to write $flock."
wroteflock=1
### setup and begin tracing
echo nop > current_tracer
if ! echo "$kprobe" >> kprobe_events; then
echo >&2 "ERROR: adding kprobe \"$kprobe\"."
if (( dmesg )); then
echo >&2 "Last $dmesg dmesg entries (might contain reason):"
dmesg | tail -$dmesg | sed 's/^/ /'
fi
edie "Exiting."
fi
if (( opt_view )); then
cat events/kprobes/$kname/format
edie ""
fi
if (( opt_filter )); then
if ! echo "$filter" > events/kprobes/$kname/filter; then
edie "ERROR: setting filter or -p. Exiting."
fi
fi
if (( opt_stack )); then
if ! echo 1 > options/stacktrace; then
edie "ERROR: enabling stack traces (-s). Exiting"
fi
fi
if ! echo 1 > events/kprobes/$kname/enable; then
edie "ERROR: enabling kprobe $kname. Exiting."
fi
### print trace buffer
warn "echo > trace"
if (( opt_duration )); then
sleep $duration
if (( opt_headers )); then
cat trace
else
grep -v '^#' trace
fi
else
# trace_pipe lack headers, so fetch them from trace
(( opt_headers )) && cat trace
cat trace_pipe
fi
### end tracing
end
perf-tools-unstable-0.0.1~20150130+git85414b0/killsnoop 0000775 0000000 0000000 00000017563 12542613570 0022017 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# killsnoop - trace kill() syscalls with signal/process details.
# Written using Linux ftrace.
#
# This traces kill() syscalls, showing which process killed which pid and
# returns the returncode (0 for success, -1 for error).
#
# This implementation is designed to work on older kernel versions, and without
# kernel debuginfo. It works by dynamic tracing of the return value of kill()
# and associating it with the previous kill() syscall return.
# This approach is kernel version specific, and may not work on your version.
# It is a workaround, and proof of concept for ftrace, until more kernel tracing
# functionality is available.
#
# USAGE: ./killsnoop [-hst] [-d secs] [-p pid] [-n name]
#
# Run "killsnoop -h" for full usage.
#
# REQUIREMENTS: FTRACE and KPROBE CONFIG, syscalls:sys_enter_kill and
# syscalls:sys_exit_kill kernel tracepoints (you may already have these
# on recent kernels) and awk.
#
# From perf-tools: https://github.com/brendangregg/perf-tools
#
# See the killsnoop(8) man page (in perf-tools) for more info.
#
# COPYRIGHT: Copyright (c) 2014 Brendan Gregg.
# COPYRIGHT: Copyright (c) 2014 Martin Probst.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
# (http://www.gnu.org/copyleft/gpl.html)
#
# 20-Jul-2014 Brendan Gregg Templated this.
# 13-Sep-2014 Martin Probst Created this.
### default variables
tracing=/sys/kernel/debug/tracing
flock=/var/tmp/.ftrace-lock; wroteflock=0
opt_duration=0; duration=; opt_name=0; name=; opt_pid=0; pid=; ftext=
opt_time=0; opt_fail=0; opt_file=0; file=
kevent_entry=events/syscalls/sys_enter_kill
kevent_return=events/syscalls/sys_exit_kill
trap ':' INT QUIT TERM PIPE HUP # sends execution to end tracing section
function usage {
cat <<-END >&2
USAGE: killsnoop [-hst] [-d secs] [-p PID] [-n name] [filename]
-d seconds # trace duration, and use buffers
-n name # process name to match
-p PID # PID to match on kill issue
-t # include time (seconds)
-s # human readable signal names
-h # this usage message
eg,
killsnoop # watch kill()s live (unbuffered)
killsnoop -d 1 # trace 1 sec (buffered)
killsnoop -p 181 # trace kill()s issued to PID 181 only
See the man page and example file for more info.
END
exit
}
function warn {
if ! eval "$@"; then
echo >&2 "WARNING: command failed \"$@\""
fi
}
function end {
# disable tracing
echo 2>/dev/null
echo "Ending tracing..." 2>/dev/null
cd $tracing
warn "echo 0 > $kevent_entry/enable"
warn "echo 0 > $kevent_return/enable"
warn "echo > trace"
(( wroteflock )) && warn "rm $flock"
}
function die {
echo >&2 "$@"
exit 1
}
function edie {
# die with a quiet end()
echo >&2 "$@"
exec >/dev/null 2>&1
end
exit 1
}
### process options
while getopts d:hn:p:st opt
do
case $opt in
d) opt_duration=1; duration=$OPTARG ;;
n) opt_name=1; name=$OPTARG ;;
p) opt_pid=1; pid=$OPTARG ;;
t) opt_time=1 ;;
s) opt_fancy=1 ;;
h|?) usage ;;
esac
done
shift $(( $OPTIND - 1 ))
(( $# )) && usage
### option logic
(( opt_pid && opt_name )) && die "ERROR: use either -p or -n."
(( opt_pid )) && ftext=" issued to PID $pid"
(( opt_name )) && ftext=" issued by process name \"$name\""
if (( opt_duration )); then
echo "Tracing kill()s$ftext for $duration seconds (buffered)..."
else
echo "Tracing kill()s$ftext. Ctrl-C to end."
fi
### select awk
# workaround for mawk fflush()
[[ -x /usr/bin/mawk ]] && awk="mawk" && mawk -W interactive && \
[ $? -eq 0 ] && awk="mawk -W interactive"
# workaround for gawk strtonum()
[[ -x /usr/bin/gawk ]] && awk="gawk --non-decimal-data"
### check permissions
cd $tracing || die "ERROR: accessing tracing. Root user? Kernel has FTRACE?
debugfs mounted? (mount -t debugfs debugfs /sys/kernel/debug)"
### ftrace lock
[[ -e $flock ]] && die "ERROR: ftrace may be in use by PID $(cat $flock) $flock"
echo $$ > $flock || die "ERROR: unable to write $flock."
wroteflock=1
### setup and begin tracing
echo nop > current_tracer
if ! echo 1 > $kevent_entry/enable; then
edie "ERROR: enabling kill() entry tracepoint Exiting."
fi
if ! echo 1 > $kevent_return/enable; then
edie "ERROR: enabling kill() return tracepoint. Exiting."
fi
(( opt_time )) && printf "%-16s " "TIMEs"
printf "%-16.16s %-6s %-8s %-10s %4s\n" "COMM" "PID" "TPID" "SIGNAL" "RETURN"
#
# Determine output format. It may be one of the following (newest first):
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# TASK-PID CPU# TIMESTAMP FUNCTION
# To differentiate between them, the number of header fields is counted,
# and an offset set, to skip the extra column when needed.
#
offset=$($awk 'BEGIN { o = 0; }
$1 == "#" && $2 ~ /TASK/ && NF == 6 { o = 1; }
$2 ~ /TASK/ { print o; exit }' trace)
### print trace buffer
warn "echo > trace"
( if (( opt_duration )); then
# wait then dump buffer
sleep $duration
cat trace
else
# print buffer live
cat trace_pipe
fi ) | $awk -v o=$offset -v opt_name=$opt_name -v name=$name \
-v opt_duration=$opt_duration -v opt_time=$opt_time \
-v opt_pid=$pid -v opt_fancy=$opt_fancy '
# fancy signal names
BEGIN {
signals[1] = "SIGHUP"
signals[2] = "SIGINT"
signals[3] = "SIGQUIT"
signals[4] = "SIGILL"
signals[6] = "SIGABRT"
signals[8] = "SIGFPE"
signals[9] = "SIGKILL"
signals[11] = "SIGSEGV"
signals[13] = "SIGPIPE"
signals[14] = "SIGALRM"
signals[15] = "SIGTERM"
signals[10] = "SIGUSR1"
signals[12] = "SIGUSR2"
signals[17] = "SIGCHLD"
signals[18] = "SIGCONT"
signals[19] = "SIGSTOP"
signals[20] = "SIGTSTP"
signals[21] = "SIGTTIN"
signals[22] = "SIGTTOU"
}
# common fields
$1 != "#" {
# task name can contain dashes
comm = pid = $1
sub(/-[0-9][0-9]*/, "", comm)
if (opt_name && match(comm, name) == 0)
next
sub(/.*-/, "", pid)
}
# sys_kill() entry
$1 != "#" && $(4+o) ~ /sys_kill/ && $(5+o) !~ /->/ {
#
# eg: ... sys_kill(pid:...
#
kpid = $(5+o)
signal = $(7+o)
sub(/,$/, "", kpid)
sub(/\)$/, "", signal)
kpid = int("0x"kpid)
signal = int("0x"signal)
current[pid,"kpid"] = kpid
current[pid,"signal"] = signal
}
# sys_kill exit
$1 != "#" && $(5+o) ~ /->/ {
rv = int($NF)
killed_pid = current[pid,"kpid"]
signal = current[pid,"signal"]
delete current[pid,"kpid"]
delete current[pid,"signal"]
if(opt_pid && killed_pid != opt_pid) {
next
}
if (opt_time) {
time = $(3+o); sub(":", "", time)
printf "%-16s ", time
}
if (opt_fancy) {
if (signals[signal] != "") {
signal = signals[signal]
}
}
printf "%-16.16s %-6s %-8s %-10s %-4s\n", comm, pid, killed_pid, signal,
rv
}
$0 ~ /LOST.*EVENTS/ { print "WARNING: " $0 > "/dev/stderr" }
'
### end tracing
end
perf-tools-unstable-0.0.1~20150130+git85414b0/man/ 0000775 0000000 0000000 00000000000 12542613570 0020616 5 ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/man/man8/ 0000775 0000000 0000000 00000000000 12542613570 0021461 5 ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/man/man8/bitesize.8 0000664 0000000 0000000 00000004631 12542613570 0023374 0 ustar 00root root 0000000 0000000 .TH bitesize 8 "2014-07-07" "USER COMMANDS"
.SH NAME
bitesize \- show disk I/O size as a histogram. Uses Linux perf_events.
.SH SYNOPSIS
.B bitesize
[-h] [-b buckets] [seconds]
.SH DESCRIPTION
This can be used to characterize the distribution of block device (disk) I/O
sizes. To study block device I/O in more detail, see iosnoop(8).
This uses multiple counting tracepoints with different filters, one for each
histogram bucket. While this is summarized in-kernel, the use of multiple
tracepoints does add addiitonal overhead, which is more evident if you add
more buckets. In the future this functionality will be available in an
efficient way in the kernel, and this tool can be rewritten.
.SH REQUIREMENTS
Linux perf_events: add linux-tools-common, run "perf", then add any additional
packages it requests. This also requires the block:block_rq_issue tracepoint,
which should already be available in recent kernels.
.SH OPTIONS
.TP
\-h
Usage message.
.TP
\-b buckets
Specify a list of bucket points for the histogram as a string (eg, "10 500
1000"). The histogram will include buckets for less-than the minimum, and
greater-than-or-equal-to the maximum. If a single value is specified, two
statistics only are gathered: for less-than and for greater-than-or-equal-to.
The overhead is relative to the number of buckets, so only specifying a
single value costs the lowest overhead.
.TP
seconds
Number of seconds to trace. If not specified, this runs until Ctrl-C.
.SH EXAMPLES
.TP
Trace read() syscalls until Ctrl-C, and show histogram of requested size:
#
.B bitesize syscalls:sys_enter_read count
.SH FIELDS
.TP
Kbytes
Kbyte range of the histogram bucket.
.TP
I/O
Number of I/O that occurred in this range while tracing.
.TP
Distribution
ASCII histogram representation of the I/O column.
.SH OVERHEAD
While the counts are performed in-kernel, there is one tracepoint used per
histogram bucket, so the overheads are higher than usual (relative to the
number of buckets) than function counting using perf stat. The lowest
overhead is when \-b is used to specify one bucket only, bifurcating
statistics.
.SH SOURCE
This is from the perf-tools collection.
.IP
https://github.com/brendangregg/perf-tools
.PP
Also look under the examples directory for a text file containing example
usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
iosnoop(8), iolatency(8), iostat(1)
perf-tools-unstable-0.0.1~20150130+git85414b0/man/man8/cachestat.8 0000664 0000000 0000000 00000007377 12542613570 0023527 0 ustar 00root root 0000000 0000000 .TH cachestat 8 "2014-12-28" "USER COMMANDS"
.SH NAME
cachestat \- Measure page cache hits/misses. Uses Linux ftrace.
.SH SYNOPSIS
.B cachestat
[\-Dht] [interval]
.SH DESCRIPTION
This tool provides basic cache hit/miss statistics for the Linux page cache.
Its current implementation uses Linux ftrace dynamic function profiling to
create custom in-kernel counters, which is a workaround until such counters
can be built-in to the kernel. Specifically, four kernel functions are counted:
.IP
mark_page_accessed() for measuring cache accesses
.IP
mark_buffer_dirty() for measuring cache writes
.IP
add_to_page_cache_lru() for measuring page additions
.IP
account_page_dirtied() for measuring page dirties
.PP
It is possible that these functions have been renamed (or are different
logically) for your kernel version, and this script will not work as-is.
This was written for a Linux 3.13 kernel, and tested on a few others versions.
This script is a sandcastle: the kernel may wash some away, and you'll
need to rebuild.
This program's implementation can be improved in the future when other
kernel capabilities are made available. If you need a more reliable tool now,
then consider other tracing alternatives (eg, SystemTap). This tool is really
a proof of concept to see what ftrace can currently do.
WARNING: This uses dynamic tracing of kernel functions, and could cause
kernel panics or freezes. Test, and know what you are doing, before use.
It also traces cache activity, which can be frequent, and cost some overhead.
The statistics should be treated as best-effort: there may be some error
margin depending on unusual workload types.
Since this uses ftrace, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_FUNCTION_PROFILER, which you may already have enabled and available on
recent kernels, and awk.
.SH OPTIONS
.TP
\-D
Include extra fields for debug purposes (see script).
.TP
\-h
Print usage message.
.TP
\-t
Include timestamps in units of seconds.
.TP
interval
Output interval in seconds. Default is 1.
.SH EXAMPLES
.TP
Show per-second page cache statistics:
#
.B cachestat
.SH FIELDS
.TP
TIME
Time, in HH:MM:SS.
.TP
HITS
Number of page cache hits (reads). Each hit is for one memory page (the size
depends on your processor architecture; commonly 4 Kbytes). Since this tool
outputs at a timed interval, this field indicates the cache hit rate.
.TP
MISSES
Number of page cache misses (reads from storage I/O). Each miss is for one
memory page. Cache misses should be causing disk I/O. Run iostat(1) for
correlation (although the miss count and size by the time disk I/O is issued
can differ due to I/O subsystem merging).
.TP
DIRTIES
Number of times a page in the page cache was written to and thus "dirtied".
The same page may be counted multiple times per interval, if it is written
to multiple times. This field gives an indication of how much cache churn there
is, caused by applications writing data.
.TP
RATIO
The ratio of cache hits to total cache accesses (hits + misses), as a
percentage.
.TP
BUFFERS_MB
Size of the buffer cache, for disk I/O. From /proc/meminfo.
.TP
CACHED_MB
Size of the page cache, for file system I/O. From /proc/meminfo.
.SH OVERHEAD
This tool currently uses ftrace function profiling, which provides efficient
in-kernel counters. However, the functions profiled are executed frequently,
so the overheads can add up. Test and measure before use. My own testing
showed around a 2% loss in application performance while this tool was running.
.SH SOURCE
This is from the perf-tools collection.
.IP
https://github.com/brendangregg/perf-tools
.PP
Also look under the examples directory for a text file containing example
usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
iostat(1), iosnoop(8)
perf-tools-unstable-0.0.1~20150130+git85414b0/man/man8/execsnoop.8 0000664 0000000 0000000 00000006651 12542613570 0023565 0 ustar 00root root 0000000 0000000 .TH execsnoop 8 "2014-07-07" "USER COMMANDS"
.SH NAME
execsnoop \- trace process exec() with arguments. Uses Linux ftrace.
.SH SYNOPSIS
.B execsnoop
[\-hrt] [\-a argc] [\-d secs] [name]
.SH DESCRIPTION
execsnoop traces process execution, showing PID, PPID, and argument details
if possible.
This traces exec() from the fork()->exec() sequence, which means it won't
catch new processes that only fork(). With the -r option, it will also catch
processes that re-exec. It makes a best-effort attempt to retrieve the program
arguments and PPID; if these are unavailable, 0 and "[?]" are printed
respectively. There is also a limit to the number of arguments printed (by
default, 8), which can be increased using -a.
This implementation is designed to work on older kernel versions, and without
kernel debuginfo. It works by dynamic tracing an execve kernel function to
read the arguments from the %si register. The stub_execve() function is tried
first, and then the do_execve() function. The sched:sched_process_fork
tracepoint, is used for the PPID. Tracing registers and kernel functions is
an unstable technique, and this tool may not work for some kernels or platforms.
This program is a workaround that should be
improved in the future when other kernel capabilities are made available. If
you need a more reliable tool now, then consider other tracing alternatives
(eg, SystemTap). This tool is really a proof of concept to see what ftrace can
currently do.
Since this uses ftrace, only the root user can use this tool.
.SH REQUIREMENTS
FTRACE and KPROBE CONFIG, sched:sched_process_fork tracepoint,
and either the stub_execve() or do_execve() kernel function. You may already
have these on recent kernels. And awk.
.SH OPTIONS
.TP
\-a argc
Maximum number of arguments to show. The default is 8, and the maximum allowed
is 16. If execsnoop thinks it has truncated the argument list, an ellipsis
"[...]" will be shown.
.TP
\-d seconds
Duration to trace, in seconds. This also uses in-kernel buffering.
.TP
\-h
Print usage message.
.TP
\-r
Include re-exec()s.
.TP
\-t
Include timestamps in units of seconds.
.TP
name
Only show processes that match this name.
Partials and regular expressions are allowed, as this is filtered in
user space by awk.
.SH EXAMPLES
.TP
Trace all new processes and arguments (if possible):
#
.B execsnoop
.TP
Trace all new process names containing the text "http":
#
.B execsnoop http
.SH FIELDS
.TP
TIMEs
Time of the exec(), in seconds.
.TP
PID
Process ID.
.TP
PPID
Parent process ID, if this was able to be read. If it wasn't, 0 is printed.
.TP
ARGS
Command line arguments, if these were able to be read. If they aren't able to be
read, "[?]" is printed (which would be due to a limitation in this tools
implementation, since this is workaround for older kernels; if you need
reliable argument tracing, use a different tracer). They will be truncated
to the argc limit, and an ellipsis "[...]" may be printed if execsnoop is
aware of the truncation.
.SH OVERHEAD
This reads and processes exec() events in user space as they occur. Since the
rate of exec() is expected to be low (< 500/s), the overhead is expected to
be small or negligible.
.SH SOURCE
This is from the perf-tools collection.
.IP
https://github.com/brendangregg/perf-tools
.PP
Also look under the examples directory for a text file containing example
usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
top(1)
perf-tools-unstable-0.0.1~20150130+git85414b0/man/man8/funccount.8 0000664 0000000 0000000 00000004056 12542613570 0023563 0 ustar 00root root 0000000 0000000 .TH funccount 8 "2014-07-19" "USER COMMANDS"
.SH NAME
funccount \- count kernel function calls matching specified wildcards. Uses Linux ftrace.
.SH SYNOPSIS
.B funccount
[\-hT] [\-i secs] [\-d secs] [\-t top] funcstring
.SH DESCRIPTION
This tool is a quick way to determine which kernel functions are being called,
and at what rate. It uses ftrace function profiling capabilities.
WARNING: This uses dynamic tracing of (what can be many) kernel functions,
and could cause kernel panics or freezes. Test, and know what you are doing,
before use.
Since this uses ftrace, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_FUNCTION_PROFILER, which you may already have enabled and available on
recent kernels, and awk.
.SH OPTIONS
\-d seconds
Total duration of the trace.
.TP
\-h
Print usage message.
.TP
\-i seconds
Print an interval summary every so many seconds.
.TP
\-t top
Print top number of entries only.
.TP
\-T
Include timestamp on each summary.
.TP
funcstring
A function name to trace, which may include file glob style wildcards ("*") at
the beginning or ending of a string only. Eg, "vfs*" means match "vfs" followed
by anything.
.SH EXAMPLES
.TP
Count every kernel function beginning with "bio_", until Ctrl-C is hit:
#
.B funccount 'bio_*'
.TP
Count every "tcp_*" kernel function, and print a summary every one second, five in total:
#
.B funccount \-i 1 \-d 5 'tcp_*'
.TP
Count every "ext4*" kernel function, and print the top 20 when Ctrl-C is hit:
#
.B funccount \-t 20 'ext4*'
.SH FIELDS
.TP
FUNC
Kernel function name.
.TP
COUNT
Number of times this function was called during the tracing interval.
.SH OVERHEAD
This uses the ftrace profiling framework, which does in-kernel counts,
lowering the overhead (compared to tracing each event).
.SH SOURCE
This is from the perf-tools collection:
.IP
https://github.com/brendangregg/perf-tools
.PP
Also look under the examples directory for a text file containing example
usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
functrace(8)
perf-tools-unstable-0.0.1~20150130+git85414b0/man/man8/funcgraph.8 0000664 0000000 0000000 00000013451 12542613570 0023533 0 ustar 00root root 0000000 0000000 .TH funcgraph 8 "2014-07-29" "USER COMMANDS"
.SH NAME
funcgraph \- trace kernel function graph, showing child function calls and times. Uses Linux ftrace.
.SH SYNOPSIS
.B funcgraph
[\-aCDhHPtT] [\-m maxdepth] [\-p PID] [\-d secs] funcstring
.SH DESCRIPTION
This is an exploratory tool that shows the graph of child function calls
for a given kernel function. This can cost moderate overhead to execute, and
should only be used to understand kernel behavior before using other, lower
overhead tools. This is a proof of concept using Linux ftrace capabilities
on older kernels.
The output format is the same as the ftrace function graph trace format,
described in the kernel source under Documentation/trace/ftrace.txt.
Note that the output may be shuffled when different CPU buffers are read;
check the CPU column for changes, or include timestamps (-t) and post sort.
The "-d duration" mode leaves the trace data in the kernel buffer, and
only reads it at the end. If the trace data is large, beware of exhausting
buffer space (/sys/kernel/debug/tracing/buffer_size_kb) and losing data.
Also beware of feedback loops: tracing tcp* functions over an ssh session,
or writing ext4* functions to an ext4 file system. For the former, tcp
trace data could be redirected to a file (as in the usage message). For
the latter, trace to the screen or a different file system.
WARNING: This uses dynamic tracing of kernel functions, and could cause
kernel panics or freezes. Test, and know what you are doing, before use.
Also see the OVERHEAD section.
Since this uses ftrace, only the root user can use this tool.
.SH REQUIREMENTS
FTRACE CONFIG, which you may already have enabled and available on recent
kernels.
.SH OPTIONS
.TP
\-a
All info. Same as \-HPt. (But no -T, which isn't available in older kernels.)
.TP
\-C
Function durations measure on-CPU time only (exclude sleep time).
.TP
\-d seconds
Set the duration of tracing, in seconds. Trace output will be buffered and
printed at the end. This also reduces overheads by buffering in-kernel,
instead of printing events as they occur.
The ftrace buffer has a fixed size per-CPU (see
/sys/kernel/debug/tracing/buffer_size_kb). If you think events are missing,
try increasing that size.
.TP
\-D
Do not show function duration times.
.TP
\-h
Print usage message.
.TP
\-H
Print column headers.
.TP
\-m
Max depth to trace functions. By default, unlimited (0). This feature is only
available for newer Linux kernel versions.
.TP
\-p PID
Only trace kernel functions when this process ID is on-CPU.
.TP
\-P
Show process names and process IDs with every line of output.
.TP
\-t
Show timestamps on every line of output.
.TP
\-T
Tail mode: decorate function return lines with the name of the function. This
option may not be available for older kernels.
.TP
funcstring
A function name to trace, which may include file glob style wildcards ("*") at
the beginning or ending of a string only. Eg, "vfs*" means match "vfs" followed
by anything. Since the output is verbose, you probably only want to trace
single functions, and not use wildcards.
.SH EXAMPLES
.TP
Trace calls to do_nanosleep(), showing child functions and durations:
#
.B funcgraph do_nanosleep
.TP
Same as above, but include column headers:
#
.B funcgraph -H do_nanosleep
.TP
Same as above, but include timestamps and process names as well:
#
.B funcgraph -HtP do_nanosleep
.TP
Trace all vfs_read() kernel function calls, and child functions, for PID 198 only:
#
.B funcgraph \-p 198 vfs_read
.TP
Trace all vfs_read() kernel function calls, and child functions, for 1 second then write to a file.
#
.B funcgraph \-d 1 vfs_read > out
.SH FIELDS
The output format depends on the kernel version, and headings can be printed
using \-H. The format is the same as the ftrace function trace format, described
in the kernel source under Documentation/trace/ftrace.txt.
Typical fields are:
.TP
TIME
(Shown with \-t.) Time of event, in seconds.
.TP
CPU
The CPU this event occurred on.
.TP
TASK/PID
(Shown with \-P.) The process name (which could include dashes), a dash, and the process ID.
.TP
DURATION
Elapsed time during the function call, inclusive of children. This is also
inclusive of sleep time, unless -C is used. The time is either displayed on
the return of a function ("}"), or for a leaf function (no children), on the
same line.
If the trace output begins with some returns that lack entries, their durations
may not be trusted. This is usually only the case for the first dozen or so
lines.
.TP
FUNCTION CALLS
Entries and returns from kernel functions.
.SH OVERHEAD
This tool causes moderate to high overheads. Use with caution for
exploratory purposes, then switch to lower overhead techniques based on
findings. It's expected that the kernel will run at least 50% slower while
this tool is running -- even while no output is being generated. This is
because ALL kernel functions are traced, and filtered based on the function
of interest. When output is generated, it can generate many lines quickly
depending on the traced event. Such data will cause performance overheads.
This also works without buffering by default, printing function events
as they happen (uses trace_pipe), context switching and consuming CPU to do
so. If needed, you can try the "-d secs" option, which buffers events
instead, reducing overhead. If you think the buffer option is losing events,
try increasing the buffer size (buffer_size_kb).
It's a good idea to use funccount(8) first, which is lower overhead, to
help you select which functions you may want to trace using funcgraph(8).
.SH SOURCE
This is from the perf-tools collection:
.IP
https://github.com/brendangregg/perf-tools
.PP
Also look under the examples directory for a text file containing example
usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
funccount(8), functrace(8), kprobe(8)
perf-tools-unstable-0.0.1~20150130+git85414b0/man/man8/funcslower.8 0000664 0000000 0000000 00000010015 12542613570 0023736 0 ustar 00root root 0000000 0000000 .TH funcslower 8 "2014-07-30" "USER COMMANDS"
.SH NAME
funcslower \- trace kernel functions slower than a threshold (microseconds). Uses Linux ftrace.
.SH SYNOPSIS
.B funcslower
[\-aChHPt] [\-p PID] [\-d secs] funcstring latency_us
.SH DESCRIPTION
This uses the Linux ftrace function graph profiler to time kernel functions
and filter them based on a latency threshold. Latency outliers can be studied
this way, confirming their presence, duration, and rate. This tool
is a proof of concept using Linux ftrace capabilities on older kernels.
The output format is based on the ftrace function graph trace format,
described in the kernel source under Documentation/trace/ftrace.txt. Use the
\-H option to print column headings.
Note that the output may be shuffled when different CPU buffers are read;
check the CPU column for changes, or include timestamps (-t) and post sort.
WARNING: This uses dynamic tracing of kernel functions, and could cause
kernel panics or freezes. Test, and know what you are doing, before use.
Since this uses ftrace, only the root user can use this tool.
.SH REQUIREMENTS
FTRACE function graph, which you may already have enabled and available on
recent kernels. And awk.
.SH OPTIONS
.TP
\-a
All info. Same as \-HPt.
.TP
\-C
Function durations measure on-CPU time only (exclude sleep time).
.TP
\-d seconds
Set the duration of tracing, in seconds. Trace output will be buffered and
printed at the end. This also reduces overheads by buffering in-kernel,
instead of printing events as they occur.
The ftrace buffer has a fixed size per-CPU (see
/sys/kernel/debug/tracing/buffer_size_kb). If you think events are missing,
try increasing that size.
.TP
\-h
Print usage message.
.TP
\-H
Print column headers.
.TP
\-p PID
Only trace kernel functions when this process ID is on-CPU.
.TP
\-P
Show process names and process IDs with every line of output.
.TP
\-t
Show timestamps on every line of output.
.TP
funcstring
A function name to trace, which may include file glob style wildcards ("*") at
the beginning or ending of a string only. Eg, "vfs*" means match "vfs" followed
by anything. Since the output is verbose, you probably only want to trace
single functions, and not use wildcards.
.TP
latency_us
Minimum function duration to trace, in units of microseconds. This is filtered
in-kernel.
.SH EXAMPLES
.TP
Trace calls to vfs_read(), showing events slower than 10 ms:
#
.B funcslower vfs_read 10000
.TP
Same as above, but include column headers, event timestamps, and process names:
#
.B funcslower -HPt vfs_read 10000
.TP
Trace slow vfs_read()s for PID 198 only:
#
.B funcslower \-p 198 vfs_read 10000
.SH FIELDS
The output format depends on the kernel version, and headings can be printed
using \-H. The format is the same as the ftrace function trace format, described
in the kernel source under Documentation/trace/ftrace.txt.
Typical fields are:
.TP
TIME
(Shown with \-t.) Time of event, in seconds.
.TP
CPU
The CPU this event occurred on.
.TP
TASK/PID
(Shown with \-P.) The process name (which could include dashes), a dash, and the process ID.
.TP
DURATION
Elapsed time during the function call, inclusive of children. This is also
inclusive of sleep time, unless -C is used.
.TP
FUNCTION CALLS
Kernel function returns.
.SH OVERHEAD
OVERHEADS: Timing and filtering is performed in-kernel context, costing
lower overheads than post-processing in user space. If you trace frequent
events (eg, pick a common function and a low threshold), you might want to
try the "-d secs" option, which buffers events in-kernel instead of printing
them live.
It's a good idea to start with a high threshold (eg, "100000" for 100 ms) then
to decrease it. If you start low instead, you may start printing too many
events.
.SH SOURCE
This is from the perf-tools collection:
.IP
https://github.com/brendangregg/perf-tools
.PP
Also look under the examples directory for a text file containing example
usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
funccount(8), functrace(8), funcgraph(8), kprobe(8)
perf-tools-unstable-0.0.1~20150130+git85414b0/man/man8/functrace.8 0000664 0000000 0000000 00000007401 12542613570 0023526 0 ustar 00root root 0000000 0000000 .TH functrace 8 "2014-07-20" "USER COMMANDS"
.SH NAME
functrace \- trace kernel function calls matching specified wildcards. Uses Linux ftrace.
.SH SYNOPSIS
.B functrace
[\-hH] [\-p PID] [\-d secs] funcstring
.SH DESCRIPTION
This tool provides a quick way to capture the execution of kernel functions,
showing basic details including as the process ID, timestamp, and calling
function.
WARNING: This uses dynamic tracing of (what can be many) kernel functions,
and could cause kernel panics or freezes. Test, and know what you are doing,
before use.
Also beware of feedback loops: tracing tcp* functions over an ssh session,
or writing ext4* functions to an ext4 file system. For the former, tcp
trace data could be redirected to a file (as in the usage message). For
the latter, trace to the screen or a different file system.
SEE ALSO: kprobe(8), which can dynamically trace a single function call or
return, and examine CPU registers and return values.
Since this uses ftrace, only the root user can use this tool.
.SH REQUIREMENTS
FTRACE CONFIG, which you may already have enabled and available on recent
kernels.
.SH OPTIONS
.TP
\-d seconds
Set the duration of tracing, in seconds. Trace output will be buffered and
printed at the end. This also reduces overheads by buffering in-kernel,
instead of printing events as they occur.
The ftrace buffer has a fixed size per-CPU (see
/sys/kernel/debug/tracing/buffer_size_kb). If you think events are missing,
try increasing that size.
.TP
\-h
Print usage message.
.TP
\-H
Print column headers.
.TP
\-p PID
Only trace kernel functions when this process ID is on-CPU.
.TP
funcstring
A function name to trace, which may include file glob style wildcards ("*") at
the beginning or ending of a string only. Eg, "vfs*" means match "vfs" followed
by anything.
.SH EXAMPLES
.TP
Trace calls to do_nanosleep():
#
.B functrace do_nanosleep
.TP
Trace calls to all kernel functions ending in "*sleep":
#
.B functrace '*sleep'
.TP
Trace all "vfs*" kernel function calls for PID 198:
#
.B functrace \-p 198 'vfs*'
.TP
Trace all "tcp*" kernel function calls, and output to a file until Ctrl-C:
#
.B functrace 'tcp*' > out
.TP
Trace all "tcp*" kernel function calls, output to a file, for 1 second (buffered):
#
.B functrace \-d 1 'tcp*' > out
.SH FIELDS
The output format depends on the kernel version, and headings can be printed
using \-H. The format is the same as the ftrace function trace format, described
in the kernel source under Documentation/trace/ftrace.txt.
Typical fields are:
.TP
TASK-PID
The process name (which could include dashes), a dash, and the process ID.
.TP
CPU#
The CPU ID, in brackets.
.TP
||||
Kernel state flags. For example, on Linux 3.16 these are for irqs-off,
need-resched, hardirq/softirq, and preempt-depth.
.TP
TIMESTAMP
Time of event, in seconds.
.TP
FUNCTION
Kernel function name.
.SH OVERHEAD
This can generate a lot of trace data quickly, depending on the
frequency of the traced events. Such data will cause performance overheads.
This also works without buffering by default, printing function events
as they happen (uses trace_pipe), context switching and consuming CPU to do
so. If needed, you can try the "\-d secs" option, which buffers events
instead, reducing overhead. If you think the buffer option is losing events,
try increasing the buffer size (buffer_size_kb).
It's a good idea to use funccount(8) first, which is lower overhead, to
help you select which functions you may want to trace using functrace(8).
.SH SOURCE
This is from the perf-tools collection:
.IP
https://github.com/brendangregg/perf-tools
.PP
Also look under the examples directory for a text file containing example
usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
funccount(8), kprobe(8)
perf-tools-unstable-0.0.1~20150130+git85414b0/man/man8/iolatency.8 0000664 0000000 0000000 00000007110 12542613570 0023540 0 ustar 00root root 0000000 0000000 .TH iolatency 8 "2014-07-12" "USER COMMANDS"
.SH NAME
iolatency \- summarize block device I/O latency as a histogram. Uses Linux ftrace.
.SH SYNOPSIS
.B iolatency
[\-hQT] [\-d device] [\-i iotype] [interval [count]]
.SH DESCRIPTION
This shows the distribution of latency, allowing modes and latency outliers
to be identified and studied. For more details of block device I/O, use
iosnoop(8).
This is a proof of concept tool using ftrace, and involves user space
processing and related overheads. See the OVERHEAD section.
NOTE: Due to the way trace buffers are switched per interval, there is the
possibility of losing a small number of I/O (usually less than 1%). The
summary therefore shows the general distribution, but may be slightly
incomplete. If 100% of I/O must be studied, use iosnoop(8) and post-process.
Also note that I/O may be missed when the trace buffer is full: see the
interval section in OPTIONS.
Since this uses ftrace, only the root user can use this tool.
.SH REQUIREMENTS
FTRACE CONFIG, and the tracepoints block:block_rq_issue and
block:block_rq_complete, which you may already have enabled and available on
recent Linux kernels. And awk.
.SH OPTIONS
.TP
\-d device
Only show I/O issued by this device. (eg, "202,1"). This matches the DEV
column in the iolatency output, and is filtered in-kernel.
.TP
\-i iotype
Only show I/O issued that matches this I/O type. This matches the TYPE column
in the iolatency output, and wildcards ("*") can be used at the beginning or
end (only). Eg, "*R*" matches all reads. This is filtered in-kernel.
.TP
\-h
Print usage message.
.TP
\-Q
Include block I/O queueing time. This uses block I/O queue insertion as the
start tracepoint (block:block_rq_insert), instead of block I/O issue
(block:block_rq_issue).
.TP
\-T
Include timestamps with each summary output.
.TP
interval
Interval between summary histograms, in seconds.
During the interval, trace output will be buffered in-kernel, which is then
read and processed for the summary. This buffer has a fixed size per-CPU (see
/sys/kernel/debug/tracing/buffer_size_kb). If you think events are missing,
try increasing that size (the bufsize_kb setting in iolatency). With the
default setting (4 Mbytes), I'd expect this to happen around 50k I/O per
summary.
.TP
count
Number of summaries to print.
.SH EXAMPLES
.TP
Default output, print a summary of block I/O latency every 1 second:
#
.B iolatency
.TP
Include block I/O queue time:
.B iolatency \-Q
.TP
Print 5 x 1 second summaries:
#
.B iolatency 1 5
.TP
Trace reads only:
#
.B iolatency \-i '*R*'
.TP
Trace I/O issued to device 202,1 only:
#
.B iolatency \-d 202,1
.SH FIELDS
.TP
>=(ms)
Latency was greater than or equal-to this value, in milliseconds.
.TP
<(ms)
Latency was less than this value, in milliseconds.
.TP
I/O
Number of block device I/O in this latency range, during the interval.
.TP
Distribution
ASCII histogram representation of the I/O column.
.SH OVERHEAD
Block device I/O issue and completion events are traced and buffered
in-kernel, then processed and summarized in user space. There may be
measurable overhead with this approach, relative to the block device IOPS.
The overhead may be acceptable in many situations. If it isn't, this tool
can be reimplemented in C, or using a different tracer (eg, perf_events,
SystemTap, ktap.)
.SH SOURCE
This is from the perf-tools collection.
.IP
https://github.com/brendangregg/perf-tools
.PP
Also look under the examples directory for a text file containing example
usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
iosnoop(8), iostat(1)
perf-tools-unstable-0.0.1~20150130+git85414b0/man/man8/iosnoop.8 0000664 0000000 0000000 00000013326 12542613570 0023245 0 ustar 00root root 0000000 0000000 .TH iosnoop 8 "2014-07-12" "USER COMMANDS"
.SH NAME
iosnoop \- trace block I/O events as they occur. Uses Linux ftrace.
.SH SYNOPSIS
.B iosnoop
[\-hQst] [\-d device] [\-i iotype] [\-p pid] [\-n name] [duration]
.SH DESCRIPTION
iosnoop prints block device I/O events as they happen, with useful details such
as PID, device, I/O type, block number, I/O size, and latency.
This traces disk I/O at the block device interface, using the block:
tracepoints. This can help characterize the I/O requested for the storage
devices and their resulting performance. I/O completions can also be studied
event-by-event for debugging disk and controller I/O scheduling issues.
NOTE: Use of a duration buffers I/O, which reduces overheads, but this also
introduces a limit to the number of I/O that will be captured. See the duration
section in OPTIONS.
Since this uses ftrace, only the root user can use this tool.
.SH REQUIREMENTS
FTRACE CONFIG, and the tracepoints block:block_rq_insert, block:block_rq_issue,
and block:block_rq_complete, which you may already have enabled and available on
recent Linux kernels. And awk.
.SH OPTIONS
.TP
\-d device
Only show I/O issued by this device. (eg, "202,1"). This matches the DEV
column in the iosnoop output, and is filtered in-kernel.
.TP
\-i iotype
Only show I/O issued that matches this I/O type. This matches the TYPE column
in the iosnoop output, and wildcards ("*") can be used at the beginning or
end (only). Eg, "*R*" matches all reads. This is filtered in-kernel.
.TP
\-p PID
Only show I/O issued by this PID. This filters in-kernel. Note that I/O may be
issued indirectly; for example, as the result of a memory allocation, causing
dirty buffers (maybe from another PID) to be written to storage.
With the \-Q
option, the identified PID is more accurate, however, LATms now includes
queueing time (see the \-Q option).
.TP
\-n name
Only show I/O issued by processes with this name. Partial strings and regular
expressions are allowed. This is a post-filter, so all I/O is traced and then
filtered in user space. As with PID, this includes indirectly issued I/O,
and \-Q can be used to improve accuracy (see the \-Q option).
.TP
\-h
Print usage message.
.TP
\-Q
Use block I/O queue insertion as the start tracepoint (block:block_rq_insert),
instead of block I/O issue (block:block_rq_issue). This makes the following
changes: COMM and PID are more likely to identify the origin process, as are
\-p PID and \-n name; STARTs shows queue insert; and LATms shows I/O
time including time spent on the block I/O queue.
.TP
\-s
Include a column for the start time (issue time) of the I/O, in seconds.
If the \-Q option is used, this is the time the I/O is inserted on the block
I/O queue.
.TP
\-t
Include a column for the completion time of the I/O, in seconds.
.TP
duration
Set the duration of tracing, in seconds. Trace output will be buffered and
printed at the end. This also reduces overheads by buffering in-kernel,
instead of printing events as they occur.
The ftrace buffer has a fixed size per-CPU (see
/sys/kernel/debug/tracing/buffer_size_kb). If you think events are missing,
try increasing that size (the bufsize_kb setting in iosnoop). With the
default setting (4 Mbytes), I'd expect this to happen around 50k I/O.
.SH EXAMPLES
.TP
Default output, print I/O activity as it occurs:
#
.B iosnoop
.TP
Buffer for 5 seconds (lower overhead) and write to a file:
#
.B iosnoop 5 > outfile
.TP
Trace based on block I/O queue insertion, showing queueing time:
#
.B iosnoop -Q
.TP
Trace reads only:
#
.B iosnoop \-i '*R*'
.TP
Trace I/O issued to device 202,1 only:
#
.B iosnoop \-d 202,1
.TP
Include I/O start and completion timestamps:
#
.B iosnoop \-ts
.TP
Include I/O queueing and completion timestamps:
#
.B iosnop \-Qts
.TP
Trace I/O issued when PID 181 was on-CPU only:
#
.B iosnoop \-p 181
.TP
Trace I/O queued when PID 181 was on-CPU (more accurate), and include queue time:
#
.B iosnoop \-Qp 181
.SH FIELDS
.TP
COMM
Process name (command) for the PID that was on-CPU when the I/O was issued, or
inserted if \-Q is used. See PID. This column is truncated to 12 characters.
.TP
PID
Process ID which was on-CPU when the I/O was issued, or inserted if \-Q is
used. This will usually be the
process directly requesting I/O, however, it may also include indirect I/O. For
example, a memory allocation by this PID which causes dirty memory from another
PID to be flushed to disk.
.TP
TYPE
Type of I/O. R=read, W=write, M=metadata, S=sync, A=readahead, F=flush or FUA (force unit access), D=discard, E=secure, N=null (not RWFD).
.TP
DEV
Storage device ID.
.TP
BLOCK
Disk block for the operation (location, relative to this device).
.TP
BYTES
Size of the I/O, in bytes.
.TP
LATms
Latency (time) for the I/O, in milliseconds.
.SH OVERHEAD
By default, iosnoop works without buffering, printing I/O events
as they happen (uses trace_pipe), context switching and consuming CPU to do
so. This has a limit of about 10,000 IOPS (depending on your platform), at
which point iosnoop will be consuming 1 CPU. The duration mode uses buffering,
and can handle much higher IOPS rates, however, the buffer has a limit of
about 50,000 I/O, after which events will be dropped. You can tune this with
bufsize_kb, which is per-CPU. Also note that the "-n" option is currently
post-filtered, so all events are traced.
The overhead may be acceptable in many situations. If it isn't, this tool
can be reimplemented in C, or using a different tracer (eg, perf_events,
SystemTap, ktap.)
.SH SOURCE
This is from the perf-tools collection.
.IP
https://github.com/brendangregg/perf-tools
.PP
Also look under the examples directory for a text file containing example
usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
iolatency(8), iostat(1)
perf-tools-unstable-0.0.1~20150130+git85414b0/man/man8/killsnoop.8 0000664 0000000 0000000 00000005600 12542613570 0023565 0 ustar 00root root 0000000 0000000 .TH killsnoop 8 "2014-09-15" "USER COMMANDS"
.SH NAME
killsnoop \- trace kill() syscalls with process and signal details. Uses Linux ftrace.
.SH SYNOPSIS
.B killsnoop
[\-hst] [\-d secs] [\-p pid] [\-n name]
.SH DESCRIPTION
This traces kill() syscalls, showing which process killed which pid and
returns the returncode (0 for success, -1 for error).
This implementation is designed to work on older kernel versions, and without
kernel debuginfo. It works by dynamic tracing of the return value of kill()
and associating it with the previous kill() syscall return.
This approach is kernel version specific, and may not work on your version.
It is a workaround, and proof of concept for ftrace, until more kernel tracing
functionality is available.
Since this uses ftrace, only the root user can use this tool.
.SH REQUIREMENTS
FTRACE and KPROBE CONFIG, syscalls:sys_enter_kill and
syscalls:sys_exit_kill kernel tracepoints (you may already have these
on recent kernels) and awk.
.SH OPTIONS
.TP
\-d secs
Set the duration of tracing, in seconds. Trace output will be buffered and
printed at the end. This also reduces overheads by buffering in-kernel,
instead of printing events as they occur.
The ftrace buffer has a fixed size per-CPU (see
/sys/kernel/debug/tracing/buffer_size_kb). If you think events are missing,
try increasing that size.
.TP
\-h
Print usage message.
.TP
\-n name
Only show processes matching this process name. Partial strings and regular
expressions are allowed. This is post-filtered using awk.
.TP
\-p PID
Only trace this process ID. This is filtered in-kernel.
.TP
\-s
Use human readable signal names, instead of signal numbers.
.TP
\-t
Include timestamps, in seconds.
.SH EXAMPLES
.TP
Trace all kill() syscalls with details:
#
.B killsnoop
.TP
Trace kill() syscalls with readable signal names, and times:
#
.B killsnoop -st
.TP
Track kill() syscalls for processes named "httpd":
#
.B killsnoop -n httpd
.SH FIELDS
.TP
TIMEs
Time of open() completion, in units of seconds.
.TP
COMM
Process name (if known) of the process that issued the signal.
.TP
PID
Process ID that issued the signal.
.TP
TPID
Target PID for the signal.
.TP
SIGNAL
Signal number sent to the target process, or name if -s is used.
.TP
RETURN
Return status: 0 for success, -1 for failure.
.SH OVERHEAD
This reads and kill() syscalls as they occur. For high rates of kills (> 500/s),
the overhead may begin to be measurable, however, the rate is unlikely to get
this high. And if it is: you should investigate why. Test yourself. You can
also use the \-d mode to buffer output, reducing overheads.
.SH SOURCE
This is from the perf-tools collection.
.IP
https://github.com/brendangregg/perf-tools
.PP
Also look under the examples directory for a text file containing example
usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Martin Probst
.SH SEE ALSO
tpoint(8), execsnoop(8), opensnoop(8)
perf-tools-unstable-0.0.1~20150130+git85414b0/man/man8/kprobe.8 0000664 0000000 0000000 00000012301 12542613570 0023031 0 ustar 00root root 0000000 0000000 .TH kprobe 8 "2014-07-20" "USER COMMANDS"
.SH NAME
kprobe \- trace a given kprobe definition. Kernel dynamic tracing. Uses Linux ftrace.
.SH SYNOPSIS
.B kprobe
[\-FhHsv] [\-d secs] [\-p PID] kprobe_definition [filter]
.SH DESCRIPTION
This will create, trace, then destroy a given kprobe definition. See
Documentation/trace/kprobetrace.txt in the Linux kernel source for the
syntax of a kprobe definition, and "kprobe -h" for examples. With this tool,
the probe alias is optional (it will become to kprobe: if not
specified).
WARNING: This uses dynamic tracing of kernel functions, and could cause
kernel panics or freezes, depending on the function traced. Test in a lab
environment, and know what you are doing, before use.
Also beware of feedback loops: tracing tcp functions over an ssh session,
or writing ext4 functions to an ext4 file system. For the former, tcp
trace data could be redirected to a file (as in the usage message). For
the latter, trace to the screen or a different file system.
SEE ALSO: functrace(8), which can perform basic tracing (event only) of
multiple kernel functions using wildcards.
Since this uses ftrace, only the root user can use this tool.
.SH REQUIREMENTS
FTRACE and KPROBES CONFIG, which you may already have enabled and available on
recent kernels.
.SH OPTIONS
.TP
\-F
Force. Trace despite warnings. By default the specified kernel function must
exist in the available_filter_functions file. This option overrides this check.
This might expose you to more unsafe functions, which could cause kernel
panics or freezes when traced.
.TP
\-d seconds
Set the duration of tracing, in seconds. Trace output will be buffered and
printed at the end. This also reduces overheads by buffering in-kernel,
instead of printing events as they occur.
The ftrace buffer has a fixed size per-CPU (see
/sys/kernel/debug/tracing/buffer_size_kb). If you think events are missing,
try increasing that size.
.TP
\-h
Print usage message.
.TP
\-H
Print column headers.
.TP
\-s
Print kernel stack traces after each event.
.TP
\-v
Show the kprobe format file only (do not trace), identifying possible variables
for use in a custom filter.
.TP
\-p PID
Only trace kernel functions when this process ID is on-CPU.
.TP
kprobe_definition
A full kprobe definition, as documented by Documentation/trace/kprobetrace.txt
in the Linux kernel source. Note that the probe alias name is optional with
kprobe(8), and if not specified, the tracepoint will become kprobe:.
See the EXAMPLES section.
.TP
filter
An ftrace filter definition.
.SH EXAMPLES
These examples may need modification to match your kernel version's function
names and platform's register usage. If using platform specific registers
becomes too painful in practice, consider a kernel debuginfo-based tracer,
which can trace variables names instead. For example, perf_events.
.TP
Trace do_sys_open() entry:
#
.B kprobe p:do_sys_open
.TP
Trace do_sys_open() return:
#
.B kprobe r:do_sys_open
.TP
Trace do_sys_open() return value:
#
.B kprobe 'r:do_sys_open $retval'
.TP
Trace do_sys_open() return value, with a custom probe alias "myopen":
#
.B kprobe 'r:myopen do_sys_open $retval'
.TP
Trace do_sys_open() file mode:
#
.B kprobe 'p:myopen do_sys_open mode=%cx:u16'
.TP
Trace do_sys_open() file mode for PID 81:
#
.B kprobe -p 81 'p:myopen do_sys_open mode=%cx:u16'
.TP
Trace do_sys_open() with filename string:
#
.B kprobe 'p:myopen do_sys_open filename=+0(%si):string'
.TP
Trace do_sys_open() for filenames ending in "stat":
#
.B kprobe 'p:myopen do_sys_open fn=+0(%si):string' 'fn ~ """*stat"""'
.TP
Trace tcp_retransmit_skb() and show kernel stack traces, showing the path that led to it (can help explain why):
#
.B kprobe \-s 'p:myprobe tcp_retransmit_skb'
.SH FIELDS
The output format depends on the kernel version, and headings can be printed
using \-H. The format is the same as the ftrace function trace format, described
in the kernel source under Documentation/trace/ftrace.txt.
Typical fields are:
.TP
TASK-PID
The process name (which could include dashes), a dash, and the process ID.
.TP
CPU#
The CPU ID, in brackets.
.TP
||||
Kernel state flags. For example, on Linux 3.16 these are for irqs-off,
need-resched, hardirq/softirq, and preempt-depth.
.TP
TIMESTAMP
Time of event, in seconds.
.TP
FUNCTION
Kernel function name.
.SH OVERHEAD
This can generate a lot of trace data quickly, depending on the
frequency of the traced events. Such data will cause performance overheads.
This also works without buffering by default, printing function events
as they happen (uses trace_pipe), context switching and consuming CPU to do
so. If needed, you can try the "\-d secs" option, which buffers events
instead, reducing overhead. If you think the buffer option is losing events,
try increasing the buffer size (buffer_size_kb).
It's a good idea to use funccount(8) first, which is lower overhead, to
help you select which functions you may want to trace using kprobe(8).
.SH SOURCE
This is from the perf-tools collection:
.IP
https://github.com/brendangregg/perf-tools
.PP
Also look under the examples directory for a text file containing example
usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
functrace(8), funccount(8)
perf-tools-unstable-0.0.1~20150130+git85414b0/man/man8/opensnoop.8 0000664 0000000 0000000 00000006055 12542613570 0023600 0 ustar 00root root 0000000 0000000 .TH opensnoop 8 "2014-07-20" "USER COMMANDS"
.SH NAME
opensnoop \- trace open() syscalls with file details. Uses Linux ftrace.
.SH SYNOPSIS
.B opensnoop
[\-htx] [\-d secs] [\-p pid] [\-n name] [filename]
.SH DESCRIPTION
This traces open() syscalls, showing the file name (pathname) and returned file
descriptor number (or \-1, for error).
This implementation is designed to work on older kernel versions, and without
kernel debuginfo. It works by dynamic tracing of the return value of getname()
as a string, and associating it with the following open() syscall return.
This approach is kernel version specific, and may not work on your version.
It is a workaround, and proof of concept for ftrace, until more kernel tracing
functionality is available.
Since this uses ftrace, only the root user can use this tool.
.SH REQUIREMENTS
FTRACE and KPROBE CONFIG, the syscalls:sys_exit_open tracepoint, and the
getname() kernel function. You may already have these enabled and available
on recent Linux kernels. And awk.
.SH OPTIONS
.TP
\-d secs
Set the duration of tracing, in seconds. Trace output will be buffered and
printed at the end. This also reduces overheads by buffering in-kernel,
instead of printing events as they occur.
The ftrace buffer has a fixed size per-CPU (see
/sys/kernel/debug/tracing/buffer_size_kb). If you think events are missing,
try increasing that size.
.TP
\-h
Print usage message.
.TP
\-n name
Only show processes matching this process name. Partial strings and regular
expressions are allowed. This is post-filtered using awk.
.TP
\-p PID
Only trace this process ID. This is filtered in-kernel.
.TP
\-t
Include timestamps, in seconds.
.TP
\-x
Only print failed open()s.
.TP
filename
Only show open()s which match this filename. Partial strings and regular
expressions are allowed. This is post-filtered using awk.
.SH EXAMPLES
.TP
Trace all open() syscalls with details:
#
.B opensnoop
.TP
Only trace open()s for PID 81:
#
.B opensnoop -p 81
.TP
Trace failed open() syscalls:
#
.B opensnoop -x
.TP
Trace open() syscalls for filenames containing "conf":
#
.B opensnoop conf
.TP
Trace open() syscalls for filenames ending in "log":
#
.B opensnoop 'log$'
.SH FIELDS
.TP
TIMEs
Time of open() completion, in units of seconds.
.TP
COMM
Process name (if known).
.TP
PID
Process ID.
.TP
FD
File descriptor. If this is a successful open, the file descriptor number is
shown. If this is unsuccessful, -1 is shown. Numbers beginning with 0x are
hexadecimal.
.TP
FILE
Filename (pathname) used by the open() syscall.
.SH OVERHEAD
This reads and open() syscalls and getname() kernel functions as they occur.
For high rates of opens (> 500/s), the overhead may begin to be measurable.
Test yourself. You can use the \-d mode to buffer output, reducing overheads.
.SH SOURCE
This is from the perf-tools collection.
.IP
https://github.com/brendangregg/perf-tools
.PP
Also look under the examples directory for a text file containing example
usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
execsnoop(8), strace(1)
perf-tools-unstable-0.0.1~20150130+git85414b0/man/man8/perf-stat-hist.8 0000664 0000000 0000000 00000007520 12542613570 0024430 0 ustar 00root root 0000000 0000000 .TH perf-stat-hist 8 "2014-07-07" "USER COMMANDS"
.SH NAME
perf-stat-hist \- histogram summary of tracepoint values. Uses Linux perf_events.
.SH SYNOPSIS
.B perf-stat-hist
[-h] [-b buckets|-P power] [-m max] tracepoint variable [seconds]
.SH DESCRIPTION
This is a proof-of-concept showing in-kernel histograms using Linux perf_events
(aka the "perf" command), on older kernels where perf_events does not have
this native capability.
These histograms show the distribution of variable, allowing details
including multiple modes and outliers to be studied.
This uses multiple counting tracepoints with different filters, one for each
histogram bucket. While this is summarized in-kernel, the use of multiple
tracepoints does add addiitonal overhead. Hopefully, in the
future this this functionality will be provided in an efficient way from
perf_events itself, at which point this tool can be deleted or rewritten.
.SH REQUIREMENTS
Linux perf_events: add linux-tools-common, run "perf", then add any additional
packages it requests. Also uses awk.
.SH OPTIONS
.TP
\-h
Usage message.
.TP
\-b buckets
Specify a list of bucket points for the histogram as a string (eg, "10 500
1000"). The histogram will include buckets for less-than the minimum, and
greater-than-or-equal-to the maximum. If a single value is specified, two
statistics only are gathered: for less-than and for greater-than-or-equal-to.
The overhead is relative to the number of buckets, so only specifying a
single value costs the lowest overhead.
.TP
\-P power
Power for power-of histogram. By default, a power-of-4 histogram is created.
This and the \-b option are exclusive.
.TP
\-m max
Max value for power-of histograms.
.TP
tracepoint
Tracepoint specification. Eg, syscalls:sys_enter_read.
.TP
variable
The tracepoint variable name to summarize. To see what are available, cat the
format file under /sys/kernel/debug/tracing/events/*/*/format.
.TP
seconds
Number of seconds to trace. If not specified, this runs until Ctrl-C.
.SH EXAMPLES
.TP
Trace read() syscalls until Ctrl-C, and show histogram of requested size:
#
.B perf\-stat\-hist syscalls:sys_enter_read count
.TP
Trace read() syscall completions until Ctrl-C, and show histogram of successful returned size:
#
.B perf\-stat\-hist syscalls:sys_exit_read ret
.TP
Trace read() return sizes for 10 seconds, showing histogram:
#
.B perf\-stat\-hist syscalls:sys_exit_read ret 10
.TP
Trace network transmits until Ctrl-C, and show histogram of packet size:
#
.B perf\-stat\-hist net:net_dev_xmit len
.TP
Trace read() return sizes, using a power-of-10 histogram:
.B perf\-stat\-hist \-P 10 syscalls:sys_exit_read ret
.TP
Trace read() return sizes, using a power-of-2 histogram, and a max of 1024:
.B perf\-stat\-hist \-P 2 \-m 1024 syscalls:sys_exit_read ret
.TP
Trace read() return sizes, using the specified bucket points:
.B perf\-stat\-hist \-b """10 50 100 5000""" syscalls:sys_exit_read ret
.TP
Trace read() return sizes, and bifurcate statistics by the value 10:
.B perf-stat-hist \-b 10 syscalls:sys_exit_read ret
.SH FIELDS
.TP
Range
Range of the histogram bucket, in units of the variable specified.
.TP
Count
Number of occurrences (tracepoint events) of the variable in this range.
.TP
Distribution
ASCII histogram representation of the Count column.
.SH OVERHEAD
While the counts are performed in-kernel, there is one tracepoint used per
histogram bucket, so the overheads are higher than usual (relative to the
number of buckets) than function counting using perf stat. The lowest
overhead is when \-b is used to specify one bucket only, bifurcating
statistics.
.SH SOURCE
This is from the perf-tools collection.
.IP
https://github.com/brendangregg/perf-tools
.PP
Also look under the examples directory for a text file containing example
usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
perf(1)
perf-tools-unstable-0.0.1~20150130+git85414b0/man/man8/reset-ftrace.8 0000664 0000000 0000000 00000002414 12542613570 0024137 0 ustar 00root root 0000000 0000000 .TH reset-ftrace 8 "2014-07-07" "USER COMMANDS"
.SH NAME
reset-ftrace \- reset state of ftrace, disabling all tracing. Written for Linux ftrace.
.SH SYNOPSIS
.B reset-ftrace
[\-fhq]
.SH DESCRIPTION
This resets the state of various ftrace files, and shows the before and after
state.
This may only be of use to ftrace hackers who, in the process of developing
ftrace software, often get the subsystem into a partially active state, and
would like a quick way to reset state. Check the end of this script for the
actually files reset, and add more if you need.
WARNING: Only use this if and when you are sure that there are no other active
ftrace sessions on your system, as otherwise it will kill them.
.SH REQUIREMENTS
FTRACE CONFIG.
.SH OPTIONS
.TP
\-f
Force. If the ftrace lock file exists (/var/tmp/.ftrace-lock), delete it.
.TP
\-h
Print usage message.
.TP
\-q
Quiet. Run, but don't print any output.
.SH EXAMPLES
.TP
Reset various ftrace files:
#
.B reset-ftrace
.SH SOURCE
This is from the perf-tools collection.
.IP
https://github.com/brendangregg/perf-tools
.PP
Also look under the examples directory for a text file containing example
usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
perf(1)
perf-tools-unstable-0.0.1~20150130+git85414b0/man/man8/syscount.8 0000664 0000000 0000000 00000005047 12542613570 0023447 0 ustar 00root root 0000000 0000000 .TH syscount 8 "2014-07-07" "USER COMMANDS"
.SH NAME
syscount \- count system calls. Uses Linux perf_events.
.SH SYNOPSIS
.B syscount
[\-chv] [\-t top] {\-p PID|\-d seconds|command}
.SH DESCRIPTION
This is a proof-of-concept using perf_events capabilities for older kernel
versions, that lack custom in-kernel aggregations. Once they exist, this
script can be substantially rewritten and improved (lower overhead).
.SH REQUIREMENTS
Linux perf_events: add linux-tools-common, run "perf", then
add any additional packages it requests. Also needs awk.
.SH OPTIONS
.TP
\-c
Show counts by syscall name. This mode (without -v) uses in-kernel counts, which
have lower overhead than the default mode.
.TP
\-h
Usage message.
.TP
\-v
Verbose: include PID.
.TP
\-p PID
Trace this process ID only.
.TP
\-d seconds
Duration of trace in seconds.
.TP
command
Run and trace this command.
.SH EXAMPLES
.TP
Trace and summarize syscalls by process name:
#
.B syscount
.TP
Trace and summarize syscalls by syscall name (lower overhead):
#
.B syscount \-c
.TP
Trace for 5 seconds, showing by process name:
#
.B syscount \-d 5
.TP
Trace PID 932 only, and show by syscall name (lower overhead):
#
.B syscount \-cp 923
.TP
Execute the """ls""" command, and show by syscall name:
#
.B syscount -c ls
.SH FIELDS
.TP
PID
Process ID.
.TP
COMM
Process command name.
.TP
SYSCALL
Syscall name.
.TP
COUNT
Number of syscalls during tracing.
.SH OVERHEAD
Modes that report syscall names only (\-c, \-cp PID, \-cd secs) have
lower overhead, since they use in-kernel counts. Other modes which report
process IDs (\-cv) or process names (default) create a perf.data file for
post processing, and you will see messages about it doing this. Beware of
the file size (test for short durations, or use \-c to see counts based on
in-kernel counters), and gauge overheads based on the perf.data size.
Note that this script delibrately does not pipe perf record into
perf script, which would avoid perf.data, because it can create a feedback
loop where the perf script syscalls are recorded. Hopefully there will be a
fix for this in a later perf version, so perf.data can be skipped, or other
kernel features to aggregate by process name in-kernel directly (eg, via
eBPF, ktap, or SystemTap).
.SH SOURCE
This is from the perf-tools collection.
.IP
https://github.com/brendangregg/perf-tools
.PP
Also look under the examples directory for a text file containing example
usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
iosnoop(8), iolatency(8), iostat(1)
perf-tools-unstable-0.0.1~20150130+git85414b0/man/man8/tcpretrans.8 0000664 0000000 0000000 00000005325 12542613570 0023744 0 ustar 00root root 0000000 0000000 .TH tcpretrans 8 "2014-07-31" "USER COMMANDS"
.SH NAME
tcpretrans \- show TCP retransmits, with address and other details. Uses Linux ftrace.
.SH SYNOPSIS
.B tcpretrans
[\-hsp]
.SH DESCRIPTION
This traces TCP retransmits that are sent by the system tcpretrans is executed
from, showing address, port, and TCP state information,
and sometimes the PID (although usually not, since retransmits are usually
sent by the kernel on timeout events). To keep overhead low, only
tcp_retransmit_skb() kernel calls are traced (this does not trace every packet).
This was written as a proof of concept for ftrace, for older Linux systems,
and without kernel debuginfo. It uses dynamic tracing of tcp_retransmit_skb(),
and reads /proc/net/tcp for socket details. Its use of dynamic tracing and
CPU registers is an unstable platform-specific workaround, and may require
modifications to work on different kernels and platforms. This would be better
written using a tracer such as SystemTap, and will likely be rewritten in the
future when certain tracing features are added to the Linux kernel.
When \-l is used, this also uses dynamic tracing of tcp_send_loss_probe() and
a register.
Currently only IPv4 is supported, on x86_64. If you try this on a different
architecture, you'll likely need to adjust the register locations (search
for %di).
Since this uses ftrace, only the root user can use this tool.
.SH REQUIREMENTS
FTRACE and KPROBE CONFIG, tcp_retransmit_skb() kernel function.
You may have these already have these on recent kernels. And Perl.
TCP tail loss probes were added in Linux 3.10.
.SH OPTIONS
.TP
\-h
Print usage message.
.TP
\-s
Include kernel stack traces.
.TP
\-l
Include TCP tail loss probes.
.SH EXAMPLES
.TP
Trace TCP retransmits
#
.B tcpretrans
.TP
TIME
Time of retransmit (may be rounded up to the nearest second).
.TP
PID
Process ID that was on-CPU. This is less useful than it might sound, as it
may usually be 0, for the kernel, for timer-based retransmits.
.TP
LADDR
Local address.
.TP
LPORT
Local port.
.TP
\-\-
Packet type: "R>" for retransmit, and "L>" for tail loss probe.
.TP
RADDR
Remote address.
.TP
RPORT
Remote port.
.TP
STATE
TCP session state.
.SH OVERHEAD
The CPU overhead is relative to the rate of TCP retransmits, and is
designed to be low as this does not examine every packet. Once per second the
/proc/net/tcp file is read, and a buffer of retransmit trace events is
retrieved from the kernel and processed.
.SH SOURCE
This is from the perf-tools collection.
.IP
https://github.com/brendangregg/perf-tools
.PP
Also look under the examples directory for a text file containing example
usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
tcpdump(1)
perf-tools-unstable-0.0.1~20150130+git85414b0/man/man8/tpoint.8 0000664 0000000 0000000 00000007727 12542613570 0023104 0 ustar 00root root 0000000 0000000 .TH tpoint 8 "2014-07-20" "USER COMMANDS"
.SH NAME
tpoint \- trace a given tracepoint. Static tracing. Uses Linux ftrace.
.SH SYNOPSIS
.B tpoint
[\-hHsv] [\-d secs] [\-p PID] tracepoint [filter]
.B tpoint
\-l
.SH DESCRIPTION
This will enable a given tracepoint, print events, then disable the tracepoint
when the program ends. This is like a simple version of the "perf" command for
printing live tracepoint events only. Wildcards are currently not supported.
If for any reason tpoint(8) is insufficient, use the more powerful perf
command for tracing tracepoints instead.
Beware of feedback loops: tracing tcp functions over an ssh session,
or writing ext4 events to an ext4 file system. For the former, tcp
trace data could be redirected to a file (as in the usage message). For
the latter, trace to the screen or a different file system.
Since this uses ftrace, only the root user can use this tool.
.SH REQUIREMENTS
FTRACE CONFIG and tracepoints, which you may already have enabled and available
on recent kernels.
.SH OPTIONS
.TP
\-d seconds
Set the duration of tracing, in seconds. Trace output will be buffered and
printed at the end. This also reduces overheads by buffering in-kernel,
instead of printing events as they occur.
The ftrace buffer has a fixed size per-CPU (see
/sys/kernel/debug/tracing/buffer_size_kb). If you think events are missing,
try increasing that size.
.TP
\-h
Print usage message.
.TP
\-H
Print column headers.
.TP
\-l
List tracepoints only.
.TP
\-s
Print kernel stack traces after each event.
.TP
\-v
Show the tpoint format file only (do not trace), identifying possible variables
for use in a custom filter.
.TP
\-p PID
Only trace kernel functions when this process ID is on-CPU.
.TP
tracepoint
A tracepoint name. Eg, block:block_rq_issue. See the EXAMPLES section.
.TP
filter
An ftrace filter definition.
.SH EXAMPLES
.TP
List tracepoints containing "open":
#
.B tpoint -l | grep open
.TP
Trace open() syscall entry:
#
.B tpoint syscalls:sys_enter_open
.TP
Trace open() syscall entry, showing column headers:
#
.B tpoint -H syscalls:sys_enter_open
.TP
Trace block I/O issue:
#
.B tpoint block:block_rq_issue
.TP
Trace block I/O issue with stack traces:
#
.B tpoint \-s block:block_rq_issue
.SH FIELDS
The output format depends on the kernel version, and headings can be printed
using \-H. The format is the same as the ftrace function trace format, described
in the kernel source under Documentation/trace/ftrace.txt.
Typical fields are:
.TP
TASK-PID
The process name (which could include dashes), a dash, and the process ID.
.TP
CPU#
The CPU ID, in brackets.
.TP
||||
Kernel state flags. For example, on Linux 3.16 these are for irqs-off,
need-resched, hardirq/softirq, and preempt-depth.
.TP
TIMESTAMP
Time of event, in seconds.
.TP
FUNCTION
Kernel function name.
.SH OVERHEAD
This can generate a lot of trace data quickly, depending on the
frequency of the traced events. Such data will cause performance overheads.
This also works without buffering by default, printing function events
as they happen (uses trace_pipe), context switching and consuming CPU to do
so. If needed, you can try the "\-d secs" option, which buffers events
instead, reducing overhead. If you think the buffer option is losing events,
try increasing the buffer size (buffer_size_kb).
Before using tpoint(8), you can use perf_events to count the rate of events
for the tracepoint of interest, to gauge overhead. For example:
.B perf stat \-e block:block_rq_issue \-a sleep 5
That counts the occurrences of the block:block_rq_issue tracepoint for
5 seconds.
Also consider using perf_events, which manages buffers differently and more
efficiently, for higher frequency applications.
.SH SOURCE
This is from the perf-tools collection:
.IP
https://github.com/brendangregg/perf-tools
.PP
Also look under the examples directory for a text file containing example
usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
functrace(8), funccount(8), perf(1)
perf-tools-unstable-0.0.1~20150130+git85414b0/misc/ 0000775 0000000 0000000 00000000000 12542613570 0020776 5 ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/misc/perf-stat-hist 0000775 0000000 0000000 00000014361 12542613570 0023603 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# perf-stat-hist - perf_events stat histogram hack.
# Written using Linux perf_events (aka "perf").
#
# This is a proof-of-concept showing in-kernel histogram summaries of a
# tracepoint variable.
#
# USAGE: perf-stat-hist [-h] [-b buckets|-P power] [-m max] tracepoint
# variable [seconds]
#
# Run "perf-stat-hist -h" for full usage.
#
# This uses multiple counting tracepoints with different filters, one for each
# histogram bucket. While this is summarized in-kernel, the use of multiple
# tracepoints does add addiitonal overhead, which is more evident if you change
# the power-of size from 4 to 2 (which creates more buckets). Hopefully, in the
# future this this functionality will be provided in an efficient way from
# perf_events itself, at which point this tool can be rewritten.
#
# From perf-tools: https://github.com/brendangregg/perf-tools
#
# COPYRIGHT: Copyright (c) 2014 Brendan Gregg.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
# (http://www.gnu.org/copyleft/gpl.html)
#
# 30-Jun-2014 Brendan Gregg Created this.
opt_buckets=0; buckets=; opt_power=0; power=4; opt_max=0; max=$((1024 * 1024))
duration=0; debug=0
trap ':' INT QUIT TERM PIPE HUP
function usage {
cat <<-END >&2
USAGE: perf-stat-hist [-h] [-b buckets|-P power] [-m max] tracepoint
variable [seconds]
-b buckets # specify histogram bucket points
-P power # power-of (default is 4)
-m max # max value for power-of
-h # this usage message
eg,
perf-stat-hist syscalls:sys_enter_read count 5
# read() request histogram, 5 seconds
perf-stat-hist syscalls:sys_exit_read ret 5
# read() return histogram, 5 seconds
perf-stat-hist -P 10 syscalls:sys_exit_read ret 5
# ... use power-of-10
perf-stat-hist -P 2 -m 1024 syscalls:sys_exit_read ret 5
# ... use power-of-2, max 1024
perf-stat-hist -b "10 50 100 500" syscalls:sys_exit_read ret 5
# ... histogram based on these bucket ranges
perf-stat-hist -b 10 syscalls:sys_exit_read ret 5
# ... bifurcate by the value 10 (lowest overhead)
See the man page and example file for more info.
END
exit
}
function die {
echo >&2 "$@"
exit 1
}
### process options
while getopts b:hm:P: opt
do
case $opt in
b) opt_buckets=1; buckets=($OPTARG) ;;
P) opt_power=1; power=$OPTARG ;;
m) opt_max=1; max=$OPTARG ;;
h|?) usage ;;
esac
done
shift $(( $OPTIND - 1 ))
(( $# < 2 )) && usage
tpoint=$1 # tracepoint
var=$2 # variable for histogram
duration=${3}
### option logic
(( opt_buckets && opt_power )) && die "ERROR: use either -b or -P"
(( opt_power && power < 2 )) && die "ERROR: -P power must be 2 or higher"
### check that tracepoint exists
if ! grep "^$tpoint\$" /sys/kernel/debug/tracing/available_events > /dev/null
then
echo >&2 "ERROR: tracepoint \"$tpoint\" not found. Exiting..."
[[ "$USER" != "root" ]] && echo >&2 "Not root user?"
exit 1
fi
### auto build power-of buckets
if (( !opt_buckets )); then
b=0
s=1
while (( s <= max )); do
b="$b $s"
(( s *= power ))
done
buckets=($b)
fi
### build list of tracepoints and filters for each histogram bucket
max=${buckets[${#buckets[@]} - 1]} # last element
((max_i = ${#buckets[*]} - 1))
tpoints="-e $tpoint --filter \"$var < ${buckets[0]}\""
awkarray=
i=0
while (( i < max_i )); do
if (( i && ${buckets[$i]} <= ${buckets[$i - 1]} )); then
die "ERROR: bucket list must increase in size."
fi
tpoints="$tpoints -e $tpoint --filter \"$var >= ${buckets[$i]} && "
tpoints="$tpoints $var < ${buckets[$i + 1]}\""
awkarray="$awkarray buckets[$i]=${buckets[$i]};"
(( i++ ))
done
awkarray="$awkarray buckets[$max_i]=${buckets[$max_i]};"
tpoints="$tpoints -e $tpoint --filter \"$var >= ${buckets[$max_i]}\""
if (( debug )); then
echo buckets: ${buckets[*]}
echo tracepoints: $tpoints
echo awkarray: ${awkarray[*]}
fi
### prepare to run
if (( duration )); then
etext="for $duration seconds"
cmd="sleep $duration"
else
etext="until Ctrl-C"
cmd="sleep 999999"
fi
if (( opt_buckets )); then
echo "Tracing $tpoint, specified buckets, $etext..."
else
echo "Tracing $tpoint, power-of-$power, max $max, $etext..."
fi
### run perf
out="-o /dev/stdout" # a workaround needed in linux 3.2; not by 3.4.15
stat=$(eval perf stat $tpoints -a $out $cmd 2>&1)
if (( $? != 0 )); then
echo >&2 "ERROR running perf:"
echo >&2 "$stat"
exit
fi
if (( debug )); then
echo raw output:
echo "$stat"
echo
fi
### find max value for ASCII histogram
most=$(echo "$stat" | awk -v tpoint=$tpoint '
$2 == tpoint { gsub(/,/, ""); if ($1 > m) { m = $1 } }
END { print m }'
)
### process output
echo
echo "$stat" | awk -v tpoint=$tpoint -v max_i=$max_i -v most=$most '
function star(sval, smax, swidth) {
stars = ""
if (smax == 0) return ""
for (si = 0; si < (swidth * sval / smax); si++) {
stars = stars "#"
}
return stars
}
BEGIN {
'"$awkarray"'
printf(" %-15s: %-8s %s\n", "Range", "Count",
"Distribution")
}
/Performance counter stats/ { i = -1 }
# reverse order of rule set is important
{ ok = 0 }
$2 == tpoint { num = $1; gsub(/,/, "", num); ok = 1 }
ok && i >= max_i {
printf(" %10d -> %-10s: %-8s |%-38s|\n", buckets[i],
"", num, star(num, most, 38))
next
}
ok && i >= 0 && i < max_i {
printf(" %10d -> %-10d: %-8s |%-38s|\n", buckets[i],
buckets[i+1] - 1, num, star(num, most, 38))
i++
next
}
ok && i == -1 {
printf(" %10s -> %-10d: %-8s |%-38s|\n", "",
buckets[0] - 1, num, star(num, most, 38))
i++
}
'
perf-tools-unstable-0.0.1~20150130+git85414b0/net/ 0000775 0000000 0000000 00000000000 12542613570 0020631 5 ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/net/tcpretrans 0000775 0000000 0000000 00000021434 12542613570 0022750 0 ustar 00root root 0000000 0000000 #!/usr/bin/perl
#
# tcpretrans - show TCP retransmts, with address and other details.
# Written using Linux ftrace.
#
# This traces TCP retransmits, showing address, port, and TCP state information,
# and sometimes the PID (although usually not, since retransmits are usually
# sent by the kernel on timeouts). To keep overhead low, only
# tcp_retransmit_skb() calls are traced (this does not trace every packet).
#
# USAGE: ./tcpretrans [-hls]
#
# REQUIREMENTS: FTRACE and KPROBE CONFIG, tcp_retransmit_skb() kernel function,
# and tcp_send_loss_probe() when -l is used. You may have these already have
# these on recent kernels. And Perl.
#
# This was written as a proof of concept for ftrace, for older Linux systems,
# and without kernel debuginfo. It uses dynamic tracing of tcp_retransmit_skb(),
# and reads /proc/net/tcp for socket details. Its use of dynamic tracing and
# CPU registers is an unstable platform-specific workaround, and may require
# modifications to work on different kernels and platforms. This would be better
# written using a tracer such as SystemTap, and will likely be rewritten in the
# future when certain tracing features are added to the Linux kernel.
#
# When -l is used, this also uses dynamic tracing of tcp_send_loss_probe() and
# a register.
#
# Currently only IPv4 is supported, on x86_64. If you try this on a different
# architecture, you'll likely need to adjust the register locations (search
# for %di).
#
# OVERHEAD: The CPU overhead is relative to the rate of TCP retransmits, and is
# designed to be low as this does not examine every packet. Once per second the
# /proc/net/tcp file is read, and a buffer of retransmit trace events is
# retrieved from the kernel and processed.
#
# From perf-tools: https://github.com/brendangregg/perf-tools
#
# See the tcpretrans(8) man page (in perf-tools) for more info.
#
# COPYRIGHT: Copyright (c) 2014 Brendan Gregg.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
# (http://www.gnu.org/copyleft/gpl.html)
#
# 28-Jul-2014 Brendan Gregg Created this.
use strict;
use warnings;
use POSIX qw(strftime);
use Getopt::Long;
my $tracing = "/sys/kernel/debug/tracing";
my $flock = "/var/tmp/.ftrace-lock";
my $interval = 1;
local $SIG{INT} = \&cleanup;
local $SIG{QUIT} = \&cleanup;
local $SIG{TERM} = \&cleanup;
local $SIG{PIPE} = \&cleanup;
local $SIG{HUP} = \&cleanup;
$| = 1;
### options
my ($help, $stacks, $tlp);
GetOptions("help|h" => \$help,
"stacks|s" => \$stacks,
"tlp|l" => \$tlp)
or usage();
usage() if $help;
sub usage {
print STDERR "USAGE: tcpretrans [-hls]\n";
print STDERR " -h # help message\n";
print STDERR " -l # trace TCP tail loss probes\n";
print STDERR " -s # print stack traces\n";
print STDERR " eg,\n";
print STDERR " tcpretrans # trace TCP retransmits\n";
exit;
}
# delete lock and die
sub ldie {
unlink $flock;
die @_;
}
# end tracing (silently) and die
sub edie {
print STDERR "@_\n";
close STDOUT;
close STDERR;
cleanup();
}
sub writeto {
my ($string, $file) = @_;
open FILE, ">$file" or return 0;
print FILE $string or return 0;
close FILE or return 0;
}
sub appendto {
my ($string, $file) = @_;
open FILE, ">>$file" or return 0;
print FILE $string or return 0;
close FILE or return 0;
}
# kprobe functions
sub create_kprobe {
my ($kname, $kval) = @_;
appendto "p:$kname $kval", "kprobe_events" or return 0;
}
sub enable_kprobe {
my ($kname) = @_;
writeto "1", "events/kprobes/$kname/enable" or return 0;
}
sub remove_kprobe {
my ($kname) = @_;
writeto "0", "events/kprobes/$kname/enable" or return 0;
appendto "-:$kname", "kprobe_events" or return 0;
}
# tcp socket cache
my %tcp;
sub cache_tcp {
undef %tcp;
open(TCP, "/proc/net/tcp") or ldie "ERROR: reading /proc/net/tcp.";
while () {
next if /^ *sl/;
my ($sl, $local_address, $rem_address, $st, $tx_rx, $tr_tm,
$retrnsmt, $uid, $timeout, $inode, $jf, $sk) = split;
$sk =~ s/^0x//;
$tcp{$sk}{laddr} = $local_address;
$tcp{$sk}{raddr} = $rem_address;
$tcp{$sk}{state} = $st;
}
close TCP;
}
my @tcpstate;
sub map_tcp_states {
push @tcpstate, "NULL";
for () {
chomp;
s/.*TCP_//;
s/[, ].*$//;
push @tcpstate, $_;
}
}
# /proc/net/tcp hex addr to dotted quad decimal
sub inet_h2a {
my ($haddr) = @_;
my @addr = ();
for my $num ($haddr =~ /(..)(..)(..)(..)/) {
unshift @addr, hex($num);
}
return join(".", @addr);
}
### check permissions
chdir "$tracing" or die "ERROR: accessing tracing. Root? Kernel has FTRACE?" .
"\ndebugfs mounted? (mount -t debugfs debugfs /sys/kernel/debug)";
### ftrace lock
if (-e $flock) {
open FLOCK, $flock; my $fpid = ; chomp $fpid; close FLOCK;
die "ERROR: ftrace may be in use by PID $fpid ($flock)";
}
writeto "$$", $flock or die "ERROR: unable to write $flock.";
#
# Setup and begin tracing.
# Use of ldie() and edie() ensures that if an error is encountered, the
# kernel is not left in a partially configured state.
#
writeto "nop", "current_tracer" or ldie "ERROR: disabling current_tracer.";
my $kname_rtr = "tcpretrans_tcp_retransmit_skb";
my $kname_tlp = "tcpretrans_tcp_send_loss_probe";
create_kprobe $kname_rtr, "tcp_retransmit_skb sk=%di" or
ldie "ERROR: creating kprobe for tcp_retransmit_skb().";;
if ($tlp) {
create_kprobe $kname_tlp, "tcp_send_loss_probe sk=%di" or
edie "ERROR: creating kprobe for tcp_send_loss_probe(). " .
"Older kernel version?";
}
if ($stacks) {
writeto "1", "options/stacktrace" or print STDERR "WARNING: " .
"unable to enable stacktraces.\n";
}
enable_kprobe $kname_rtr or edie "ERROR: enabling $kname_rtr probe.";
if ($tlp) {
enable_kprobe $kname_tlp or edie "ERROR: enabling $kname_tlp probe.";
}
map_tcp_states();
printf "%-8s %-6s %-20s -- %-20s %-12s\n", "TIME", "PID", "LADDR:LPORT",
"RADDR:RPORT", "STATE";
#
# Read and print event data. This loop waits one second then reads the buffered
# trace data, then caches /proc/net/tcp, then iterates over the buffered trace
# data using the cached state. While this minimizes CPU overheads, it only
# works because sockets that are retransmitting are usually long lived, and
# remain in /proc/net/tcp for at least our sleep interval.
#
while (1) {
sleep $interval;
# buffer trace data
open TPIPE, "trace" or edie "ERROR: opening trace_pipe.";
my @trace = ();
while () {
next if /^#/;
push @trace, $_;
}
close TPIPE;
writeto "0", "trace" or edie "ERROR: clearing trace";
# cache /proc/net/tcp state
if (scalar @trace) {
cache_tcp();
}
# process and print events
for (@trace) {
if ($stacks && /^ *=>/) {
print $_;
next;
}
my ($taskpid, $rest) = split ' ', $_, 2;
my ($task, $pid) = $taskpid =~ /(.*)-(\d+)/;
my ($skp) = $rest =~ /sk=([0-9a-fx]*)/;
next unless defined $skp and $skp ne "";
$skp =~ s/^0x//;
my ($laddr, $lport, $raddr, $rport, $state);
if (defined $tcp{$skp}) {
# convert /proc/net/tcp hex to dotted quads
my ($hladdr, $hlport) = split /:/, $tcp{$skp}{laddr};
my ($hraddr, $hrport) = split /:/, $tcp{$skp}{raddr};
$laddr = inet_h2a($hladdr);
$raddr = inet_h2a($hraddr);
$lport = hex($hlport);
$rport = hex($hrport);
$state = $tcpstate[hex($tcp{$skp}{state})];
} else {
# socket closed too quickly
($laddr, $raddr) = ("-", "-");
($lport, $rport) = ("-", "-");
$state = "-";
}
my $now = strftime "%H:%M:%S", localtime;
printf "%-8s %-6s %-20s %s> %-20s %-12s\n", $now, $pid,
"$laddr:$lport", $rest =~ /$kname_tlp/ ? "L" : "R",
"$raddr:$rport", $state,
}
}
### end tracing
cleanup();
sub cleanup {
print "\nEnding tracing...\n";
close TPIPE;
if ($stacks) {
writeto "0", "options/stacktrace" or print STDERR "WARNING: " .
"unable to disable stacktraces.\n";
}
remove_kprobe $kname_rtr
or print STDERR "ERROR: removing kprobe $kname_rtr\n";
if ($tlp) {
remove_kprobe $kname_tlp
or print STDERR "ERROR: removing kprobe $kname_tlp\n";
}
writeto "", "trace";
unlink $flock;
exit;
}
# from /usr/include/netinet/tcp.h:
__DATA__
TCP_ESTABLISHED = 1,
TCP_SYN_SENT,
TCP_SYN_RECV,
TCP_FIN_WAIT1,
TCP_FIN_WAIT2,
TCP_TIME_WAIT,
TCP_CLOSE,
TCP_CLOSE_WAIT,
TCP_LAST_ACK,
TCP_LISTEN,
TCP_CLOSING /* now a valid state */
perf-tools-unstable-0.0.1~20150130+git85414b0/opensnoop 0000775 0000000 0000000 00000017055 12542613570 0022021 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# opensnoop - trace open() syscalls with file details.
# Written using Linux ftrace.
#
# This traces open() syscalls, showing the file name and returned file
# descriptor number (or -1, for error).
#
# This implementation is designed to work on older kernel versions, and without
# kernel debuginfo. It works by dynamic tracing of the return value of getname()
# as a string, and associating it with the following open() syscall return.
# This approach is kernel version specific, and may not work on your version.
# It is a workaround, and proof of concept for ftrace, until more kernel tracing
# functionality is available.
#
# USAGE: ./opensnoop [-htx] [-d secs] [-p pid] [-n name] [filename]
#
# Run "opensnoop -h" for full usage.
#
# REQUIREMENTS: FTRACE and KPROBE CONFIG, syscalls:sys_exit_open tracepoint,
# getname() kernel function (you may already have these on recent kernels),
# and awk.
#
# From perf-tools: https://github.com/brendangregg/perf-tools
#
# See the opensnoop(8) man page (in perf-tools) for more info.
#
# COPYRIGHT: Copyright (c) 2014 Brendan Gregg.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
# (http://www.gnu.org/copyleft/gpl.html)
#
# 20-Jul-2014 Brendan Gregg Created this.
### default variables
tracing=/sys/kernel/debug/tracing
flock=/var/tmp/.ftrace-lock; wroteflock=0
opt_duration=0; duration=; opt_name=0; name=; opt_pid=0; pid=; ftext=
opt_time=0; opt_fail=0; opt_file=0; file=
trap ':' INT QUIT TERM PIPE HUP # sends execution to end tracing section
function usage {
cat <<-END >&2
USAGE: opensnoop [-htx] [-d secs] [-p PID] [-n name] [filename]
-d seconds # trace duration, and use buffers
-n name # process name to match on I/O issue
-p PID # PID to match on I/O issue
-t # include time (seconds)
-x # only show failed opens
-h # this usage message
filename # match filename (partials, REs, ok)
eg,
opensnoop # watch open()s live (unbuffered)
opensnoop -d 1 # trace 1 sec (buffered)
opensnoop -p 181 # trace I/O issued by PID 181 only
opensnoop conf # trace filenames containing "conf"
opensnoop 'log$' # filenames ending in "log"
See the man page and example file for more info.
END
exit
}
function warn {
if ! eval "$@"; then
echo >&2 "WARNING: command failed \"$@\""
fi
}
function end {
# disable tracing
echo 2>/dev/null
echo "Ending tracing..." 2>/dev/null
cd $tracing
warn "echo 0 > events/kprobes/getnameprobe/enable"
warn "echo 0 > events/syscalls/sys_exit_open/enable"
if (( opt_pid )); then
warn "echo 0 > events/kprobes/getnameprobe/filter"
warn "echo 0 > events/syscalls/sys_exit_open/filter"
fi
warn "echo -:getnameprobe >> kprobe_events"
warn "echo > trace"
(( wroteflock )) && warn "rm $flock"
}
function die {
echo >&2 "$@"
exit 1
}
function edie {
# die with a quiet end()
echo >&2 "$@"
exec >/dev/null 2>&1
end
exit 1
}
### process options
while getopts d:hn:p:tx opt
do
case $opt in
d) opt_duration=1; duration=$OPTARG ;;
n) opt_name=1; name=$OPTARG ;;
p) opt_pid=1; pid=$OPTARG ;;
t) opt_time=1 ;;
x) opt_fail=1 ;;
h|?) usage ;;
esac
done
shift $(( $OPTIND - 1 ))
if (( $# )); then
opt_file=1
file=$1
shift
fi
(( $# )) && usage
### option logic
(( opt_pid && opt_name )) && die "ERROR: use either -p or -n."
(( opt_pid )) && ftext=" issued by PID $pid"
(( opt_name )) && ftext=" issued by process name \"$name\""
(( opt_file )) && ftext="$ftext for filenames containing \"$file\""
if (( opt_duration )); then
echo "Tracing open()s$ftext for $duration seconds (buffered)..."
else
echo "Tracing open()s$ftext. Ctrl-C to end."
fi
### select awk
(( opt_duration )) && use=mawk || use=gawk # workaround for mawk fflush()
[[ -x /usr/bin/$use ]] && awk=$use || awk=awk
### check permissions
cd $tracing || die "ERROR: accessing tracing. Root user? Kernel has FTRACE?
debugfs mounted? (mount -t debugfs debugfs /sys/kernel/debug)"
### ftrace lock
[[ -e $flock ]] && die "ERROR: ftrace may be in use by PID $(cat $flock) $flock"
echo $$ > $flock || die "ERROR: unable to write $flock."
wroteflock=1
### setup and begin tracing
echo nop > current_tracer
ver=$(uname -r)
if [[ "$ver" == 2.* || "$ver" == 3.[1-6].* ]]; then
# rval is char *
kprobe='r:getnameprobe getname +0($retval):string'
else
# rval is struct filename *
kprobe='r:getnameprobe getname +0(+0($retval)):string'
fi
if ! echo $kprobe >> kprobe_events; then
edie "ERROR: adding a kprobe for getname(). Exiting."
fi
if (( opt_pid )); then
if ! echo "common_pid==$pid" > events/kprobes/getnameprobe/filter || \
! echo "common_pid==$pid" > events/syscalls/sys_exit_open/filter
then
edie "ERROR: setting -p $pid. Exiting."
fi
fi
if ! echo 1 > events/kprobes/getnameprobe/enable; then
edie "ERROR: enabling kprobe for getname(). Exiting."
fi
if ! echo 1 > events/syscalls/sys_exit_open/enable; then
edie "ERROR: enabling open() exit tracepoint. Exiting."
fi
(( opt_time )) && printf "%-16s " "TIMEs"
printf "%-16.16s %-6s %4s %s\n" "COMM" "PID" "FD" "FILE"
#
# Determine output format. It may be one of the following (newest first):
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# TASK-PID CPU# TIMESTAMP FUNCTION
# To differentiate between them, the number of header fields is counted,
# and an offset set, to skip the extra column when needed.
#
offset=$($awk 'BEGIN { o = 0; }
$1 == "#" && $2 ~ /TASK/ && NF == 6 { o = 1; }
$2 ~ /TASK/ { print o; exit }' trace)
### print trace buffer
warn "echo > trace"
( if (( opt_duration )); then
# wait then dump buffer
sleep $duration
cat trace
else
# print buffer live
cat trace_pipe
fi ) | $awk -v o=$offset -v opt_name=$opt_name -v name=$name \
-v opt_duration=$opt_duration -v opt_time=$opt_time -v opt_fail=$opt_fail \
-v opt_file=$opt_file -v file=$file '
# common fields
$1 != "#" {
# task name can contain dashes and space
split($0, line, "-")
sub(/^[ \t\r\n]+/, "", line[1])
comm = line[1]
if (opt_name && match(comm, name) == 0)
next
sub(/ .*$/, "", line[2])
pid = line[2]
}
# do_sys_open()
$1 != "#" && $(5+o) ~ /do_sys_open/ {
#
# eg: ... (do_sys_open+0xc3/0x220 <- getname) arg1="file1"
#
filename = $NF
sub(/"$/, "", filename)
sub(/.*"/, "", filename)
lastfile[pid] = filename
}
# sys_open()
$1 != "#" && $(4+o) == "sys_open" {
filename = lastfile[pid]
delete lastfile[pid]
if (opt_file && filename !~ file)
next
rval = $NF
# matched failed as beginning with 0xfffff
if (opt_fail && rval !~ /0xfffff/)
next
if (rval ~ /0xfffff/)
rval = -1
if (opt_time) {
time = $(3+o); sub(":", "", time)
printf "%-16s ", time
}
printf "%-16.16s %-6s %4s %s\n", comm, pid, rval, filename
}
$0 ~ /LOST.*EVENTS/ { print "WARNING: " $0 > "/dev/stderr" }
'
### end tracing
end
perf-tools-unstable-0.0.1~20150130+git85414b0/syscount 0000775 0000000 0000000 00000014447 12542613570 0021672 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# syscount - count system calls.
# Written using Linux perf_events (aka "perf").
#
# This is a proof-of-concept using perf_events capabilities for older kernel
# versions, that lack custom in-kernel aggregations. Once they exist, this
# script can be substantially rewritten and improved (lower overhead).
#
# USAGE: syscount [-chv] [-t top] {-p PID|-d seconds|command}
#
# Run "syscount -h" for full usage.
#
# REQUIREMENTS: Linux perf_events: add linux-tools-common, run "perf", then
# add any additional packages it requests. Also needs awk.
#
# OVERHEADS: Modes that report syscall names only (-c, -cp PID, -cd secs) have
# lower overhead, since they use in-kernel counts. Other modes which report
# process IDs (-cv) or process names (default) create a perf.data file for
# post processing, and you will see messages about it doing this. Beware of
# the file size (test for short durations, or use -c to see counts based on
# in-kernel counters), and gauge overheads based on the perf.data size.
#
# Note that this script delibrately does not pipe perf record into
# perf script, which would avoid perf.data, because it can create a feedback
# loop where the perf script syscalls are recorded. Hopefully there will be a
# fix for this in a later perf version, so perf.data can be skipped, or other
# kernel features to aggregate by process name in-kernel directly (eg, via
# eBPF, ktap, or SystemTap).
#
# From perf-tools: https://github.com/brendangregg/perf-tools
#
# See the syscount(8) man page (in perf-tools) for more info.
#
# COPYRIGHT: Copyright (c) 2014 Brendan Gregg.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
# (http://www.gnu.org/copyleft/gpl.html)
#
# 07-Jul-2014 Brendan Gregg Created this.
# default variables
opt_count=0; opt_pid=0; opt_verbose=0; opt_cmd=0; opt_duration=0; opt_tail=0
tnum=; pid=; duration=; cmd=; cpus=-a; opts=; tcmd=cat; ttext=
trap '' INT QUIT TERM PIPE HUP
stdout_workaround=1 # needed for older perf versions
write_workaround=1 # needed for perf versions that trace their own writes
### parse options
while getopts cd:hp:t:v opt
do
case $opt in
c) opt_count=1 ;;
d) opt_duration=1; duration=$OPTARG ;;
p) opt_pid=1; pid=$OPTARG ;;
t) opt_tail=1; tnum=$OPTARG ;;
v) opt_verbose=1 ;;
h|?) cat <<-END >&2
USAGE: syscount [-chv] [-t top] {-p PID|-d seconds|command}
syscount # count by process name
-c # show counts by syscall name
-h # this usage message
-v # verbose: shows PID
-p PID # trace this PID only
-d seconds # duration of trace
-t num # show top number only
command # run and trace this command
eg,
syscount # syscalls by process name
syscount -c # syscalls by syscall name
syscount -d 5 # trace for 5 seconds
syscount -cp 923 # syscall names for PID 923
syscount -c ls # syscall names for "ls"
See the man page and example file for more info.
END
exit 1
esac
done
shift $(( $OPTIND - 1 ))
### option logic
if (( $# > 0 )); then
opt_cmd=1
cmd="$@"
cpus=
fi
if (( opt_pid + opt_duration + opt_cmd > 1 )); then
echo >&2 "ERROR: Pick one of {-p PID|-n name|-d seconds|command}"
exit 1
fi
if (( opt_tail )); then
tcmd="tail -$tnum"
ttext=" Top $tnum only."
fi
if (( opt_duration )); then
cmd="sleep $duration"
echo "Tracing for $duration seconds.$ttext.."
fi
if (( opt_pid )); then
cpus=
cmd="-p $pid"
echo "Tracing PID $pid.$ttext.. Ctrl-C to end."
fi
(( opt_cmd )) && echo "Tracing while running: \"$cmd\".$ttext.."
(( opt_pid + opt_duration + opt_cmd == 0 )) && \
echo "Tracing.$ttext.. Ctrl-C to end."
(( stdout_workaround )) && opts="-o /dev/stdout"
ulimit -n 32768 # often needed
### execute syscall name mode
if (( opt_count && ! opt_verbose )); then
: ${cmd:=sleep 999999}
out=$(perf stat $opts -e 'syscalls:sys_enter_*' $cpus $cmd)
printf "%-17s %8s\n" "SYSCALL" "COUNT"
echo "$out" | awk '
$1 && $2 ~ /syscalls:/ {
sub("syscalls:sys_enter_", ""); sub(":", "")
gsub(",", "")
printf "%-17s %8s\n", $2, $1
}' | sort -n -k2 | $tcmd
exit
fi
### execute syscall name with pid mode
if (( opt_count && opt_verbose )); then
if (( write_workaround )); then
# this list must end in write to associate the filter
tp=$(perf list syscalls:sys_enter_* | awk '
$1 != "syscalls:sys_enter_write" { printf "-e %s ", $1 }')
tp="$tp -e syscalls:sys_enter_write"
sh -c "perf record $tp --filter 'common_pid != '\$\$ $cpus $cmd"
else
perf record 'syscalls:sys_enter_*' $cpus $cmd
# could also pipe direct to perf script
fi
printf "%-6s %-16s %-17s %8s\n" "PID" "COMM" "SYSCALL" "COUNT"
perf script -f pid,comm,event | awk '$1 != "#" {
sub("sys_enter_", ""); sub(":", "")
a[$1 ";" $2 ";" $3]++
}
END {
for (k in a) {
split(k, b, ";");
printf "%-6s %-16s %-17s %8d\n", b[2], b[1], b[3], a[k]
}
}' | sort -n -k4 | $tcmd
exit
fi
### execute process name mode
tp="-e raw_syscalls:sys_enter"
if (( write_workaround )); then
sh -c "perf record $tp --filter 'common_pid != '\$\$ $cpus $cmd"
else
perf record $tp $cpus $cmd
fi
if (( opt_verbose )); then
printf "%-6s %-16s %8s\n" "PID" "COMM" "COUNT"
perf script -f pid,comm | awk '$1 != "#" { a[$1 ";" $2]++ }
END {
for (k in a) {
split(k, b, ";");
printf "%-6s %-16s %8d\n", b[2], b[1], a[k]
}
}' | sort -n -k3 | $tcmd
else
printf "%-16s %8s\n" "COMM" "COUNT"
perf script -f comm | awk '$1 != "#" { a[$1]++ }
END {
for (k in a) {
printf "%-16s %8d\n", k, a[k]
}
}' | sort -n -k2 | $tcmd
fi
perf-tools-unstable-0.0.1~20150130+git85414b0/system/ 0000775 0000000 0000000 00000000000 12542613570 0021367 5 ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/system/tpoint 0000775 0000000 0000000 00000013505 12542613570 0022636 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# tpoint - trace a given tracepoint. Static tracing.
# Written using Linux ftrace.
#
# This will enable a given tracepoint, print events, then disable the tracepoint
# when the program ends. This is like a simple version of the "perf" command for
# printing live tracepoint events only. Wildcards are currently not supported.
# If this is insufficient for any reason, use the perf command instead.
#
# USAGE: ./tpoint [-hHsv] [-d secs] [-p pid] tracepoint [filter]
# ./tpoint -l
#
# Run "tpoint -h" for full usage.
#
# I wrote this because I often needed a quick way to dump stack traces for a
# given tracepoint.
#
# OVERHEADS: Relative to the frequency of traced events. You might want to
# check their frequency beforehand, using perf_events. Eg:
#
# perf stat -e block:block_rq_issue -a sleep 5
#
# To count occurrences of that tracepoint for 5 seconds.
#
# REQUIREMENTS: FTRACE and tracepoints, which you may already have on recent
# kernel versions.
#
# From perf-tools: https://github.com/brendangregg/perf-tools
#
# See the tpoint(8) man page (in perf-tools) for more info.
#
# COPYRIGHT: Copyright (c) 2014 Brendan Gregg.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
# (http://www.gnu.org/copyleft/gpl.html)
#
# 22-Jul-2014 Brendan Gregg Created this.
### default variables
tracing=/sys/kernel/debug/tracing
flock=/var/tmp/.ftrace-lock; wroteflock=0
opt_duration=0; duration=; opt_pid=0; pid=; opt_filter=0; filter=
opt_view=0; opt_headers=0; opt_stack=0; dmesg=2
trap ':' INT QUIT TERM PIPE HUP # sends execution to end tracing section
function usage {
cat <<-END >&2
USAGE: tpoint [-hHsv] [-d secs] [-p PID] tracepoint [filter]
tpoint -l
-d seconds # trace duration, and use buffers
-p PID # PID to match on I/O issue
-v # view format file (don't trace)
-H # include column headers
-l # list all tracepoints
-s # show kernel stack traces
-h # this usage message
eg,
tpoint -l | grep open
# find tracepoints containing "open"
tpoint syscalls:sys_enter_open
# trace open() syscall entry
tpoint block:block_rq_issue
# trace block I/O issue
tpoint -s block:block_rq_issue
# show kernel stacks
See the man page and example file for more info.
END
exit
}
function warn {
if ! eval "$@"; then
echo >&2 "WARNING: command failed \"$@\""
fi
}
function end {
# disable tracing
echo 2>/dev/null
echo "Ending tracing..." 2>/dev/null
cd $tracing
warn "echo 0 > $tdir/enable"
if (( opt_filter )); then
warn "echo 0 > $tdir/filter"
fi
(( opt_stack )) && warn "echo 0 > options/stacktrace"
warn "echo > trace"
(( wroteflock )) && warn "rm $flock"
}
function die {
echo >&2 "$@"
exit 1
}
function edie {
# die with a quiet end()
echo >&2 "$@"
exec >/dev/null 2>&1
end
exit 1
}
### process options
while getopts d:hHlp:sv opt
do
case $opt in
d) opt_duration=1; duration=$OPTARG ;;
p) opt_pid=1; pid=$OPTARG ;;
H) opt_headers=1 ;;
l) opt_list=1 ;;
s) opt_stack=1 ;;
v) opt_view=1 ;;
h|?) usage ;;
esac
done
if (( !opt_list )); then
shift $(( $OPTIND - 1 ))
(( $# )) || usage
tpoint=$1
shift
if (( $# )); then
opt_filter=1
filter=$1
fi
fi
### option logic
(( opt_pid && opt_filter )) && die "ERROR: use either -p or -f."
(( opt_duration && opt_view )) && die "ERROR: use either -d or -v."
if (( opt_pid )); then
# convert to filter
opt_filter=1
filter="common_pid == $pid"
fi
if (( !opt_view && !opt_list )); then
if (( opt_duration )); then
echo "Tracing $tpoint for $duration seconds (buffered)..."
else
echo "Tracing $tpoint. Ctrl-C to end."
fi
fi
### check permissions
cd $tracing || die "ERROR: accessing tracing. Root user? Kernel has FTRACE?
debugfs mounted? (mount -t debugfs debugfs /sys/kernel/debug)"
### do list tracepoints
if (( opt_list )); then
cd events
for tp in */*; do
# skip filter/enable files
[[ -f $tp ]] && continue
echo ${tp/\//:}
done
exit
fi
### check tracepoints
tdir=events/${tpoint/:/\/}
[[ -e $tdir ]] || die "ERROR: tracepoint $tpoint not found. Exiting"
### view
if (( opt_view )); then
cat $tdir/format
exit
fi
### ftrace lock
[[ -e $flock ]] && die "ERROR: ftrace may be in use by PID $(cat $flock) $flock"
echo $$ > $flock || die "ERROR: unable to write $flock."
wroteflock=1
### setup and begin tracing
echo nop > current_tracer
if (( opt_filter )); then
if ! echo "$filter" > $tdir/filter; then
edie "ERROR: setting filter or -p. Exiting."
fi
fi
if (( opt_stack )); then
if ! echo 1 > options/stacktrace; then
edie "ERROR: enabling stack traces (-s). Exiting"
fi
fi
if ! echo 1 >> $tdir/enable; then
edie "ERROR: enabling tracepoint $tprobe. Exiting."
fi
### print trace buffer
warn "echo > trace"
if (( opt_duration )); then
sleep $duration
if (( opt_headers )); then
cat trace
else
grep -v '^#' trace
fi
else
# trace_pipe lack headers, so fetch them from trace
(( opt_headers )) && cat trace
cat trace_pipe
fi
### end tracing
end
perf-tools-unstable-0.0.1~20150130+git85414b0/tools/ 0000775 0000000 0000000 00000000000 12542613570 0021203 5 ustar 00root root 0000000 0000000 perf-tools-unstable-0.0.1~20150130+git85414b0/tools/reset-ftrace 0000775 0000000 0000000 00000006265 12542613570 0023526 0 ustar 00root root 0000000 0000000 #!/bin/bash
#
# reset-ftrace - reset state of ftrace, disabling all tracing.
# Written for Linux ftrace.
#
# This may only be of use to ftrace hackers who, in the process of developing
# ftrace software, often get the subsystem into a partially active state, and
# would like a quick way to reset state. Check the end of this script for the
# actually files reset, and add more if you need.
#
# USAGE: ./reset-ftrace [-fhq]
#
# REQUIREMENTS: FTRACE CONFIG.
#
# From perf-tools: https://github.com/brendangregg/perf-tools
#
# See the reset-ftrace(8) man page (in perf-tools) for more info.
#
# COPYRIGHT: Copyright (c) 2014 Brendan Gregg.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
# (http://www.gnu.org/copyleft/gpl.html)
#
# 20-Jul-2014 Brendan Gregg Created this.
tracing=/sys/kernel/debug/tracing
flock=/var/tmp/.ftrace-lock
opt_force=0; opt_quiet=0
function usage {
cat <<-END >&2
USAGE: reset-ftrace [-fhq]
-f # force: delete ftrace lock file
-q # quiet: reset, but say nothing
-h # this usage message
eg,
reset-ftrace # disable active ftrace session
END
exit
}
function warn {
if ! eval "$@"; then
echo >&2 "WARNING: command failed \"$@\""
fi
}
function die {
echo >&2 "$@"
exit 1
}
function vecho {
(( opt_quiet )) && return
echo "$@"
}
# write to file
function writefile {
file=$1
string=$2 # optional
if [[ ! -w $file ]]; then
echo >&2 "WARNING: file $file not writable/exists. Skipping."
return
fi
vecho "$file, before:"
(( ! opt_quiet )) && cat -n $file
warn "echo $string > $file"
vecho "$file, after:"
(( ! opt_quiet )) && cat -n $file
vecho
}
### process options
while getopts fhq opt
do
case $opt in
f) opt_force=1 ;;
q) opt_quiet=1 ;;
h|?) usage ;;
esac
done
shift $(( $OPTIND - 1 ))
### ftrace lock
if [[ -e $flock ]]; then
if (( opt_force )); then
warn rm $flock
else
echo -e >&2 "ERROR: ftrace lock ($flock) exists. It shows" \
"ftrace may be in use by PID $(cat $flock).\nDouble check" \
"to see if that PID is still active. If not, consider" \
"using -f to force a reset. Exiting."
exit 1
fi
fi
### reset ftrace state
vecho "Reseting ftrace state..."
vecho
cd $tracing || die "ERROR: accessing tracing. Root user? Kernel has FTRACE?"
writefile current_tracer nop
writefile set_ftrace_filter
writefile set_graph_function
writefile set_ftrace_pid
writefile events/enable 0
writefile tracing_thresh 0
writefile kprobe_events
writefile tracing_on 1
vecho "Done."