collectl-4.3.1/0000775000175000017500000000000013366602004011466 5ustar mjsmjscollectl-4.3.1/collectl.conf0000775000175000017500000001545513366602004014153 0ustar mjsmjs# Copyright 2003-2009 Hewlett-Packard Development Company, LP # Like most Linux configuration files, this specifies a set of user controller # parameters. In many cases these are commented out which simply means those # are the default values already used by collectl. To change a value # uncomment the line and change it. To revert back to the default all you # need do is recomment the line. ############################ # daemon/service handling ############################ # When someone specifies a daemon is to be run, typically but not limited # to running collectl as a service, this string will cause the associated # values to be used. They CAN also be overriden via a command line switch. # In other words, if DaemonCommands is set to '-s cdm' and collectl is # envoked with -D, it will process subsystems 'cdm'. However, if it is envoked # with '-D -s mnp' it will process subsystems 'mnp', there is no combining the # two set of values. Be sure to include any switches a daemon is required to # have such as -f and either -r or -R. # NOTE - if things aren't behaving as expected, you can always try running # collectl in non-daemon mode just to see if there are any error messages. If # you include the -m switch, you can also look in the collectl log, which is # stored in the logging directory. DaemonCommands = -f /var/log/collectl -r00:00,7 -m -F60 -s+YZ # This defines the location to look for all additional required files if formatit.ph # is not in the same directory as collectl itself. #ReqDir = /usr/share/collectl # E x t r a L i b r a r i e s # So far this has only been used during development, but if there are extra # library locations that should be 'used', put them here. #Libraries = # S t a n d a r d U t i l i t i e s # Note that by default collectl will look for lspci in both /sbin and # /usr/sbin, but if listed here will only look in that one place. #Grep = /bin/grep #Egrep = /bin/egrep #Ps = /bin/ps #Rpm = /bin/rpm #Lspci = /sbin/lspci #Lctl = /usr/sbin/lctl # I n f i n i b a n d S u p p o r t # Collectl will assume open fabric and will attempt to use the perfquery # utility to get the counters. If not there, it assumes Voltaire and will # first look in /proc/voltaire/adaptor-mlx/stats and failing that will use # the get_pcounter utiliy. Since collectl resets IB counters in the # hardware you can disable its collection by commenting out the appropriate # variable below. PQuery for OFED, PCounter for get_pcounter calls and # VStat for ALL non-ofed access of any kind. # can disable either by commenting out the reference to VStat/PQuery below. PQuery = /usr/sbin/perfquery:/usr/bin/perfquery:/usr/local/ofed/bin/perfquery PCounter = /usr/mellanox/bin/get_pcounter VStat = /usr/mellanox/bin/vstat:/usr/bin/vstat OfedInfo = /usr/bin/ofed_info:/usr/local/ofed/bin/ofed_info # D e f a u l t s # This set of variables are actually all set in collectl and you need not # change them. # This parameter controls subsystem selection. The 'core' subsystems are # selected when the user omits the -s switch OR uses the '+' or '-' to # add/remove from that list. Note that changing this will also change the # default for -s displayed in help. #SubsysCore = bcdfijlmnstx # although these can all be overridden by switches, they're assumed to # always be defined so don't remove or comment any of them out! Over time # more may be added #Interval = 10 #Interval2 = 60 #Interval3 = 120 # These are SFS lustre specific. When using the -OD switch, any partitions # found to be smaller than LustreSvcLunMax, which is in GB, will be ignored. # When displaying data in verbose mode, only LustreMaxBlkSize will be # displayed, but ALL block sizes will be read and recorded #LustreSvcLunMax = 10 #LustreMaxBlkSize = 512 # By default, we check at these frequencies to see if lustre or interconnect # configurations have changed. Things are efficient enough that now we can # check for lustre changes every polling interval but I'm leaving the code # in place rather than remove it in case needed again in the future. #LustreConfigInt = 1 #InterConnectInt = 900 # These apply to disk/partition limits for exception (-o x/X) processing #LimSVC = 30 # Minimum partition Avg Service time #LimIOS = 10 # Minumum number of Disk OR Partion I/Os #LimBool = 0 # generate exception record if EITHER limit exceeded #LimLusKBS = 100 # Minimum number of Lustre OSS KB/sec #LimLusReints = 1000 # Minimum number o Lustre MDS Reint operations # Socket I/O Defaults #Port = 2655 #Timeout = 10 # Maximum allowable zlib errors in a single day or run. #MaxZlibErrors = 20 # To disable bogus network data checking, set this to any negative value #DefNetSpeed=10000 # Collectl will automatically size the frequency of headers in 'brief format' # to the height of your display window which it determines using the resize # utility. If that utility can't be found, it will use the height speficied # in 'TermSize'. If 'resize' is in your path but you want a fixed/different # size, comment out the Resize line and uncomment TermHeight, setting it to # what you want. #TermHeight = 24 Resize=/usr/bin/resize:/usr/X11R6/bin/resize # To turn off Time:HiRes/glibc incompatibility checking, the following # should be enabled and set to 0 #TimeHiResCheck = 1 # These control environmental monitoring and to use it you MUST have ipmitool # installed (see http://ipmitool.sourceforge.net/). If not in the path shown # below, you must change it. Ipmitool = /usr/bin/ipmitool:/usr/local/bin/ipmitool:/opt/hptc/sbin/ipmitool IpmiCache = /var/run/collectl-ipmicache IpmiTypes = fan,temp,current # passwd file for UID to usernames mapping during process monitoring #Passwd = /etc/passwd # If a cciss device is reset (such as when during a lun scan) while collectl running, # disk rates will be excessive. If one seen above the following, reset ALL stats for # that disk to 0. To disable set this to -1 #DiskMaxValue=5000000 # When collectl reads disk data, it filters out any that don't match the DiskFilter, # which by default looks for cciss, hd, sd, xvd, dm, emcpower and psv. All others are # ignored. To change the filter, set the string below to those you want to keep BUT # you need to know what a perl regular expression looks like or you may not get the # desired results. CAUTION - white space is CRITICAL for this to work. #DiskFilter = /hd[ab] | sd[a-z]+ |dm-\d+ |xvd[a-z] |fio[a-z]+ | vd[a-z]+ |emcpower[a-z]+ |psv\d+ |nvme\d+n\d+ / # Kernel Efficiency Test # On kernels 2.6.32 forward (and you can't tell how distros patched) there is a read inefficiency # in the /proc filesystem for 4 and more sockets and the only way to tell is to test it. If slow # generate a warning that patching the kernel may be recommmended. To bypass the test/message, set # the following to 'no' #ProcReadTest = yes collectl-4.3.1/docs/0000775000175000017500000000000013366602004012416 5ustar mjsmjscollectl-4.3.1/docs/SlowProc.html0000664000175000017500000001560013366602004015056 0ustar mjsmjs collectl - Slow Proc Access?

Is Your Kernel Reading /proc Too Slowly?

Introduction

The 2.6.32 release of the kernel introduced a regression that causes the /proc to be read significantly more slowly and with significantly more overhead on systems with high core counts. In fact this overhead has been measured to be over a factor of 50 reading /proc/stat on a system with 8 sockets and 48 cores.

The good news is newer RedHat and SUSE distros have been updated to mitigate this problem, specifically RHEL 6.2 and SLES SP1. As for other distros, I just don't have the access to verify everything, so if you are running a different distro and can verify this problem has been resolved, I'd appreciate hearing about which specific version addresses it so I can publish the news here.

So why should you even care about this? You may have high core counts and running a kernel that has not yet been patched. While this probably won't have any impact on any of your running applications - but do you ever run top? iostat? sar? or any other monitoring tools? If you're reading this you probably run collectl. Most monitoring tools are farily light-weight and for a good reason - if you're trying to measure something you don't want the tool's overhead to get in the way. Unfortunately with this regression it will now!

The Analysis

Whenever a monitoring change is made to collectl I measure its overhead, just to make sure it's being done efficiently. The way I do this is simulate a day's worth of sampling at an interval of 0 seconds and timing it. In this case I had recently added numa monitoring and initially found this problem reading /sys/devices/system/node, but with more testing found it appears elsewhere as well.

In the following example, you can see monitoring CPU data takes about 3 seconds to read almost 9K samples and write them to a file on a 2-socket/dual-core system. Very efficient!

time collectl -sc -i0 -c 8640 -f/tmp
real    0m2.879s
user    0m1.908s
sys     0m0.913s
Next I ran the same command on an 8-socket/48-core system and look at the difference. Note that the overhead was so high and took so long I only took 1/10th the number of samples (I get impatient). This system is running Fedora 14, which is a 2.6.35 kernel, and this alone is over 5 times the overhead of the previous example which normalizing to a full day would be over 50 times the load:
time collectl -sc -i0 -c 864 -f/tmp
real    0m16.783s
user    0m3.003s
sys     0m13.523s

How can you tell if your system has this problem?

Before you panic, there are 2 things to keep in mind:

Since a simply uname command will tell you your kernel version, you might think that's all it takes, but nothing is always that simple because most vendors patch their kernels and you can't always be sure what code it's actually running.

One simple way to tell for sure is to run the very simple test below which times a read of /proc/stat (which seems to be the most heavily effected) by using strace see how much time is spent in the actually read.

The following is on my 2-socket/dual-core system:

strace -c cat /proc/stat>/dev/null
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.000251         251         1           execve
  0.00    0.000000           0         3           read
  0.00    0.000000           0         1           write
  0.00    0.000000           0         4           open
  0.00    0.000000           0         5           close
  0.00    0.000000           0         5           fstat
  0.00    0.000000           0         8           mmap
  0.00    0.000000           0         3           mprotect
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         3           brk
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           uname
  0.00    0.000000           0         1           arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00    0.000251                    37         1 total
while the following in on the 8-socket/48-core system:
strace -c cat /proc/stat >/dev/null
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.014997        4999         3           read
  0.00    0.000000           0         1           write
  0.00    0.000000           0        20        16 open
  0.00    0.000000           0         6           close
  0.00    0.000000           0        12        10 stat
  0.00    0.000000           0         5           fstat
  0.00    0.000000           0         8           mmap
  0.00    0.000000           0         3           mprotect
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         4           brk
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00    0.014997                    66        27 total
As you can see the differences are dramatic! On my 4-core machine virtually 0 time is spent doing the read while on the 48 core machine almost 15 msec is spent reading, and that's only reading /proc/stat one time! Also remember - monitoring tools typically read a lot of different /proc structures. Perhaps now you can get a better appreciation of how significant this problem really is.

What is collectl doing about this?

While there is nothing collectl can do to reduce this overhead, staring with the next release it will be including code upon startup times the reading of /proc on newer kernels with core counts of 32 or more and warns the users if their systems exhibit this problem. It will also skip this test for RHEL 6.2 and SLES SP1. As I hear that other distros have been identified that have also fixed this problem, I'll start excluding them from the tests as well.

updated Jan 18, 2012
collectl-4.3.1/docs/BuddyInfo.html0000664000175000017500000001161713366602004015175 0ustar mjsmjs collectl - Memory Fragmentation

Memory Fragmentation

Introduction

Version 3.2.1 of collectl introduces support for /proc/buddyinfo, which shows the distribution of memory fragments where the size of each fragment is a power of 2 pages. The memory is categorized by node, depending on the system architecture and then subcategorized by the type of memory, referred as a zone. For example, the /proc/buddyinfo might look like:
Node 0, zone      DMA      5      5      3      4      2      4      3      1      0      0      0
Node 0, zone    DMA32     79     61      4     12      0      0      0      2      0      1      0
Node 1, zone    DMA32    134     57     27     60      0      0      1      1      0      1      0
Node 2, zone   Normal    865    357     37      1      6      0      2      1      0      1      0
Node 3, zone   Normal    651     47     19     10      1      1      1      1      0      1      0
Running collectl with -sb --verbose would produce a single line of output that shows the totals of each column. For example, taking 1 second samples with timestamps included:
collectl -sb --verbose -oT
# MEMORY FRAGMENTATION SUMMARY (4K pages)
#                1       2       4       8      16      32      64     128     256     512    1024
16:11:26      1296     483     157      85       9       5       7       6       0       4       0
16:11:27      1354     485     163      87       9       5       7       6       0       4       0
16:11:28      1395     480     165      89       9       5       7       6       0       4       0
And in detail mode we see one line per entry, again with timestamps:
collectl -sB -oT
# MEMORY FRAGMENTATION (4K pages)
#         Node    Zone        1       2       4       8      16      32      64     128     256     512    1024
16:13:33     0     DMA        5       5       3       4       2       4       3       1       0       0       0
16:13:33     0   DMA32      175      97       3      12       0       0       0       2       0       1       0
16:13:33     1   DMA32      933     389      60      68       0       0       1       1       0       1       0
16:13:33     2  Normal        0       1       8       1       6       0       2       1       0       1       0
16:13:33     3  Normal        1       2      57      10       1       1       1       1       0       1       0
Where things get interesting is in brief mode. The challege here is to show the maximum amount of information in the least amount of space, thus allowing you to look at other information on the same line as well. To better understand the methodology chosen for this, think in terms of base 36. However, instead of mapping each character to a number from 0 to 35, we're going to do something very different. Since we're mainly interested in seeing what's happening as the numbers of fragments shrink: This results in a display like the following:
collectl -sb -oT
#         <---Memory-->
#Time        Fragments
16:24:13   qmji9576040
16:24:14   rmji9576040
16:24:15   smji9576040
16:24:16   smji9576040
16:24:17   rmji9576040
As a final example of what this looks like when combined with other data, remembering you can choose virtually any combination of subsystem for your display both in collection and playback modes:
collectl -sbcmn -oT
#         <--------CPU--------><------------------Memory-----------------><----------Network---------->
#Time     cpu sys inter  ctxsw Free Buff Cach Inac Slab  Map    Fragments   KBIn  PktIn  KBOut  PktOut
16:44:46    0   0  1029    146  23M 178M   6G   5G 461M 234M  lljj9576040      2      8      0       2
16:44:47    0   0  1020    136  24M 178M   6G   5G 461M 234M  nljj9576040      2      8      1       2
16:44:48    1   0  1062    371  22M 178M   6G   5G 461M 235M  kljj9576040      3     31      2      27
16:44:49    0   0  1009    146  22M 178M   6G   5G 461M 235M  kljj9576040     14     13      0       2
updated Feb 12, 2009
collectl-4.3.1/docs/WhySummary.html0000664000175000017500000001372213366602004015436 0ustar mjsmjs collectl - Why Summary

Why would I monitor summary data?

Introduction

The answer to this question is fundamental to understanding system monitoring in general, whether you're using collectl or some other utilities. To really understand what your system is actually doing you should be looking at individual disks, cpus or networks. But in reality that's often too much to keep track of unless of course you have a single-cpu/disk system.

Let's say you have multiple devices such as disks and the one you're interested in is misbehaving and reading or writing too slowly. Won't the total disk activity also be low? Similary if disk traffic is being reported high on a lightly loaded system won't this also jump out you by simply looking at the total disk activity? The same can be said about networks and most other subsystems for which there is summary data and simply looking at the totals will often alert you to the fact that something is not right. The key thing to keep in mind that you are looking at totals.

CPU monitoring can be a little tricker as these are reported as averages as opposed to totals and as the number of cores increase so does the divisor of the calculation. In most cases when you have a system with excessive load, it will effect all CPUs and so will be very visible even as an average, but in some cases it won't. What if you have a 2 core system and see a CPU load of 45% when you're expecting a much lighter load? Looking at individual CPUs you may see one running at near zero load and the second at 90%. Your only clue was that the 45% load was unexpected and so you looked closer. But what if you had a heavily loaded CPU on a 48 core system? You'd never even realize it. In other words, just pay attention.

Stated slightly differently, summary data is often a starting point to help identify potential trouble ares and from there you can determine if you need to dig deeper.

So why brief data?

It you have ever tried to look at multiple lines of different text and identify what was changing over time you should already know the answer - it's really difficult! For example, here's what collectl might show for CPU, Disk and Network data:

collectl.pl --verbose
### RECORD    1 >>> poker <<< (1314712401.002) (Tue Aug 30 09:53:21 2011) ###

# CPU SUMMARY (INTR, CTXSW & PROC /sec)
# User  Nice   Sys  Wait   IRQ  Soft Steal  Idle  CPUs  Intr  Ctxsw  Proc  RunQ   Run   Avg1  Avg5 Avg15
     0     0     0     0     0     0     0   100     4  1120    192     0   363     0   0.00  0.00  0.00

# DISK SUMMARY (/sec)
#KBRead RMerged  Reads SizeKB  KBWrite WMerged Writes SizeKB
      0       0      0      0        0       0      0      0

# NETWORK SUMMARY (/sec)
# KBIn  PktIn SizeIn  MultI   CmpI  ErrsI  KBOut PktOut  SizeO   CmpO  ErrsO
     0      1     60      0      0      0      0      0      0      0      0

### RECORD    2 >>> poker <<< (1314712402.002) (Tue Aug 30 09:53:22 2011) ###

# CPU SUMMARY (INTR, CTXSW & PROC /sec)
# User  Nice   Sys  Wait   IRQ  Soft Steal  Idle  CPUs  Intr  Ctxsw  Proc  RunQ   Run   Avg1  Avg5 Avg15
     0     0     0     0     0     0     0    99     4  1111    200     0   363     0   0.00  0.00  0.00

# DISK SUMMARY (/sec)
#KBRead RMerged  Reads SizeKB  KBWrite WMerged Writes SizeKB
      0       0      0      0      256      59      5     51

# NETWORK SUMMARY (/sec)
# KBIn  PktIn SizeIn  MultI   CmpI  ErrsI  KBOut PktOut  SizeO   CmpO  ErrsO
     0      2     60      0      0      0      0      3    328      0      0
and that's only 2 samples! Think about trying to watch level of detail changing every second and identifying changes? Extremely difficult if not impossible. It might be a little easier to watch in top format which you can get by including --home, but now you lose the previous records to compare the values to.

Now consider the fact that in many cases seeing network errors or disk merges or even the percentage of time the CPU spent processing interrupts, while important, may not be when trying to identify anomalous behaviors. And that's where brief mode comes in. Here we are identifying those few nuggets of information which will tell us whether or not things are functioning as expected such that we can display them all on the same line and make it easier to spot change. In fact, during the following run I did a ping -f and see how easy it is to spot the network burst?

collectl
#<--------CPU--------><----------Disks-----------><----------Network---------->
#cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut
   0   0  1124    203      0      0    240      4      0      0      0       0
   0   0  1105    253      0      0     12      2      0      1      0       1
   0   0  1123    206      0      0      0      0      0      3      0       2
   2   1  6051   8584      0      0      0      0    173   2099    297    2860
   3   2  7828  11270      0      0      0      0    222   2770    411    3936
   0   0  1115    204      0      0     92      5      0      5      1       5
   0   0  1121    198      0      0      0      0      0      1      0       1
Now, if you think there is a network problem you can then run collectl in verbose or detail mode and only look at network and not be distracted by other data.

In summary, just keep in mind that there is no single recipe for how to monitor a system, what format to display the output in and how to drill deeper. However, as you become more familiar with the types of data and collectl formats your ability to better utilize collectl will increase.
updated Sept 19, 2011
collectl-4.3.1/docs/Socket.html0000664000175000017500000000226713366602004014543 0ustar mjsmjs collectl - Socket Info

Socket Monitoring

Introduction

Not really a whole lot to say here other than collectl does not report any socket details at this time but only summary data, which it gets from /proc/net/sockstat. In brief format the data reported is the number of sockets in use by their type, specifically TCP, UDP and Raw. It also reports the number if IP fragments. In verbose more it break things out at lower levels of detail.

In most cases this is of minimal interest unless you're trying to track down a specific socket related problem. In the cases of a runaway process or someone opening but not closing sockets this number has been seen to grow quite large and even consume all resources causing a system crash, but those cases are pretty rare. In any event, during times of strange behavior it can't hurt to have a look at these numbers if for no other reason than to rule out socket problems.
updated August 30, 2011
collectl-4.3.1/docs/colmux.html0000664000175000017500000003262013366602004014616 0ustar mjsmjs Colmux

Colmux

Introduction

Have you ever seen an nfs server getting beaten up but didn't know which of the many hundreds of clients were doing the beating? Or have you wondered if an application was leaking memory when it ran but there was no easy way to observe all the memory on all the nodes at the same time? Or how about whether or not a few disks in a large farm had slow access times and so were slowing down all the disks? It has always been easy to observe all of these types of behaviors with collectl one node at a time, or even plot the data after the fact with colplot. But observing cluster-wide activity in real-time has never been that easy, until now.

As its name implies, colmux is a collectl multiplexor, which allows one to collect data from multiple systems and treat it as a single data stream, essentially extending collectl's functionality to a set of hosts rather than a single one. Colmux has been tested on clusters of over 1000 nodes but one should also take note that this will put a heavier load on the system on which colmux is running.

Colmux runs in 2 distinct modes: Real-Time and Playback. In real-time mode, colmux actually communicates with instances of collectl running on remote systems which in turn are collecting real-time performance metrics. In playback mode colmux also communicates with a remote copy of collectl but in this case collectl is playing back a data file collected some time in the past.

Colmux can also provide its output in 2 distinct formats: single-line and multi-line. In single-line format colmux reports the multiplexed data from all systems on a single line by allowing the user to choose a small number of variables to display, based on both the display width and the number of systems. While it is possibly to handle more than a couple of dozen systems, (see the example at the bottom of this page), one rarely does so because of the screen width. However it is also possible to redirect the output to a file for off-line viewing, via a text editor or a spreadsheet.

Colmux has been extensively tested on versions of collectl from V3.3.6 forward and there have been some additional enhancements made to V3.5.0, which is the recommended minimal version. You should first make sure all the systems of interest have the latest versions of collectl installed or at least those at V3.3.6 or newer.

Colmux also provides the ability for dynamic interaction with the keyboard arrow keys if the optional perl module Term::ReadKey has been installed. To see if this is the case and that colmux can find it run with -v and you should see the following:

colmux -v
colmux: 3.0 (Term::ReadKey: V2.30)

Restriction
Colmux requires passwordless ssh between it and all hosts it is monitoring

Using colmux

Although colmux does not have any required switches, -command is one of the two most important as you use it to tell collectl what switches to use when running. Colmux will then take care of multiplexing the command out to multiple instances of collectl either running them in real-time or playback mode. The other key switch, also not required but typically used, is -address because it identifies the remote system(s) on which to run colmux. The default addess is that of the host colmux is running on.

The inclusion of a playback filename in the collectl command instructs colmux to run in playback mode and the use of colmux's -cols switch tells it to produce output in single-line format. By using various combinations of these switches you can get colmux to run in any 4 distinct modes as shown in the following table:

 Real-TimePlayback
Single-line-cols-command "-p filename" -cols
Multi-linedefault-command "-p filename"

Let's discuss these 4 options separately to give a better feel for what they actually mean and when you might use them. Note that the 2 operational modes have nothing to do with the way the data is displayed and the 2 formats have nothing to do with the way the data is collected - in other words a complete separation between form and function.

Real-Time Mode
If you've ever run collectl before, and you probably have if you're looking at these utilities, you already know the real-time nature of the tool. The difference here is that with colmux you're actually able to look at multiple systems at the same time. By default, colmux runs in real-time mode unless you explicitly instruct it to run in playback mode by including -p in the collectl command.

Playback Mode
The way you tell colmux to run in this mode is to simply point the collectl command at a file with the -p switch as you would normally do when you want to play back a file. The main restriction is that the file needs to exist on all the systems you've pointed colmux to and therefore wild cards are required for portion of the filename and that includes the hostname and are often used for the timestamp portion as well.

Typically one simply uses a collectl playback filename in a format something like /var/log/collectl/*20101225*, wildcarding all but the date of interest. If you want, you can put all the collectl files in one directory on the same system colmux is running on, but this method is restricted to only running on the local system.

During playback by colmux, only the data falling in the same time interval will be reported and so the header that reports how many nodes have had their data included becomes more meaningful in case there is missing data.

Multi-Line Format
Once you've decided which systems you want to monitor and what collectl command you want to execute, you need to decide whether you're interested in single- or multi-line output. Most users will probably be interested in the multi-line format, at least at first. Think of the linux top command but not being limited to just processes.

Multi-line format reports all data provided by collectl in its original format. Further, it sorts it by a column of the user's choice (the default is column 1) and presents as much data as will fit on the screen. The result is a top-like utilility capable of reporting the top consumers of virtually any resource on the cluster be it the more traditional process statistics or something more exotic such as memory or network consumption.

Since this IS native collectl data it can be virtually anything, including that provided by any custom import modules you may have written. You will also see the identical information in playback mode, though this is presented as scrolling text (there is also a -home switch to display the data in top format if you wish, but unless you include -delay it may scroll by too fast to be of use).

The main consideration with multi-line format is that colmux can only deal with collectl commands that themselves only produce single line output OR multiple lines that are all the same format, noting that data provided by a custom import are considered to be a single device themselves. These include:

While the column number to sort on should also be a consideration, and you can manually select it at startup with -column, you can easily change the sort column once colmux is running by either using the arrow keys (if Term::ReadKey has been installed) or simply typing the desired column number followed by the enter key. This works in both real-time and playback mode.

Below is an example of examing the network traffic on 5 nodes and sorting it by column 2. As you can also see, all 5 nodes have reported data for the interval being displayed and colmux highlights the selected column, though not as the bolded text shown in the examples that follow but rather in reverse video:

colmux -addr 'xx1n[1-5]' -command "-sn" -column 2

# Wed Dec 29 05:42:21 2010  Connected: 5 of 5
#         <----------Network---------->
#Host       KBIn  PktIn  KBOut  PktOut
xx1n1          9     82     42     326
xx1n2          9     77     41     320
xx1n5          8     75     41     318
xx1n4          8     74     41     317
xx1n3          8     71     40     314

Single-Line Format
Unlike multi-line format which only displays output for the top systems which can change from interval to interval, in single-line format you always see the selected data for all systems and it is never sorted but rather reported in a fixed format. Therefore you need to tell colmux which data fields you're interested in when you first start it up. To determine the correct column numbers you can either run the desired collectl command manually and start counting columns, noting the first column is always 1, or you can use colmux's -test switch, which you can also use in multi-line format. This switch will display the header line of collectl's output including the hostname as column 0, with the column(s) you have selected highlighted, as well as a list of all the columns and their numbers for quick reference.

colmux -command "-sc" -test -cols 3,4
>>> Headers <<<
#          <--------CPU-------->
#Host      cpu sys inter  ctxsw

>>> Column Numbering <<<
 0 #Host  1 cpu    2 sys    3 inter  4 ctxsw

Once you have decided on the column numbers, there are a couple of other optional switches you may choose to use, including timestamps, data type totals and for very wide displays you can even request the columns be narrower and to divide each value by 1000 or 1024. To preface each line with a timestamp, you actually include the appropriate time format switch with the collectl command itself, rather than using a distinct colmux switch. caution: when including timestamps the column numbering is shifted appropriately and so you may want to use -test to be sure you're specifying the correct columns.

Here is an example of the same command to look at network data, except in this case colmux has been instructed report data for only columns 3 and 4, to print time stamps at the beginning of each line and to report totals at the far right. As colmux first starts you can see the data being reported as all -1s since those systems have not yet sent any data back:

colmux -addr 'xx1n[1-5]' -command "-sn -oT" -cols 2,4 -coltot

#Time    xx1n1  xx1n2  xx1n3  xx1n4  xx1n5  |  xx1n1  xx1n2  xx1n3  xx1n4  xx1n5  |     KBIn    KBOut
05:29:48    -1     -1     -1     -1     -1  |     -1     -1     -1     -1     -1  |        0        0
05:29:49     2      2      2      2      2  |      0      0      0      0      0  |       10        0
05:29:50     2      2      2      2      2  |      0      0      0      0      0  |       10        0
05:29:51     9     10      4      8      9  |     41     42      3     41     42  |       40      169
05:29:52     2      2      9      2      2  |      0      0     40      0      0  |       17       40
05:29:53     2      2      2      2      2  |      0      0      0      0      0  |       10        0
05:29:54     2      2      2      2      2  |      0      0      0      0      0  |       10        0

The following screenshot is an example of looking at Infiniband traffic between 16 clients writing to 4 lustre servers and even though the font is small, you can still make out the patterns of the column widths changing. The left half of the display shows network received KB and the right half network transmitted KB. The first 4 columns in each section are the lustre servers and the next 16 columns the clients. As expected during a client write test, the lustre servers show high receive traffic and the clients show high transmit traffic. Look how easy it is to see drops in the client transmission rates even if you can't easily read the numbers. Also notice that the second client isn't doing any transmitting at all and since it's not displaynig -1 we know collectl is running correctly.

Here's an even more dense example showing CPU load on a large cluster which is so wide it takes 3 monitors to display it all. Even though you can't read the output you can still see different patterns as some systems start/stop and others sit idle.

updated March 9, 2015
collectl-4.3.1/docs/Tutorial.html0000664000175000017500000003057313366602004015117 0ustar mjsmjs Collectl Tutorial - Getting Started With Collectl

Collectl Tutorial - The Basics

Getting started using collectl may seem a little challenging to the new user because of its many options, but it shouldn't be. After all, how many people simply run the top command and don't even realize there are a rich set of options available? In that same spirit, you can simply enter the collectl command and get a lot of useful information, but you would also be losing out on a lot. The intent of this tutorial is to give you a better appreciation of what you can do with collectl and hopefully encourage you to experiment with even more options than those described below.

Measuring Disk Activity

For this first set of examples I'll be using Robin Miller's dt to write a large file to /tmp using the command dt of=/tmp/test limit=1g bs=1m disable=compare,verify dispose=keep while running collectl in another window:
#<--------CPU--------><----------Disks-----------><----------Network---------->
#cpu sys inter  ctxsw KBRead  Reads KBWrit Writes netKBi pkt-in netKBo pkt-out
  30  30   254     65      8      2   7920     97      0      4      0       2
  10  10   377     65      0      0  32500    282      4     52      2      19
  10  10   332     61      0      0  29312    246      0      3      0       3
   9   9   330     65      0      0  32512    275      3     45      1       9
  11  11   331     53      4      1  29684    270      0      2      0       2
   8   8   352     63      0      0  35004    273      3     33      1       8
  13  12   329    116      0      0  28924    249      0      2      0       2
Here we see a few things including a burst of cpu activity when the test first starts as well as an I/O rate of about 30MB/sec which corresponds to what dt is telling us in the following summary line:
Average transfer rates: 32051995 bytes/sec, 31300.776 Kbytes/sec
If we compare the write rates to the number of writes we can also infer writes of about 128KB which is good to know because that means we're being efficient in the size of the data blocks being handed to the driver. However if we don't mind using the extra columns, we can include --iosize, which tells collectl to include the average I/O size when using this default display format also known as brief mode. In verbose mode the I/O sizes are always included.
#<--------CPU--------><---------------Disks---------------->
#cpu sys inter  ctxsw KBRead  Reads Size KBWrit Writes Size
   9   8   381     71      0      0    0  30644    276  111
  14  13   325     85      0      0    0  32888    258  127
  11  10   313     80      0      0    0  31064    261  119
  12  11   421    186      0      0    0  32376    276  117

This may also be a good time to mention screen real estate. There is a lot of information that collectl can display and everything takes space! More often than not you don't really care about time and so by default it isn't displayed. However there may be times you do care and so you can simply add the switch -oT add the option of time to the display. In fact, sometimes you may want to include the date as well in which case -oD will do both. You can even show the times in msec by including m with -o, which can be useful when running at sub-second monitoring levels and/or if you want to correlate data to system or application logs with may themselves have finer grained time. Here's an example of the command collectl -scd -i.25 -oDm which shows the cpu and disk loads every quarter second and includes the date and time in msecs:

#                      <--------CPU--------><----------Disks----------->
#Date    Time          cpu sys inter  ctxsw KBRead  Reads KBWrit Writes
20080212 11:22:47.008    2   0   364     84      0      0  31328    284
20080212 11:22:47.258    8   6   392     92      0      0  30832    356
20080212 11:22:47.508    8   6   308     84      0      0  36256    268
20080212 11:22:47.758    2   0   292     44      0      0  31152    196

So what about that CPU load? Given that this is a 2 CPU system we might be interested in seeing how that load is being distributed by running the command collectl -sC, since an uppercase subsystem type, like cpu, disk or network tells collectl to show instance level details:

# SINGLE CPU STATISTICS
#   CPU  USER NICE  SYS WAIT IRQ  SOFT STEAL IDLE
      0     0    0   17    0    0    0     0   83
      1     0    0    4    0    0    0     0   96
      0     0    0   14    0    0    0     0   86
      1     0    0    0    0    0    0     0  100
      0     0    0   20    0    0    0     0   80
      1     0    0    0    0    0    0     0  100
noting all the load is being delivered by a single CPU as expected. Ok, so now let's read back the 1G file we just wrote and see what happens.
#<--------CPU--------><----------Disks-----------><----------Network---------->
#cpu sys inter  ctxsw KBRead  Reads KBWrit Writes netKBi pkt-in netKBo pkt-out
  38  37   248    189   7283    111      0      0      1      9      1       8
  24  23   153     81     32      0      0      0      2     32      1       9
Now we see a big burst of CPU load and not much from disk. Furthermore dt is reporting
Average transfer rates: 872960833 bytes/sec, 852500.813 Kbytes/sec
which in fact confirms that reads are coming from cache and not disk since no local disk can read at this rate! In general, when doing disk I/O testing one should use file sizes that are larger than cache to force all I/O to come from disk. So repeating the tests with a larger file we now see more realistic read rates:
#<--------CPU--------><----------Disks-----------><----------Network---------->
#cpu sys inter  ctxsw KBRead  Reads KBWrit Writes netKBi pkt-in netKBo pkt-out
   9   8   773    743  41376    629      0      0      1      8      1       7
   9   8   619    639  31716    476      0      0      2     33      1       8
  16  15   510    554  23016    370      0      0      0      4      0       2
  10  10   572    624  27272    429      0      0      2     27      1       8
  16  15   458    504  19560    306     12      2      0      4      0       2
So just what is happening to cache during testing? To see memory utilization we can simply add the memory subsystem to the default selections as collectl -s+m but that also makes the display wider and since for our purposes we don't need network information I'm just going to run the following collectl -scmd:
#<--------CPU--------><-----------Memory----------><----------Disks----------->
#cpu sys inter  ctxsw free buff cach inac slab  map KBRead  Reads KBWrit Writes
   3   0   159     80   2G 395M 189M   1M    0    0      0      0     20      3
   1   0   153     52   2G 395M 189M   1M    0    0      0      0      0      0
  43  42   238     68   2G 395M 340M 152M    0    0      0      0   3060     72
  25  25   376     53   1G 395M 431M 242M    0    0      0      0  29808    273
   6   6   377     59   1G 395M 455M 266M    0    0      0      0  30900    266
  10  10   347     55   1G 395M 492M 303M    0    0      0      0  35004    265
   5   4   389     60   1G 395M 506M 318M    0    0      0      0  27308    262
and watch the cache fill up. In fact, if we keep running collectl eventually we use up all available memory (but that's what it's there for) and even after the test completes and there is no more I/O, we still see hardly any free memory. But that too is ok because until someone else needs it or deletes the file, that data stays in cache. Look at the last sample where I manually deleted the file. You can see the cache drop to 204M and the free memory rise to 2G during a single reporting interval:
#<--------CPU--------><-----------Memory----------><----------Disks----------->
#cpu sys inter  ctxsw free buff cach inac slab  map KBRead  Reads KBWrit Writes
   1   1   374     91 171M 397M   2G   2G    0    0      0      0  34624    288
   1   1   368     82 171M 397M   2G   2G    0    0      0      0  31408    260
   2   2   319     56 171M 397M   2G   2G    0    0      0      0  31148    266
   0   0   385     70 172M 397M   2G   2G    0    0      0      0  25844    273
   0   0   167     70 172M 397M   2G   2G    0    0      0      0      0      0
   0   0   173     51 172M 397M   2G   2G    0    0      0      0      0      0
   2   0   181    108 172M 397M   2G   2G    0    0      0      0     12      2
  41  41   148     52   2G 397M 204M  15M    0    0      0      0     72      5
For one more test, I'm going to write that same 1G file to my home directory and look what collectl tells me:
#<--------CPU--------><----------Disks-----------><----------Network---------->
#cpu sys inter  ctxsw KBRead  Reads KBWrit Writes netKBi pkt-in netKBo pkt-out
   0   0   145     48      0      0      0      0      2     38      2      13
  13  13  6716   3491      0      0      0      0    136    682  21144   14762
  18  18  6802   3426      0      0      0      0    248   1256  39111   27278
  14  14  4680   2420      0      0     28      2    252   1256  40166   28008
   7   7  3105   1520      0      0      0      0    148    752  23256   16228
Since my home directory is mounted via nfs, all I/O goes through the network! In fact, if I run collectl as collectl -scfn I see:
#<--------CPU--------><----------Network----------><------NFS Totals------>
#cpu sys inter  ctxsw   KBIn  PktIn  KBOut  PktOut   read  write meta comm
  19  19  1672    429      1     11      2      12      0   3885    6    0
  27  27  8466  12909   1652  20875  56112   39495      0  19383    0    4
   9   9  4042   1632    301   3781  10125    7129      0   9508    0    0
   7   7 18677   9074   3557  44897 120375   84729      0      0    0    0
   8   8 18082   8874   3559  44928 120359   84717      0      0    0    0
I first see a batch of over 3K nfs writes which also include 6 metadata calls, which are clearly doing a variety of directory accesses to see if the file currently exists as well as for creating the new one. During the next interval the network starts sending the bulk of the data over the network, which also include 4 commits (nfs does commits for a batch of writes as opposed to a single commit/write which would be excessive and slow). In the intervals that follow, nfs need do no more writes as they've already been queued up and so for the next several intervals all we see is the network traffic. The CPU load has also gone down because the data has already been moved into the outbound I/O buffers. For more details on nfs, see this page.

So in conclusion you can see there is really quite a lot you can do with just a few basic switches and I haven't even gotten into --verbose, which as they say is an exercise left for the student. So try some simple dt tests yourself or use you own personal favorite load generator, while trying out collectl -sc --verbose or collectl -sm --verbose or even collectl -sn --verbose. You can even put them all together as collectl -scmn --verbose, but then as you'll see you end up using a lot of that valuable screen real estate. As a final bonus, try adding the --home switch which move the cursor to the home (upper left-hand corner) position of the screen. Think of this as something like the linux top command (collectl also has a --top switch for displaying slab/process data) since each sample is displayed at the top of the screen. That command would then look like collectl -smcn --verbose --home.

enjoy...
updated Feb 21, 2011
collectl-4.3.1/docs/Gexpr.html0000664000175000017500000002516413366602004014401 0ustar mjsmjs Exporting Data In Ganglia Format

Exporting Data In Ganglia Format

Introduction

With the release of Collectl Version 3.3.1, one can now send collectl data directly to a ganglia gmond in binary format using the custom export gexpr.ph. This results in several benefits for existing users of ganglia:

As of V3.5 of collectl, the experimental status of this capability has been lifted since enough people are currently using it to verify it works as described.

Ganglia Configuration

The following figure shows one possible configuration of a large cluster running ganglia and is only intended to be illustrative. For complete documentation on how to set up and configure ganglia see the official wiki.

This configuration assumes one has set up ganglia gmonds in a hierarchy, such that those at the bottom of the tree collect system statistics and send them up to a higher level aggregation gmond and ultimately percolate up to the gmetad which writes the data to a round-robin database.

To use collectl as a data source there are 2 alternatives. In the diagram below at the left you simply have collectl send data to a local gmond to supplement whatever data it is already collecting, noting there won't be any way to record any of gmond's data locally. The diagram at the right would replace all gmonds and have collectl do all the data collection, optionall logging data locally, and sending data to an aggregation level gmond. Whatever method you chose you must ensure the gmond(s) are listening on their udp receive channel and enable the ganglia communications feature in collectl as described in the following sections. There are probably other hybrid configurations which are beyond the scope of this document as well as this author.

One last component is configuring the rrd to use the metrics being supplied by collectl and the details of that discussion are beyond the scope of this document as well.

Usage

Like any other custom export used by collectl, you tell collectl to use gexpr with --export gexpr and in addition include the gmond hostname:port followed by one or more of the standard switches. There are also 2 more switched unique to gexpr and they are:

The following example shows collectl gathering data on many system components but only sending cpu, disk, memory and network data to a gmond on system gmond using port 8108. It also sends a set of data every 20 seconds while writing the data to a file in plot format to a local file in /tmp every 5 seconds.

This module also supports sending its output to a multicast address, which is used if hostname in an address in the range of 225.0.0.0 through 239.255.255.255. To use this feature you will have to first install IO::Socket::Multicast which in turn requires the module IO::Socket::Interface be installed as well. Note that both these modules may be updated and so you should verify you're actually installing the latest one.

collectl -scdfijmntx --export gexpr,gmond:8108,s=cdmn,i=20 -i5 -f /tmp -P

Verification

There are 2 ways to make sure everything is working as expected, the first is to make sure you're using the ganglia export module correctly and to do this you can use the debugging parameter. Like collectl itself, which has its own debugging variable (see d), the value of this debugging variable should be interpreted as setting a bit mask, where each bit results in different behavior. Refer to the header of gexpr for the complete set, noting the following example uses only 1 bit, namely the one for printing the data being set over the socket. If for some reason you want to simultaneously disable actually sending the data over the socket to ganglia use the debugging value of 8 or a combined value of 8. Note that in this latter case you are still required to supply a socket:port because they will be opened but also realize you can use just about anything you want since no data is actually sent over it.
 collectl -scd --export gexpr,192.168.253.168:2222,d=1

 07:32:11.004 Name: cputotals.user       Units: percent               Val: 0
 07:32:11.004 Name: cputotals.nice       Units: percent               Val: 0
 07:32:11.004 Name: cputotals.sys        Units: percent               Val: 0
 07:32:11.004 Name: cputotals.wait       Units: percent               Val: 0
 07:32:11.004 Name: cputotals.irq        Units: percent               Val: 0
 07:32:11.004 Name: cputotals.soft       Units: percent               Val: 0
 07:32:11.004 Name: cputotals.steal      Units: percent               Val: 0
 07:32:11.004 Name: cputotals.idle       Units: percent               Val: 99
 07:32:11.004 Name: ctxint.ctx           Units: switches/sec          Val: 173
 07:32:11.004 Name: ctxint.int           Units: intrpts/sec           Val: 1031
 07:32:11.004 Name: ctxint.proc          Units: pcreates/sec          Val: 4
 07:32:11.004 Name: ctxint.runq          Units: runqSize              Val: 238
 07:32:11.005 Name: disktotals.reads     Units: reads/sec             Val: 0
 07:32:11.005 Name: disktotals.readkbs   Units: readkbs/sec           Val: 0
 07:32:11.005 Name: disktotals.writes    Units: writes/sec            Val: 0
 07:32:11.005 Name: disktotals.writekbs  Units: writekbs/sec          Val: 0
This second example shows the use of the g option which only sends the core ganglia data using ganglia variable naming. Also notice a nifty trick that since we're telling gexpr not to send the data over a socket, we can use a dummy network address/port and same ourselves some typing. Also note that since we didn't select memory or network stats, they aren't sent to ganglia either.
 collectl -scd --export gexpr,1.2.3.4:5,d=9,g

 05:43:19.003 Name: cpu_user             Units: percent      Val:        0 TTL: 5 sent
 05:43:19.004 Name: cpu_nice             Units: percent      Val:        0 TTL: 5 sent
 05:43:19.004 Name: cpu_system           Units: percent      Val:        1 TTL: 5 sent
 05:43:19.004 Name: cpu_wio              Units: percent      Val:        0 TTL: 5 sent
 05:43:19.004 Name: cpu_idle             Units: percent      Val:       99 TTL: 5 sent
 05:43:19.004 Name: cpu_aidle            Units: percent      Val:       99 TTL: 5 sent
 05:43:19.005 Name: cpu_num              Units: CPUs         Val:        1 TTL: 5 sent
 05:43:19.005 Name: proc_total           Units: Load/Procs   Val:      141 TTL: 5 sent
 05:43:19.005 Name: proc_run             Units: Load/Procs   Val:        0 TTL: 5 sent
 05:43:19.005 Name: load_one             Units: Load/Procs   Val:        0 TTL: 5 sent
 05:43:19.005 Name: load_five            Units: Load/Procs   Val:        0 TTL: 5 sent
 05:43:19.006 Name: load_fifteen         Units: Load/Procs   Val:        0 TTL: 5 sent

Assuming you're now successfully calling gexpr and can see the output above, it's time to make sure it is going to the expected gmond. The easiest way to do this is to simply run the gmond with debugging enabled at a level of at least 2 and make sure it can see the data from collectl, being sure you haven't set a debug level of 8 in gexpr. If it can see data in gmond (note that only the variable names and not their values are reported by gmond), you're done. If not, you need to make sure gmond is correctly listening for udp data and that the port it expects to see data on is in fact the one gexpr is sending to. If all looks correct, it may be necessary to watch the network traffic with a tool like udpdump or wireshark.

This is an example of what you should expect to see, noting that in this case gmond is still configured to collect its standard metrics, which can be disabled in gmond.conf since you no longer need these.

 gmond -d 2
 loaded module: core_metrics
 loaded module: cpu_module
 loaded module: disk_module
 loaded module: load_module
 loaded module: mem_module
 loaded module: net_module
 loaded module: proc_module
 loaded module: sys_module
 udp_recv_channel mcast_join=NULL mcast_if=NULL port=8108 bind=NULL
 tcp_accept_channel bind=NULL port=8109
 Processing a metric metadata message from cag-dl585-02.cag
 ***Allocating metadata packet for host--cag-dl585-02.cag-- and metric --cputotals.user-- ****
 saving metadata for metric: cputotals.user host: cag-dl585-02.cag
 Processing a metric value message from cag-dl585-02.cag
 ***Allocating value packet for host--cag-dl585-02.cag-- and metric --cputotals.user-- ****
 saving metadata for metric: cputotals.nice host: cag-dl585-02.cag
updated Feb 21, 2011
collectl-4.3.1/docs/Import.html0000664000175000017500000005176313366602004014572 0ustar mjsmjs Importing Custom Data

Importing Custom Data

Introduction

The mechanism for including custom recording/reporting code into collectl is very similar to that for exporting custom data. One uses the switch --import followed by one or more file names, separated by colons. Following each file name are one or more file-specific arguments which if specified are comma separated as shown below:

collectl --import file1,d:file2
In this example collectl will look for the files file1.ph and file2.ph, noting that the first has the single argument 'd'. Collectl will execute a perl require on each file (in the order they're specified) and subsequently call functions in them from various locations within collectl. Looking for strings in both collectl and formatit.ph that begin with &{$imp will identify the locations where collectl calls the functions named in the API and may help during the development/testing process to better understand what collectl is doing.

As a reference, a simple module has been included in the same main directory as collectl itself, which is named hello.ph as collectl's version of Hello World. Since it can't read anything from /proc it is hardcoded to generate 3 lines of data with increasing data values. Beyond that bit a hand-waving, everything else it does is fully functional. You can mix its output with any standard collectl data, record to raw or plot files, play back the data and even send its output over a socket.

From time to time additional import modules may be included in collectl which may also be used as reference. For example, the module misc.ph is now also part of collectl. It imports data about the uptime, number of people logged in, the cpu frequency and the number of mounted nfs file systems.

It should be noted that although collectl itself does not use strict, which is a long story, it is recommended these routines do. This will help ensure that they do not accidentally reuse a variable of the same name that collectl does and accidentally step on it.

A couple of words about performance

One of the key design objectives for collectl is efficiency and it is indeed very lightweight, typically using less than 0.2% of the CPU when sampling once every 10 seconds. Another way to look at this is it often uses less than 192 CPU seconds in the course of an entire day. If you care about overhead, and you should, be sure to be as efficient as you can in your own code. If you have to run a command to get your data instead of reading it from /proc, that will be more expensive. If that command has to do a lot of work, it will be even more expensive.

It is recommended your take advantage of collectl's built-in mechanism for measuring its own performance. For example, measuring the performance of the hello.ph module, which does almost nothing since it doesn't even look at /proc data, uses less than 1 second to read 8840 samples on an older 2GHz system, which is the equivalent of a full day's worth of sampling. Monitoring CPU performance data take about 3-1/2 seconds and memory counters take about 7 seconds, just to give a few examples of the more efficient types of data it collects.

Access to collectl internal functions, variable and constants

Collectl is relatively big, at least for a perl script, consisting of over 100 internal subroutines, most of which are simply for internal housekeeping, but some of which are of a more general purpose. It also keeps most of its statistical data in single variables and one dimensional arrays. Clearly hashes could make it more convenient for passing data around but it was felt that the use of more complex data structures would generate more overhead and so their use has been minimized.

While it is literally impossible to enumerate them all, there are a relatively small number of functions, variables and constants that should be considered when writing your routines to insure a more seamless integration with collectl. The following table is really only a means to get started. If you need more details of what a function actually does or how a variable is used, read the code.

FunctionDefinition
cvt()Convert a string to a maximum number of characters, appending 'K', 'M', etc as appropriate. Can also be instructed to divide counters by 1000 and sizes by 1024.
error()Report error on terminal and if logging to a message file write a type 'E' message. Then exit
fix()When a counter turns negative, it has wrapped. This function will convert to a positive number by adding back the size of a 32-bit word OR a user specified data width.
getexec()Execute a command and record its output to a raw file when operating in collectl mode, prepended with the supplied string
getproc()Read data from /proc, prepending a string as with getexec except in this case you can also instruct it to skip lines at the beginning or end. See the function itself for details
record()Only needed if not using getproc, which will call it for you. It writes data to a raw file> record when in record mode or calls the appropriate print routines in interactive mode. a single line of data
VariableDefinition
$datetimeThe date/time stamp associated with the current set of data, in the user requested format, based on the use of -o. See the constant $miniFiller which is a string of spaces of the same width.
$intSecsThe number of seconds in the current interval. This is not an integer.
ConstantsDefinition
$miniFillerA string of spaces, the same number of characters as in the $datetime variable
$rateA text string that is set to /secs and appended to most of the verbose format headers, indicating rates are being displayed. However, if the user specifies -on with the collectl command to indicate non-normalized data, it is set to /int to indicate per-interval data is being reported.
$SEPThis is the current plot format separator character, set to a space by default, but can be changed with --sep so never hard code spaces into your plot format output.

The API

The API between collectl and user written code is actually a fixed number of callbacks. In other words, when you tell collectl to import a piece of code, it not only uses that name to identify the code it also uses that name as a qualifier on the name of the functions it calls. If you load a module called mymodule, collectl will then make calls to mymoduleInit(), mymoduleGetData() and several others as enumerated in the table below. You must include all these function call backs in your code or prevent them from being called by restricting which switches the user is allow to specify in the collectl command line. For example if your module doesn't want to support plot data and you generate an error if the user specified -P (which can be checked by examining $plotFlag in your init routine), you can safely leave off the PrintPlot callback.

FunctionDefinition
AnalyzeExamine performance counters and generate values for current interval
GetDataRead performance data from /proc or any other mechanism of choice
GetHeaderDuring playback only, supply the header for additional initialization
InitOne time initializations are performed here
InitIntervalInitializations required for each processing cycle
IntervalEndOptional routine, called at end of each processing cycle if defined
PrintBriefBuild output strings for brief format
PrintExportBuild output strings for formatting by gexpr, graphite and lexpr, which are 3 standard collectl --export modules
PrintPlotBuild output string in plot format
PrintVerboseBuild output string in verbose format
UpdateHeaderAdd custom line(s) to all file headers

There are also several constants that must be passed back to collectl during intialization. See Init() for more details.

Analyze($type, \$data)

This function is called for each line of recorded data that begins with the qualifier string that has been set in Init. Any lines that don't begin with that string will never be seen by this routine. You should also be sure that string is unque enough that you aren't passed data you don't expect.

GetData()

This function takes no arguments and is responsible for reading in the data to be recorded and processed by collectl and as such you should strive to make it as efficient as possible. If reading data from /proc, you can probably use the getproc() function, using 0 as the first parameter for doing generic reads. If you wish to execute a command, you can call getexec() and pass it a 3 which is its generic argument for capturing the entire contents of whatever command is being executed.

If you want to do your own thing you can basically do anything you want, but be sure to call record() to actually write the data to the raw file and optionally pass it to the analysis routine later on.

In any case, each record must use the same discriminator that Analyze is expecting so collectl can identify that data as coming from this module. You may also want to look at the data gathering loop inside of collectl to get a better feel for how data collection works in general.

To make sure you're collecting data correctly, run collectl with -d4 as shown below for reading socket data, which uses the string sock as its own discriminator. The Analyze routine then needs to look at the second field to identify how to interpret the remainder of the data line.

collectl -ss -d4
>>> 1238001124.002 <<<
sock sockets: used 405
sock TCP: inuse 10 orphan 0 tw 0 alloc 12 mem 0
sock UDP: inuse 8 mem 0
sock RAW: inuse 0
sock FRAG: inuse 0 memory 0

GetHeader(\$header)

This function is called when one needs to know what is stored in the header of a file being played back - it is only called if collectl doesn't find it and therefore optional. While it is impossible to know how a module will use GetHeader, it is often used to retrieve instance numbers, such as how many nvidia GPUs one might have been monitored or their type (see nvidia.ph), both of which would have been written to the header using a call to UpdateHeader.

Since standard collectl processing is to always playback what the user requests, even if that data hadn't even been collected, the same holds true here. If one had gotten to this point and it is determined there is no data, the API does not contain a failure return code as does init. Rather, one would simply end up reporting 0s for all values.

Init(\$options, \$key)

This function is called once by collectl, before any data collection begins. If there are any one-time initializations of variables to do, this is the place to do them. For example, when processing incrementing variables one often subtracts the previous value from the current one and this is the ideal place to initialize their previous values to 0. Naturally that will lead to erroneous results for the first interval, which is why collectl never includes those stats in its output. However, if you don't initialize them to something you will get uninitialized variable warnings the first time they're used.

Upon completion return 1 to indicate success, Returning a -1 will indicate failure and result in the imported module's functional calls to be removed from collectl's call stack and will no longer be actively called.

InitInterval()

During each data collection interval, collectl may need to reset some counters. For example, when processing disk data, collectl adds together all the disk stats for each interval which are then reported as summary data. At the beginning of each interval these counters must be reset to 0 and it's at that point in the processing that this routine is called.

IntervalEnd()

As described earlier, if this routine exists it is called at the end of an interval processing cycle. This makes it possible to do any post processing that may be require before the start of the next interval. In many cases this is not necessary.

PrintBrief($type, \$line)

The trick with brief mode is that that multiple types of data are displayed together on the same line. That means each imported module must append its own unique data to the current line of output as it is being constructed without any carriage returns. Further, since there are 2 header lines and brief format supports the ability to print running totals when one enters return during processing, there are a number of places one needs to have their code called from.

PrintExport($type, \$ref1, \$ref2, \$ref3, \$ref4)

What about custom export modules and how this effects them? The good news is that at least for the standard 3 modules, lexpr, gexpr and grphite all support --import. In other words they too have callbacks that you must respond to if your code is being run at the same time as one of these.

Again, see hello.ph for an example, but suffice it to say you need to do something when called, even if only a null function is supplied.

lexpr can write its output to the terminal and do the easiest way to test this is to just run collect and have it display on the terminal. However, the output of gexpr and graphite is binary and so the easiest way to test this code is to tell them not to open a socket (though you must supply an address/port for gexpr, even if invalid) and print the data elements they are about to send to the terminal by running with a debug value of 9 noting this is gexpr's & graphite's own internal debugging switches and not collectl's. The 8 bit tells them to not open the output socket and the 1 bit tells them to display their output nicely formatting on the terminal.

collectl --import hello --export gexpr,1.2.3.4:5,d=9
Name: hwtotals.hw          Units: num/sec               Val: 140
Name: hwtotals.hw          Units: num/sec               Val: 230
Name: hwtotals.hw          Units: num/sec               Val: 320

PrintPlot($type, \$line)

This type of output is formatted for plotting, which can get quite complicated based on whether you are writing to a terminal, multiple files or a socket. Fortunately all that headache is handled for you by collectl. All you need to do is append your summary or detail data to the current line being constructed, similar to the way brief data is handled. Since it has to handle both headers as well as data, there are 4 types included in the call.

PrintVerbose($printHeader, $homeFlag, \$line)

Like PrintBrief, this routine is in charge of printing verbose data but is much simpler since it doesn't have to insert code into the middle of running strings.

UpdateHeader(\$line)

updated Feb 21, 2011
collectl-4.3.1/docs/style.css0000664000175000017500000000223213366602004014267 0ustar mjsmjsbody { font: 16px Verdana, Arial, Helvetica, sans-serif; padding-left: 10%; padding-right: 10%; } h1 { font-size: 2.5em; margin: 0px; } h2 { font-size: 1.8em; margin: 0px; } h3 { font-size: 1.2em; margin: 0px; } div.terminal { background-color: rgb(242,242,242); border: 1px solid rgb(127,127,127); color: black; margin-top: 1em; margin-bottom: 1em; } div.terminal-wide15 { background-color: rgb(242,242,242); border: 1px solid rgb(127,127,127); color: black; margin-top: 1em; margin-bottom: 1em; padding-right: 0%; font-size: 15px; } div.terminal-wide14 { background-color: rgb(242,242,242); border: 1px solid rgb(127,127,127); color: black; margin-top: 1em; margin-bottom: 1em; padding-right: 0%; font-size: 14px; } div.terminal-wide13 { background-color: rgb(242,242,242); border: 1px solid rgb(127,127,127); color: black; margin-top: 1em; margin-bottom: 1em; padding-right: 0%; font-size: 13px; } div.terminal-wide12 { background-color: rgb(242,242,242); border: 1px solid rgb(127,127,127); color: black; margin-top: 1em; margin-bottom: 1em; padding-right: 0%; font-size: 10px; } collectl-4.3.1/docs/Ganglia3.jpg0000664000175000017500000002536313366602004014556 0ustar mjsmjsÿØÿàJFIF``ÿÛC    $.' ",#(7),01444'9=82<.342ÿÛC  2!!22222222222222222222222222222222222222222222222222ÿÀ@"ÿÄ ÿĵ}!1AQa"q2‘¡#B±ÁRÑð$3br‚ %&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyzƒ„…†‡ˆ‰Š’“”•–—˜™š¢£¤¥¦§¨©ª²³´µ¶·¸¹ºÂÃÄÅÆÇÈÉÊÒÓÔÕÖרÙÚáâãäåæçèéêñòóôõö÷øùúÿÄ ÿĵw!1AQaq"2B‘¡±Á #3RðbrÑ $4á%ñ&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz‚ƒ„…†‡ˆ‰Š’“”•–—˜™š¢£¤¥¦§¨©ª²³´µ¶·¸¹ºÂÃÄÅÆÇÈÉÊÒÓÔÕÖרÙÚâãäåæçèéêòóôõö÷øùúÿÚ ?÷ú(¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¨,ïmu Tº²¹†æÝó²Xdƒƒ‚8<ƒ@ÑEQEQEQEQEehºý®»&§¬s!Ó¯Ê_5@Üê%pNGÌ:àûP­Ïø“Çð‰‰uÍZI%å"ÚÒHG<ì@X/gÏcÃÞ*ÐüWf×Z¥ äHpá2®q¹\àã gµ ]é¹±Eq7?< gªK¦ÜëËÔS$W¶˜ppAm›qžùǽvqKð¤ÐȲE"†GCÀò#¨¡j®GaôQEQEQPYÞÚê©ues Í»çd°Èpy€'¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(–¯m{w¤ÜÛé×ÿ`¼tÄW^H—Ê>»ƒø×øÿ[ð‡Â®5 $·u³ÓÑ#Jf* Ì Aõö¯kê+‚Ò>Ic£jž¾×æ¾ðíÔrGmbm‘Ô3–È—’ÄgŒŒg·j]üÿàÿHzi~æ]ýÇ|iiâcÄ–š­“OwÚrØ$K ÈÁIŽE;˜© õší Ô<;,+o¨gN[¨Ë.|´t?òÔä…+» ¨ë{”á}ô»û¬­úŠ8ZöÖ.ö_}ÿËôîFž*ñ §ƒ¼gm¨]Bú燑‚_CQ2˜÷Ç!C• ×#¨dÔ¼u¡ÜøZÕõË«-Rö Iô¨lÂ,^’_¼Äu9ÀÉ=±V-´ >ÛÁÚÖq¨É¨jêãRÔ®-K²Ü:íQN3ýÔRA@'znÒ× ³ÕôÝÒYî­²ïmîfv´/‰"aìzÕÂ0iºF³¦ÝN/-õK»‹‰nn©Ôç¿¥Xðþ—?†|'o¦Ü^¾ la(³y,Ñs´m‰ `q’qÓµ$í y/Á~€×4´îÿúžC©xÛÅ7¶2jv^<Óì/e¿û,^ŽÊž1æùcïåËcæ9÷Wc¨Mã[ÿÍá½+ÄVöPÛé–óÏ{%’I —¢tùÊŒ‚pâ¼Ï@Ne»Ó¼W¥C®ZÝ%¼ZxÐaº”É'"?´ ó$P +ȹ?+c?.ï^Ó&|G7ˆ¦ûC\^éðAöH-‹‚¨YžXܽb:dáq»ÌrR*»þ­§ãù›Ë [™Ú/îóÿ/Éùœ•®£ñ+Tðî¯}ÿ bÚ—îc±YP’–/»ˆÆhîxVŸ‹þ#˧øOà 6½Ë%ô–íp¶©±YŠÆÜÙ`rÝHÚ;ãßÏÚøB$ðæ«á©µýJ÷@¼V:u¤ZxÝyžgœ’L¨¤{gŒæG¹º‘ÖϧùÿžŸŸQ,-n±{ÿ_ׯ™SÁ¿nn|u¦èëãh¼Qm~²¬‹ý’lšÙ‘ «¸^Í^Y§hwãÄ:>±«ø®ÿZ—MyZÒ´”Š9 xÊ4ˆò®îIBƒæ!K³¼Žþ4*ÆØŽB>YG÷—Õzà÷ÆFA×4]’dJ…H{Ò—õý}þeŠ(¢ƒ0¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(®âß‹uxB WLxÅÏÛc‹&åu!‰R=^Ø5ÞW‘|}‚Ký úb#°ºÕãO”ÉVP>¿1  ß|ZÑÛVÓ.´ûÈÄ–×14R¡î¬0kÊ>jֵχz¤„ϧLÓYã|Dóc•p?Û>”ì4QEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQE㿬n<7®ètÈó6Ÿ*Áxƒñœã?PY ÿi}+ت†·¤Zëúî“x»­îáhŸŽ™î÷6Ÿ}o©éÖ×ö’ -îbYbqÝXd:³^IðSWº°M[Àz±#PÑfc?Å nqìçèâ½n€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ øš]_Â-¢ê—UÑì“Æÿ{`ÈB~˜+ÿ¯K Š( Š( Š( Š( Š( Š( Š+/Yñ•áô…µK±n&$Gò3Æ3ƒê?:i6ì„ÚZ³RŠåádxOþ‚ßù//ÿGü, ÿÐ[ÿ%åÿâj½œû1sǹÕQ\¯ü, ÿÐ[ÿ%åÿâhÿ…‘á?ú 伿üMÎ}˜sǹÕQ\¯ü, ÿÐ[ÿ%åÿâhÿ…‘á?ú 伿üMÎ}˜sǹÕQ\¯ü, ÿÐ[ÿ%åÿâhÿ…‘á?ú 伿üMÎ}˜sǹÕQ\¯ü, ÿÐ[ÿ%åÿâhÿ…‘á?ú 伿üMÎ}˜sǹÕQ\¯ü, ÿÐ[ÿ%åÿâhÿ…‘á?ú 伿üMÎ}˜sǹÕQ\¯ü, ÿÐ[ÿ%åÿâhÿ…‘á?ú 伿üMÎ}˜sǹÕQ\¯ü, ÿÐ[ÿ%åÿâhÿ…‘á?ú 伿üMÎ}˜sǹÕQ\¯ü, ÿÐ[ÿ%åÿâhÿ…‘á?ú 伿üMÎ}˜sǹÕQ\¯ü, ÿÐ[ÿ%åÿâhÿ…‘á?ú 伿üMÎ}˜sǹÕQ\¯ü, ÿÐ[ÿ%åÿâhÿ…‘á?ú 伿üMÎ}˜sǹÕQ\¯ü, ÿÐ[ÿ%åÿâhÿ…‘á?ú 伿üMÎ}˜sǹçþ'ÏÃŒú‰Pùz7ˆ?Ñ’p Nv¿ý÷^Õ^Wñ]ðŒ¼}¥.¦¦ëojÍo/Ë*ò¼íã<®}Ô_ >*è÷> ³¶Öû4¡¢v.…l€zŒgÜ=œö°sǹë4W+ÿ #ÂôÿÉyøš?ádxOþ‚ßù//ÿG³ŸfñîuTW+ÿ #ÂôÿÉyøš?ádxOþ‚ßù//ÿG³ŸfñîuTW+ÿ #ÂôÿÉyøš?ádxOþ‚ßù//ÿG³ŸfñîuTW+ÿ #ÂôÿÉyøš?ádxOþ‚ßù//ÿG³ŸfñîuTW+ÿ #ÂôÿÉyøšUøá6`£V'ÛÊç¶g>Ì9ãÜêh¢Š‚‚¸ÏÈááoû{ÿÑk]qž.ÿ‘ÃÂßö÷ÿ¢Ö®žÿ'ù=¾ïÌ»ETQEfÃâ &ã\›EŠúÔ`ŒI$üÊ?ǦGQ‘ëZUçŸüu{4^(ðá0ëö??îúÜ(íîØãÜp{V×|ikã-ÏP!¿ƒ wmÝÔ²yÇâ;PSEPEPEPHHU,ÄI=©kËüâíWOømówqÿ! …?,÷RG·_Àu4è>»¦kö¯s¥^Åu HcfŒôaØÿ?qÍhVW‡üaŽèâ!¶Oî¤Ùëÿ}~Ž}+Öë‘ø‘áŸøJ<uoæòßý"ÛK¨?/â2>¤P]Er$°ñV‰§§¾QøxÉù¢~êÞãõàÖÅxö¹§^|)ñ)ñ ËáËÇ }f!$õƒŸ”ö?/B+ÕôÝJÓWÓ Ô,&Y­§Pñº÷ÐöÇjµEPE‹âŸÙxOBŸT½l„bˆ4®z(ÿ< šÄø‡ãFðΟŽœ¾~¹~|»HTn+“ä}x¹ú—á÷‚×ÂzKËvÞ~±z|ËÉØî;;Aîy=ÎO¥aü<ðÕË㟮íNóæ´…‡DGÙAîkÓ(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(ÉlsàŒ3Y#ÑüEóÅÙR\œûè‘GZõªáþ*xmµÿK=²Ÿí 8ýªÝ—ïq÷”}G?U­ào/м%e©dyû|«…¥^óàý ŠŠ( Š( Š( ¹ÿÈŸÿlÿôb×A\ÿŽ?äO¿ÿ¶ú1jéükÔ™ü,ôj(¢  ®3Åßò8x[þÞÿôZ×g\g‹¿äpð·ý½ÿ赫§¿ÉþDÏo»ó.ÑEW—^^j3êú—üMoãXîäR)ʨPxWF*òå‰Ë‹Åà iScÔh¯)óµú êøÔyÚ‡ýu?ü jìþÉ­ÝoúÁ„óûPºµ‚úÒ[[¨’X&R’Fã!”õ¼†Þ[¿ƒþ(û%ÃK?„u3‡,mŸßÜwõ#E^óµú êøÕOS±}bÆK+ýBþâÝú¤“–ìp{Š?²kwAþ°a<þã×â–9áIbuxÝC#©È`yJ}|ç£êÚ¯…õEðþ£ªßEdßñç4w ¨zc°þGØ×mçjôÔÿð)«:YuJ‰Ùê·FõóŒ=¹¯f®ŸFz…Ýݽ…œ×wr¬6ð¡y$c¨êkÉ´[Kо-ÿ„ƒR‰ÓÃZs”±µqÄì;‘ß±?‚óƒ\Ž©¨ø¿XmÛT¾—K„ƒw$“–V ô#~{Wagú}œV–z–¡¼Kµ#Žàª¨öŠyuZ—åz.¢­œaèÙN÷jöíêzïJ+Ê|íCþƒ:Ÿþ5v¡ÿAOÿš´þÉ­Ýÿ¬O?¸õj+Ê|íCþƒ:Ÿþ5G5Σ8Öu<ª’3tÝ…*¬•î‡û &’¿ÜzÕKF–Iô=>iX¼m3¤•š»^cVv=¤î‚Š+ŽñÍÕÜ7\×wë)”¹†B„à.9SWJ›©5ÔεXѦêKe©ØÑ^SçjôÔÿð)¨óµú êøÕèdÖî#ý`ÂyýÇ«Q^SçjôÔÿð)¨óµú êøÔdÖîƒý`ÂyýÇ«Q^SçjôÔÿð)¨óµú êøÔdÖîƒý`ÂyýÇ«W’ølÿÂñZûÃŽÛ4­`}¢Ë?u_œ(ÿÇ—ß RyÚ‡ýu?ü jÈÖ´95¹¬§ŸU¾óí%ó"•¥ÜÉßå'¡È>ÔžU]-,5ŸáÖÿqí´W”ùº‡ýu?ü j<íCþƒ:Ÿþ5?ìšÝÐ¿Ö 'ŸÜzµå>v¡ÿAOÿš;Pÿ Î§ÿMGöMnè?Ö 'ŸÜzµå>v¡ÿAOÿš»ÜÜÜé_i¸–vŽíãW•Ë6ÐŒŸ©¬1”#Í&uàó:8¹8Ó¾‡M\ÿŽ?äO¿ÿ¶ú1k ®Çò'ßÿÛ?ýµËOã^§|þz5QPPWâïù<-ÿoú-k³®3Åßò8x[þÞÿôZÕÓßäÿ"g·Ýù—h¢Š‚‚¼¦où jßõÿ7þ…^­^S7ü…µoúÿ›ÿB¯S)þ3ô<> ÿtù ¢Š–ÚÚk˘í BòÈÁU}M}i+³âRmÙQ]Ç„î-ì®.¿´´ÉÜ~ðG9$îôêzb–ßʱ¹±ŒÎŠð,“mir3…䌌ÖY¥kó_QÄ^ÜúÓó8½wE·×t浜maÌrÊ7¯ÿZ¸s®k‘ÛŸ ²ÄÄÈ [ßòÌ_§OzôO¼¾ÑÒ[‰­N¥pvAb$Ý('ø™GAÓ¿p+ÒþêdžæÕn/!µÔÚO4\ÝO寙ÔF¯¿cô¯?R2—5¯V»võì{}Ó‚Ž&:7O«ô]M}F·ÐôÔ´‡–ûÒIŽ]»šÒ¬Oê³ë—‹¡Þ…ƒXŽA ¬œç¸ýqõk¸ŸÂwðZÏ)šÑ¥·]òÛ$Ù•Ô§=k¶–"‚„T]‘å×ÂbY¹Å·Õÿ_…EjéÚÆ£h÷fâÖÖÙ_`–ê]ŠÍ׊»â»y-H·˜$vJ¬g'¸­xûEMne%OdëIY/Ç[íCuÿs×6þU5Cuÿs×6þU¤¾cGø‘õG¦h?ò.éŸõéþ€+B³ôùtÏúô‹ÿ@¡_/‰Ÿ§GdÄøóþB/ý·þK]µq><ÿ†‹ÿmÿ’×F ýâ§cþéSÑœýQ_Z~rU› Jò;KTß+ôÀ¹'Ò¯ÝøvæÛÈònlï|é<¥û$»ðýàb³•XF\­êm '8ÆéôVôþ½†™.¬gš/-´3î•ë•ÇjCÿ"}§ý„ÿ@ZŸonW}lhð•!h­dßÜaQ].±¦\jž0Ô¢€¢„bòK#mHÔ’ǰ¬ÝGC¸Ó’)|ë{¨%m«-¬žbîþïÖˆW„”nõjö ¸Js²º‹jþŽÆe½/„ï …š[»™<–ÆÞ õ#‡=jÖ¹áë}>ÖÎâ »ˆ#2"NYæbp]ê¿—JŸ­RrQN÷À×P”ÜmÊ®rôVÇŠ¿ähÔ?ë¯ôZÓŸ<»£ôý•YS½ìÚû‚º¿ÿÈ&ûþ¿äÿÐV¹Jêüÿ ›ïúÿ“ÿAZósoà¯SÝáÏãËÐêëŸñÇü‰÷ÿöÏÿF-tÏøãþDûÿûgÿ£¼ õ>¾ =Š(¨(+ŒñwüŽÿ·¿ýµÙ×âïù<-ÿoú-jéïò‘3ÛîüË´QEAA^S7ü…µoúÿ›ÿB¯V¯)›þBÚ·ýÍÿ¡W©”ÿúº|ÐVׄâ’OÙ˜ÕÈF,ÅW8Ç< ÷¬Z|SKo!„G¾Þiì —îÅ»î7?ï §¹9î+»·´]_öö?k†ÎâÅšW[¢QZ7ÁWÎ=ÀüEsQÏq qÇĪ‘Ñ®rœçƒœƒ–àä|ÏÇÎÙpººqw9Úå×snùŽrNs»;›!²÷ãçlÝ<-zkÝJýï×_/ëC:ØÜi^SvÕ[—£¶šK¥—ãåz^<ðã[Þiš§†œwI‰fšæ5ÂËÈ+ׂà<0 zUÿ‡ž(ƒÄ-©ÞÊRg7Ú#'¸{p=øëMWCn.ç;\ºîmß1ÎIÎwgsd6AÞü|íž'Ä:}ö‰ªè²8¸F-t„îóûÌsÙî9ÉÎrk9Ñ­IJj+]õßÏoéÒÄàñ….v¬×-ãkYhŸ¼ïþwécÖmì×WðÔFî ;‹in˜¢´r`‡Î=ÀüG¨ªþ(òipÁp³ˆ¬ÕKŽäƒƒÈÈ Œö"¹mÄXÓá¼µ½œ¨bû]÷çvìçvw6CdíÇÎÛ®4÷–÷<{‹íw-–$’I<“’O=و嘞ª0¬æ§§.¯~ÿ#‡<$)Jš“ç²ZÆÛ5eñ;tûŸ•ÙPÝÇœßõÍ¿•MPÝÇœßõÍ¿•wKágGø‘õG¦h?ò.éŸõéþ€+B³ôùtÏúô‹ÿ@¡_/‰Ÿ§GdÄøóþB/ý·þK]µq><ÿ†‹ÿmÿ’×F ýâ§cþéSÑœýQ_Z~rløjêÞßR’;©|˜î`{4Œ„,8'Ú®ÙÚÚøsT±¾›T³ºÄØ)jþfÔ ‚ÄöÆGÌÑXN‡4›¾Yt±½±PêÚ´·³¤ÑX%ÚÎaÒÐà…îNqžØÅqÔVk$ïÛðV6–e9FJÛ·ø»6³¥ÙÜ\^ê±ëvM »¦Ž0ù˜±ä)NÜœg·¥7UK{Ý'L¾†þ×6öÑÁ%»I‰wƒ…ôçò®nŠÒ4å÷¶üŒ§‹Œ¹ýËs-u{Þ÷ÿ€jø–h§ñô°È’FÒe]x¬ª(­¡H¨ö9ëTujJ£êÛûº¿ÿÈ&ûþ¿äÿÐV¹Jêüÿ ›ïúÿ“ÿAZó3oà¯SÝáÏãËÐêëŸñÇü‰÷ÿöÏÿF-tÏøãþDûÿûgÿ£¼ õ>¾ =Š(¨(+ŒñwüŽÿ·¿ýµÙ×âïù<-ÿoú-jéïò‘3ÛîüË´QEAA^[yg¨C«ê_ñ*¿‘d»•Õã·fR¥¸ שQ]|D¨Kš'./ T=M)òuújø Ôy:‡ýµ?üjõjà~#xÆãKH<=¡—Ä—ÉÇÖÒ-nÊÆs|Éf@ =ëÇ­u–]Æ›e¥¶‡©¤QŒöVÉ÷>æ»x:ßÁÚÚ©Þ͉.î;Èþ€ÿtt‰îk§¬ã˜ÕSs¶¬Úy>T•-TWçÝžSäêôÔÿð¨òuújø ÕêÔVŸÚÕ»#õ ç÷žSäêôÔÿð© ::6§úõjõz(þÖ­Ùú¿…óûÏœ/ôýOÁz£kº]ôz<Ä ˜¤•P“Øôxü«±±¹›R²ŠòËN¿¸·”e$ŽÜ²ŸÄW¬]ZÁ}i-­ÔI,)I#qÊzƒ^Co-ßÁÿ}’ᥟÂ:Œ™ŠC–6Ïïî;ú‘È"²¥˜Ô§u£èo_'Ã׳R÷“¨ÐSÿÀV¨æ¶Ôd‚D6§–Rm[¸¯WŠXç…%‰Õãu ާ!ä})õ«Ík5k#ä8X´ÕþòŽ‹áiñHŒ’%´jÊã Õê(¯1»»žÒVA\oŽmn¦¹Ò¦·³¸¸X¼ÝâË‘¸éø×eE]*ŽœÔ×C:Ô•jnœ¶zSäêôÔÿð¨òuújø ÕêÔW¡ý­[²<õ ç÷žSäêôÔÿð¨òuújø ÕêÔQý­[²õ ç÷žSäêôÔÿð¨òuújø ÕêÔQý­[²õ ç÷žSäêôÔÿð¨òuújø ÕêÔQý­[²õ ç÷žSäêôÔÿð¨òuújø ÕêÔQý­[²õ ç÷žSäêôÔÿð¨òuújø ÕêÔQý­[²õ ç÷žSäêôÔÿð«¯ð5µÅ¶‘v.m¦¤¼wT™ ¶ ¯85ÓÑXb1Õ+Ç–HëÁå”p’r§}B¹ÿÈŸÿlÿôb×A\ÿŽ?äO¿ÿ¶ú1k–ŸÆ½Nùü,ôj(¢  ®3Åßò8x[þÞÿôZ×g\g‹¿äpð·ý½ÿ赫§¿ÉþDÏo»ó.ÑEQ@ç<]iàíKùñ%Ãe- Ï2¿ø¤ÿ\W?ðçÂ7vÏ?мC™uíGçă˜ôìHÇÐ`zÕë¿>«ñÇ5ë•âÿ ØøÃC“O¼$4–‰ûíê;Š›Âºf££ønÎÃU¿×p¦Ö˜Ó°Éäàq“É  š(¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ çüqÿ"}ÿý³ÿÑ‹]sþ8ÿ‘>ÿþÙÿèÅ«§ñ¯Rgð³Ñ¨¢Š‚‚¸ÏÈááoû{ÿÑk]qž.ÿ‘ÃÂßö÷ÿ¢Ö®žÿ'ù=¾ïÌ»ETQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEÏøãþDûÿûgÿ£º çüqÿ"}ÿý³ÿÑ‹WOã^¤Ïág£QEeë>Ò|@!¥ ¸òwy¼eÛœgî‘è+RŠi´î„Òz3•ÿ…oá?úÿäÄ¿üUð­ü'ÿ@Ÿü˜—ÿŠ®ªŠ¯i>ì\‘ìr¿ð­ü'ÿ@Ÿü˜—ÿŠ£þ¿„ÿèÿ“ÿñUÕQG´Ÿv‘ìr¿ð­ü'ÿ@Ÿü˜—ÿŠ£þ¿„ÿèÿ“ÿñUÕQG´Ÿv‘ìr¿ð­ü'ÿ@Ÿü˜—ÿŠ£þ¿„ÿèÿ“ÿñUÕQG´Ÿv‘ìr¿ð­ü'ÿ@Ÿü˜—ÿŠ£þ¿„ÿèÿ“ÿñUÕQG´Ÿv‘ìr¿ð­ü'ÿ@Ÿü˜—ÿŠ£þ¿„ÿèÿ“ÿñUÕQG´Ÿv‘ìr¿ð­ü'ÿ@Ÿü˜—ÿŠ£þ¿„ÿèÿ“ÿñUÕQG´Ÿv‘ìr¿ð­ü'ÿ@Ÿü˜—ÿŠ£þ¿„ÿèÿ“ÿñUÕQG´Ÿv‘ìr¿ð­ü'ÿ@Ÿü˜—ÿŠ£þ¿„ÿèÿ“ÿñUÕQG´Ÿv‘ìr¿ð­ü'ÿ@Ÿü˜—ÿŠ£þ¿„ÿèÿ“ÿñUÕQG´Ÿv‘ìr¿ð­ü'ÿ@Ÿü˜—ÿŠ£þ¿„ÿèÿ“ÿñUÕQG´Ÿv‘ìr¿ð­ü'ÿ@Ÿü˜—ÿŠ£þ¿„ÿèÿ“ÿñUÕQG´Ÿv‘ìr¿ð­ü'ÿ@Ÿü˜—ÿŠ£þ¿„ÿèÿ“ÿñUÕQG´Ÿv‘ìr¿ð­ü'ÿ@Ÿü˜—ÿŠ£þ¿„ÿèÿ“ÿñUÕQG´Ÿv‘ìr¿ð­ü'ÿ@Ÿü˜—ÿŠ£þ¿„ÿèÿ“ÿñUÕQG´Ÿv‘ìr¿ð­ü'ÿ@Ÿü˜—ÿŠ£þ¿„ÿèÿ“ÿñUÕQG´Ÿv‘ìr¿ð­ü'ÿ@Ÿü˜—ÿŠ£þ¿„ÿèÿ“ÿñUÕQG´Ÿv‘ìr¿ð­ü'ÿ@Ÿü˜—ÿŠ |8ð˜9þÉÿɉøªê¨£ÚO»Hö (¢ £ÿÙcollectl-4.3.1/docs/Ganglia2.jpg0000664000175000017500000002352713366602004014555 0ustar mjsmjsÿØÿàJFIF``ÿÛC    $.' ",#(7),01444'9=82<.342ÿÛC  2!!22222222222222222222222222222222222222222222222222ÿÀ9"ÿÄ ÿĵ}!1AQa"q2‘¡#B±ÁRÑð$3br‚ %&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyzƒ„…†‡ˆ‰Š’“”•–—˜™š¢£¤¥¦§¨©ª²³´µ¶·¸¹ºÂÃÄÅÆÇÈÉÊÒÓÔÕÖרÙÚáâãäåæçèéêñòóôõö÷øùúÿÄ ÿĵw!1AQaq"2B‘¡±Á #3RðbrÑ $4á%ñ&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz‚ƒ„…†‡ˆ‰Š’“”•–—˜™š¢£¤¥¦§¨©ª²³´µ¶·¸¹ºÂÃÄÅÆÇÈÉÊÒÓÔÕÖרÙÚâãäåæçèéêòóôõö÷øùúÿÚ ?÷ú(¢€ (¢€ (¢€ (¢€ (¢€ (¢€ +†øñËÁ7Z=´·¤·WIö…–e¯!Ývÿ8㟡­;¿ˆ>±ðí®¿u¬E›wÿò4oº\©·yÁëÇ_Küƒ­ŽšŠÇð?Y5懨Åy 6×Ú ²ö•€aœq‘ÍEâ_x{ÂC&»©Çf'b±©Vvlu!TëŒr=hzn ]Ú+”Óþ%xGTÓ.5+Ma^ÎÚXášV‚Tò ;”pO~ƒ¾*ŠÞº[Ö‹Äv¸²RóWNÇÉ7óýÜç#ÔPcEcxsÅz‹ld¼ÐïÒîßËr«uåX?*Èÿ…©àí¯ìøH­¾×æyuü½Þžf6ãÝx£­º…ô¹ØQXšï‹´ º®³©Ef^™|Àß2¡±ÉË (äç€iþñV‰â˽ÐõîàGØä+)SèU€#ñÐCbŠòŸüQðÝÏ…5Í?Bñ*.± cË6îñ’DŠ—&c‚~é9íšïüA®Ãá¿ ^ëW(Ò%¤M€à¹ì¹÷$Æ“v7AÛTºšôW›A¥|Q¿°M\ø³N³º‘É£¦šã"6”ã=ÈÎ 8=*=SÇ:¥ÿÁ÷×ìišÂÌ–³b5C(™cpd×Ï­UžÝEu¿CÓh¯7¶»ñ…üm¡é:¿‰-üCk«™P¡±Kimö!mêò½Ž}±]Þ­«éú™>¥ª]Çkgî’Yؤž€OjZZáÖÅÚ+Ñ~(ø7ÄZˆÓô­cíLŒâ?³LŸ*Œ“–@: l_|6¡kbž!¶óî•Z QÂüà 3Ú§Ùˆ4ØÑ\Ö‰ñ¾#ÕæÒ´f«ØCU€` V +÷Iãž”Ï|Eð—…õÓõf+{¶PÞPä*MÛíüqGo0ïäuW㉺g„·åÞÚºßÏÈæ7•~ÆsºTdàŸ»Ž½zÓ½ø•àý;E±Õîõ¸£³¾Û1ŠBòH$ ]àd’1ùŠ:_ämó:º+?E×4ÏiqjZMäwVrçl‰‘Èê8 ûšÐ¢Ö¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(‚ø­:Øi:«:¿ØôýnÚæêDRÞ\cp,@çaùÖ6³â]ÓÆúŽ/L³xn}2H-oŬŒ¶ÓûÅvî]ë•8ôÍzµ->ûýêÀõû­øÜóŸÞÛø“â·â} WDžÎarñÖòec—PÀ´|¹"°þ&x’ üEÓµ+k½2ÎoìÆ‰dÕažX\2BT²°ÀÉ<Â½Š¹=_Âz•ÏŠ ×´n->u“«Ù Œñ®ìDNõÂ|Ì}rrIÚSÒÖéÿüË„T›æv¿¯—¯cÏ@Ò¯¾bßT³Õ¾×­Á5ï‘GÉ%ÊŒFÀ2¯8#œs]‰4:O‰ÞW±·eŠ+Ï-Lc ±¦m§‘éÚµt ^é‚k™µÇ¹Ôç>d·BÙcÈ czƒó S°)9?7ùzŸÙ‰þ¯Yº;>x¼Ð÷‡ïÆ7¡çäà ÍŒb?,æiè¿«[ð-RƒÕÍ}Ï»}¼õùù_Œ‚Î{¯üI³°"+›‹ U‡¼hd¾¹Ç>ÕÊ?‹|7?µð-¶›tÞ"k1h4o±H²-Ïw$®Þ2œ÷ë^½ý‘xŸêõ›£³ç‹Í¿x~ñ|cz~NÜØÆ#òÏì‹ÄÿW¬ÝŸ<^h ûÃ÷‹ãÐóòpæÆ1—7v³ZiøÕÉN÷SWô}———ç¿^FãMSñÁPj1Çsqi¤Ü1gPØ•|‘¼g¾sÍCoe5Ï~$Ùéûa¸¸°µ•ùzÐÈ}sŽk°ŸOšÎõéâ†eY. íWþ&ñ¹?/nlˆüºZ4°ëÖûFñÅÕªHÂYC+L·zœ(À›ia&ä¬×ÅÜQ¥N;Mtèú[Ëú×çäš¿‹<9wðXøZßO¸Í…²%Å‘²|ÙÈŒÈìFÕç<ç9|u&½kÇšLjþêšU¢‡¹šÜPœnu!”gܨ¥ý‘xŸêõ›£³ç‹Í¿x~ñ|cz~NÜØÆ#òô-`û5ºEæË1&I[s1'$žÝO@Täå{«7¯Ì—Bν½¯ëïó»o>M:8/cÔ ×ÕDo£ýŠS9›§–>]¹'¦OB3ƒÅbkz%ÞŸð"ê-^åýò]ÝAŸ¸e¹V+ø{-Z7wýks¬¬+O é? |w¤_èö‰m£êùÓ®U‰Ã1ù£`ï–PÄm#8àqœcKãÏØ¼%cu¾(¼NÞQ4èÏ eI Ȫ 2ä€3’+Ыñ†&ñ5•¼vÚ€°»¶ÉÁ·ùNT®õ\8€sœãpVY•켟ërà¢å«²kô8x‚×Äúö¹¬^kú=Ö±%€·[M6 ¡A å·þùU²pq£óŠ­q¥Ø/ì×`«g¤Ï„t2cêH$f»»êsjÑê:ψ>Û4Q¬Qy6koå.Ò$(Á‹+HOÌAéÀª2ld^7úÝfé·ü“m7F:ÛêëÉÜØÛòyg5¶]¿ßêiì¡{¹®½d»y~^væ.Öî&)`’|¥Oó¯iþȼoõºÍÓoù&ÚnŒt ·Õד¹±·äòÏì‹Æÿ[¬Ý6ÿ’m &èÇ@»q±ý]y;›~O.o'Ó¿ã©^Οó¯¹ô¿—Ÿåço0ÔmÓß ¼u+ÞËc¦jÖ÷WOjb’8w?ÌÑ効F2z¥Kã_¯ö–{§k‘h:=å“ɾÚC]I+oÁ€+.TpäsøúWöEã­Ön›É6Ðtc ]¸Øþ®¼Í¿'–d^7úÝfé·ü“m7F:ÛêëÉÜØÛòymÊMÞÝoøXJ•5öÖÖÙ÷¿o_ÃnœWÁ‡žMÄ-qq-ćZ™ŒÒÛ}¤ÊFw¿ƒ9Î=ëÒë'û"ñ¿Öë7M¿ä›h º1Ð.ÜlW^NæÆß“ËÖªæ¿Klg8F;J÷¿Ô(¢Š (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ Ï×/®ôÝòöÆÂMBê‹ÅkÃJÝ€5¡TuZÓBÑîõ[ù<»[XŒ²7|ÃÔž€zšùwâ=÷5««oKäMz Ðh–çTc?<Š>‡‰#kgn+Ù>ÿÉ+³ÿ¯‰¿ô3X <=sãK­sÇšò”›VImlòa ¡eϷʳz½À¾OøVßDKÆ»òÜÌcÙ¸³÷rqùšé(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š+Æ~%^\xóÇwÃ.V[XØ\êÓ';rð÷™}+мwâÛxJóX—kL£Ë¶ˆÿËIOÝN¤û\×Áß O£x~m{VÜúæ¶ßi¸y̨NUO¡9,}Î;P¡YYÛéÖ0YZD±[ÛÆ±ÅôUTôQ@Q@Q@Q@Q@Q@Q@Q@Q@Q@Q@Q@Q@Q@Q@Q@Q@Q@Wžü]ñŒ¾ðÊéúifÖõvû-šGËŒà3~@ì=(”¹ÿ‹µñqmGïµçéš ˜µºkTÙ$™ƒƒ’3Îx žµê~G‹?èmÿÊlT(®ÿŸù3ìw´Wäx³þ†ßü¦ÅG‘âÏúò›‹ù—ãþAÌû~_æw´Wäx³þ†ßü¦ÅG‘âÏúò›‹ù—ãþAÌû~_æw´Wäx³þ†ßü¦ÅG‘âÏúò›‹ù—ãþAÌû~_æw´Wäx³þ†ßü¦ÅG‘âÏúò›‹ù—ãþAÌû~_æw´Wäx³þ†ßü¦ÅG‘âÏúò›‹ù—ãþAÌû~_æw´Wäx³þ†ßü¦ÅG‘âÏúò›‹ù—ãþAÌû~_æw´Wäx³þ†ßü¦ÅG‘âÏúò›‹ù—ãþAÌû~_æw´Wäx³þ†ßü¦ÅG‘âÏúò›‹ù—ãþAÌû~_æw´Wäx³þ†ßü¦ÅG‘âÏúò›‹ù—ãþAÌû~_æw´Wäx³þ†ßü¦ÅG‘âÏúò›‹ù—ãþAÌû~_æw´Wäx³þ†ßü¦ÅV<;¨kiã 4OUð›r­ödˆ«y‡Û4ri£üÿÈ\ÝÑÚÑE…yï‡ãëÄ?ö¹þb½ ¼÷ÿñõâû Üÿ1W…’þ$nQEQ@Q@Q@Q@y÷ÅOO§éhV_WÖÈüÊ„àŸlçhú“Ú»«ÛÛ}:Æ{Û©vðFd‘Ïe&¼ÃáåÇ‹¼UãíN2#,`ÓboàAÁ#è8ú– ëƒžµÒ¡Ã:.é¤òÒC÷›ú`+rŠ(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¬Ý+þJsØÿèáZU›¥ÉNoûý*á×Йt;Š(¢  ¯=ðïü}x‡þÃ7?ÌW¡Wžøwþ>¼Cÿa›Ÿæ*ãð²_ÄÊ(¢  ¢Š(¢Š(¢Š(¢ŠÇñGˆ-¼/áÛ½ZçBŸ»Ló#žGÔþ™4Â|JÔ.|K¯iþÒd"K‡Yu ¤qŽ@?‡Í÷}kÒtÝ>ÛIÓm´û8Ävöñˆã_aýk„øUáû˜¬n¼U«åõmeŒ»˜r‘¦zãÐ/¥z-QEQEQEQEQEQEQEQEQEQEQE›¥ÉNoûý+J³t¯ù)Íÿ`cÿ£…\:ú.‡qETç¾ÿ¯ÿØfçùŠô*óßÿÇ׈ì3süÅ\~Kø‘¹ETTÔõ4:kë€æ(€Üdœ?+Ÿÿ„ûOÿ ~§ÿ~WÿŠ«~8ÿ‘>ÿþÙÿèÅ®&½<ˆ·.‡‹›fU0RŠ‚N÷:¿øO´ÿúê÷åøª?á>Óÿè©ÿß•ÿâ«”¢»ÿ²h÷g‘þ±×þTuðŸiÿôÔÿïÊÿñTÂ}§ÿÐ?Sÿ¿+ÿÅW)EÙ4{°ÿXëÿ*:¿øO´ÿúê÷åøªó¯kÃÆ¾5ÓôùìoÓAÓˆžâ#Ï3lãkjŠO(¥Ñ±®#­Öê—Çšj(UÓu%P0@þú¥ÿ„ûOÿ ~§ÿ~WÿŠ®RŠÙ4{±¬uÿ•_ü'Úýõ?ûò¿üUðŸiÿôÔÿïÊÿñUÊQGöMì?Ö:ÿÊŽ¯þí?þúŸýù_þ*­i¾1°Ôõ¬c¶½†YCl3F ž„öÅU½ þG 'þÛè³Xâ2ÚT©Ji½¼wWˆ)E$ÏK¢Š+Ã>˜+™ºñÆky=·Ù/åh$1³E+¸|ÕÓW”Íÿ!m[þ¿æÿЫ·‡zŽ2<ìÏ,%i}N¯þí?þúŸýù_þ*øO´ÿúê÷åøªå(¯Wû&v|ÿúÇ_ùQÕÿÂ}§ÿÐ?Sÿ¿+ÿÅQÿ öŸÿ@ýOþü¯ÿ\¥dÑîÃýc¯ü¨êÿá>Óÿè©ÿß•ÿâ¨ÿ„ûOÿ ~§ÿ~WÿŠ®RŠ?²h÷aþ±×þTuðŸiÿôÔÿïÊÿñTÂ}§ÿÐ?Sÿ¿+ÿÅW)EÙ4{°ÿXëÿ*:¿øO´ÿúê÷åøª?á>Óÿè©ÿß•ÿâ«”¢ìš=جuÿ•_ü'Úýõ?ûò¿üUðŸiÿôÔÿïÊÿñUÊQGöMì?Ö:ÿÊŽ¨ü@Ó—²ÔQsË4+ÿWY^=©ÿÈ>_ÃùŠöó1øXaä”zžîUŽž2œ§5k;fé_òS›þÀÇÿG Ò¬Ý+þJsØÿèá\pëèzRèwQEAA^{áßøúñý†n˜¯B¯=ðïü}x‡þÃ7?ÌUÇád¿‰”QEAG?ãùïÿíŸþŒZâk¶ñÇü‰÷ÿöÏÿF-q5ïäÿ½O“â_ŠŸÏô (®«Ã6•wui¦Û_Þ¬«Çt2Š˜É ?ŽGãϧZª¥f|þƒ¯QSŸà¯±ÊÑ]ÔËc7ŠììäÓìbµv¸Š×fÂYIqJ©m©i·šMõ互›¡HÁ]Û‰Pÿ8'Ö°úÓi>^ݺ»#­åé6DµkgÑ&þ뜅Ój[ê–š-ÚÚÛÙËw3A(·M©€À^jçÛ4ñâ¡ ÌÙùßg.c&sÎ7oÎzóôªúΚG]oòÐ…WÖi-,õ×™]ÁýN6Šéìb³Ólu©&³†ñí.Q!óGË ŸQÀ$w¤Ô`¶Õ-4[ĵ‚ÎK¹š D µ8`·ZX\Ö¶þWÔŸ'72½¯oIrï¶æ6“§ÿjj¶ö^o•ç67íÝŽ é‘éUvHÉœí$f»» û8|a•iPJcŽP¤Læ-ß<ð}k†Ÿþ>$ÿ|ÿ:(Õ•I»«+'÷Ü1hQ¤¬îùš{ô¶„uoBÿ‘ÃIÿ¶ßú,ÕJKyoà×leÓ-bº½U˜Å ²yjçË<Áÿ=ÇZXï÷yúäÿï´þ“=JæêÞÊÚK›©ã‚Æ^IX*¨õ$ôªz.½¥øŠÉï4›´º·I Lêà 9Ô~uó?еßk#K_#«Ç Ûa#y¯¦9íg§zô­'Wø‡¤éÑZi~Ómíe'9ïŸ3’}{×É ¿^S7ü…µoúÿ›ÿB©á)ø©ÿBeŸýüÿí•“¤Üê7n›U´[[Ö¼”É …9äu=ó^¦Rÿ|ýˆ?Ý>hÑ¢Š+èψ +£¶’ÛFðí­èÓío.o$pZé7ª*à`^ù©-¦±‹N½×†—nò5ÂüƒtQerNÞùçµs¼CÖÑÒöùÞÇlp‰Ù9¤íw¾Š×¿ÝÓÌæTn`=N+bmÊ¿Õm~ÓŸ°DdÝåÿ¬ÁÏ}êΤ-ïôKMY,¡´Ÿí&ÞDv£Œd;c¥hÞÈ{Å?õêßÍ+9דÛMÿ—êmO ~muž»5'ú/¸ã(®êÆÂÚÇJ±d’â!4­©¿ÎsÐ(ìO­RZ]ž£¬ÞÁ¶öh[÷ǹÈêˆi¬d[i/êöËf£äµü4¿ÏMÎfÎÎ{û¤¶¶O2gÎÕÈÀÏSÇjžÇN7±Þ±—Ë6°±·;°@Ç^:×GáÝn;¿Y…Òtø$utwŠ,„‚£±ê ç Õ[Kï·¶µ/Ùm­±§²ì¶bŸy#ךR¯RíZÚ/ÅØtð´c.kݵÕl“ýNbŠ(®ÃÌ*jò—ðþb½†¼{Sÿ|¿‡óì5àgÄ¡ö\9þï?_Ñfé_òS›þÀÇÿG Ò¬Ý+þJsØÿèá^T:úüºÅQPPWžøwþ>¼Cÿa›Ÿæ+ЫÏ|;ÿ^!ÿ°ÍÏóqøY/âFåQPQÏøãþDûÿûgÿ£¸ší¼qÿ"}ÿý³ÿÑ‹\M{ù?Á/Säø—â§óýºê–qi²XÉk³ù†\^1´Œ©b 7A´“»Ž@,ÉËÑ^Ž"‡¶‡-ìx˜ZÂÕöŽ ZZÎÿ£_ðßyÒM­éö÷ð^Ák\™]m#Är\n H @$í''(Ò_Ë̓P´¶±»´_>Hî6+‘ í9™#hü­Éù~îæòóh¬cƒ’Vöðé¯o»åØì–iNNþÂ=z˪³û]RWï¯}5$Ôí_Nµ²&s«±FŽ<4…º¸,FÇ pOËÓsyz¿ð˜€<Ï"6¼ ³í‰h¢i8ǘ¬H#kOË÷w7—ËQJX.oŠoî]wéýiØpÍ£xQŠÛ¬ºmöºiø÷ÓN=JÖ+ ›<ÜJ“º™GóHW?:’FÉù[–ù~îæòÖMNÕôë["g1Ú»hãÃH[«‚Ä` p§ü½77——EWÕe{ó¾ý=;ý¥I+*ÚÛËkßù»þ½ôêׯ¾\ÑÎ îS ÷ j—Ý’Fý“´œ/ÝÜÞ_13#Lí—Rs¸¡\ú8ÏLàãð#¢ª†Qm§{ú~ˆË˜}f <‰Yïvÿ6ü¾~¡Vô/ù4Ÿûmÿ¢ÍT«züŽOý¶ÿÑfŒwû¼ý Éÿßiüÿ&vºÿ†tÙ}—V³IÔ}ÇèñŸUaÈÿ9¨|)á˜<%£eÛ]ÜÜÂ%i® ãåÌŸZÜ¢¾Hý+Êfÿ¶­ÿ_óèUêÕå3È[Vÿ¯ù¿ô*õ2Ÿã?CÃâ÷Oš (¢¾Œøƒ[NñΟfÖ†ÞÖîØ¶ñÔ[ÕÔsO‹Ä×±Þ]NÑ[MÖ<ÛybÌG/ËíXÔVN…6Ûks¢8ªÑJ*NËoëæÍ-KZ¹ÔÌ*é 0Cþ®dkž¤zY5Ë™nõ –Ž÷јåH_ÃùŠöð3âGÐû.ÿwŸ¯è‚³t¯ù)Íÿ`cÿ£…iVn•ÿ%9¿ì ôp¯*}~]âŠ(¨(+Ï|;ÿ^!ÿ°ÍÏóèUç¾ÿ¯ÿØfçùЏü,—ñ#rŠ(¨(§ªi°êúlÖ7 ë¸ÜP€F#÷ÿŸÿA Oþÿ/ÿ]]¤*Î Ñv3u5š¹Ê§ÿÐCSÿ¿ËÿÄÑÿŸÿA Oþÿ/ÿ]]Y­üÌÏê´?‘}Ç)ÿŸÿA Oþÿ/ÿGü :ý5?ûü¿üMutQõšßÌÃê´?‘}Ç)ÿŸÿA Oþÿ/ÿGü :ý5?ûü¿üMutQõšßÌÃê´?‘}Ç)ÿŸÿA Oþÿ/ÿGü :ý5?ûü¿üMutQõšßÌÃê´?‘}Ç)ÿŸÿA Oþÿ/ÿGü :ý5?ûü¿üMutQõšßÌÃê´?‘}Ç)ÿŸÿA Oþÿ/ÿVtßXéš”7ÑÝ^Ë,;¶‰¤V^AøGc]z²VreGJ.ñŠL(¢ŠÄØ+˜¹ð6Ÿsy=ÏÛ/ãiäiYc•Bî''-tôU¤ ïb'N3V’¹Ê§ÿÐCSÿ¿ËÿÄÑÿŸÿA Oþÿ/ÿ]]§Ök32ú­ä_qʧÿÐCSÿ¿ËÿÄÑÿŸÿA Oþÿ/ÿ]]}f·ó0ú­ä_qʧÿÐCSÿ¿ËÿÄÑÿŸÿA Oþÿ/ÿ]]}f·ó0ú­ä_qʧÿÐCSÿ¿ËÿÄÑÿŸÿA Oþÿ/ÿ]]}f·ó0ú­ä_qʧÿÐCSÿ¿ËÿÄÑÿŸÿA Oþÿ/ÿ]]}f·ó0ú­ä_qʧÿÐCSÿ¿ËÿÄÑÿŸÿA Oþÿ/ÿ]]}f·ó0ú­ä_qÉ·Ãí5×kßjL§¨i”ƒÿŽ×YEœêN¹¬)š´‚³t¯ù)Íÿ`cÿ£…iVn•ÿ%9¿ì ôp¢}.‡qETç¾ÿ¯ÿØfçùŠô*óßÿÇ׈ì3süÅ\~Kø‘¹ETQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQE›¥ÉNoûý+J³t¯ù)Íÿ`cÿ£…\:ú.‡qETç¾ÿ¯ÿØfçùŠô*ó¨l|M¥j°·ðð»‚çPšæ9~ÛyV collectl - File Naming

File Naming

All files generated by collectl via the -f switch, both raw and plot, will always contain the name of the host from which they have been generated according to the following rules:

  • If the specified name is a directory, the resultant file(s) will be created in that directory and begin with the hostname. If the name is not a directory, that name will be prepended to -hostname
  • The name is then followed with -yyyymmdd
  • If this is a raw data file or one generated using -P and -ou has been specified, it will also have -hhmmss appended as well to indicate the starting time of the sample. Note that the colons have been left off the time field to make it easy to move the file to a PC for further analysis if so desired.
  • The appropriate extension is added and if a compressed file, .gz is then appended.
warning
Never ever try to rename a file and expect collectl to be able to process it in playback mode. When running in this mode, collectl ignores any files that do not look like they were generated by collectl. It also verifies what appears in the hostname portion of the filename to match that which is recorded in the header.
updated Feb 21, 2011
collectl-4.3.1/docs/TheMath.html0000664000175000017500000001343613366602004014645 0ustar mjsmjs collectl - The Math

The Math

The Basics

At first glance, the way collectl calculates its numbers is pretty straight forward. It looks at successive intervals of counters, calculates their differences and normalizes the result by dividing by the interval, the result of which is the counter's rate/second. If -on is specified collectl does not divide by the interval and simply reports the difference. However, one occasionally may see numbers that don't make sense, such as a 1Gb network reporting rates almost double what it is capable of or other anomolous numbers.

The Interval Time Stamps

By design, collectl takes one time stamp at the start of each monitoring interval and associates that time with all the samples taken during that interval. This has been done for one major reason - there needs to be a single time associated with all data points, especially if you want to plot the data. The overhead in collecting the data is fairly constant and therefore the interval for that sample is fairly consistent and so the rates reported are also consistent.

However, there can be a problem that is important to understand and has been seen in the past. A device had the wrong firmware level and under some conditions caused a long delay in the middle of the collection interval. Some samples were collected close the the starting time of that interval while all that followed the delay were actually collected at a time much later than was being reported.

Consider the following in which we're looking at raw data collected for 2 subsystems, call them XXX and YYY. Let's also assume that the counters we're monitoring are increasing at a steady rate of 100 units/sec. In this example, during the 10:00:01 interval there was a 10 second hang in collecting the YYY sample. The XXX sample was correctly recorded, but by the time the YYY sample was collected, 1000 units were recorded. As we move to the next interval which was delayed by 10 seconds, the sample for XXX has accumulated 1000 units and the sample for YYY is 100.

TYPE            XXX     YYY
10:00:00        100     100
10:00:01        200     1100
10:00:11        1200    1200
10:00:12        1300    1300
The problem here is when reporting the 2 rates at 10:00:01, we'll see a rate of 1000 units/sec for YYY because based on the timestamp that interval only appears to be 1 second long. Conversely, the rate reported for that same subsystem at 10:00:11 will be 10 units/sec because this interval is reported as 10 seconds long. Also note that for this interval the counter for XXX has been incremented correctly and the resultant rates are reported correctly. This is because the sampling occured before the delays. If one were to move the timestamp to the end of the interval, it would fix the problem with YYY, but then move it to XXX.

It IS important to understand that this is only a problem if the delay is during the data collection itself. If there is a system delay that causes all data collection to be delayed but once started runs as expected, and this has been seen to be the typical case, the intervals may be longer but the counters will have increased proportionaly and the results consistent.

The only real answer to this problem would be to timestamp individual samples, however it is also felt that this problem is rare enough as to not be of serious concern and changing the methodology of timestamping would cause more problems than it solves.

The Counter Update Rate

This is a problem that is very real and worth understanding even it if you never personally see it. If the rate at which a counter is updated is too coarse, especially if it is close to the monitoring interval, the reported numbers will be off. For most of the data collectl reports on, this is not a problem because these counters ARE updated frequently. However, it turns out that some network drivers only updated about once a second and in early versions of the 2.6 kernel (and may the one you're currently running on), you may see some very strange anomolies in the output if you look at 1-second data samples. See this page for more details.

Normalization

As described elsewhere, collectl divides counter values by the interval between samples and then rounds off the results. When run interactively with a default interval of 1 second, this is not an issue. However, for data collected in deamon mode this can actually be of significance. Consider a network that has 1-4 errors over a 10 second period. This will normalize to a value of <5 and be rounded off to 0! The same is true for values reported for process/slab statistics, where these are typically measured over 60 seconds. Normalization in these cases can be even more dramatic.

One other thing to consider is that when selecting only non-zero values be reported, one might be occasionally be surprised by see values of 0 being reported. This will occur if there is a non-zero value that is then nomalized to 0.

If you think you might need to see these close to 0 values, you should include -on which tells collectl not to normalize its output before reporting.

In conclusion

As they say,
garbage in, garbage out and so if the number you're seeing look wrong, it's worth trying to understand why and you shouldn't necessary take them as face value.
updated Feb 21, 2011
collectl-4.3.1/docs/Environmental.html0000664000175000017500000004163713366602004016140 0ustar mjsmjs IPMI/Environmental Monitoring

IPMI/Environmental Monitoring

experimental

Overview

Most modern computer systems have the capability of reporting temperatures and fan speeds (as well as other information) via ipmi. However, the format of the device names is not standardized making it extremely difficult if not impossible to programmatically interpret and report it. This feature has been declared to be experimental in order to evaluate its success on a broader set of hardware, which I expect will be determined by the number of bug reports. As such it is disabled in collectl.conf so if you want to enable it when running as a daemon be sure to include an E in the -s string in the DeamonCommands line as shown below:
DaemonCommands = -f /var/log/collectl -r00:01,7 -m -F60 -s+CEYZ

Prerequisites

Collectl depends on the open source tool ipmitool, which must be installed first. Installation is as simple as pulling down and unpacking the kit with tar and executing the commands:
./configure
make
make install
That's all it takes. However, collectl must be run as root and your system must support ipmi. The easiest way to tell is if the command dmidecode | grep IPMI runs without error. If you get the error Could not open device at /dev/ipmi0... you are not running as root. If you get some other error your system probably does not support ipmi and even if you were able to install impitool, you won't be able to use it.

The next step is to start the ipmi driver, and this is generally done via the command service ipmi start on a RedHat system or something line /etc/init.d/impi start on others. On some systems such as HP blades, you may need to install a custom ipmi driver such as hp-OpenIPMI and start that instead of the standard driver.

At this point you should be able to execute the command "ipmitool sdr and see a all your sensor data or the commands ipmitool sdr type fan and ipmitool sdr type temp to just see fan and temperature data:

[root@bl460-63 ipmitool-1.8.9]# ipmitool sdr
UID Light        | 0 unspecified     | ok
Int. Health LED  | 0 unspecified     | ok
VRM 1            | 0 unspecified     | cr
VRM 2            | 0 unspecified     | cr
Temp 1           | 47 degrees C      | ok
Temp 2           | 34 degrees C      | ok
Temp 3           | 30 degrees C      | ok
Temp 4           | 30 degrees C      | ok
Temp 5           | 31 degrees C      | ok
Temp 6           | 30 degrees C      | ok
Temp 7           | 30 degrees C      | ok
Temp 8           | 66 degrees C      | ok
Temp 9           | 20 degrees C      | ok
Virtual Fan      | 37.24 unspecifi | nc
Enclosure Status | 0 unspecified     | nc

The collectl interface

Collectl uses a 3rd monitoring interval to collect ipmi data, which by default is 2 minutes. This is done for several reasons:
  • The ipmitool interface involves running another process and adds to the load, using a less frequent interval helps minimize collectl's footprint
  • This type of data changes at a slow enough rate as to not require monitoring too frequently although the current readings which have just been added to Version 3.1.2 do seem to change in nead real-time and so the monitoring interval may need to be rethought
Although you can run collectl to report this data interatively to report a single sample, its typical use is expected to be as a daemon. When run as a daemon collectl and -sE is specified, which is currently the default, it will first check to see if ipmitool is present and if a communications device of the form /dev/ipmi* is present. If so it will start collecting ipmi data for fans and temperature sensors. If it can detect a current sensor is present, it will report on that too.

You can control the way ipmi data is displayed in playback mode using --envopts and one of 3 switches that allow you to only report fan or temperature data and if you are reporting both, which is the default, you can request the 2 types of data be displayed on separate lines. This latter option can be useful if you have a lot of devices on which to report.

The following is an example of time-stamped output on an HP BL460c Blade, first without any options

collectl.pl -sE -i::1 -oT
# ENVIRONMENTAL STATISTICS
#             VFan   Temp1   Temp2   Temp3   Temp4   Temp5   Temp6   Temp7   Temp8   Temp9   Power
08:39:15    37.240      47      35      30      30      33      30      30      58      24     206
08:39:16    37.240      47      35      30      30      33      30      30      58      24     206
08:39:17    37.240      47      35      30      30      33      30      30      58      24     206
Here we see the effect of --envopts M when examining an HP DL380-G5. However it does also generate a lot more noise in the output. It's main purpose is for dealing with too much data to comfortably display on a single line.
collectl.pl -sE -i::1 -oT --envopts M
### RECORD    1 >>> opteron167 <<< (1218022891.002) (Wed Aug  6 07:41:31 2008) ###

# ENVIRONMENTAL STATISTICS
#   CFAN1   CFAN2   CFAN3   CFAN4   CFAN5   CFAN6   CFAN7   CFAN8   CFAN9  CFAN10   SFAN1   SFAN2
     6200    6000    6200    6200    6200    5800    6200    6000    6200    6000    6000    6200
#  CTEMP0  CTEMP1   STEMP
       51      48      29

### RECORD    2 >>> opteron167 <<< (1218022892.002) (Wed Aug  6 07:41:32 2008) ###

# ENVIRONMENTAL STATISTICS
#   CFAN1   CFAN2   CFAN3   CFAN4   CFAN5   CFAN6   CFAN7   CFAN8   CFAN9  CFAN10   SFAN1   SFAN2
     6200    6000    6200    6200    6200    5800    6200    6000    6200    6000    6000    6200
#  CTEMP0  CTEMP1   STEMP
       51      48      29
If you choose to convert the data to plot format, a file with the extension env will be created.

Device Names and the collectl challenge

As already mentioned, there is no standard on how one names an ipmi device and as a result the names used even on the small sample of systems tested during development have been quite different. Here is are just a few ways fan names are reported:
Fan 1
Fans
CPU FAN1
SYS FAN1
Fan1A (CPU)
FAN CPU0
FAN MOD 1A RPM
Fan Redundancy
On the one hand, collectl could simply report the exact names as they are reported, but the challenge of trying to format them in such a way as to provide a compact display are impossible. Given that the collectl standard reporting format is a single data header line, the notion of multiple-line headers is not an option. While it is tempting to simply determine the widest device name and use that for a header width, for systems that report over a dozen devices you couldn't fit them on the same line and that's only for systems that have been tested.

After looking at all these different names and formats, one common theme did emerge. All devices appear to have optional numbers (I didn't see any with just letters) and those numbers if there have optional letters. Furthermore, there seems to be some sort of optional type associated with many as well. This led to the idea of a standard naming for these devices as follows:

[type]Fan|Temp[devicenumber[deviceletter]]

in which the type field would be limited to a single character. Applying this scheme to the examples above leads to the following name mapping:

Fan 1              Fan1
Fans               Fan
CPU FAN1           CFAN1
SYS FAN1           SFAN1
Fan1A (CPU)        CFan1A
FAN CPU0           CFAN0
FAN MOD 1A RPM     MFAN1A
Fan Redundancy     RFan
This is admittedly not perfect but seems like a reasonable compromise and since collectl will report the device names in the same order returned by ipmitool it is not all that difficult to figure out how collectl chose to map them.

Parsing Names and Customization

This is where the fun begins or things get really ugly, depending on your perspective.

After examing many different types of device name formats, it was determined that most tended to follow the pattern of

prefix type instanceNumber suffix

Where things get a little crazy is that sometimes the actual instance number can be part of the prefix OR sometimes the instance contains a letter.

All that said, collectl breaks a device name into these components, assuming a numeric instance. It then applies the minimal set of tests/modifications, note there are examples of all these cases in the sample names shown earlier:

  • if the suffix contains the string in ()s, set the prefix to the string contained within
  • if an instance and suffix, append first word of suffix to the instance
  • if no prefix or instance and the suffix doesn't start with a digit, set the prefix to the suffix
  • if there is a prefix, prepend the first letter to the device name which is fan or temp
  • if there is a prefix and it contains a digit, use that as the instance number
  • remove all whitespace
Since this can get verf confusing, a special switch names --envdebug has been included which will show the actual parsing of the device names. The following is an example of parsing some of the names listed above:
Fan CPU0 Tach,3480
  Prefix:   Name: Fan  Instance:   Suffix: CPU0 Tach
Fan1A (CPU),EAh,ok,29.3,Performance Met
  Prefix:   Name: Fan  Instance: 1  Suffix: A (CPU)
FAN MOD 1A RPM,5775,RPM,ok
  Prefix:   Name: FAN  Instance:   Suffix: MOD 1A RPM
  • In the first example, the prefix is set to CPU0. Then in instance set to 0 and the C prepended to Fan
  • In the second case the prefix is set to CPU and ultimately the name resolved to CFan1A
  • In the third case the prefix is set to MOD and the device name set to MFAN and the instance number is lost when what we really want is to have it set to 1A!!!

Standard and User Defined Parsing Rules

Collectl provides a mechanism for dealing with device names that do not result in the generation of satisfactory names as described in the last section, by providing a file of translation rules for the system(s) they apply to. This file is currently populated with a number of HP systems this mechanism has been tested on and the rules are actually one or more perl pattern matching/replacement expressions. If the system name can be obtained via dmidecode, this file is searched for a matching stanza and its translations applied to device names before their initial parsing and/or immediately after a device name is generated.

If your system is not in this standard set, you can either add your own rules to /usr/share/collectl/envrules.std (assuming your system type can be obained through dmidecode) or put them in a standalone file and tell collectl to use it instead of the standard one using --envrules. If you do use your own file it should simply contains line of the following form (no stanza preface) noting that spaces and comments (lines preceeded with a #) are permitted:

[ignore]
/pattern/
...
[pre]
/pattern1/replace1/
/pattern2/replace2/
...
[post]
/pattern1/replace1/
/pattern2/replace2/
...
If you know perl (and you really should if you want to do this), collectl builds a perl pattern match command if you specify [ignore] and ignores any strings returned by ipmitool that match. This is a good way to reduce the volume of sensors on systems that may have dozens of them and you're only interested in a specific subset.

In the cases of [pre] and [post] a perl substistituion command is built out of the pattern and replace strings and applied to the sensor names. There is one caveat about [post] and that is it only applies to the actual derived sensor name and not the instance, so it you want to change a specific instance consider using a pre string to make a unique sensor name and then change it to what you really want with post. So looking at the string

FAN MOD 1A RPM
and the processing rules described in the previous section, the MOD suffix will be prepended to FAN and the first letter used to name the device MFAN, losing the instance information with is 1A.

There are at least 3 options here. The first is to simply remove MOD from each name which we can do with the rule:

/ MOD//
which will result in the instance names being picked up correctly because they will now immediately follow FAN. In fact, if you include --envdebug along with your rules you'll see the results of the replacement:
FAN MOD 1A RPM,5775,RPM,ok
  Pre-Remapped 'FAN MOD 1A RPM' to 'FAN 1A RPM'
  Prefix:   Name: FAN  Instance: 1  Suffix: A RPM
The second option would be to move MOD to the front of the string so that the rule that uses the first letter as part of the final name will take effect and that rule will look like:
/(.*) MOD (.*)/MOD $1$2/
and results in the following parsing:
FAN MOD 1A RPM,5775,RPM,ok Pre-Remapped 'FAN MOD 1A RPM' to 'MOD FAN 1A RPM' Prefix: MOD Name: FAN Instance: 1 Suffix: A RPM

Unfortunately in order to make perl iterpret the $1$2 symbols an eval is required which generates a little extra overhead and while not horrible an even better solution is the third option which doesn't use any special $ symbols:

/FAN MOD/MOD FAN/
which produces exactly the same results as the previous example except without the eval command.

There is in fact at least one other mechanism for those that are not all that familiar with perl and is only being included for completeness, and that is to simply hardcode the replacement of each device with the desired output. In other words

/FAN MOD 1A RPM/MOD FAN1 A/
/FAN MOD 2A RPM/MOD FAN2 A/
/FAN MOD 3A RPM/MOD FAN3 A/
etc
will produce strings that can also be properly parsed without involved $ variables but this means you need to specify each unique device name to remap and it will also result in all pattern matching statements to be executed for each device which will also result in slightly more overhead.

Power Monitoring

Currently all the systems that power monitoring has been testing on report it as the field Power Meter and without more examples, the parsing is currently set up to specifically look for that field.

Performance and Alternate IMPI Devices

In some situations there may be multiple ipmi devices over which to communicate and if so, the default one may not necessarily be the fastest one. If you thing the ipmi commands are taking too long to execute, try a simple experiment like this:

ipmitool sdr dump /tmp/xxx
time for i in `seq 1 10`; do ipmitool -S /tmp/xxx sdr > /dev/null; done;
real    0m20.476s
user    0m0.004s
sys     0m0.015s
As you can see, even though the command only used 0.02 seconds of CPU time, the elapsed time was over 20 seconds, a good indication something is not right. If look in /proc/ipmi you make see more than one directory as in the following case:
[root@hpdc3dmgt1 ~]# ls /proc/ipmi
0  1
This means there are 2 different IPMI devices and since the default is one, let's try repating the command above on the other device. Also notice that since we've already initialized our cache file we do not need to reissue the ipmitool sdr dump command:
time for i in `seq 1 10`; do ipmitool -S /tmp/xxx -d1 sdr > /dev/null; done;
real    0m0.487s
user    0m0.004s
sys     0m0.013s
See how the elapsed time is only a fraction using device 1? To tell collectl to use this device instead of the default, simply specify the number in the --envopts switch, for example collectl -sE --envopts 1

Restrictions

Some systems report what appears to be device codes in the data field and the data in the 4th field and I don't know why. For now, when this occurs report the 4th column as the data instead. If this breaks other things it will have to be removed and invalid data reported for those who do not report it in column 2.
updated June 25, 2010
collectl-4.3.1/docs/Process.html0000664000175000017500000007524713366602004014741 0ustar mjsmjs collectl - Process Monitoring

Process Monitoring

Introduction

Collectl has the ability to monitor processes in pretty much the same way as ps or top do as can be see here:
# collectl -sZ
# PROCESS SUMMARY (faults are /sec)
# PID  User     PR  PPID S   VSZ   RSS  SysT  UsrT Pct  AccuTime MajF MinF Command
21502  root     15  1749 S    6M    2M  0.00  0.00   0   0:06.40    0    0 /usr/sbin/sshd
21504  root     15 21502 S    4M    1M  0.00  0.00   0   0:00.79    0    0 -bash
22984  root     15     1 S    7M    1M  0.00  0.00   0   0:00.78    0    0 cupsd
23073  apache   15  1914 S   18M    8M  0.00  0.00   0   0:00.01    0    0 /usr/sbin/httpd
You can select processes to monitor by pid, parent, owner or command name (see section on Filters below). When using names, you can use partial or full names or even use strings that were part of the command invocation string such as parameters. The main benefit of monitoring processes with collectl is that you can coordinate the sample times of process data with any of the other subsystems collectl can monitor.

The way you tell collectl to monitor processes is to specify the Z subsystem and any optional parameters with --procopts. Since monitoring processes is a heavier-weight function, it is recommended to use a different interval, which can be specified after the main monitoring interval separated by a colon. The default is 60 seconds. Therefore, to monitor all the processes once every 20 seconds and the rest of the parameters every 5 simply say:

collectl -sZ -i5:20
The biggest mistake people make when running this command interactively is to leave off the interval or specificy something like -i1 and not see any process data. That is because the default interval is 60 seconds and they just haven't waited long enough for the output! This should obvious since collectl will announce it is waiting for a 60 second sample.

There are also a few restrictions to the way these intervals are specified. The process interval must be a multiple of the main interval AND cannot be less than it. If you specify a process interval without a main interval, the main interval defaults to the process interval.

Finally, as with other data collected by collectl, you can play back process data by specifying -p. While not exactly plottable data, you can specify -P and the output will be written to a separate file as time stamped space delimited data, one process per line.

Options

As stated earlier, you can specify options specific to process monitoring. These apply to all forms of process output unless otherwise noted, including --top and --procanalyze
  • c: include cpu times of children who have exited with their parent
  • f: use cumulative totals for maj/min page faults
  • i: Show alternate format which includes all io counters
  • m: Show alternate format which includes all memory sizes
  • p: Never look for new pids or threads after startup (improves performance)
  • r: Only show root name of command, leaving off path
  • t: Look for thread for all processes
  • w: Include command arguments, making a wider display. Can be combined with r.
  • z: Only show processes with non-zero sort field. This only applies to --top.

Filters

You can tell collectl to monitor a subset of processes by using the --procfilt switch followed by one or more process selectors, separated by commas (see collectl -x to see a detailed list). For the most part, the use of filters is pretty straighforward in that if you want to see all processes whose parent is 1234 as well as those that contain http in their command name, you would specify a filter of --procfilt P1234,chttp. However there is one important distinction to keep in mind. The c prefix says to select on the command name only whereas the f prefix says to look at the entire command string including arguments. In other words, if you're editing the file abc and try to select it via --procfilt cabc you'll never see it. This is particularly annoying when monitoring perl scripts since the name of the command is perl and the name of the script shows up as an argument.

If a plus sign immediately follows a process selector any processes selected by it will have their threads monitored as well. See collectl -x or man collectl for more details.

Dynamic Process/Thread Monitoring

A unique feature of process monitoring is that processes specified with a selection list via --procfilt do not have to exist at the time collectl is run. In other words, collectl will continue to look for new processes that match this selection list during every collection cycle! While this is indeed a good thing if that is what you want to do, it does come with a price in overhead: not a lot, but overhead never-the-less.

If you do not want this effect and only want to look at those processes that match the selection list at the time collectl is started, specify --procopts p to suppress dynamic process discovery. This holds for process threads as well, suppressing looking for new ones.

Perhaps the best way to see this in effect is to run collectl with the following command:

collectl -i:.1 -sZ --procfilt fabc
The .1 for an interval is not a mistake. It is there to show that you can indeed use collectl to spot the appearance of short lived processes - just don't do it unless you really need to. The --procfilt switch is saying to look for any processes invoked with a command that contains the string 'abc' in it. When this command is invoked there shouldn't be any output unless someone IS running a command with 'abc' in it. Now go to a different window or terminal and edit the file abc with your favorite editor. You will immediately see collectl display process information for your editor and when you exit the editor the output will stop.

The Time Fields

The SysT and UsrT represent the system and user time the line item spent during the current interval. One might think this means that in a 60 second interval the most time a process could spend is 60 seconds. Not quite! If this is a multi-processor/multi-core system the process could actually spend up to 60 seconds on each core, so just be careful how the times are interpretted. The Pct field is the percentage of the current interval the process had consumed in system and user time, which can also exceed 100% in multi-processor situations. Finally, since the AccuTime field accumulates these times it can exceed the actual wall clock time.

When run in non-threaded mode, the times reported include all time consumed by all threads. When run in threaded mode, times are reported for indivual threads as well as the main process. In other words, if a process's only job is to start threads, it will typically show times of 0. If you rerun collectl in non-threaded mode you will see it report aggregated times.

Process Memory Utilization

The types of memory utilization displayed as part of the process monitoring output are the Virtual and Resident sizes. However there are additional type of memory that collectl tracks and to see them as well as page faults you can select alternate process display format as follows:
# collectl -sZ -i:1 --procopts m
# PID  User     S VmSize  VmLck  VmRSS VmData  VmStk  VmExe  VmLib MajF MinF Command
 9410  root     R 81760K      0 15828K 14132K    84K    16K  3620K    0   18 /usr/bin/perl
    1  root     S  4832K      0   556K   212K    84K    36K  1388K    0    0 init
    2  root     S      0      0      0      0      0      0      0    0    0 kthreadd
    3  root     S      0      0      0      0      0      0      0    0    0 migration/0

Process I/O Statistics

As of collectl Version 2.4.0, if process I/O stats have been built into the kernel collectl will add 2 additional columns to the process display named RKB and WKB, noting in the following example I've set the display interval to 1 second and removed the initialization message from the output. As with all fields reported as rates/sec these will show consistent values independent of the interval and if you want the unnormalized value be sure to include that option with the -o switch as -on.
# collectl -sZ -i:1
# PROCESS SUMMARY (faults are /sec)
# PID  User     PR  PPID S   VSZ   RSS  SysT  UsrT Pct  AccuTime  RKB  WKB MajF MinF Command
    1  root     20     0 S    4M  552K  0.00  0.00   0   0:00.68    0    0    0    0 init
    2  root     15     0 S     0     0  0.00  0.00   0   0:00.00    0    0    0    0 kthreadd
    3  root     RT     2 S     0     0  0.00  0.00   0   0:00.02    0    0    0    0 migration/0
A particularly useful feature I've found is monitoring one or more processes by name (you can also monitor by pid, ppid and uid) to see what they're doing. In this case I'm using the dt program to write a large file and telling collectl to display any process whose command string matches dt as well as to include time stamps.
# collectl -sZ -i:1 --procfilt cdt -oT
# PROCESS SUMMARY (faults are /sec)
#          PID  User     PR  PPID S   VSZ   RSS  SysT  UsrT Pct  AccuTime  RKB  WKB MajF MinF Command
09:01:03 13577  root     20 12775 R    1M    1M  0.04  0.00   4   0:01.92    0  16K    0    0 ./dt
09:01:04 13577  root     20 12775 D    1M    1M  0.40  0.00  40   0:02.32    0 118K    0    0 ./dt
09:01:05 13577  root     20 12775 D    1M    1M  0.24  0.00  24   0:02.56    0  65K    0    0 ./dt
Finally, note that there is more process I/O data available but I chose to leave it off the default display and instead have the following alternate format. This is the same methodology used for reporting process memory utilitation, namely you only see VSZ and RSS in the default display but much more with --procopts m. Also note in this caseI chose 1/2 second monitoring as well as showing time in msec resolution:
# collectl -sZ --procopts i -i:.5 --procfilt cdt -oTm
#              PID  User     S  SysT  UsrT   RKB   WKB  RKBC  WKBC  RSYS  WSYS  CNCL  Command
09:03:24.003 13614  root     D  0.12  0.00     0   32K     0   32K     0    64     0  ./dt
09:03:24.503 13614  root     D  0.14  0.00     0   32K     0   32K     0    64     0  ./dt
09:03:25.003 13614  root     R  0.10  0.00     0   24K     0   24K     0    48     0  ./dt

The --top Switch

A feature that has been in collectl for awhile has been the --top switch which generates data in a format similar to the linux top command, though it was limited to process data only. However, since the inclusion of process I/O statistics as well as the inclusion of a few additional handy switches, this command has become much more useful as explained below.

In its simplest form, this switch tells collectl to simply display the top consumers of cpu. However, as of collectl V2.6.4 you can now now tell it to optionally display the list sorted by I/O or page faults. Here I'm simply looking for the top processes sorted by page faults with the command collectl --top flt and the display fills my window, which in this case is only 10 lines high. To look at the top consumers of I/O, simply use --top io instead:

# PID  User     PR  PPID S   VSZ   RSS CP  SysT  UsrT Pct  AccuTime  RKB  WKB MajF MinF Command
 3009  root     20     1 S    2M  280K  3  0.00  0.00   0   0:43.01    0    0    0    8 irqbalance
 7144  root     20  6485 R   81M   15M  2  0.00  0.06   6   0:01.70    0    0    0    5 /usr/bin/perl
    1  root     20     0 S    4M  556K  2  0.00  0.00   0   0:03.60    0    0    0    0 init
    2  root     15     0 S     0     0  2  0.00  0.00   0   0:00.00    0    0    0    0 kthreadd
    3  root     RT     2 S     0     0  0  0.00  0.00   0   0:00.10    0    0    0    0 migration/0
    4  root     15     2 S     0     0  0  0.00  0.00   0   0:00.06    0    0    0    0 ksoftirqd/0
    5  root     RT     2 S     0     0  0  0.00  0.00   0   0:00.30    0    0    0    0 watchdog/0
    6  root     RT     2 S     0     0  1  0.00  0.00   0   0:00.08    0    0    0    0 migration/1
    7  root     15     2 S     0     0  1  0.00  0.00   0   0:00.02    0    0    0    0 ksoftirqd/1
As discussed earlier, threads can be considered for displaying with --procopts t which requests all selected processes be examined for threads. You can also specify a subset of processes be examined by specifying a + with --procfilt but that's getting into more advanced concepts. Fnally, in the spirit of saving screen real estate collectl doesn't include command arguments in the output but including --procopts w will request a wider display that does include them. In fact you can get an even narrow display by including --procopts which requests only a command's root name be displayed so in the example about we would see perl instead if /usr/bin/per.

The following 3 successive displays are the result of monitoring a processes named thread.pl which creates a couple of threads 10 seconds apart which then do some I/O. In the first display we see the main script, which is actually run under the perl interpretter and so the string thread does exist as part of the argument string, but I chose to leave it off the output to save screen real estate:

collectl --top io --procfilt fthread --procopt t
# PROCESS SUMMARY (faults are /sec) 06:57:42
# PID  User     PR  PPID S   VSZ   RSS CP  SysT  UsrT Pct  AccuTime  RKB  WKB MajF MinF CommandnF Command
 7024  root     20  6725 S   61M    2M  2  0.00  0.00   0   0:00.00    0    0    0    0 /usr/bin/perl
A few seconds later the first thread starts up and immediately goes to the top of the list since it does have the dominant I/O:
# PROCESS SUMMARY (faults are /sec) 06:57:52
# PID  User     PR  PPID S   VSZ   RSS CP  SysT  UsrT Pct  AccuTime  RKB  WKB MajF MinF Command
 7065+ root     20  6725 R   73M    5M  0  0.88  0.12 100   0:01.98    0 291K    0    0 thread.pl
 7064  root     20  6725 S   73M    5M  2  0.88  0.11  99   0:01.98    0    0    0    0 /usr/bin/perl
And still later the second thread shows up, it too having a higher sort order than the root script:
# PROCESS SUMMARY (faults are /sec) 06:58:02
# PID  User     PR  PPID S   VSZ   RSS CP  SysT  UsrT Pct  AccuTime  RKB  WKB MajF MinF Command
 7098+ root     20  6725 R   83M    8M  1  0.12  0.02  14   0:00.86    0  29K    0    0 thread.pl
 7096+ root     20  6725 R   83M    8M  0  0.16  0.00  16   0:04.24    0  27K    0    0 thread.pl
 7095  root     20  6725 S   83M    8M  2  0.28  0.02  30   0:05.13    0    0    0    0 /usr/bin/perl
Naturally, as with all other data in collectl, you can record it to a file and play it back later using various combinations of options with --procopts and even --top. If you are using --top collectl simply displays a blank line between intervals. Also don't forget to try different sort options and experioment with the number of lines per process interval you want to examine since the default is your screen height and may be too big for playback purposes.

Including non-process data
The native top can natively show other types of data besides the top processes and so can collectl. Just specify those subsystems you are interested in with -s and they will be displayed in a scrolling window above the process data - by scrolling multiple lines of data, you are able to see history, something the linux command cannot do. You may also want to include timestamps with the brief data by using -oT to make it easier to read.

But don't stop with brief data, you can even show verbose data as well. However in the case of multiple subsystems it just isn't practical to show scrolling history and so you will only see the latest sample. If you choose to show a single verbose subsystem you will see scrolled data.

Finally, if you want to customize the way the screen real-estate is allocated between the process and other data, you can change the size of the process section by including the number of lines to display as the second argument to --top. You can also control the size of the subsystem data with --hr lines, a synonym for --headerrpeat lines.

Experimental --export proctree

This is an experimental (meaning subject to change) alternate format for displaying process data. Rather than simply show processes in the defaut PID order or sorted by a particular field when using the --top format, this format displays processes in a parent/child relationship. As with all --export formats, one can use this interactively, when playing back data or to send the data over a socket when using --address. At the very least, this could offer a good starting point for developing your own alternative process output formats.

There are actually 2 main functional components to this format, the main one being to determine the parent child relationship between all processes (there IS some additional overhead involved here). A second function is the aggregation of various counters and meters.

Proctree can also be combined with --top to limit the number of processes display OR in playback mode with or without --top. Consider the following output when playing back a file with --top --export proctree:

#  PID       PPID User     PR S   VSZ   RSS CP  SysT  UsrT Pct  AccuTime MajF MinF Command
00001           0 root     15 S    2G  108M  0  0.03  0.03   0   0:18.21    0    0 init
 05535          1 root     15 R  106M   15M  2  0.01  0.03   0   0:07.68    0    0  /usr/bin/perl
 05452          1 haldaemo 15 S   85M    7M  1  0.02  0.00   0   0:06.99    0    0  hald
  05453      5452 root     15 S   55M    3M  0  0.02  0.00   0   0:05.42    0    0   hald-runner
   05474     5453 root     16 S    9M  652K  0  0.02  0.00   0   0:05.41    0    0    hald-addon-storage:
One can quickly see that the total CPU consumption for this monitoring interval is 0.03 of both system and user time by simply looking at root process 1. Furthermore, of the system time 0.01 is consumed by 05535 while the other 0.02 is consumed by one of the children of 05452, actually the grandchild 05474. One should also note that any processes with no CPU time will be excluded from the display to keep the output reasonable dense. Without --top all processes are shown.

One can also use most of the process options as well (see --showsubopts for the complete set).

Additional interactive --top options
Proctree was really developed for real-time display with --top and so there are more available options, the main one to consider is the suppression of fields with zero in them. In the previous example, fields with 0 CPU were suppressed because by default --top sorts by CPU (even though we're not sorting). If one were to choose a different sort field with --top, proctree will use that field to suppress entries with zero in them. In fact, there are a number of different switches one can select interactively, one of which is to change the suppression value from 0 to something else.

So let's take a closer look at running in interactive mode by typing the command

collectl --top --export proctree
and at some time after the first data screen is displayed, type RETURN. You will now see a menu like this:
Enter a command and RETURN while in display mode:
  pid    only display this pid and its children
  a      toggle aggregation between 'on' and 'off'
  dxx    change display hierarchy depth to xx
  i      change display format to 'I/O'
  k      toggle multiplication of I/O numbers by 1024 between 'on' and 'off'
  m      change display format to 'memory'
  p      change display format to 'process'
  h      show this menu
  stype  where 'type' is a valid sorting type (see --showtopopts)
         entries with 0s in those field(s) will be skipped
  wxx    max width for display of command arguments
  z      toggle 'skip' logic between 'on' and 'off'
  Zxx    when skipping, only keep entries with I/O fields > xxKB
Press RETURN to go back to display mode...
This list shows a number of commands which will change the display contents and/or format much in the way you can do with the standard linux top command. First type RETURN to get back into real-time display mode and then simply type a command at any time while collectl is running and the command will take effect on the next display cycle.

These commands fall into several categories, one being those that toggle behavior such as aggregation, multiplication and the skip logic. By default, all values are aggregated up through their parent hierarchy and typing the a command followed by a RETURN will turn this behavior off. Similarly, when the values of the I/O counters are too large to easily read you can force their division by 1K with the k command. And finally, you can disable the logic that skips zero-based entries with the z command. If you'd rather skip on some value other 0 you can set the skip value with Zxxx.

Look at the display line above the following process data:

Process Tree 09:06:03 [skip when 'time'<=0 is 'on' aggr: 'on' x1024: 'off' depth 5]
#  PID       PPID User     PR S   VSZ   RSS CP  SysT  UsrT Pct  AccuTime MajF MinF Command
00001           0 root     15 S  674M  272M  1  0.00  0.06   6   0:09.96    0    0 init
 01766          1 root     15 S   50M   24M  0  0.00  0.06   6   0:01.30    0    0  /usr/sbin/sshd
  02142      1766 root     15 S   25M   14M  1  0.00  0.06   6   0:00.88    0    0   /usr/sbin/sshd
   02144     2142 root     15 S   18M   12M  1  0.00  0.06   6   0:00.87    0    0    -bash
    02229    2144 root     19 R   14M   10M  0  0.00  0.06   6   0:00.84    0    0     /usr/bin/perl
Following the time field you can see what the toggle states are of the three fields as well as the skip value and display depth. By default you only see 5 levels of the process hierarchy but can change this with the d switch. For example d7 will set the depth to 7.

As with other process displays, you can also choose whether you want to see the default display, one that shows all I/O fields or one focused on memory using the p, i or m commands. You can easily switch between these formats at any time.

If you enter a number as a command, this is interpretted as a process PID and the display will be adjusted such that this becomes the first entry in the display. If you would like to skip on something other than the current field, you can easily change that with the s command immediately following by one of the sort field names listed with --showtop. Finally, if using the wide command option with --procopts w, long command string will cause wrapping and make the display unreadable. The w command can be used to set the maximum width of the command field.

As with other collectl options, there are simply far too many combinations to describe which are appropriate for a particular situation (such as using --procopt) so it is recommended you experiment to better understand the many capabilities of proctree.

Process Analysis

If you've run collectl as a daemon and collected process data, you now have a huge pile of data and it's not entirely clear what you could do with. However, when that data is played back with collectl --procanalyze, it generates a process summary file with the extension of prcs in the same direcory as specified with the -f switch. This file will contain one line for each unique process and the fields will be separated by collectl's field separator which by default is a space but something you can also change with the --sep switch.

The fields themselves summarize all the key data elements associated with each process making it possible to see the process start/end times, cpu consumption, I/O (if the kernel supports I/O stats), page faults and even the ranges of the different types of memory consumed. And since the data elements are separated by a single character delimeter you can easily load the file into your favorite spreadsheet and perform deeper analysis (the data is actually not very user friendly as written).

It is also important to remember a couple of things:

  • Not all processes show up in the output if they were created and exited between a single pair of samples
  • You can control the time frame that is analyzed by using the --from and -thru switches which can be a useful thing to do when you're interesed in a specific time period and don't want this file too cluttered
  • The --procanalyze overrides collectl's default behavior of processing all the data that has been collected and so will only generate the prcs file(s). If you want to generate plot data for other subsystems at the same time be sure to include then with -s.
  • By default, all data is written in space-separated format which is collectl's standard default. Since command arguments (if any) are also space-separated they will each show up in a different spreadsheet cell, which shouldn't be a big deal but if you want the entire command string together, you can always use --sep 9 to make the data tab-delimited or even choose a different separator.

Understanding Processing Overhead

This is intended to be a brief description of how process monitoring works with the hope that it will help one use the capability more efficiently and avoid unnecessary processing overhead. Normally the overhead is modest, but if you intend to run at higher monitoring rates or looking at threads it's worth reading further.

Collectl maintains 2 data structures that control monitoring: pids-to-monitor and pids-to-ignore. These lists are built at the time collectl starts, so if --procopts p is not specified, the effect is to execute a ps command and save all the pids in the pids-to-monitor list. If filters are specified with --procfilt, only those pids that match are placed in pids-to-monitor list and the rest placed in the pids-to-ignore list and so you can see that when filters are used there can be a significant reduction in overhead since collectl need not examine every processes data.

If collectl is only monitoring a specific set of processes, either because --procopts p was specified or procfilt was used and only specified specific pids (not ppids), on each monitoring pass collectl only looks at the pids in the to-be-monitored list. In other words, this is as efficient as it gets because it needn't look for processes if neither list, aka newly created processes.

If doing dynamic process monitoring, every monitoring pass collectl has to read all the pids in /proc to get a list of ALL current processes. While it ignores any in the do-not-monitor list, it must look at the rest. If any of these are in the to-be-monitored list and have had thread monitoring requested, additional work is required to see if any new threads have shown up. Any processes not in the to-be-monitored list are obviously NEW processes and must then be examined to see if they match any selection criteria and this involves reading the /proc/pid/stat file. That pid is then placed in one of the two lists. It should be understood that during any particular interval a lot of processes come and go, such as cat, ls, etc. However, these are short lived enough as to not even be seen by collectl, unless of course collectl is running at a very fine grained monitoring level.

Occasionally a process being monitored disappears because it had terminated. When this happens its pid is removed from the to-be-monitored list.

Finally, these data structures (and a couple of others that have not been described) need maintenance to keep them from growing. If the number of processes to monitor has been fixed, this maintenance is significantly reduced.

So the bottom line is if you have to use dynamic monitoring, try to bound the number of processes and/or threads. If you really need to see it all, don't be afraid to but just be mindful of the overhead. Collecting all process data with the default interval has been observed to take about 1 minute of CPU time, which is less than 0.1%, on a lightly loaded Proliant DL380, but that load will be higher with more active process.

RESTRICTIONS

  • You cannot specify --procfilt during playback mode. If you need to look at a subset of the data consider using a filter like grep.
  • Thread monitoring is limited to 2.6 kernels.
  • Process I/O monitoring is limited to kernels that have that capability enabled and that didn't even appear before 2.6.22. If you don't see the file /proc/self/io, your kernel was not built with process I/O accounting enabled and you need to get one that has the following enabled: CONFIG_TASKSTATS, CONFIG_TASK_XACCT and CONFIG_TASK_IO_ACCOUNTING.
  • There is a bug in the way the kernel reports I/O stats (see bugzilla 10702) such that when you exclude threads from the display the parent process I/O counts are not aggregated into the I/O counts from the threads resulting in misleading results. Andrea Righi has provided a patch here that provides the correct aggregation, but of course that will require a kernel rebuild. He has also informed me that his patch has been included in the 2.6.26 kernel so this should no longer be an issue from that version forward.
updated Dec 16, 2008
collectl-4.3.1/docs/OutputFiles.html0000664000175000017500000000470413366602004015574 0ustar mjsmjs collectl - Output Files

Output Files

Raw Data

All raw data is recorded in a single compressed file with the extension raw.gz if the perl compression library has been installed. If that library is not there it will be written to a non-compressed file with the extension raw. The only exception to this rule is the process raw file which can be useful on systems with a large number of processes (see the description of --tworaw) and which has the extension rawp.

Plot Data

There are actually 2 main types of plot data - summary and detail. Summary plot data, for those subsystems selected with lower case letters, is always stored in a single file, one line per time period, with the extension tab. The primary reason for this is that the data for each subsystem is of a fixed length and there is really no benefit in separating it into multiple files.

Detail plot data, which is typically for devices for which there can be multiple instances such as CPUs, is recorded in one file per detail type with an extension that reflects the type of data stored in that file. Each line contains instance data of a fixed number of fields for that particular device. Although TCP do not have instance data, it does have a detail component and is also written to its own detail file. Process and Slab data are also treated like detail data because they too require multiple lines per monitoring period.

Exception Data

Exception data is written to a file in the same format as detail data with an X appended to its name. Since exception data is not of a known format across the entire device as is detail data, it cannot be written as a single line, but rather is written as one line per device. Each line is prefaced with a date/time stamp and the number of the device (0 based).

Collectl Messages

Periodically collectl logs various types of messages to its own message log to avoid situations in which an unexpected situation or a collectl bug might causes the flooding of /var/log/mesasages. However, the more serious messages are written to both as described here.
updated Feb 21, 2011
collectl-4.3.1/docs/HiResTime.html0000664000175000017500000000431213366602004015135 0ustar mjsmjs High Resolution Timer Warnings

High Resolution Timer Warnings

There has been a recent problem identified when running collectl on systems with versions of the Time:HiRes perl module less than 1.91 and versions of glibc newer then 2.3. So far these messages appear to be harmless as they've only been identified as occurring during system boot. However, since the problem is that HiRes is actually calling the setitimer system service incorrectly, it is really living on borrowed time (if you'll pardon the pun) and users would be much better off and safer to simply upgrade to a newer version which they can get at http://search.cpan.org/dist/Time-HiRes/.

If you've never installed something from CPAN before you shouldn't be intimidated even if you're not a programmer as all you need to do is perform the following steps. If the version you're downloading is not 1.9715, replace that string in the instructions below with the appropriate version number:

  • download Time-HiRes-1.9715.tar.gz to /tmp
  • cd /tmp
  • unpack it with the command tar -zxf Time-HiRes-1.9715.tar.gz
  • cd Time-HiRes-1.9715
  • perl Makefile.PL
  • make
  • make test
  • make install
  • collectl -v should identify the newer version

In a few cases, rather than replacing the older version the new one ends up in a different location, and collectl still sees the old version. You can usually get around this problem by re-executing the make install command as make install -UNINST=1

Update - April 08, 2009
It looks like RedHat has finally responded to my bugzilla and posted a solution, which makes me optimistic we should see a newer version of Time::HiRes in the RHEL5.4 timeframe. That still doesn't mean the problem has been resolved on distros that use older versions.
updated April 8, 2009
collectl-4.3.1/docs/OutputFormats.html0000664000175000017500000000760013366602004016143 0ustar mjsmjs Output Formats

Output Formats

Basic Formatting

By design, collectl gathers more data than is possible to display in an efficient, easy to read, compact form. However, most user want their data displayed in such a form for easy interpretation. Therefore, collectl will attempt to display all data in a single line, often choosing a subset of the complete data for each subsystem. If the user has selected too many systems, each line may exceed the display width and wrap. When this happens either make the terminal window wider (maybe even using a smaller font) or choose less subsystems. This is referred to as brief format and is collectl's display format of choice and therefore the default. Verbose mode displays more information and results in multiple lines of output.

Collectl will try its best to select a format consistent with the user's selection criteria, using brief mode whenever possible unless explicitly told no to do so. However there are several instances when this mode doesn't make sense. For example, detail data will always be displayed in verbose mode since it takes multiple lines for each sample. When this occurs, collectl will automatically use verbose which can also be manually forced for non-detail data using --verbose.

One should note that these formats are not just for interactive use and can also applied to playback mode as well.

An additional feature of brief output is subtotal mode. If one hits the enter key at any time, the next line of output will be the subtotals (or averages on non-counters) of all columns since the start of collectl OR the last time the counters were zeroed. To zero the counters enter Z followed by a carriage return. Furthermore, if you type A followed by the enter key, the averages will be reported. The averages/totals can also be displayed during playback in brief mode by specifying -oA.

To get a better idea of what the output actually looks like, see the examples.

You can even export your own custom output.

Additional Control

There are several switches that provide even more control over the look of the output in addition to --verbose as described above. They are:
  • --home moves the cursor to the home position before displaying verbose output at the start of each interval. Only available in interactive mode, this results in a look-and-feel similar to the top command.
  • --procfilt and --slabfilt effect the output format for those respective output formats in that these typically cause a much smaller number of processes or slabs (if used in conjunction with --slabopts S) to be displayed, sometimes as little as a line or two and it was felt repeating the interval header when only processes or slabs are the only data being reporting, was too distracting and so it left off. Be sure to try it with -oT for better clarity.
  • --top is very similar to the linux top command in that it shows a small subset of processes sorted by the top consumers of the cpu, I/O or even page faults. You can even use -s to add subsystems to the display in brief or verbose mode as well. By default this format sorts by the top CPU users but you can choose virtually field. If you choose one of the slab field names it will show the top slabs sorted by that field name.
The best way to really understand how these work in conjunction with each other is to try them out. And don't forget you can use --top with playback too!
updated Feb 21, 2011
collectl-4.3.1/docs/slub.jpg0000664000175000017500000003357613366602004014103 0ustar mjsmjsÿØÿàJFIFxxÿÛC    $.' ",#(7),01444'9=82<.342ÿÛC  2!!22222222222222222222222222222222222222222222222222ÿÀÍJ"ÿÄ ÿĵ}!1AQa"q2‘¡#B±ÁRÑð$3br‚ %&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyzƒ„…†‡ˆ‰Š’“”•–—˜™š¢£¤¥¦§¨©ª²³´µ¶·¸¹ºÂÃÄÅÆÇÈÉÊÒÓÔÕÖרÙÚáâãäåæçèéêñòóôõö÷øùúÿÄ ÿĵw!1AQaq"2B‘¡±Á #3RðbrÑ $4á%ñ&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz‚ƒ„…†‡ˆ‰Š’“”•–—˜™š¢£¤¥¦§¨©ª²³´µ¶·¸¹ºÂÃÄÅÆÇÈÉÊÒÓÔÕÖרÙÚâãäåæçèéêòóôõö÷øùúÿÚ ?÷ú(¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ ó›MrÎÇ]ñ¯u)@Ó¬ zd ©Î"åÀ?ÄL²Ç+]¾·ªE¢hWú¤çZ[¼Íï´Šätí&ßJø7ö}wO¹¿Y,šçQ·„–G“2IŽGÌ „tãµ?þÝZÂeŸÄ>¼Òt‰%Ž%¾k¸fØ\…DS” wë]–¾n|Wªh3Zù2Yà Ä2y›¼øäÜ ÆÜ2‘ß·<⼪æþ ZA{àŸˆSkS=ÄB-æî;¦œ;…1®~xðsÛ×eñý|%ªèþ31»ÛÛﰿع" pUº»"'ýôh@øúÂOêqˆ´Íçì­p²o3HK(\ Î9>•KþÝZÂeŸÄ>¼Òt‰%Ž%¾k¸fØ\…DS” wëU42 ‚§û~ÆëQ–’_jD€M+JL•à ޠ¾µÊÜßÁá«H/|ñ mjg¸ˆE ÜÝÇtÓ‡p¦5ÏÏÎ{cšî¼}JóÃ~Fñ1ÔR{…+»6öÿ½¦XF¹?Þ¢ïÇ:”ú•密|-q®Earmo.ò(9Õwœ».FG=éÚnuŠšÅùÜÖú=œz|$ýß6CæË}¾Pÿ&¹b×ÂÍâÍVh¼Y¨x?XŠ|ÎxÅvJ.&¹*êGãr3@ñˆ-|I¥x*öÚ9adñ}¤Á0á• ‘±‘}+«×|_ygªÍ¤húæ§I4ñ%Ävñ®H]Î穈•çwzÆ«­xgÁój³-ÛCã{{{{õ@‚ö.`£ž,÷®—_ñ<òøæÿB»ñT°²·†Xäq–ô¸l•yr®ÀÎ{Pú‡‹‰>øÒÖçN›LÕtí6xïl¦‘\ÆZe!—†R3ƒÆpx¬ÿ xßT±ð6‡-§ƒõ;ý"ÓK…g½ŠHÕ‹$jË…ˆg‚3ÆqÆG5Íh­ [ü^x5ÝF t˜Þ+ËÑûË…û4Ãp;@+œ… `€1]…<{á/á¾î©i Í®™Idò;0‰py·pF;…t>%ñ<6Ÿ/|A§J²‰,÷Ù°ïy"œ³/k" Uü¤è¾Òt©u­n5e6ÐÊ‹…iœáAr}OZçt]"ößßü|e3™[R½Fà¤Q*£gÑÞ%Çû=…nü@µðÕÖµ¦Ç­j:†…xÐIö]fÚä["€Ëº#!8Éà€G@pA ë+“Wð_‹t­OIŸGÖmôIç’ÎiRPÑ´N£¡Ã ‚;}*Îâ8ô‡>l羿¾Óí¡´³·*V<±$“\ŸöÆ¢ú_´XüD¾&Ñaðôó¦¤6¥|’ÉÃ’ ¶{VŒ¾#¼Ñ|ðúÆßQƒHƒS²‚)õk„VKP¶Á€Ã»˜ð È9 Dñ¾¡qâ } Äž›B¿¼ŽI,Á»Žâ9ÂX\a€9Æ:µÈx3ÄZ–›wã+-×ZÍßü$÷²Ê©2AQ’ ïÁcµ°£'Žq‘š±Ki7Æ?Ëkâ»ß*Ä–áÚ7·‰Œ „Üy$dœéZŸ¼O¢h—þ4¶Õµ]=äñ-쑽ԫÈ2 …fÀ$q‘þÐõ ¾Asàýc\ŽÂâ+&)ÍΟrBH’D…¶2#0ÏjÅ—â^¤ºž¿ƒ¯§ÑZÍn¯.Òæ0`Ê!°ÒäùG¬qÿ-⧉mdvÒõ;7ŽÌ•!dÚ²<‹ž¡˜ã8çozÛÙçúöÖ€7õZiÚ~›=¼ºœú©ÂÞÔ®éò›ò ¡BòI5›¢xßP¸ñ¾…âO M¡_ÞG$–`ÝÇqá,.0ÀãZç%ñæ‹à?‡Ö6úŒD”O«\"²Z…¶ …ÜÇ€N@ÁÍdÅ-¤ßü-¯Šï|D¨o[‡hÞÞ&0*4jqä‘’p¥v—^>Ô?á'×ü?¥xbãS½ÒþÎS˺HÖQ${ÉfpcÕ‰Î@àÔ6ÿ¯µ]4Í¡øJþþîÚW‡P¶kˆâû#£a”³;q¨9ÜÈ*å›çëÎ î—FŠM_r¡7t`ÙøhØ[-µµóÁo÷+x’1ŽBøå×îlmÄ~ZÇáÆKXmN¡'Ùã •I,I‚0 Ø`e׿lÄ~^í¹¯Öjy}Ëü¿­;#œÂ0$6°™Ç•nÃbGok€BùA@òŸ‚é‚w¾6üž[Gƒm~mò!ón~Ó>ÛhÓ{‚mdÙ™GÌrØÛˆü¾–Š9}f§—Ü¿ËúÓ²1ÓE¹TTmZça\ŠŠ¨ Cî…Ûêë‚w6â?-°h3[ÛÅnšœÉ €Ç ,J!Q…T ``²ã;›q—µEˆ>³SËî_åýiÙQøq’ÖS¨IöxÂÇåGF‹c`Œ(6uÁù›q— ~!µ„Î<«v;xãXÔÊ ”ø$L½ñ·äòú:(äAõšž_rÿ/ëNÈÅA™gþÓ˜=Ëÿ¥4h±™”(U?.6ɵTHq—¯†¾ÇçùÏÚ¥i.üˆ’/;?Åò¶SšAÉÜÝ1—½Eˆ>³SËî_åýiÙ6>þγŽÎÖùííFQ¡¶‰!O/$á¹$–uÁ;ßù<²ÃÃK§\ í®–9RóùvÑ qØ ;T3¼ GåïQG"ˆ›ÓO¹—õ§dQEQ€QEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEVgˆ4?ÄÚ%Γ©Â%µ¸\U=™Ob"´è œ|+­ê|q7…ü@í&vûã¸ÁÚ ð%QéÙ‡l{sôb:KÉ«£ÊÊr=5ÊüAð5—<:öíŠò,½¥Î9‰ýÿÙ=þ Wš|%ñÍï‡u‡ø{âÍÐOžU”’Ÿº{DOu=Tû㸠w¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¯ý¡´­û&ÓVšåmµô`–ʃ-p€ò:ÎCvéÞ½º¸Ï_µþ§?‡‡tÍZµŠæ_í £þñ¤U OÈò‰ÏF(i+³Ç~|A×5ŸŠPëz”·K=‹ÙÄáWh—Éês_J×Ïcá…þâÍ;^ðþŸ§é¦ÒågkgÕeš7䪓nr22Ku¯Tÿ„Åô/èÿø8—ÿ‘©ò²}¤;…ÇÿÂGâúôüKÿÈÔÂGâúôüKÿÈÔr°öîvWÿ Š?è_Ñÿðq/ÿ#Qÿ Š?è_Ñÿðq/ÿ#QÊÃÚC¹ØQ\ü$~(ÿ¡GÿÁÄ¿üGü$~(ÿ¡GÿÁÄ¿üG+içaEqÿð‘ø£þ…ýÿÿò5ð‘ø£þ…ýÿÿò5¬=¤;…ÇÿÂGâúôüKÿÈÔÂGâúôüKÿÈÔr°öîvWÿ Š?è_Ñÿðq/ÿ#Qÿ Š?è_Ñÿðq/ÿ#QÊÃÚC¹ØQ\ü$~(ÿ¡GÿÁÄ¿üGü$~(ÿ¡GÿÁÄ¿üG+içaEqÿð‘ø£þ…ýÿÿò5VÔ¼eâ=+J¼Ôgðö”ÐÚÀóÈ#Õä,UT±Æm€Î­¬=¤{ÍÇÿÂGâúôüKÿÈÔÂGâúôüKÿÈÔr°öîvWÿ Š?è_Ñÿðq/ÿ#Qÿ Š?è_Ñÿðq/ÿ#QÊÃÚC¹ØQ\ü$~(ÿ¡GÿÁÄ¿üGü$~(ÿ¡GÿÁÄ¿üG+içaEqÿð‘ø£þ…ýÿÿò5ð‘ø£þ…ýÿÿò5¬=¤;…ÇÿÂGâúôüKÿÈÔÂGâúôüKÿÈÔr°öîvWÿ Š?è_Ñÿðq/ÿ#Qÿ Š?è_Ñÿðq/ÿ#QÊÃÚC¹ØQ\ü$~(ÿ¡GÿÁÄ¿üGü$~(ÿ¡GÿÁÄ¿üG+içaEqÿð‘ø£þ…ýÿÿò5ð‘ø£þ…ýÿÿò5¬=¤;…ÇÿÂGâúôüKÿÈÔÂGâúôüKÿÈÔr°öîvW Œ¼G&«q§i^tE;«É´¬"Œ£g9³ÇqøYÿ„Åô/èÿø8—ÿ‘¨åaí#Üì(®?þ?п£ÿàâ_þF£þ?п£ÿàâ_þF£•‡´‡s°¢¸ÿøHüQÿBþÿƒ‰ùøHüQÿBþÿƒ‰ùŽVÒΊãÿá#ñGý ú?þ%ÿäj?á#ñGý ú?þ%ÿäj9X{Hw; +ÿ„Åô/èÿø8—ÿ‘¨ÿ„Åô/èÿø8—ÿ‘¨åaí!Üì(®?þ?п£ÿàâ_þF£þ?п£ÿàâ_þF£•‡´‡s°¢¸ÿøHüQÿBþÿƒ‰ùøHüQÿBþÿƒ‰ùŽVÒΊãÿá#ñGý ú?þ%ÿäj?á#ñGý ú?þ%ÿäj9X{Hw; +ÿ„Åô/èÿø8—ÿ‘¨ÿ„Åô/èÿø8—ÿ‘¨åaí!Üì(®Qñ—ˆ´½2ïP¸ðö–`µ…çG«ÈXª©c€mÀ'ÔWsCV)I=‚Š(¤0¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(®*ûþJ¥ÿ`«/ýu]­qWßòPõ/ûYèÛª¨îgWàeº(¢µ8‚Š( Š+Ìüek»Ä×x“EÕµ}Hâ[EÓÙö@Ã>a‘'89ôÀëI»vwúΡý“¡ê—•æý’ÚIü½ÛwìRØÎ3ޏ£FÔ?µ´=?Rò¼¯µÛG?—»vÍêÀÎ3×ÂÚϧÏðëŇIÕç½ÓÅ¥ÀŠÞá_Í´>IÝg;ˆÏ#=3Þ§o±‡>mSY}?N–þ|Qn]~ämJÞp “Š.>SÐ+Lñö‰õÝì¾_öWÙÿ}ægÍóP·L|¸Æ:œûW¡É¤èß4; Yêv:}ú\­ÔWI2E)XÃ+ òÀ® ô#Ö®ÛxzÛÄ_|io¨I;X(²óm£•£Y˜ÃÆâ¤ÝÆqÏ9À¢ãäKsÓkœñŒ¬4SLÒØ}¢úþê(lyjí·{î|{3´a‘'89ôÀëRÚϧÏðëŇIÕç½ÓÅ¥ÀŠÞá_Í´>IÝg;ˆÏ#=3Þ•ËåêwZ6¡ý­¡éú—•å}®Ú9ü½Û¶oPØÎqž¸«µçíö!ðçÂͪk/§éÂÒßÏŠ-ÂK¯Ü±©C¸sÎqXºšNñC±ðÅž§c§ß¥ÊÝEt“$R•Œ2² , àŸB=h¸r^ç¦xƒûGÄúîö_/û+ìÿ¾ó3æù¨[¦>\cN}«bYR^iX$q©fcÐÉ5æ¶Þ¶ñÄß[êNÖ ,¼ÛhåhÖf0ñ¸©€7qœsÎp+g@Ãþ9¼ð´\Ëc&žš…²O+Iä|æ6@Ìs´ü¤{И8®ƒá<¾û7ö¯ü"×ßðù~oö‡Ÿýwù9Ý·ç®;Wg ÑÜAиx¤Pèã2 sž1°×µ=2k=1­~Ç4-Ìm•žE<2ÆÜª’¼r^ÝkCÃ7ú~§á:ïKM–/¬1œf0¿.ÓŽãü(BvµÑ­ES$(¢ŠÄñüˆúÿýƒn?ô[W¢×xÇþD}þÁ·ú-«Ñk9T6aET…Q@Q@Q@Q@Q@Q@Q@qWßòPõ/ûYèÛªíkT𦓫êoºKŹ1,%í¯ç·ÜŠX¨"7Pp]ºúšiÙ“8óFÆ}/ü z÷õüÞñÚ?áÐÿ¿¬àîóÿŽÕóœþÁ÷"¢¥ÿ„Cþþ±ÿƒ»Ïþ;Gü z÷õüÞñÚ9ÃØ>äUÍë:w‰cÔ¤Ô<=¨Ún–5I,õ!#B “ó&Ó”$p9À®§þ=ûúÇþï?øíðèßÖ?ðwyÿÇhçCTêq1xBðèþ&{›«iu­zÝ¢–HÑ£…ˆÆ€ “žOSšeׄµEÑ|0Öv‰«èP* œ3ÛÊ|¡ŒqÃc#ÓšîáÐÿ¿¬àîóÿŽÑÿ‡ýýcÿwŸüv—2²—s…_ øŠóÄz¿ªê6RÜØ<ímÑ’ŽHʱBrÌÙÇ\ ÝÛ'†|GgâßkÚEöž~-Ä0]+º8Hö¶ý¸*s‚'Œäs]çü z÷õüÞñÚ?áÐÿ¿¬àîóÿŽÑ̇ì¥Üæô òÎöïVÕîâºÕ.Ñ#c Š(×8D'$’zæ›âÍë]þÃû,§Ø5h/¥óIDݸæçŒà{×Mÿ‡ýýcÿwŸüvø@ô?ïëø;¼ÿã´s!{^÷"¢¥ÿ„Cþþ±ÿƒ»Ïþ;Gü z÷õüÞñÚ|äûÜŠŠ—þ=ûúÇþï?øíðèßÖ?ðwyÿÇhç`û‘V'ŒäG×ÿìqÿ¢Úµu èm›\:x‚|¢;mVúWf=_ÔàäU+χZݨ7:Öm*wk&§q,Ž»$ÌÈ2a³ž£œjƒOsNŠ—þ=ûúÇþï?øíðèßÖ?ðwyÿÇhç°}Ȩ©áÐÿ¿¬àîóÿŽÑÿ‡ýýcÿwŸüvŽpö¹TÕ,›RÒîlÒê{Wš2«<Uã=˜ÜV‡ü z÷õüÞñÚ?áÐÿ¿¬àîóÿŽÑÎ?`ûœ ï†üUâgÒÓÞ»Ôn4õ“\Šòß%¡›W½BÉœo\ˆ_qŸCƒÅãtrÝ/ü z÷õüÞñÚ?áÐÿ¿¬àîóÿŽÑÎ/`û‘QRÿ¡ÿXÿÁÝçÿ£þ=ûúÇþï?øíáìs–ÖtïÇ©I¨x{Q´Ý,j’YêBF„'æM§(H<àsT¢ð…áÑüL÷7VÒëZõ»E,‘£G '<ž§5Û¡ÿXÿÁÝçÿ£þ=ûúÇþï?øí.dW²—s†ºð–¨º/†ÂîÑ5} D†{yO”#`qƒŽ8ldzsH¾ñçˆô-UÔl¥¹°yÚÛ£$)‘•b„噳ޏ»÷_ðèßÖ?ðwyÿÇhÿ„Cþþ±ÿƒ»Ïþ;G2²—sƒ“Ã>#³ñoˆ5í"ûOG¿â.•Ý${[~Ü9ÁÆr9­M#× .¡}®ÜÅy¨j,ˆP¤QÄÄhN>bI=s]Gü z÷õüÞñÚ?áÐÿ¿¬àîóÿŽÑÌ…ì¥Üá-t?évÑhú~­¤eC• Ìöò5Ôj8Q€ÁŽxúWM¢höº‰i¥Y‚ ¶Œ"“Œ±êXã¹$“îkWþ=ûúÇþï?øíðèßÖ?ðwyÿÇhæ@èÉõ"¢¥ÿ„Cþþ±ÿƒ»Ïþ;Gü z÷õüÞñÚ|äûÜŠŠ—þ=ûúÇþï?øíðèßÖ?ðwyÿÇhç`ûœÿŒäG×ÿìqÿ¢Ú½¹y¾x~â!uIa‘J^Ä•Áìr<úÖmªA¦It†ú~RÝwÇ÷ˆí_ö޽GnºµÅÝØ½°XÑÀ;´Ì3ìüãø@8ÏÞíZ++ lr@é“N  zn•e¤[{HÌ]ÉbÍ# ;K7’Iâ®QEQEQEQEQEQEQEQEQEQEÉ¢Yà’&.ªêT”r¬àŽA÷ú(Éuáí$ÚÿXD“Ò5i’,u cÌÇû#qÏBzëC*OsFII2’8##ƒÈ§Ö}Α Æ©o¨¬÷0\“ ¸YS$ìu9Vžq‘ž  +:ÛQ¹kÛ«{Ý=íR¼w"@ðËx;¸*ÞªG‰ÖQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEUÕ­™ï ³t½½´@ÒZA*y€œí$'©Uìou­)"Õ]9ÚMÏŸtrÉΤÚuçn9u³w«ÚÙß[Ù7%ÕÁb†&r«œol *Slž)"‡T}VYg¹¶]<)X­ãˆ—nŸ3¹?_”Äö» IopÆ1jA$àɧÐ=7IÓô{o³éÖ‘[DNæ® 7rÇ«sÍ\¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢ŠdÐÅq Ã4i$R)WGPUà‚QY-cq¡i+h“,rnû,÷ >CÕcc½°ËÛÈÙ¢€+Å{“‹c"%ß”³5³:™IÀ$xÈ#=25bªÜé¶W—6×7ÑÉ=³o†R>hÏ|£=Çz«úŒ·òêmjú|JÓE<*ÂE^IFNsü@óýÑÜRІÒîÚþÖ;«IâžÞQ¹%‰ƒ+PEM@Q@Q@Q@Q@Q@Q@Q@Q@Q@Q@Q@Q@Q@Q@Q@Q@Q@Ú¤½ü¶ó\’(–6ïw’E@Ü@ûÎ:‘\¯ü%Ëë?j}][ŽÎ$±‰&G7žyvöÎk[ÇŸò,'ý„´ÿý,†¢«ŒncV£ƒÐ‚×ÄâÆ§‚u{xË)²Q“É8õ5?ü&w?ô)kŸ÷òÏÿ’(¢Ÿ"2öòøLîèR×?ïåŸÿ$Qÿ Ïý Zçýü³ÿäŠÄñwˆ?áðÅæ³ö_µ}›gî|Í›·:¯\}ìôí[tr!ûiZáÿ Ïý Zçýü³ÿäŠ?á3¹ÿ¡K\ÿ¿–ü‘T®no¢Õ,`ƒNóí&ó>Óuç*ýŸ •ù/¸ñÇNµ[Ä~"°ðÆ‘.¡&AÙòò¶3µGsü†IàQȃÛO±­ÿ Ïý Zçýü³ÿäŠ?á3¹ÿ¡K\ÿ¿–ü‘T´mCû[CÓõ/+Êû]´sù{·lÞ¡±œ ã=qWhäAíä@¾?g¾–É|+®ˆ£I]7Zp®X)ÏŸŽJ7åô©ÿá3¹ÿ¡K\ÿ¿–ü‘X–¿òé<ÒÄ:Œ|…OPÙU­CÕ›X]oÄWv“_G‚,‘–Uˆ,Ao™˜à žÂ•‚únexÖÞúÛÆ×vñ\A'ˆïÅ*V'Px5¹â­owá»9PÝ][”¸lçìð¸*Xû‘£¹ö¹¿ ũɢxÌi¾›_¼û4—ˆù(7dœsÛ¨Å7HѾ!è–g³OåŽée‘®šIŸ»»Y©£ ÚÖ÷;ë+H´û k(ØmâX£Š ú ž ³ûWØmþÝäý¯Ë_?ÈÏ—¿6Üó·9ÆyÅOTdbx;þD}þÁ¶ÿú-kn±<ÿ">ÿ`Ûýµ·BÜóë "ËǺƽu¬›™­l5°¶µqù@eðŒ2Řò{UÍÔ¶úÄ Éq5ÄzU›˜%™·7•-»8RÝNÒÉíŠÛ¼Ð|C§k×þ½Ó–;÷Y.-uœªÈÒêÈs’àñÅ:/ ÝGá}~ÒkØîµ}b)¼ë–CohÊ d…QÜñš›]¾ð†©àí"úæîÿûUô輋ÈîäF·1µQT… œã'&¨_krø‹ÁõIÇï¦×íCŒeÕ¤F?‰RÛ²ðïŒì4k-ÛXÓÉ-R'h§ˆ…„g!Xg8$85‹ôÈ4­7Á]’±ŠÛ]³D^¬UCäŸæM¾§q|/ ”¢Á [¬~ìÎ @}À ŸÎ¹? ±ñ>³c«*ÂAt±ÞÍ4l sB?v›q‚p[©Í\Õô=f f}kÃW61Þ\ÂÜÁ¥ØNÖÊÊÀ1óÇJŸDÐo-õksY¹·¹Õg…mÇÙâ)1[bä–9bI$óÇú¬‘ÐÖ&¹ÿ! ØIÿô’â¶ë\ÿ¿†¿ì$ÿúIqM’ºæ¼gö!ajÚ¦²ú~œ&ýüQn]|§ljPîó…œWK\çŠ4ýNçLÔô››xµ-2I$…n•š7¦Ò)Èö<‘Í qÜâ494â&‡aá‹=NÇO¿K•ºŠí&H¥+ådXÁ>„z×K¡ßÃáHœ·—-y©Gå’êY"%d;¸à`d˜¬½M'Fø‰¡ØøbÏS±ÓïÒån¢ºI’)JÆY‡–pO¡µÛø£A¿ÔîtÍOI¹·‹RÓ$’HVéY¡“zm!‚œcÉÖjøgÄWž#еýWQ²–æÁäknŒ¤rFUŠ–fÎ:à`vîX•µù­ÿ÷-ÿíÍ ?ä˜hÿöÛÿGIZŸØ7_ð±¿á!ó!û'öOØvdù›üíùÆ1·ùÏj<  ÝxgÁÖ=ìÉqoæohI(wHÌ0H£ÔÖâmrýÇCGƒ?ãóÄÿö_ý#¶¢Çç‰ÿì*¿úGmS=Š¡ñUQYaEPEPEPEPEPEPEPEPEPEPEPEPEPEPEPEPv¹£Å¯iOa4óÛ«I«,w£Ç"Ȥn >ò Ö7ü!—?ô6ëŸ÷îÏÿ‘몢Äâžèåá ¹ÿ¡·\ÿ¿vüGü!—?ô6ëŸ÷îÏÿ‘몢‹±rG±ÊÿÂsÿCn¹ÿ~ìÿùøC.èm×?ïÝŸÿ#×UEaÉÇŸÙ|&ÓôýbëV¶×õ”Ô.‰3\µflõÆ`8kÂsÿCn¹ÿ~ìÿùºª(»X¾‡+ÿeÏý ºçýû³ÿäz?á ¹ÿ¡·\ÿ¿vü]U]‡${rø’ú[ÕñV¸.%"wÛiÊ¡b£F8.ߟҧÿ„2çþ†ÝsþýÙÿò=uTQv‘ìr¿ð†\ÿÐۮ߻?þG£þËŸúuÏû÷gÿÈõÕQEØrG±ÊÿÂsÿCn¹ÿ~ìÿùªj—XÓ§Óï¼Q®Ëk:í‘´]Ã9ÆVkµ¢‹°äcŠÓ~Å£ØGc§ø“Y¶¶;cŽ+0y?òïÖ­ÿÂsÿCn¹ÿ~ìÿùºª(»Hö9_øC.èm×?ïÝŸÿ#ÑÿeÏý ºçýû³ÿäzꨢì9#Øã¬üÚ}½•¯ŠµÈííãX¢M¶‡j¨À0dð;Ôÿð†\ÿÐۮ߻?þG®ªŠ.Ã’=ŽWþËŸúuÏû÷gÿÈôÂsÿCn¹ÿ~ìÿùºª(»Hö9_øC.èm×?ïÝŸÿ#ÖIøM§·ˆ?·Ÿ_Ö_SÚ\:Ú±PƒŽàW QEØrDZÊÿÂsÿCn¹ÿ~ìÿùøC.èm×?ïÝŸÿ#×UEaÉÇ+ÿeÏý ºçýû³ÿäz‚5ÌÖÒÍâ­qžÚC,'m ÚÅ3Äü®ÃŸZìh¢ì9#Øåá ¹ÿ¡·\ÿ¿vüGü!—?ô6ëŸ÷îÏÿ‘몢‹°äc•ÿ„2çþ†ÝsþýÙÿò=ð†\ÿÐۮ߻?þG®ªŠ.Ã’=ŽWþËŸúuÏû÷gÿÈôÂsÿCn¹ÿ~ìÿùºª(»Hö9_øC.èm×?ïÝŸÿ#ÑÿeÏý ºçýû³ÿäzꨢì9#Øåá ¹ÿ¡·\ÿ¿vüZšƒƒ Ú­íÕä·whšk¯/qm‰áT ±¯jÖ¢‹±¨¥² (¢ÂŠ( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( ÿÙcollectl-4.3.1/docs/Lustre.html0000664000175000017500000001444613366602004014573 0ustar mjsmjs collectl - Lustre

Lustre

Overview

The first thing to understand about lustre reporting is in most cases, where one has configured the server(s) and just wants to monitor them, all one need do is specify -sl or -sL and collectl will do the right thing. It will automatically detect the type of service(s) currently running and will either record or display the appropriate data. If you select -sl and the system doesn't have lustre installed, it will warn you and then disable that switch.

Controlling Which Data is Displayed

Lustre records a wealth of performance data, far more than makes sense to display all the time, and so by default collectl displays minimal information such as bytes/operations read and written. At the client detail level lustre can differentiate this data at the filesystem and even the OST level! In order to accommodate the broadest flexibility one is allowed to control the way data is collected/displayed via several complementary switches.
  • -s: As is normally the case, one can specify '-sl' for summary level data, '-sL' for detail data or combine them to get both. However, since the client detail data can actually be presented at the individual filesystem or OST level, there is an option to show the OST level details (filesystem details are the default) see --lustopts O.
  • --lustopts: This switch is used to provide further detail about the types of data that is to be collected/displayed. There are 5 such values that collectl cares about:
    • B - rpc buffer level data.
    • D - disk block statistics, which applies to both MDS and OSS servers. One should also note this is specific to HP SFS and this data is not available in the open source version.
    • M - client metadata (note that this was the default prior to collectl V1.6.2).
    • O - for client details only, show results by OST
    • R - read_ahead statistics. Unlike the other options, which generate a lot of data, --lustopts R may be used with brief mode.
    • As it turns out, nothing is quite as simple as it seems and while the following case is not typical, it needs to be addressed for completeness. Since collectl allows one to collect one set of data and to later display a different set, consider what happens in one were to collect multiple types of lustre data for a client using --lustopts MR, but then just play back the basic client data which is collected without specifying --lustopts. By default, playback mode defaults to the settings data was collected with and to change the display one needs to explicitly change those settings. To meet this need, use --lustsvc, which is described in more detail later.

In the spirit of letting the user display whatever they want to, collectl will allow one to select multiple values for --lustopts and it will try to display the results appropriately. Perhaps the easiest thing to do is just experiment and in most cases you'll get what you're looking for. There are a few combinations of -s and --lustopts that do not make sense and if you choose one, you will be told.

What About Playback?

As is always the case with playback, unless otherwise told to do something else, collectl will playback its recorded data based on the parameters selected for collection. In other words, if you specify --lustopts OBR in record mode, collectl will record both RPC buffer and read_ahead stats. When you play the data back, it will then display both as well. However, you also have the option of specifying --lustopts, typically thought of as a collection-only switch, and it will force the output to what you'd like it to be. If you select a statistics type that hasn't been recorded, that information will be displayed, but as zeros.

Recognizing Service Configuration Changes

In some cases lustre services may change after collectl starts. In fact, it may not even be running and if so you'll get a message telling you it is not and that collectl cannot determine the system type since it could be a client, MDS, OSS or some combination. This includes services starting and stopping as well as the configurations of those services themselves changing. For example one might occasionally mount/umount different lustre filesystems on a client. Not to worry. Collectl periodically checks for configuration changes and automatically adjusts the data it collects as well as anything it may be currently displaying. However this can also lead to the output format changing. If you know that the system type could change and you simply want to force the type of output to be consistent, use --lustsvc as described in the next section.

Changing the Default Recording/Display Behavior

There are some times when you want specific control over what data is recorded or displayed rather than the default behavior OR collectl starts before lustre does and it can't determine the type of system it is. This is typically the case when a system is playing multiple roles by providing more than one service. For example, if a system has been configured as both an OSS and a client, every time you run collectl you will collect or display data about both and sometimes this is NOT what you want. There may be other times where you have developed some reports or graphs that expect data in a standard format and you've collected a subset (or superset) of data.

To override this behavior of the lustre portion of the data (remember you can control the displaying of individual subsystems with -s), use --lustsvc to specify the type of service(s) you're interested in and collectl will only pay attention to those, both for recording to a file as well as display. Naturally when displaying data for services you never collectled data on, those services will print as zeros.

If all this sounds confusing, just experiment with various combinations of -s, --lustopts and --lustsvcs and observe the behavior.
updated Mar 26, 2010
collectl-4.3.1/docs/Documentation.html0000664000175000017500000000575013366602004016124 0ustar mjsmjs collectl - Documentation

Documentation

Home | Architecture | Features | Releases | FAQ | Support

Getting Started

Examples
Lustre Tutorial
Mapping to Other Tools
Tutorial
Why Summary Data?

General Information

Colmux
Data Definitions
Exception Reporting
File Naming
Ganglia Interface
Graphite Interface
Input Files
Logging
The Math
Operational Messages
Output Files
Output Formats
OpenStack Support
Operational Modes
Performance
Playback of Raw Files
Generating Plottable Files
Running as a service
Socket Interface
Startup and Initialization

Subsystem Specific

Buddyinfo (Memory Fragmentation)
CPU Monitoring
Disk Monitoring
Environmental Monitoring
Infiniband
Inodes
Interrupts
Lustre
Memory Monitoring
Network Monitoring
NFS Monitoring
Process Monitoring
Slab
Socket Monitoring

APIs for Customization

Exporting Custom Output
Importing Custom Data

Custom Modules

Hello World
Miscellaneous Data

Troubleshooting

setitimer console messages
updated Mar 31, 2014
collectl-4.3.1/docs/Inodes.html0000664000175000017500000000354513366602004014534 0ustar mjsmjs`> collectl - Inode Info

Inode Monitoring

The name of this subsystem is a slight misnomer because in addition to monitoring inodes it also monitors dentries as well. The information being reported is quite self-explanatory and there should really be no mystery here.

However, it may be worth noting that while inspired by sar -v the information isn't quite the same in case that's what you're expecting. Sar also reports on the number pseudo terminals and somehow that didn't seem to fit with the types of data collectl reports and so was left out.

Another difference worth noting is that while both report the number of open files, actually calling them allocated file handles, it was felt that this number all by itself wasn't providing as much info as it could and so the percentage of the total has also been included in collectl's output.

While I wanted to do the same with dentries, reporting the percentage of the max, I ran into a problem - the number of unused cache entries is wrong, or at least the definition is! Rather than decrease as new directories are created it goes up, which makes no sense to me and so trying to report it as a percentage doesn't seem to make any sense. However, a closer investigation makes me think this is in fact tracking the slab objects named dentry_cache which does increase as new directories are created. However something still isn't quite right because while close, the numbers aren't close enough.

Investigation will continue, though at a lower priority. If/when resolved it will be fixed/documented accordingly.

updated October 17, 2011
collectl-4.3.1/docs/Graphite.html0000664000175000017500000001363013366602004015052 0ustar mjsmjs Exporting Data to Graphite

Exporting Data to Graphite

Introduction

With the release of Collectl Version 3.6.1, you can now send collectl data directly to graphite . For existing collectl users this now provides you with yet another way to store/plot collectl data, whether on a single system or hundreds. For graphite users who are not yet collectl users, you now have access to literally hundreds of performance metrics:

  • Since all collectl instances collect this data at the same time, system noise on clusters running fine-grained parallel jobs is reduced, though for larger clusters.
  • You can still log all the data collectl collects locally and only send a subset to graphite, reducing the load on both graphite and your network.
  • You can monitor as infrequently as you like and send data to graphite at a coarser frequency of either of the average, minimum or maximum values over that interval.
  • The r= switch, something unique to the graphite plugin, can help reduce the instantaneous load on the graphite server itself.
  • All this at collectl's low monitoring overhead

Usage

You use this export like any other, the only required option being the address to send the data to as in the following example:

collectl --export graphite,192.168.1.113
However you should also note that since by design this export does not provide any terminal output, there are only 2 real ways to make sure it is doing what you expect, the first being to inspect graphite's whisper storage area for your particular host name and make sure the data you're collecting is in fact showing up there:
ls /opt/graphite/storage/whisper/poker
cpuload  cputotals  ctxint  disktotals  nettotals
or to simply run with the debug mask set to 1, which tells the graphite module to echo all the data it is sending to graphite, noting in this case even though collectl is collecting cpu, disk and network data we're not sending cpu data to graphite. This is something you might do if logging more data to disk than you are sending to graphite, which in this case we are:
collectl --export graphite,192.168.1.113,d=1,s=dn -rawtoo -f /var/log/collectl
poker.disktotals.reads 0 1325609789
poker.disktotals.readkbs 0 1325609789
poker.disktotals.writes 0 1325609789
poker.disktotals.writekbs 0 1325609789
poker.nettotals.kbin 0 1325609789
poker.nettotals.pktin 1 1325609789
poker.nettotals.kbout 0 1325609789
poker.nettotals.pktout 0 1325609789
tip - if you add 8 to the debug flag, eg d=9, this tells the graphite module not to actually establish the connection with graphite's carbon listener but to only echo the data that would have been sent.

Once you're happy with the switch settings, be sure to update the DaemonCommands in /etc/collectl.conf and restart the collectl daemon to make them take effect.

Switches unique to graphite

e=escape
When sending data to graphite, collectl prefaces each line item with the hostname. If that name includes a domain name, extra dots add additional levels the the variable names which may not be desireable. By including an escape character, those dots will be replaced by that character.

r=seconds
By design, collectl calls the export module as soon as the required data has been collected and collection is synchronized to the nearest milli-seconds across a cluster, this means all instances of collectl will send their data to graphite at almost exactly the same time. This high burst of data can overwhelm graphite and so to reduce the load when that is found to be a problem, OR if you just want to smooth out the load you can use r=seconds which literally means delay sending your data to ganglia by a random number of micro-seconds <= seconds.

There is an additional caveat and that is that this stall must have completed by the end of the current data collection periods and so you're restricted to a maximum delay of the interval less 1 second. This means if you run collectl with -i1, you can't use -r. However, since most users run collectl with intervals of 5 or 10 seconds, values of 4 or 9 should be more than sufficient. And if you choose a collection interval of 30 seconds you may still want to use a value of r closer to 5 or 10 seconds so that the data will arrive at graphite reasonablly close together.

For help with what other valid switches are, you can actually get the graphite module itself to tell you like this:

collectl --export graphite,h

Communications

Collectl will attempt to establish a TCP connection to the specified address/port, noting the default port is 2003. If that connection cannot be established, collectl will report an error but not exit! This is because graphite itself may be down and need to be restarted.

collectl --export graphite,192.168.1.113,d=1,s=dn
Could not create socket to 192.168.1.113:2003.  Reason: Connection refused
By design when collectl assumes the graphite address is correct and will try to reconnect every monitoring interval. Further, to avoid generating too many errors, it will silently continue to retry and only report the connection failure every 100 times, a constant you can modify in the graphite.ph header if you really care. Once graphite comes back online collectl will again start sending data to it.

updated November 9, 2012
collectl-4.3.1/docs/InputFiles.html0000664000175000017500000000747213366602004015400 0ustar mjsmjs collectl - Input Files

Input Files

The following is a list of the files read by collectl to support the different types of data being collected. Also included are the basic linux commands that should produce the same numbers as collectl for those times you may want to know if you've uncovered a collectl problem OR it's a linux problem. The one exception is Infiniband data which is obtained by the perfquery OFED utility as noted.

SubsystemFile(s)Commands
Buddy/proc/budyinfocat /proc/buddyinfo
CPU/proc/loadavgmpstat, iostat -c, vmstat
 /proc/stat 
Disk/proc/diskstatsiostat -d, iostat -x
 /proc/partitions 
 /proc/stat 
Inode/proc/sys/fs/dentry-statesar -v
 /proc/sys/fs/dquot-nr 
 /proc/sys/fs/file-nr 
 /proc/sys/fs/inode-state 
 /proc/sys/fs/super-nr 
Interrupts/proc/interrupts 
Interconnect/proc/qsnet/ep/rail[0-1]/statsperfquery
 perfquery * 
Lustre/proc/fs/lustre/llite/.../stats 
 /proc/fs/lustre/llite/.../read_ahead_stats 
 /proc/fs/lustre/mdt/MDT/mds/stats 
 /proc/fs/lustre/osc/OST_...client.../stats 
 /proc/fs/lustre/obdfilter/OST_.../stats 
 /proc/fs/lustre/obdfilter/OST_.../brw_stats 
 /proc/fs/lustre/osc/OSC...mds.../stats 
Memory/proc/meminfosar -rB, free, vmstat
 /proc/vmstat 
 /sys/device/system/node/node*/meminfo/ 
 /sys/device/system/node/node*/numastat/ 
Network/proc/net/devnetstat -i
NFS/proc/net/rpc/nfsnfsstat -c/s [c if -o C]
 /proc/net/rpc/nfsd 
Process/proc/pid/cmdlineps or top
 /proc/pid/io 
 /proc/pid/stat 
 /proc/pid/status 
Slab/proc/slabinfoslabtop
 /sys/slab/object_size 
 /sys/slab/objects 
 /sys/slab/objs_per_slab 
 /sys/slab/order 
 /sys/slab/slab_size 
 /sys/slab/slabs 
Socket/proc/net/sockstatsar -n SOCK
Tcp/proc/net/netstat 
updated Sept 15, 2011
collectl-4.3.1/docs/Export.html0000664000175000017500000002745113366602004014576 0ustar mjsmjs Exporting Custom Output

Exporting Custom Output

Introduction

As with --import, the --export option allows one to build custom modules, but in this case for generating output. Unlike --import, which allows multiple modules to be specified, you only --export using a single module, which in turn overrides all of collectl's output formatting routines.

These modules all end in the extension .ph and are searched for first in the directory collectl is being executed from and then /usr/share/collectl. If the module name is prepended with a directory name, collectl will search only there.

The reason you might care about this is that now if you want to produce your own exportable form of output and be able to print it locally, make it available to another program over a socket or even write to a local file while still being able to log to raw and/or plot formats, you get that all that functionality for free.

How It Works

The interface to all this is really quite simple. At the command line the user types collectl --export name[,options] where:
  • name both names a file to be required by collectl as well as the name of the entry points to both an initialization routine as well as one that produces the output
  • options specifies an optional list of arguments that are passed to both routines to do with whatever they wish
There are currently 5 different custom exports that are part of a standard collectl distribution:
  • gexpr allows one to send collectl data to ganglia
  • graphite allows one to send collectl data to graphite
  • lexpr generates output in an easy to parse format
  • proctree provides another way to look at process data
  • vmstat is the standard unix vmstat command most people know and love, so why reinvent it? It's both a useful example of how to write an export as well as provides the benefit of allowing one to play back collectl data in vmstat format!
The first four are the most interesting because they have been built to take advantage of collectl's capability of sending their output over a socket and are therefore the ideal vehicle for interfacing with other tools and environments. All four share a common set of options as described in the following table:

alignUsed in conjunction with i= and when specified data samples will be aligned to whole minute boundaries. In other words if used with i=15, data will be reported at the top of the minute, 15 seconds past, etc. The first sample may therefore be partial.
avg|max|min|totused in conjunction with i=, send the average, maximum, minimum or total of the data over the associated set of intervals specified by i=. If none of these are specified, the values from the most recent monitoring interval will be reported.
cothis does not take a value and indicates changes only such that only data elements that have changed since they were last sampled are reported in an attempt to minimize processing and network bandwidth. If not specified, samples for all reporting intervals will be sent.
d=maskdebugging mask, see beginning of the actual export file for details
f=filenames the output snapshot file, which applies to only lexpr. If this option is not used, -f must be and the snapshot file name which is set to the single character L filename and written into the directory associated with -f
caution
If you are writing your own export module that doesn't use a snapshot file, you must explicity include error checking to assure it is not run without at least -P or --rawtoo in combination with -f
hshows help/usage
i=secsspecifies the reporting interval in seconds. In other words, if you specify i=60, a sample will be reported every 60 seconds independent of collectl's monitoring interval. The default is to report every sample. This interval must be a multiple of the base collectl interval. note that while collectl always rounds rates to the next whole value, when multiple intervals are added together only the totals are rounded.
sspecifies a subset of those subsystems specified with -s in the collectl command line and only data collected for that subset will be reported. The default to report everything. In the case where you only want report data collected via --import, use s= with no args.
ttl is the time to live in intervals for each piece of performance data. If more than this number of intervals passes data will be sent regardless of whether it changed or not and the ttl countdown timer reset. The default is 5. This actually has a second use for gexpr and that is to set the gmond ttl to double this number multiplied by the interval.

Logging
Collectl can actually create up to 3 different type of log files and it's worth spending a little more time enumerating how collectl decides where and when to create them.

  • In its simplest form, when one runs collectl with --export and no options, all output is sent to the terminal, noting gexpr and graphite always require an address.
  • If collectl is run with -f, and one is exporting with lexpr a snapshot file is created in the directory associated with -f.
  • If a socket is to be used by specifying collectl -A and the export supports socket I/O:
    • all output goes out over the socket and no snapshot file is involved
    • if one specifies -f, a raw file is created in the directory specified by -f. If -P is also used, a plot file is created instead and if -P and --rawtoo are used, both types of files are created. caution: this is different behavior than one sees when running without sockets in which case a snapshot file is written to the directory named by -f
    • Since gexpr and graphite do not use -A for specifying socket I/O and do their own communications, collectl's rules for non-socket logging apply, in which case -f by itself implies a snapshot file is involved and so requires -P or --rawtoo or both to do any logging.
  • To create the snapshot file in a directory other then the one specified with -f, use the f= option with --export
  • If one tries an invalid combination of switches or something not supported, it is up to the export itself to determine that since by design collectl has no knowledge of those module's inner workings.

Example

Perhaps the best way to see how all this works is with a simple example and it turns out that vmstat.ph is small enough to meet that need. You may also wish to refer to the others as well to see how some of the more exotic capabilities are implemented.

This first section gets called almost immediately by collectl after reading in the various user switches. This is the place to catch switch errors and since this routine always requires -scm we'll just hardcode it to that and reject any user entered ones. This initialization subroutine must be named for our module followed by Init.

sub vmstatInit
{
  error("-s not allowed with 'vmstat'")          if $userSubsys ne '';
  error("-f requires either --rawtoo or -P")     if $filename ne '' && !$rawtooFlag && !$plotFlag;
  error("-P or --rawtoo require -f")             if $filename eq '' && ($rawtooFlag || $plotFlag);
  $subsys=$userSubsys='cm';
}
Next we define the output routine, with the same base name as that of our included file.

The if statement uses collectl's standard idiom for printing headers based on the number of lines printed and whether or not the user wants only a single header, no header or even to clear the screen between headers.

sub vmstat
{
  my $line;
  if (printHeader())
  {
    $line= "${cls}#${miniBlanks}procs ---------------memory (KB)--------------- --swaps-- -----io---- --system-- ----cpu-----\n";
    $line.="#$miniDateTime r  b   swpd   free   buff  cache  inact active   si   so    bi    bo   in    cs us sy  id wa\n";
  }
Next comes the handling of optional date/time prefixes that I stole from printTerm() in formatit.ph and which can be controlled by various switch options. Again, if you have no intent of supporting these you can even put in error handling in your initialization routine or simply ignore the switches.
  my $datetime='';
  if ($options=~/[dDTm]/)
  {
    ($ss, $mm, $hh, $mday, $mon, $year)=localtime($lastSecs);
    $datetime=sprintf("%02d:%02d:%02d", $hh, $mm, $ss);
    $datetime=sprintf("%02d/%02d %s", $mon+1, $mday, $datetime)                  if $options=~/d/;
    $datetime=sprintf("%04d%02d%02d %s", $year+1900, $mon+1, $mday, $datetime)   if $options=~/D/;
    $datetime.=".$usecs"                                                         if ($options=~/m/);
    $datetime.=" ";
  }
Here we build the actual output, noting that we're not really printing anything yet, but rather building up a string (which may contain the header) that we will print in one shot.
  my $i=$NumCpus;
  my $usr=$userP[$i]+$niceP[$i];
  my $sys=$sysP[$i]+$irqP[$i]+$softP[$i]+$stealP[$i];
  $line.=sprintf("%s %2d %2d %6s %6s %6s %6s %6s %6s %4d %4d %5d %5d %4d %5d %2d %2d %3d %2d\n",
                $datetime, $procsRun, $procsBlock,
                cvt($swapUsed,6,1,1),  cvt($memFree,6,1,1),  cvt($memBuf,6,1,1),
                cvt($memCached,6,1,1), cvt($inactive,6,1,1), cvt($active,6,1,1),
                $swapin/$intSecs, $swapout/$intSecs, $pagein/$intSecs, $pageout/$intSecs,
                $intrpt/$intSecs, $ctxt/$intSecs,
                $usr, $sys, $idleP[$i], $waitP[$i]);
Finally comes the output. There is actually a lot of latitude here and in this case we're calling printText() which will send the output to the terminal or over a socket. It will not write to a local file as does lexpr, but if you want to see how to do that, refer to its source. As with all perl require files, they must return true and therefore the final line is the digit 1.
  printText($line);
}
1;
Try running it and you'll see all the pagination and time formats work just as they do with standard output formats.
updated November 9, 2012
collectl-4.3.1/docs/Interrupts.html0000664000175000017500000001337213366602004015471 0ustar mjsmjs collectl - Interrupts

Interrupts

Introduction

Prior to V2.5.0, collectl reported the total number of interrupts across all CPUs as part of the CPU summary data and nothing about interrupts in the CPU detail. Since interrupt counts are actually reported in the kernel for each CPU by individual interrupt number, that information will now be made available both in summary and detail formats. However, a slightly different methodology for categorization will be used because interrupt summary data will be reported by CPU and detailed interrupt data will break out the data by individual interrupts for each CPU.

Requesting Interrupt Statistics

Since -si has already been taken for reporting inode statistics, interrupt summary data should be requested using -sj and detail data using -sJ. Interrupts are also treated differently than other statistics in a couple of ways:
  • Rather than reporting a fixed number of fields like other summary data, the number of fields are variable and equal to the number of CPUs
  • If one specifies -sCj one will NOT get a separate verbose format for the Interrupt Summary, but rather will have those details included as part of the CPU detail report as shown in the examples below.

Plot Data

Although looking at interrupt data in brief or verbose fits quite well with collectl's reporting methodology, it doesn't fit the plot format. This is because it expects a fixed number of fields for summary data and a variable number of fields for detail data, indexed by a device number. However, interrupt summary data is actually variable size based on the number of CPUs and the detail data doesn't really fit with anything and so trying to do so will generate an error. Therefore to report interrupt data in plot format you will be required to request CPU detail data as well since interrupts are reported as part of that data.

Examples

The following examples only show interrupt data except where CPU data is required. Any of these reports can be extended to include other data types.

This first example shows the basic interrupt output displayed in brief mode on an 8 processor system, with timestamps. If you include --verbose you essentially get the same display except with more significant digits for each interrupt.

[mjs]# ./collectl.pl -sj -oT
#         <-----------------Int----------------->
#Time     Cpu0 Cpu1 Cpu2 Cpu3 Cpu4 Cpu5 Cpu6 Cpu7
12:49:55  4828  16K 1000  16K   18  16K  16K    0
12:49:56  4804  16K 1000  16K    0  16K  16K    0
12:49:57  4811  16K 1000  16K   18  16K  16K    0

As mentioned above, specifying -sCj is special as it takes the data for 2 separate subsystems (CPU detail and interrupt summary) and combines them in a single display as shown for two sampling periods on a 4 processor system:

[mjs]# ./collectl.pl -sCj -oT
# SINGLE CPU STATISTICS
#            CPU  USER NICE  SYS WAIT IRQ  SOFT STEAL IDLE INTRPT
14:36:12       0     0    0    0    0    0    0     0  100     11
14:36:12       1     0    0    0    0    0    0     0   99    999
14:36:12       2     0    0    0    0    0    0     0  100      0
14:36:12       3     0    0    0    0    0    0     0  100      0
14:36:13       0     0    0    0    0    0    0     0  100     13
14:36:13       1     0    0    0    0    0    0     0  100   1000
14:36:13       2     0    0    0    0    0    0     0  100      0
14:36:13       3     0    0    0    0    0    0     0  100      0
A third form that can be particularly useful is the Interrupt Details which shows the distribution of the individual interrupts across the CPUs. What makes this form especially handy is it only shows those interrupts that changed during the monitoring cycle which is considerable easier to read than /proc/interrupts itself.
[mjs]# ./collectl.pl -sJ -oT
# INTERRUPT DETAILS
#          Int    Cpu0   Cpu1   Cpu2   Cpu3   Cpu4   Cpu5   Cpu6   Cpu7   Type            Device(s)
12:48:50   082       0      0      0   7731      0      0      0      0   PCI-MSI-X       eth2 (queue 0)
12:48:50   098       0      0      0      0   2037      0      0      0   PCI-MSI-X       eth2 (queue 2)
12:48:50   122       0      0   2240      0      0      0      0      0   PCI-MSI-X       eth2 (queue 5)
12:48:50   138       0   7084      0      0      0      0      0      0   PCI-MSI-X       eth2 (queue 7)
12:48:50   154       0      0      0      0      0   7723      0      0   PCI-MSI-X       eth3 (queue 0)
12:48:50   162    9082      0      0      0      0      0      0      0   PCI-MSI-X       eth3 (queue 1)
12:48:50   178       0      0      0      0      0      0   8253      0   PCI-MSI-X       eth3 (queue 3)
12:48:50   210       0      0      0      0      0      0      0   6417   PCI-MSI-X       eth3 (queue 7)
12:48:50   218       1      0      0      0      0      0      0      0   PCI-MSI         eth0

Restrictions

If the number of CPUs change during processing, this can only be detected when monitoring CPU data at the same time. If you are only monitoring interrupt data and there is a state change things will get very messy. As users typically monitor Interrupts and CPU data at the same time it is not felt to be worth the extra effort or processing overhead to try and accommodate this rare case.
updated June 25, 2010
collectl-4.3.1/docs/Startup.html0000664000175000017500000000476513366602004014762 0ustar mjsmjs collectl - Startup and Initialization

Startup and Initialization

Introduction

Much of this is really intended for those who want to learn more about how collectl initializes and don't feel the urge to weigh through the code. Quite frankly I don't blame you!

When you run the collectl command, it goes through a number of initialization steps most of which are not being documented here. Rather as initialization internals becomes more important or identified as areas people want to here more about, now I'll have a place to put it.

Finding formatit.ph

Collectl basically consists to 2 main perl scripts, the first is collectl itself, whose main job is reading data from /proc and writing it to a raw file. The functions for generating formatted output live in formatit.ph. When collectl is installed from an rpm, debian package or one has simply executed INSTALL, formatit.ph is copied to /usr/share/collectl. This is also the directory collectl looks in for --include and --export modules.

However, there are some cases in which one wishes to install collectl in a non-standard location by modifying the INSTALL script or they're modifying collectl and want to test those changes without disturbing the installed copy. To accomodate this feature, collectl actually looks in a couple of places to locate formatit.ph, based on the directory collectl itself is run from so let's call that $bindir:

  • If $bindir/formatit.ph exists, load that one
  • Otherwise, if $bindir/../share/collectl/formatit.ph exists, load that one
  • Otherwise, if /usr/share/collectl/formatit.ph exists, load that one
  • Otherwise, error!

Finding collectl.conf

A similar mechanism exists for opening/reading the collectl.conf file which among other things sets the parameters used by collectl run running as a Daemon. This search path is also based on $bindir.
  • If collectl is run with -C configfile read settings from there
  • Otherwise, if $bindir/collectl.conf exists, read settings from there
  • Otherwise, if $bindir../../etc/collectl.conf exists, use that one
  • Otherwise, if /etc/collectl.conf exists, read that one
  • Otherwise, error!
updated April 29, 2014
collectl-4.3.1/docs/Performance.html0000664000175000017500000001371413366602004015553 0ustar mjsmjs collectl - Performance

Performance

Introduction

When thinking about the performance of collectl keep in mind that efficiency is one of the main design principles and it should therefore be possible to run collectl as a daemon on most systems with little concern for the overhead it generates. However, given that the another design principal it to be able to collect a very broad and deep set of data, it is worth understanding a little more about just what goes on in collectl.

During the earlier days of collectl development a third design consideration was minimizing the amount of storage used for raw files. At the time disks were smaller and the amount of data collected was small enough that being more judicious about what was saved felt like it made a difference. However, since the inclusion of process, slab and now interrupt data along with larger disks the amount of storage used has become less of a consideration. In other words don't spend extra data collection cycles trying to be more selective about what is recorded if it's going to add to the overhead.

Measuring the Overhead

If you take a closer look at the architecture you can see that the path of minimal overhead is when collectl reads from /proc and write it to a file. This is not an accident! In fact, on many system this has been observed to take less than 0.1% of the CPU and this is when collecting almost all the data collectl is capable of. However, this also raises several important questions for consideration:
  • What is the overhead on a specific system?
  • Is there a way to reduce the overhead further?
  • Is collection overhead for all data types the same and if not, what is it?
In fact, these are the very questions I asked during the initial development as well as every time I add support for new data types. In order to be able to answer these questions quantitatively, collectl should be run with a collection interval of 0 and told to collect 8640 samples, the number of 10-second samples in a day. By timing the execution you can then get a reasonable estimate of the daily overhead. Consider the following:
# time collectl -scdnm -i0 -c8640 -f /tmp
real    0m9.711s
user    0m7.480s
sys     0m2.140s
and you can see that collectl uses about 10 seconds out of 86400 or about 0.01% of the cpu to collect cpu, disk, network and memory data. If we repeat the test again for just cpu the time drops to under 4 seconds, so you can see if performance is really critical, you can improve things by recording less data or maybe just do it less frequently. The point is these are tradeoffs only you can make if you feel collectl is using too much resource.

So what happens if you take a different processing path and save collectl data in plot format? This means adding the additional overhead of parsing the /proc data and performing some basic math of the values. If we use the same command as above and include -P:

# time collectl -scdnm -i0 -c8640 -f /tmp
real    0m20.607s
user    0m17.970s
sys     0m2.580s
we can see that this takes a little over twice as much overhead even though it is still pretty low.

One other example worth mentioning and is process monitoring overhead, which is the highest overhead operation collectl can do and one of the reasons it has its own monitoring interval. The overheard for collection of this type of data can vary quite broadly depending on how may processes are running at the time and on a system with only 138 processes look at this:

# time ./collectl.pl -sZ -i0 -c8640 -f /tmp
real    1m8.453s
user    0m54.650s
sys     0m13.430s
noting collectl is also smart enough to only look at 1/6 as many samples of process data since that is the default relationship of process monitoring to other subsystem data. This also leads to the mention of a way to further optimize process monitoring. If you are monitoring a specify set of processes, say http daemons, collectl no longer has to look at as much data in /proc and so we now see:
# time ./collectl.pl -sZ -Zchttp -i0 -c8640 -f /tmp
real    0m5.721s
user    0m4.480s
sys     0m1.180s
In fact, if we know there are never going to be any new http process appearing (collectl looks for new processes that match selection strings by default):
# time ./collectl.pl -sZ -Zchttp -i0 -c8640 --procopts p  -f /tmp
real    0m5.080s
user    0m3.930s
sys     0m1.130s
And things get even better as you could even image monitoring these processes at the same interval as everything else with almost no additional overhead!

Another wrinkle - process I/O statistics

Since the inclusion of process I/O stats, collectl now has to read an addition set of data for each process, specifically /proc/pid/io, which adds over 25% to the total process data collection load. For most users, collectl's overhead is reasonable low enough for this extra overhead to not be a problem. Remember - we're talking about an increase of 25% to a relatively small number. But for those concerned with this extra overhead, a new --procopt value of I has been added to V3.6.1 which suppresses the reading of process I/O stats.

Remember - your milage can and will vary!

The number of different combinations of switches one can measure far exceeds the scope of this discussion - for example we haven't even talked about the pros/cons of compression. But in conclusion, the main thing to remember is collectl's overhead is already pretty low but if you're really concerned, measure it yourself and adjust accordingly.
updated Jan 15, 2012
collectl-4.3.1/docs/Messages.html0000664000175000017500000000341213366602004015053 0ustar mjsmjs collectl - Messages

Operational Messages

When something that may be of interest occurs, collectl calls an internal message reporting routine and assigns that message a status of Informational, Warning, Error or Fatal. In all cases, if a message of type Fatal is encountered, collectl will terminate. In all other cases it continues executing, often skipping what it was trying to do when the error occurred. The way collectl deals with these messages is controlled by several factors:

Interactive Messages

Daemon Mode Messages, requires -f and -m
updated Feb 21, 2011
collectl-4.3.1/docs/Logging.html0000664000175000017500000002365213366602004014702 0ustar mjsmjs collectl - Logging

Logging

Overview

Collectl supports 2 very basic data logging mechanisms. In the first case it will log the data as read from /proc to a file with the extension raw or raw.gz, depending on whether or not the perl module Compress::Zlib.pm has been installed. If not, one can always install compression at a later time and collectl will happily use it the next time it is started. One useful property of raw files is that one can play them back using different switches/options for display or generation of plottable files from them.

The second major form of logging is writing data to one or more tabularized, also known as plottable files, which have the extension tab for data associated with the core subsystems or one of several other files for the detail data associated with devices like cpus, disks, networks, etc.

The biggest benefit of raw files is they are very lightweight to create in that no additional processing is performed on the data. Since they contain the unaltered /proc data from which collectl derives its numbers to report, it is always possible to go back and look at the orginal data. In some cases, there is data in the raw file that was easier to collect than ignore and in these situations one can actually see more data than is normally available. In fact the --grep switch is available for looking for data in the raw files and prefacing them with timestamps, something the standard grep command cannot do.

As their type implies, plottable files have their data in a form that is ready to be plotted with tools like gnuplot or immediately loadable into a spreadsheet like OpenOffice or Excel or any other tool that can read space-separated data. When generated by collectl while it is running, this data can be read while it is being generated making it possible to do real-time monitoring/display of it. For situations where a tool requires data be delimited by something other than spaces, one can change the data separator with --sep. In fact, for the case where a tool such as rrd requires the date be in UTC format, you can even change the timestamp format using --utc.

Logging Options

There are 2 switches you should become familiar with for logging data, noting that you cannot write to a file and the terminal at the same time. As collectl continues to grow in functionality and collect more data, linux is also growing in complexity and increasing the number of active processes as well as expanding the number of slabs. There is a 3rd option which has been around for quite awhile but has had minimal use or discussion and that is the -G or --group switch. When specified, this tells collectl to write process and slab data to a second file named rawp (initially it only contained process data). The main reason for doing so is because without this switch a typical raw file, even when compressed, can approach 50MB or more, growing even larger as the number of active processes grows.

While large files are nothing new to collectl, playing them back either for the purpose of drilling into the data or to simply generate plot files can become very expensive in terms of time and CPU load. In extreme some cases it can take tens of minutes to process a single, large raw file and even in normal cases it will take multiple minutes. Having collectl write to 2 separate files doesn't add any additional overhead or disk space but can significantly reduce the playback time when you are not interested in slab or process data, which is often the case during initial analysis. As a data point, on my development system, single compressed collectl logs are on the order of 35MB. When using the -G switch, it generated a pair of files where the process/slab data is about 34MB and the file with the rest of the data is only 1MB making the raw, where all the subsystem details are stored, very efficient to process in playback mode, taking about a minute compared to 5 minutes when that file includes slab and process data.

For most users this is all you need to know. On the other hand if want to use collectl to feed data to other tools or perhaps log to both raw and plot files at the same time, read on...

Logging both raw and plottable data at the same time!

The main benefit in requesting collectl to write its data in plottable form is that data becomes available for immediate plotting without any post-processing required, the one expense being some additional processing overhead. However there are a few potential limits in doing so that should be understood.

First and foremost, once a plottable file has been created the original data from which it was created is lost forever. In many cases that is fine as many users feel there is really no need to go back to the original source. However, one often collects summary data because that is what they are interested in, but then later decides they want to look at the details. This can be easily done by just replaying the raw file and requesting details be displayed or (re)written to a plottable file. If the raw file had not been generated, this option would not be possible.

A second limitation with plottable data files is that one cannot easily examine the data by timeframes and when there are multiple data files involved, it is not easy to look at all the data together as time-oriented samples without plotting it. It is always possible to write a script that merges this data together, but that functionality is natively built into collectl when used in playback mode.

Finally, there are times when one might wish to go back and look at non-normalized data, for example if one has 3 processes created over a 10 second period collectl will report a rate of 0 process creations/second because it would round down and the only way to see what really happened is to play the data back with -on, which tells collectl not to normalize the data and will therefore tell you the value of the counter not its rate.

In most cases none of these restrictions should be a concern, but there may be occasions in which they are and that is where the --rawtoo switch comes in. When specified in conjunction with -P, collectl will generate raw data in addition to the plottable data, making it possible to go back to the source if/when necessary. The only real overhead is the amount of disk space required since the raw data is already sitting in a buffer and ready to be written. If the plottable files are being generated in uncompressed format, the size of the compressed raw file becomes even less significant.

Exported output, the 3rd type of file

We finally come to a third type of output, intended primarily for feeding collectl data to other programs, and that is exported output. There are currently a variety of types of exports delivered with collectl though only three are capable of generating local data files and their use complicates the picture. To better understand how logging works in the context of --export, see the description of how they work and in particular how collectl decides where and when to do logging.

If you are a little confused, and you probably should be, try experimenting with various combinations of switches and see which files get generated.

The overhead

So what is the overhead associated with all this logging? From the perspective of CPU load it can be quite minimal since in most cases the data is already in hand and all that needs to be done is to write it out to one or more additional files, something that is a fairly low-overhead operation on Linux systems. If this is really a concern, measure it yourself. It you want to see how much disk space involved just examine the sizes of the file(s) created during the performance tests and see for yourself.
updated Feb 21, 2011
collectl-4.3.1/docs/Data-detail.html0000664000175000017500000010275213366602004015424 0ustar mjsmjs Detail Data

Detail Data

Buddy (Memory Fragmentation) Data, collectl -sB

# MEMORY FRAGMENTATION (4K pages)
#Node    Zone      1Pg    2Pgs    4Pgs    8Pgs   16Pgs   32Pgs   64Pgs  128Pgs  256Pgs  512Pgs 1024Pgs
This table shows the number of memory fragments by pagesize in increasing powers of 2 for various types of memory defined by the combination of Node and Zone.

CPU Data, collectl -sC

# SINGLE CPU STATISTICS
#   Cpu  User Nice  Sys Wait IRQ  Soft Steal Guest NiceG Idle INTRPT
These are the same fields as reported for verbose CPU Summary, each preceeded by the CPU number. If running collectl V2.5.0 or greater AND you request interrupt summary data, the INTRPT field will also be included.

CPU The CPU number which the stats are associated with
User Time spent in User mode, not including time spend in "nice" mode.
Nice Time spent in Nice mode, that is lower priority as adjusted by the nice command and have the "N" status flag set when examined with "ps".
Sys This is time spent in "pure" system time.
Wait Also known as "iowait", this is the time the CPU was idle during an outstanding disk I/O request. This is not considered to be part of the total or system times reported in brief mode.
Irq Time spent processing interrupts and also considered to be part of the summary system time reported in "brief" mode.
Soft Time spent processing soft interrupts and also considered to be part of the summary system time reported in "brief" mode.
Steal Time spend in involuntary wait state while the hypervisor was servicing another virtual processor.
Guest Time spent running a virtual CPU for guest operating systems under the control of the Linux kernel, new since 2.6.24
NiceG Time spent running a niced guest (virtual CPU for guest operating systems under the control of the Linux kernel), new since 2.6.33
Idle Time spent idle, nothing that since CPU numbers are rounded off, they may not always add up to 100%
Intrpt If the interrupt summary stats were requested at the same time, this will be included which is the aggregate number of interrupts for each CPU.

Disk Data, collectl -sD

If you specify filtering with --dskfilt, the disk names that match the pattern(s) will either be included or excluded from the the summary data. However, the data will still be collected so if recorded to a file can later be viewed. Note: if you specify --dskopts f, fractional values will be reported for some of the fields for more precision.

# DISK STATISTICS (/sec)
#          <---------reads---------><---------writes---------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
Name Name of the disk the statistics are being reported for.
KBytes KB read/sec
Merged Read requests merged per second when being dequeued.
IOs Number of reads/sec
Size Average read I/O size in KBytes
KBytes KB written/sec
Merged Write requests merged per second when being dequeued.
IOs Number of writes/sec
Size Average write I/O size in KBytes
RWSize Average combined read and write I/O size in KBytes. This is not the average of the read and write sizes but rather the sum of the reads/write divided by the number of I/Os
QLen Average number of requests queued
Wait Average time in msec for a request has been waiting in the queue
SvcTim Average time in msec for a request to be serviced by the device
Util Percentage of CPU time during which I/O requests were issued

Infiniband, collectl -sX

# INFINIBAND STATISTICS (/sec)
#HCA    KBIn   PktIn  SizeIn   KBOut  PktOut SizeOut  Errors
HCAHCA instance name
KBInKB received/sec.
PktInReceived packets/sec.
SizeInAverage incoming packet size in KB
KBOutKB transmitted/sec.
PktOutTransmitted packets/sec.
SizeOutAverage outgoing packet size in KB
ErrsCount of current errors. Since these are typically infrequent, it is felt that reporting them as a rate would result in either not seeing them OR round-off hiding their values.

Interrupts, collectl -sJ

# INTERRUPT DETAILS
# Int    Cpu0   [Cpu...]   Type            Device(s)

IntInterrupt number within the range 0-255. Note that only those interrupts that have had any activity since the last monitoring interval will be reported
CPUn...The CPU for which the interrupt count is being reported. There will be one column/CPU
TypeInterrupt type, whitespace removed
DeviceThe names of the devices which are generating this interrut as a comma separated list

Lustre Data, collectl -sL

There are several formats the lustre detail data can take based on whether you're looking at a client or an OSS (there is not any MDS specific detail data, though it does share the same disk-level buffer size data as the OSS). Furthermore, if one specifies the -sLL form of the detail switch OST level details will be reported where appropriate.

Lustre Client, collectl -sL

# LUSTRE CLIENT DETAIL (/sec)
#Fils  KBRead  Reads SizeKB KBWrite Writes SizeKB
FilsysName of the filesystem these stats apply to
KBReadKBs read/sec
SizeKBAverage read size
Reads Reads/sec
KBWriteKBs written/sec
WritesWrites/sec
SizeKBAverage write size

Lustre Client, collectl --lustops O

# LUSTRE CLIENT DETAIL (/sec)
#Fils  Ost     KBRead  Reads SizeKB KBWrite Writes SizeKB
The data here is the same as that reported for the standard client side lustre data except now it is broken down by OST within the file system.

FilsysName of the filesystem these stats apply to
Ost OST name within the filesystem
KBReadKBs read/sec
Reads Reads/sec
SizeKBAverage read size
KBWriteKBs written/sec
WritesWrites/sec
SizeKBAverage write size

Lustre Client RPB-Buffer Stats, collectl --lustopts B

# LUSTRE CLIENT DETAIL: RPC-BUFFERS (pages)
#Filsys  Ost   RdK  Rds   1K   2K   ...   WrtK Wrts   1K   2K   ...
This form also includes the reads/writs within the filesystem, but also add the sizes of the RPM buffers. Since these numbers always apply to OSTs you need to use the -sLL form of the subsystem switch.

FilsysName of the filesystem these stats apply to
OstOST name within the filesystem
RdKKBs read/sec
RdsReads/sec
nKNumber of pages of of this size read
WrtKKBs written/sec
WrtsWrites/sec
nKNumber of pages of of this size written

Lustre Client Metadata, collectl -sL --lustopts M

# LUSTRE CLIENT DETAIL: METADATA
#Filsys   KBRead  Reads KBWrite  Writes  Open Close GAttr SAttr  Seek Fsync DrtHit DrtMis

FilsysName of the filesystem these stats apply to
KBReadKBs read/sec
ReadsReads/sec
KBWriteKBs written/sec
WritesWrites/sec
OpenOpens/sec
CloseCloses/sec
GAttrGet Attributes/sec
SAttrSet Attributes/sec
SeekSeeks/sec
FsyncFSyncs/sex
DrtHitDirty Hits/sec
DrtMisDirty Misses/sec

Lustre Client Readhead, collectl -sL --lustopts R

# LUSTRE CLIENT DETAIL: READAHEAD
#Filsys   KBRead Reads  KBWrite Writes  Pend  Hits Misses NotCon MisWin LckFal  Discrd ZFile ZerWin RA2Eof HitMax

FilsysName of the filesystem these stats apply to
KBReadKBs read/sec
ReadsReads/sec
KBWriteKBs written/sec
WritesWrites/sec
PendPending issued pages
HitsHits
MissesMisses
NotConReadpage not consecutive
MisWinMiss inside window
LckFalFailed lock match
DiscrdRead but discarded
ZFileZero length file
ZerWinZero size window
RA2EofRead-ahead to EOF
HitMaxHit max r-a issue

Lustre OSS, collectl -sL

# LUSTRE FILESYSTEM SINGLE OST STATISTICS (/sec)
#Ost            KBRead   Reads  SizeKB    KBWrite  Writes  SizeKB
OstOST name
KBReadKBs read/sec
ReadsReads/sec
SizeKBAverage read size
KBWriteKBs written/sec
WritesWrites/sec
SizeKBAverage write size

Lustre OSS RPC Buffers, collectl -sL --lustopts B

# LUSTRE FILESYSTEM SINGLE OST STATISTICS
#Ost            RdK  Rds   1P   2P  ...  WrtK Wrts   1P   2P  ...
FilsysName of the filesystem these stats apply to
OstOST name within the filesystem
RdKKBs read/sec
RdsReads/sec
nPNumber of pages of of this size read
WrtKKBs written/sec
WrtsWrites/sec
nPNumber of pages of of this size written

Lustre OSS and MDS Disk Buffers, collectl -sL --lustopts D

This display is very similar the the RPC buffers in that the sizes of different size I/O requests are reported. In this case there are requests send to the disk driver. Note that this report is only available for HP's SFS.

# LUSTRE DISK BLOCK LEVEL DETAIL (units are 512 bytes)
#DISK RdK  Rds 0.5K   1K   2K   ...   WrtK Wrts 0.5K   1K   2K   ...
DiskName of the disk these stats apply to
RdKReads/sec
RdsKBs read/sec
nKNumber of blocks of of this size read
WrtKWrites/sec
WrtsKBs written/sec
nKNumber of blocks of of this size written

Memory Data, collectl -sM

This is also known as numa data and provides detail information about memory utilization in each numa node.

# MEMORY STATISTICS
# Node    Total     Used     Free     Slab   Mapped     Anon    Locked  Inact HitPct
Node Numa node number, which is usually the same as a physical socket
Total Total physical memory
Used Used physical memory. This does not include memory used by the kernel itself.
Free Unallocated memory
Slab Memory used for slabs, see collectl -sY
Mapped Memory mapped by processes
Anon Anonymous memory
Locked Locked memory
Inactive Inactive pages, which is the sum of Inactive(anon) and Inactive(file). Note that Inactive(anon) is not considered nor included in the previous anonynous memory field.
Hit% It is currenlty not entirely clear how useful this number actually is as it refers to the hit percentages for both local and foreign memory as a single number. Most importat, it does not refer to memory access but rather memory allocation. In other words, it does not differentiate between one failing to allocation memory and only referencing it a small number of time vs a very large number of times. Clearly the latter would be more interesting from a performance perspective.

Network Data, collectl -sN

If you specify filtering with --netfilt, the names that match the pattern(s) will either be included or excluded from the the summary data. However, the data will still be collected so if recorded to a file can later be viewed.

# NETWORK STATISTICS (/sec)
#Num    Name  KBIn  PktIn SizeIn  MultI   CmpI  ErrsI  KBOut PktOut  SizeO   CmpO   ErrsO

Num Each network interface is numbered, starting with 0
Name Name of the interface
KBIn Incoming KB/sec
PktIn Incoming packets/sec
SizeI Average incoming packet size in bytes
MultI Incoming multicast packets/sec
CmpI Incoming compressed packets/sec
ErrsI Total incoming errors/sec. This is an aggregration of incoming errors. To see explicit error counters use --netopts e
KBOut Outgoing KB/sec
PktOut Outgoing packets/sec
SizeO Average outgoing packet size in bytes
CmpO Outgoing compressed packets/sec
ErrsO Total outgoing errors/sec. This is an aggregation of outgoing errors. To see explicit error counters use --netopts e

Network Data, collectl -sN --netopts e

# NETWORK ERRORS SUMMARY (/sec)
#Num    Name   ErrIn  DropIn  FifoIn FrameIn    ErrOut DropOut FifoOut CollOut CarrOut
Num Each network interface is numbered, starting with 0
Name Name of the interface
ErrIn Receive errors/sec detected by the device driver
DropIn Receive packets dropped/sec
FifoIn Receive packet FIFO buffer errors/sec
FrameIn Receive packet framing errors/sec
ErrOut Transmit errors/sec detected by the device driver
DropOut Transmit packets dropped/sec
FifoOut Transmit packet FIFO buffer errors/sec
CollOut Transmit collisions/sec detected on the interface
CarrOut Transmit packet carrier loss errors detected/sec

NFS Data, collectl -sF

By default, collectl will report all 6 types of nfs data (clients and servers for all 3 versions of nfs) unless one has limited the reporting with --nfsfilt. For versions prior to 3.2.1 which only collected a single type of data, collectl will only report that single type.

Also note that nfs V2 records 2 fields that V3 doesn't record and V4 records many more. At this time and detail data is standardized on the V3 format and fields not collected will be left blank. These fields map onto those reported by nfsstat as indicated and are reported as rates.

# NFS SERVER/CLIENT DETAILS (/sec)
#Type Read Writ Comm Look Accs Gttr Sttr Rdir Cre8 Rmov Rnam Link Rlnk Null Syml Mkdr Rmdr Fsta Finf Path Mknd Rdr+
Typea combination of Clt or Svr and one of 2, 3 or 4
ReadReads
WritWrites
CommCommits
LookLookups
AccsAccesses
GttrGetattrs
SttrSetattrs
RdirReaddirs
Cre8Creates
RmovRemoves
RnamRenames
LinkLinks
RlnkReadlinks
NullNulls
SymlSymlinks
MkdrMkdirs
RmdrRmdirs
FstaFsstats
FinfFsinfos
PathPathconfs
MkndMknods
Rdr+Readdirpluses

Process Data, collectl -sZ

There are actually multiple formats process data can be displayed in, the default being the one shown immediately below. By using --procopts as shown in later examples, you can change what is displayed, noting that when playing back the data from a raw file, all information has been recorded and so you can actually play it back multiple times and see different views.

These switched can also be use in conjunction with --top.

# PROCESS SUMMARY (faults are /sec)
# PID  User     PR  PPID S   VSZ   RSS  CP  SysT  UsrT Pct  AccuTime  RKB  WKB MajF MinF Command
PID Pid of the process
User Name of user which this process is running under. In playback mode on a different machine, use -oP to direct collectl to use the password file named in collectl.conf (default is /etc/passwd) to lookup the corresponding username. Otherwise the UID will be reported instead.
PR Process priority
PPID PID of this process's parent
S Process State: S - Sleeping, D - Uninterruptable Sleep, R - Running, Z - Zombie or T - Stopped/Traced
VSZ This is the amount of VS memory used by this process
RSS This is the amount of RSS memory used by this process
CP CPU number this process is currently running on
SysT The amount of System Time this process used during this interval
UsrT The amount of User Time this process used during this interval
Pct Percentage of the current interval taken up by this task (the User and System time are used for this calculation)
AccuTime Total accumulated System and User time since the process began execution
RKB This is the number of kilobytes of data written by each process. Both this and the WKB field are only present if the kernel had proces I/O monitoring enabled which is not the default as of 2.6.23.
WKB This is the number of kilobytes of data read by each process
MajF Major Page Faults per second
MinF Minor Page Faults per second
Command Command that is running. Path and command line options are NOT included unless --procopts w

Process Data, collectl -sZ --procopts x

This format is essentially idential to the last except that it adds extended information to the display, specifically VCtx and NCtx.

# PROCESS SUMMARY (counters are /sec)
# PID  User     PR  PPID THRD S   VSZ   RSS CP  SysT  UsrT Pct  AccuTime  RKB  WKB VCtx NCtx MajF MinF Command
VCtx           Voluntary context switches
NCtx Non-voluntary context switches

Process I/O Data, collectl --procopts i

# PID  User    PPID S  SysT  UsrT   Pct AccuTime   RKB   WKB  RKBC  WKBC  RSys  WSys  Cncl  Command
PID Pid of the process
User Name of user which this process is running under. In playback mode on a different machine, use -oP to direct collectl to use the password file named in collectl.conf (default is /etc/passwd) to lookup the corresponding username. Otherwise the UID will be reported instead.
PPID PID of this process's parent
S Process State: S - Sleeping, D - Uninterruptable Sleep, R - Running, Z - Zombie or T - Stopped/Traced
SysT The amount of System Time this process used during this interval
UsrT The amount of User Time this process used during this interval
Pct Percentage of the current interval taken up by this task (the User and System time are used for this calculation)
AccuTime Total accumulated System and User time since the process began execution
RKB Attempt to count the number of bytes which this process really did cause to be fetched from the storage layer by doing calls to read_bytes. This is done at the submit_bio() level, so it is accurate for block-backed filesystems.
WKB Attempt to count the number of bytes which this process caused to be sent to the storage layer by doing calls to write_bytes. This is done at page-dirtying time.
RKBC Number of bytes which were read via read, readv, pread and sendfile. Since these requests are satisfied from kernel pagecache they won't be accounted for by RKB, because they didn't require any I/O.
WKBC Number of bytes which were written via write, writev, pwrite and sendfile. Like RKBC, since the I/O uses the pagecache these values won't be accounted for by WKB.
RSys Number of read syscalls, specifically: read, pread, readv and sendfile
WSys Number of write syscalls, specifically: write, pwrite, writev and sendfile
Cncl Number of cancelled write bytes.

Process Memory Data, collectl --procmem

# PID  User     S VmSize  VmLck  VmRSS VmData  VmStk  VmExe  VmLib  VmSwp MajF MinF Command
PID Pid of the process
User Name of user which this process is running under. In playback mode on a different machine, use -oP to direct collectl to use the password file named in collectl.conf (default is /etc/passwd) to lookup the corresponding username. Otherwise the UID will be reported instead.
S Process State: S - Sleeping, D - Uninterruptable Sleep, R - Running, Z - Zombie or T - Stopped/Traced
VmSize Size of Virtual memory used by the entire process
VmLck Size of Locked Virtual Memory
VmRSS Size of Resident Virtual Memory
VmData Size of Virtual Memory used for heap
VmStk Size of Virtual Memory used for stack
VmExe Size of Virtual Memory used for exe and statically linked libraries
VmLib Size of Virtual Memory used for dynamically linked libraries
VmSwp Size of Virtual Memory used for swapping. This does not necessarily mean a process is actively swapping but only that the memory has been mapped.
MajF Major Page Faults per second
MinF Minor Page Faults per second
Command Command that is running. Path and command line options are NOT included unless --procopt w

Slab Data, collectl -sY

There are actualy 3 different formats for slab data. The first applies to all kernels prior to 2.6.22 and contains the same fields as the Summary report for each named slab and loses the caches fields.

# SLAB DETAIL
#          <-----------Objects----------><---------Slab Allocation------><----Change-->
#Name      InUse   Bytes   Alloc   Bytes   InUse   Bytes   Total   Bytes   Diff    Pct
Objects
InUse Total number of objects that are currently in use.
Bytes Total size of all the objects in use.
Alloc Total number of objects that have been allocated but not necessarily in use.
Bytes Total size of all the allocated objects whether in use or not.
Slab Allocation
InUse Number of slabs that have at least one active object in them.
Bytes Total size of all the slabs.
Total Total number of slabs that have been allocated whether in use or not.
Bytes Total size of all the slabs that have been allocted whether in use or not.
Diff Change in size of this slab since last sample in bytes
Pct Percentage change in size of this slab

This second format applies to the new SLUB allocator starting with the 2.6.22 kernel. As with the old format slab detail report, the same fields as are found in the Slab Summary Report are shown for each named slab.

# SLAB DETAIL
#                             <----------- objects --------><--- slabs ---><---------allocated memory-------->
#Slab Name                    Size  /slab   In Use    Avail  SizeK  Number     UsedK    TotalK   Change    Pct
Objects
Size Size of a single slab object
/Slab The number of objecs in a single slab
InUse The total number of objects that have been allocated to processes.
Avail The total number of objects that are available in the currently allocated slabs. This includes those that have already been allocated toprocesses.
Slabs
SizeK The size of one slab, which typically contains multiple objects
Number This is the number of individual slabs that have been allocated and taking physical memory.
Memory
UsedK Memory used by those objects that have been allocated to processes.
TotalK Total physical memory allocated to processes. When there is no filtering in effect, this number will be equal to the Slabs field reported by -sm.
Change Change in size of this slab since last sample in bytes
Pct Percentage change in size of this slab

The third format uses a common format based on the slabtop utility for displaying top slab data sorted by any of the listed column headers. All on need to do is use one of these names as the argument to the --top switch in lower case (use --showtopopts for full list of options to the --top switch). All the same rules apply for controlling the number of lines in the data section, mixing in subsystem data and even using with playback mode.

# TOP SLABS 15:38:08
#NumObj  ActObj  ObjSize  NumSlab  Obj/Slab  TotSize  TotChg  TotPct  Name
NumObjTotal number of objects that are available, which includes those in use
ActObjNumber of objects that are in use
ObjSizeSize of an individual slab object
NumSlabTotal number of slabs that have been allocated
Obj/SlabThis is the constant number of objects that fit into one slab
TotSizeAmount of memory consumed by this slab even if not all objects are actualy allocated
TotChgChange in size of this slab since last sample in bytes
TotPctPercentage change in size of this slab
NameSlab Name
updated July 23, 2014
collectl-4.3.1/docs/CPUs.html0000664000175000017500000000511513366602004014120 0ustar mjsmjs collectl - CPU Monitoring

CPU Monitoring

Introduction

As of Version 3.4.2 of collectl, collectl can now detect dynamic changes to a CPU's state. In other words, going offline or coming back online. When one or more CPUs is indeed found to be off line, collectl will include a message in an output header to indicate this. Furthermore, when display CPU numbers in headers, those names will have their number changed to Xs to indicate this has occurred, such as in the following:
[root@node02 mjs]# ./collectl.pl -scj
waiting for 1 second sample...
# *** One or more CPUs disabled ***
#<--------CPU--------><-----------------Int------------------>
#cpu sys inter  ctxsw Cpu0 Cpu1 Cpu2 CpuX Cpu4 Cpu5 Cpu6 Cpu7
   0   0  1051     49 1000   17    0    0    4    0    0   29
As the state changes, the headers will change accordingly. If there is a state change between headers this won't be seen until the next headers are displayed. If displaying detail data, one CAN tell the stat has changed. In the case of looking at only CPU data, ALL percentages for a CPU that is offline will display as zeros. If looking at interrupts, the CPU number will be changed to an X in the header (see Restrictions below).

When logging to a file, if any CPUs are found to be offline when collectl starts, that number will be written to the file header in the field CPUsDis. A new flag D will also be added to the Flags field. However, one will still see the same effects of a CPU state change in the output during playback.

Restrictions
If a CPU goes offline after collectl has started and one is logging to disk, it will not be noted in the file header.

When monitoring process data, this header will indicate if a CPU was found to be offline at the time collectl started as well as during processing. However, if the state changes and you're not explicitly displaying CPU data, there will be no indication of dynamic CPU state changes reported.

If you are only monitoring interrupt data and there is a state change things will get very messy. As users typically monitor Interrupts and CPU data at the same time it is not felt to be worth the extra effort or processng overhead to try and accommodate this rare case.
updated Dec 13, 2011
collectl-4.3.1/docs/Matrix.html0000664000175000017500000000541613366602004014556 0ustar mjsmjs Command Equivalence Matrix

Command Equivalence Matrix

The following table needs a little explaining as it is rarely the case that the output of a particular utility maps identically to an equivalent collectl command. In some cases a single sar switch will approximate a single collectl subsystem and in other case it takes multiple sar switches to do so.

The point is this matrix is only a rough guideline to some command equivalences and simply changing the case of the collectl subsystem or adding the verbose switch will provide even more output. There are even some options (see the uppercase O switch) that can also effect what gets displayed. And of course there is the lowercase o switch which allows you to specify formatting options like date, time, etc.

In other words there are simply too many permutations to list them all here and you're just going to have to experiment.

Finally, there is a special case worth noting. There is no collectl command equivilent to sar -R. It's not entirely clear to me if this data is needed, but it certainly could be added if the demand is there.

SAR CommandsEquivalence Other CommandsEquivalence
sar -bcollectl -sd iostat -xcollectl -sD
sar -Bcollectl -sm --verbose /proc/fs/lustrecollectl -sl --verbose
sar -clquwcollectl -sx --verbose mpstat collectl -sc --verbose
sar -dcollectl -sD netstat -i collectl -sN
sar -cI XALLcollectl -sJ nfsstat collectl -sfF
sar -n ALLcollectl -sfsN perfquery collectl -sx
sar -P ALLcollectl -sC cat /proc/net/netstat collectl -sY
sar -rcollectl -sm --verbose ps collectl -sZ
sar -Rcollectl -sm --memopts R slabtop collectl -sY --slabopts S
sar -vcollectl -si top collectl --top -scm
sar -Wcollectl --vmstat vmstat collectl --vmstat
updated October 4, 2011
collectl-4.3.1/docs/Data-brief.html0000664000175000017500000003276213366602004015254 0ustar mjsmjs Brief Data

Brief Data

This format does NOT include data for any individual devices such as cpu, disk, network, nfs, lustre, process, slabs or tcp. If you do select any one of them collectl will force --verbose format.

Buddy (Memory Fragmentation) Data, collectl -sB

The smaller the value the more accuracy as follows:

Remember, the whole purpose of monitoring fragments in brief mode is to identify trends, particularly when there are a small number of them.
#<---Memory-->
#   Fragments
  smkj9576040

CPU Data, collectl -sc

#<--------CPU-------->
#cpu sys inter  ctxsw
cpu Percent of time the cpu was busy during the current interval averaged across all CPUs and is actually the total percentage of time the CPU in one of the following: system, user, nice, irq, soft-irq and steal. Note that this does NOT include time spend in I/O wait.
sys Percentage of time the cpu was executing in system mode during the current interval. This includes all those modes as above except user and nice to to determine the amount of time spent as a user you need to subtract these from the total cpu field.
inter Total number of interrupts/sec.
ctxswTotal number of context switches/sec.

Disk, collectl -sd

If you specify filtering with --dskfilt, the disks that match the pattern(s) will either be included or excluded from the the summary data. However, the data will still be collected so if recorded to a file can later be viewed.

#<---------------Disks---------------->
#KBRead  Reads  Size KBWrit Writes Size
KBReadKB read/sec
ReadsNumber of reads/sec
SizeAverage read size in KB. This field only included if --dskopts i or --iosize specified
KBWriteKB written/sec
WritesNumber of writes/sec
SizeAverage write size in KB. This field only included if --dskopts i or --iosize specified

Infiniband, collectl -sx

#<---------------InfiniBand--------------->
#  KBIn  PktIn Size  KBOut PktOut Size Errs
KBInKB received/sec.
PckInPackets received/sec.
SizeAverage incoming packet size in KB. This field is only included if --xopts i or --iosize included
KBOutKB sent/sec.
PktOutPackets sent/sec.
SizeAverage outgoing packet size in KB. This field is only included if --xopts i or --iosize included
ErrsCount of current errors. Since these are typically infrequent, it is felt that reporting them as a rate would result in either not seeing them OR round-off hiding their values.

Inodes/Filesystem, collectl -si

#<----Files--->
#Handle Inodes
   5100 116110
HandlesNumber of allocated file handles
InodesNumber of inodes in use

Lustre

Lustre data actually falls into one of 3 categories - client, mds and oss. Collectl determines the type of system it is running on (a system can have multiple personalities) and reports on all it finds, unless specifically selected via -L.

Lustre Client, collectl -sl

#<-------------Lustre Client------------->
# KBRead  Reads Size  KBWrite Writes Size
not necessarily from the lustre storage servers.
KBReadKB/sec delivered to the client.
ReadsReads/sec delivered to the client,
SizeAverage read size in KB. This field only included if --iosize specified
KBWriteKB Writes/sec delivered to the storage servers.
WritesWrites/sec delivered to the storage servers.
SizeAverage write size in KB. This field only included if --iosize specified

The following format of lustre client data is selected by including -OR and adds readahead statistics to the previous six, noting the Size fields are dependent on --iosize being specificed.

#<--------------------Lustre Client-------------------->
# KBRead  Reads Size  KBWrite Writes Size   Hits Misses
KBReadKB/sec delivered to the client.
ReadsReads/sec delivered to the client, not necessarily from the lustre storage servers.
SizeAverage read size in KB.
KBWriteKB Writes/sec delivered to the storage servers.
WritesWrites/sec delivered to the storage servers.
SizeAverage write size in KB.
HitsNumber of reads/sec from the lustre prefetch cache.
MissesNumber of misses/sec from the prefetch cache which must then be satisfied by reading from the storage servers.

Lustre MDS (Meta-Data Server)
The first format is for lustre versions 1.6.5 and beyond while the second format is for earlier releases

#<--------Lustre MDS-------->
#Gattr+ Sattr+   Sync  Unlnk

#<--------Lustre MDS-------->
#Gattr+ Sattr+   Sync  Reint
Gattr+Total number of all getattr operations/sec. See Getattr, GttrLck and Gxattr in the verbose data section
Sattr+Total number of all getattr operations/sec. See Setattr and Sxattr in the verbose data section
SyncNumber of syncs/sec
UnlnkNumber of file deletes/sec
ReintNumber of reints/sec which include unlinks and setattrs. Since older version did not break out setattrs, they are not included in Sattr+.

Lustre OSS (Object Storage Server), collectl -sc

#<--------------Lustre OST-------------->
# KBRead  Reads Size  KBWrit Writes Size
KBReadKB/sec read
ReadsReads/sec
SizeAverage read size in KB. This field only included if --iosize specified
KBWriteKB/sec written
WritesWrites/sec
SizeAverage write size in KB. This field only included if --iosize specified

Memory, collectl -sm

#<-----------Memory---------->
#free buff cach inac slab  map
freeTotal free memory, which unfortunately is NOT the difference between total memory and the following amounts allocated to used memory.
buffMemory used as system buffers.
cachThis is also commonly known as the file system buffer cache as buffered I/O uses this memory to cache the data.
inacInactive memory.
slabTotal memory allocated to slabs.
mapTotal mapped memory, which include AnonPages.

Note
If you include --memopts R, these values wil be displayed as changes/sec between intervals rather than absolute values. This switch will also honor -on in that the values will not be normalized to a rate but rather displayed as changes in size per interval.

Network, collectl -sn

If you specify filtering with --netfilt, the names that match the pattern(s) will either be included or excluded from the the summary data. However, the data will still be collected so if recorded to a file can later be viewed.

Also be sure to keep in mind that like all other data, network counters are always normalized to /sec rates. This means that something like errors, which themselves might be really small, could be reported as 0 if less than 1/2 the sampling interval. To see unnormalized values use -on.

#<------------------Network------------------>
#  KBIn  PktIn Size  KBOut  PktOut Size Error
KBInKB received/sec over all real network interfaces and therefore excludes 'lo' and 'sit'.
PktInPackets received/sec over all real network interfaces.
SizeAverage incoming packet size in bytes. This field is only included if --netopts i or --iosize specified
KBOutKB sent/sec over all real network interfaces.
PktOutPackets sent/sec over all real network interfaces.
SizeAverage outgoing packet size in bytes. This field is only included if --netopts i or --iosize specified
ErrorTotal incoming/outgoing errors/sec. To see individual error counts, use --verbose. This field is only included if --netopts e specified

NFS, collectl -sf

As of version 3.2.1 collectl now collects all types of nfs data, both clients and servers as well as versions 2, 3 and 4. In brief format it therefore reports summaried data across all nfs types as shown below. If --nfsfilt was included in the command to limit the types of data reported, those values will be included in the header line as a reminder as shown in the second form to the right, in which case only V3 server and V4 client data are being summarized.
#<------NFS Totals------>            #<------NFS [s3,c4]----->
# Reads Writes Meta Comm             # Reads Writes Meta Comm
ReadsTotal nfs reads/sec.
WritesTotal nfs writes/sec.
MetaTotal nfs meta data calls/sec, where meta data is considered to be any of: lookup, access, getattr, setattr, readdir and readdirplus, noting that not all types of nfs version report all as V3 clients/servers do.
CommTotal nfs commits/sec.

Slabs, collectl -sy

#<----slab---->
# Alloc   Bytes
AllocTotal Number of slabs allocated
BytesTotal Number of bytes allocated as slabs

Sockets, collectl -ss

#<------Sockets----->
#  Tcp  Udp  Raw Frag
TcpTotal TCP sockets currently in use.
UdpTotal UDP sockets currently in use.
RawTotal RAW sockets currently in use.
FragTotal number of IP fragments queues currently in use.

TCP, collectl -st

The TCP data collected and actually depends on the value of --tcpfilt, which by default is set to itcu, which stands for IP, Tcp, Udp and ICMP. A 5th filter T will result in Tcp Extended values being reported.
#<----------TCP---------->
#  IP  Tcp  Udp Icmp TcpX 
    0    0    0    0    0 
IPSummary of a number of IP errors from /prod/net/snmp: InHdrErrors+InAddrErrors+InUnknownProtos+InDiscards+OutDiscards+ReasmFails+FragFails
TcpSummary of a number of Tcp errors from /prod/net/snmp: AttemptFails+InErrs
UdpSummary of a number of Udp errors, from /prod/net/snmp: NoPorts+InErrors
IcmpSummary of a number of Icmp errors, from /prod/net/snmp: InErrors+InDestUnreachs+OutErrors
TcpXSummary of a number of Txp Extended errors, from /prod/net/snmp: TcpLoss+TCPFastRetrans
updated September 29, 2011
collectl-4.3.1/docs/SlabInfo.html0000664000175000017500000003043413366602004015005 0ustar mjsmjs collectl - SlabInfo

SlabInfo

Introduction

In version 2.6.22 of the Linux kernel, the slab allocator has been replaced by a new one called SLUB, the Unqueued Slab Allocator, and more importantly from collect's perspective, the way slab statistics are reported has changed as well. Rather than reporting all slab data in the single file /proc/slabino, there is now one subdirectory for each slab under /sys/slab. But before getting into all that here's a quick review of how slabs are organized by referring to the following diagram:

As you can see, for a given slab name there are multiple slabs and each slab consists of multiple objects. When a process requests an allocation of slab memory it is provided as an object from a slab if there is one available. If there are none, a new slab is allocated and the object provided from it. Furthermore, slub allows slabs of different names but whose objects are the same sizes to share the same slab as you can see below for the slab with the very ugly name of :0001024, which in this case is a slab which contains 1K objects. These additional entries are called aliases for obvious reasons:

drwxr-xr-x  2 root root 0 Dec 27 07:48 /sys/slab/:0001024
lrwxrwxrwx  1 root root 0 Dec 27 07:48 /sys/slab/biovec-64 -> ../slab/:0001024
lrwxrwxrwx  1 root root 0 Dec 27 07:48 /sys/slab/kmalloc-1024 -> ../slab/:0001024
lrwxrwxrwx  1 root root 0 Dec 27 07:48 /sys/slab/sgpool-32 -> ../slab/:0001024

Good news! The slab memory field reported in /proc/meminfo finally matches the total memory reported the individual slabs and so the need for collectl's slab summary usage for all slabs has been reduced. However, when selecting a subset of slabs by filter(s), the summary will show the totals for the selected slabs and will therefore be more useful. More on this in the examples below.

The main pieces of information collectl reports on for each named slab are the number of slabs that have been allocated, the corresponding number of objects and the number of objects that have actully been allocated to processes. Collectl reports the total memory associated with the slabs as well as the amount of slab memory actually being used by processes. It also reports some constants such as the number of objects/slab and the physical sizes of both objects and slabs.

Examples

The following examples show several different output formats and the commands used to produce them. One should note there is similar output for the old style slab data but. It should also be noted that an interval of 1 second has been chosen in each case and one should always consult the help and/or man pages for more detail as there are many other formatting options. Perhaps the easiest way to get started it to just type the command collectl -i:1 -sY and later add some additional switches to see their impact. For those new to collectl, you should also realize that collectl can run as a daemon logging all this in the background for later playback and that all the different subsystems it supports can be include by simply adding them to -sY.

Summary

This is the verbose, time-stamped slab summary output for only those slabs beginning with 'blk' or 'ext3'
collectl -i:1 -sy --slabfilt blk,ext --verbose -oT
# SLAB SUMMARY
#         <---Objects---><-Slabs-><-----memory----->
#          In Use   Avail  Number      Used   TotalK
13:21:10   120625  124233   30701   113894K  122832K
13:21:11   120625  124233   30701   113894K  122832K
13:21:12   120625  124233   30701   113894K  122832K

Standard Detail

Here's the same report, only now we're looking at details and tossing in msec timestamps
collectl -i:1 -sY --slabfilt blk,ext -oTm
waiting for 1 second sample...# SLAB DETAIL
#                                          <----------- objects --------><--- slabs ---><---------allocated memory-------->
#             Slab Name                    Size  /slab   In Use    Avail  SizeK  Number     UsedK    TotalK   Change    Pct
09:30:56.004 blkdev_ioc                      64     64     1183     1472      4      23        73        92        0    0.0
09:30:56.004 blkdev_queue                  1608      5       29       30      8       6        45        48        0    0.0
09:30:56.004 blkdev_requests                288     14       32       56      4       4         9        16        0    0.0
09:30:56.004 ext2_inode_cache               928      4        0        0      4       0         0         0        0    0.0
09:30:56.004 ext3_inode_cache               976      4    36916    36916      4    9229     35185     36916        0    0.0
09:30:56.004 ext3_xattr                      88     46        0        0      4       0         0         0        0    0.0

Standard detail, changes only

Here we see the same output again only this time we're simultaneously writing a large file and choosing to report on only those slabs which have changed between monitoring intervals. To make the output a little more interesting we've added filtering on 'dentry' as well:
collectl -i:1 -sY --slabfilt blk,ext,dentry --slabopts S -oT
# SLAB DETAIL
# SLAB DETAIL
#                                      <----------- objects --------><--- slabs ---><---------allocated memory-------->
#         Slab Name                    Size  /slab   In Use    Avail  SizeK  Number     UsedK    TotalK   Change    Pct
09:33:49 blkdev_ioc                      64     64     1193     1472      4      23        74        92        0    0.0
09:33:49 blkdev_queue                  1608      5       29       30      8       6        45        48        0    0.0
09:33:49 blkdev_requests                288     14       51       70      4       5        14        20        0    0.0
09:33:49 dentry                         224     18    42000    42048      4    2336      9187      9344        0    0.0
09:33:49 ext3_inode_cache               976      4    36916    36916      4    9229     35185     36916        0    0.0
09:33:51 blkdev_requests                288     14       40       70      4       5        11        20        0    0.0
09:33:51 dentry                         224     18    42006    42048      4    2336      9188      9344        0    0.0
09:34:00 dentry                         224     18    42000    42030      4    2335      9187      9340    -4096   -0.0
09:34:01 blkdev_ioc                      64     64     1191     1472      4      23        74        92        0    0.0
09:34:01 blkdev_requests                288     14       37       70      4       5        10        20        0    0.0
09:34:01 dentry                         224     18    42008    42030      4    2335      9189      9340        0    0.0

--top format

Version 3.1.1 of collectl introduces a new format for slab data as specified by --top and produces output in a format similar to the slabtop command, but adds two new fields TotChg and TotPct, which allow one to see and sort on the change to the actual physical memory allocation. These two new fields allow one to see which slabs are changing the most and/or having the most impact on physical memory because sometimes a small percentage change to a very large slab can make a big difference in memory while a large percentage change to a small slab may not and hence to 2 additional sorting alternatives.

As with process data, the argument of the switch describes how to sort the data and an optional line count. In the case of slabs the sort name matches the column headers making them easy to identify and if you forget how to get started, just run collectl with --showtopopts. Naturally filtering can be applied as well and the following shows the output when used with a couple of filters.

collectl --top numobj --slabfilt nfs,tcp

# TOP SLABS 15:56:41
#NumObj  ActObj  ObjSize  NumSlab  Obj/Slab  TotSize  TotChg  TotPct  Name
    336     336       32        3       112      12K       0     0.0  tcp_bind_bucket
    330     330      128       11        30      44K       0     0.0  nfs_page
    270     144      128        6        30      36K       0     0.0  tcp_open_request
    130     130      384       13        10      52K       0     0.0  nfs_write_data
    130      68      384        8        10      52K       0     0.0  nfs_read_data
     60      60      128        2        30     8192       0     0.0  tcp_tw_bucket

Slab Analysis

Similar to process data analysis, the --slabanalyze switch will cause the slab data to be analyzed and summaried in its own file with the .slbs, written into the directory pointed to by -f. The focus of that analysis is currently on how much memory is actually consumed by each slab, the contents of that file making it possible to determine which slabs may have had the greatest impact on memory utilization. Slabs which do not allocate any memory will not be included. The following is an example report:
anon_vma                  548864      548864      548864      548864         0      0.00
arp_cache                   4096        4096        4096        8192      4096    100.00
avc_node                    4096        4096        4096        4096         0      0.00
bdev_cache                 45056       45056       45056       45056         0      0.00
bio                        65536      114688       53248     1662976   1609728   3023.08
biovec-1                   24576       28672       12288      348160    335872   2733.33
biovec-128                  4096       12288        4096       24576     20480    500.00
biovec-16                  12288       24576        4096      102400     98304   2400.00
This report shows the starting/ending values of memory utilization for each slab as well as the minimum and maximum values over the course of the day. The final two, and perhaps the most interesting columns, show the difference in memory usage over the course of the day as well as the percentage change over the low value.

As with the process analysis report, you can control the timeframe of the report using --from/--thru and unless you explicitly specify one or more subsystems no other data will be reported. On the other hand if you do specify -s, you will get the additional data reported in the standard way in plot-formatted files.

Slabs and the rawp file

As of collectl V3.3.5, it is now possible to request slab data as well as process data be logged to a separate file as specified by the -G and --group switches. To read more about the mechanics of this see Logging and in particular the section on Grouping data into 2 files.

If you have indeed chosen to use this mechansim there are a few things that have changed in collectl's behavior:

If you do not use -G and write slab data to the same file, -sy will automatically put you in verbose mode as brief slab data is no longer reported. If you want to see brief slab usage, use -sm.
updated June 7, 2009
collectl-4.3.1/docs/Misc.html0000664000175000017500000000511413366602004014200 0ustar mjsmjs Importing Miscellaneous Data

Importing Miscellaneous Data

This module reports on several variables that are not collectled as part of collectl's core metrics, partly because some of them don't exactly fit into collectl's main stats, but some users still find useful. There are also a few instructive techniques used in this simple module that are worth calling out:

The following example shows one importing both hello.ph and misc.ph while displaying cpu data and running all at the same interval:

[root@cag-dl585-02 collectl]# collectl -sc --import hello:misc
#<--------CPU--------><-Hello-><------CMU Extras----->
#cpu sys inter  ctxsw   Total   UTim  MHz MT Huge Log
   0   0  1034    149     140     94 2197  1    0   4
   0   0  1010    138     230     94 2197  1    0   4
In this example we're just doing the 2 imports, setting the misc monitoring interval to 2 and exporting the data with lexpr. As you can see, the hello data is reported every interval but the misc data only every other one:
[root@cag-dl585-02 collectl]# collectl --import hello:misc,i=2 --export lexpr
sample.time 1239625280.001
hwtotals.val 140
misc.uptime 93
misc.cpuMHz 2197
misc.mounts 1
misc.logins 4
sample.time 1239625281.001
hwtotals.val 230
sample.time 1239625282.002
hwtotals.val 319
misc.uptime 93
misc.cpuMHz 2197
misc.mounts 1
misc.logins 4
sample.time 1239625283.002
hwtotals.val 410
sample.time 1239625284.002
hwtotals.val 500
misc.uptime 93
misc.cpuMHz 2197
misc.mounts 1
misc.logins 4
sample.time 1239625285.002
hwtotals.val 590
updated Feb 21, 2011
collectl-4.3.1/docs/Examples.html0000664000175000017500000003437313366602004015074 0ustar mjsmjs Collectl Examples

Examples

There are far too many combinations of switches and output formats so only a few of the more basic ones will be shown below. These examples show both the command and in most cases the resultant output. For more examples see both the FAQ and collectl man page after you install it.

Interactive Commands

The following examples show the results of running collectl interactively and seeing system performance numbers in real-time.

Default

Notice that in this mode you see one line per sampling interval. You are only limited by the width of your terminal window.
[root@poker]# collectl
#<-------CPU--------><-----------Disks-----------><-----------Network---------->
#cpu sys inter ctxsw KBRead  Reads  KBWrit Writes netKBi pkt-in  netKBo pkt-out
   0   0   134    30      0      0       0      0      0      1       0       1
   0   0   136    39      0      0     200      3      2     20       0       4
   0   0   130    30      0      0       0      0      0      1       0       0
   2   2   134    24      0      0       0      0      2     18       0       2

Default, but in verbose mode

In this mode you give up one line per interval and are rewarded with more details than could fit on a single line. This format always includes the date and time.
[root@poker]# collectl --verbose
### RECORD    1 >>> cag-dl380-01 <<< (1179493640.005) (Fri May 18 09:07:20 2007) ###

# CPU SUMMARY (INTR, CTXSW & PROC /sec)
# USER  NICE   SYS  WAIT   IRQ  SOFT STEAL  IDLE  INTR  CTXSW  PROC  RUNQ   RUN   AVG1  AVG5 AVG15
     1     0     0     0     0     0     0    98   354    501     1   100     1   0.41  0.12  0.04

# DISK SUMMARY (/sec)
#KBRead RMerged  Reads SizeKB  KBWrite WMerged Writes SizeKB
      0       0      0      0        0       0      0      0

# NETWORK SUMMARY (/sec)
# KBIn  PktIn SizeIn  MultI   CmpI  ErrIn  KBOut PktOut  SizeO   CmpO ErrOut
    17    121    149      0      0      0     15     88    175      0      0

Detail Data

If you would rather see details on specific instances, use the uppercase subsystem names with -s, so rather than the default of -scdn use -sCDN, which also forces --verbose, noting that you can also mix lower and uppercase subsystem types.
[root@poker]# collectl -sCDN
### RECORD    1 >>> cag-dl380-01 <<< (1179493735.005) (Fri May 18 09:08:55 2007) ###

# SINGLE CPU STATISTICS
#   CPU  USER NICE  SYS WAIT IRQ  SOFT STEAL IDLE
      0     0    0    0    0    0    0     0  100
      1     0    0    0    0    0    0     0   99

# DISK STATISTICS (/sec)
#          <---------reads---------><---------writes---------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
cciss/c0d0       0      0    0    0       0      0    0    0       0     0     0      0    0
cciss/c0d1       0      0    0    0       0      0    0    0       0     0     0      0    0
cciss/c0d2       0      0    0    0       0      0    0    0       0     0     0      0    0

# NETWORK STATISTICS (/sec)
#Num    Name   KBIn  PktIn SizeIn  MultI   CmpI  ErrIn  KBOut PktOut  SizeO   CmpO ErrOut
   0     lo:      0      0      0      0      0      0      0      0      0      0      0
   1   eth0:      0      2    207      0      0      0      0      0      0      0      0
   2   eth1:      0      2    207      0      0      0      0      0      0      0      0
   3   eth2:      1     20     72      0      0      0      0      4    122      0      0
   4   eth3:      0      0      0      0      0      0      0      0      0      0      0

Mixed Summary and Detail Data

For brevity we're only showing cpu and disk data. Note that we can show both cpu summary as well as detail while we're only showing disk details.
[root@poker]# collectl -scCD
### RECORD    1 >>> cag-dl380-01 <<< (1192729823.010) (Thu Oct 18 13:50:23 2007) ###

# CPU SUMMARY (INTR, CTXSW & PROC /sec)
# USER  NICE   SYS  WAIT   IRQ  SOFT STEAL  IDLE  INTR  CTXSW  PROC  RUNQ   RUN   AVG1  AVG5 AVG15
     0     0     0     0     0     0     0    99   135     30     0   145     0   0.01  0.01  0.00

# SINGLE CPU STATISTICS
#   CPU  USER NICE  SYS WAIT IRQ  SOFT STEAL IDLE
      0     0    0    0    0    0    0     0  100
      1     0    0    0    0    0    0     0   99

# DISK STATISTICS (/sec)
#          <---------reads---------><---------writes---------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
cciss/c0d0       0      0    0    0       0      0    0    0       0     0     0      0    0
cciss/c0d1       0      0    0    0       0      0    0    0       0     0     0      0    0
cciss/c0d2       0      0    0    0       0      0    0    0       0     0     0      0    0

Different Subsystems With Timestamps

[root@poker]# collectl -scft -oT
waiting for 1 second sample...
#         <--------CPU--------><------------TCP-------------><------NFS Totals------>
#Time     cpu sys inter  ctxsw PureAcks HPAcks   Loss FTrans   read  write meta comm
08:12:20    0   0  1007    120        1      0      0      0      0      0    0    0
08:12:21    1   1  1077    400        1      0      0      0      0      0    0    0

When you just don't know...

A great way to familiarize yourself with the types of data collectl can generate is to do (as of Version 2.6.0) --all --verbose and see all the summary data generated at once, excluding processes and slabs. Since this can be a lot to watch, especially as it scrolls off the screen between samples, the --home can be your friend. It will clear the screen between samples and remove extra lines to give an appearance of a continuously refreshing screen-based utility. Try it with different combinations of subsystems to reduce the amount of information displayed.
### RECORD    1 >>> hadesn1 <<< (1214918640.001) (Tue Jul  1 09:24:00 2008) ###
# CPU SUMMARY (INTR, CTXSW & PROC /sec)
# USER  NICE   SYS  WAIT   IRQ  SOFT STEAL  IDLE  INTR  CTXSW  PROC  RUNQ   RUN   AVG1  AVG5 AVG15
     0     0     0     0     0     0     0   100  1327    647     1   341     0   0.00  0.00  0.00
# INTERRUPT SUMMARY
#    Cpu0   Cpu1   Cpu2   Cpu3   Cpu4   Cpu5   Cpu6   Cpu7
      999      0    300      0      0      0     26      0
# DISK SUMMARY (/sec)
#KBRead RMerged  Reads SizeKB  KBWrite WMerged Writes SizeKB
      0       0      0      0        0       0      0      0
# NFS SUMMARY (/sec)
#<---------------------------server---------------------------><----------------client---------------->
# Reads Writes Meta Comm  UDP   TCP  TCPConn  BadAuth  BadClnt  Reads Writes Meta Comm Retrans  Authref
      0      0    0    0    0     0        0        0        0      0      0    0    0       0        0
# INODE SUMMARY
#    Dentries      File Handles    Inodes
# Number  Unused   Alloc   % Max   Number
   42532   39837     510    0.03    39011
# LUSTRE CLIENT SUMMARY
# KBRead  Reads  KBWrite Writes
       0      0        0      0
# MEMORY STATISTICS
#<------------------------Physical Memory-----------------------><-----------Swap----------><-Inactive->
#   TOTAL    USED    FREE    BUFF  CACHED    SLAB  MAPPED  COMMIT     TOTAL    USED    FREE     TOTAL     IN    OUT
   16053M   1276M  14776M 130816K 403696K 587884K  80992K 130684K    15308M       0  15308M   201028K      0      0
# NETWORK SUMMARY (/sec)
# KBIn  PktIn SizeIn  MultI   CmpI  ErrIn  KBOut PktOut  SizeO   CmpO ErrOut
     1     15    103      0      0      0      0      2    150      0      0
# SOCKET STATISTICS
#      <-------------Tcp------------->   Udp   Raw   <---Frag-->
#Used  Inuse Orphan    Tw  Alloc   Mem  Inuse Inuse  Inuse   Mem
   90      8      0     1     11     0      8     0      0     0
# TCP SUMMARY (/sec)
# PureAcks HPAcks   Loss FTrans
         0      1      0      0
# INFINIBAND SUMMARY (/sec)
#  KBIn   PktIn  SizeIn   KBOut  PktOut SizeOut  Errors
      0       0       0       0       0       0       0

Record Mode

This mode is often used when running a test of a limited duration from a couple of minutes to several hours or more and since collectl is not being run as a daemon, the default sampling rate is 1 second.

Collect 100 Samples and Exit

[root@poker]# collectl -c100 -f/tmp

Run Until Terminated With ^C

[root@poker]# collectl -f/tmp

Playback Mode

There are a couple of things to remember about playback:

Playback Between 2 Time Periods

In this example we're not selecting any device details and so the output defaults to brief mode and all data is printing on the same line. We've also chosen to display time in msecs and use --from to specify both times.
collectl -scdn -p /var/log/collectl/cag-dl380-01-20070830-082013.raw.gz --from 08:29-08:30 -oTm
#             <--------CPU--------><-----------Disks-----------><-----------Network---------->
#Time         cpu sys inter  ctxsw KBRead  Reads  KBWrit Writes netKBi pkt-in  netKBo pkt-out
08:29:00.012    0   0   135     38      0      0       2      0      0     11       0       2
08:29:10.012    2   0   142    142      0      0     142      2      1     14       1       5
08:29:20.012    1   0   138     45      0      0      33      1      1     14       0       3
08:29:30.012    0   0   135     52      0      0       5      0      1     11       0       3
08:29:40.012    0   0   136     44      0      0      21      0      1     11       0       3
08:29:50.012    1   0   177    123     14      2     385     38      1     13       1       4

Same Data File, Shorter Interval, Disk Details

As expected this defaults to --verbose mode. Also notice we left off the leading 0 in the from time.
collectl -sD -p /var/log/collectl/cag-dl380-01-20070830-082013.raw.gz -from 8:29 --thru 08:29:10 -oD
# DISK STATISTICS (/sec)
#                   <---------reads---------><---------writes---------><--------averages--------> Pct
#         Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
08:29:00 c0d0             0      0    0    0       2      0    0    6       6     0     6      6    0
08:29:00 c0d1             0      0    0    0       0      0    0    0       0     0     0      0    0
08:29:00 c0d2             0      0    0    0       0      0    0    0       0     0     0      0    0

Same Data File, Different Data

Notice here that we're using -oD instead of -T and so both the date and time are displayed. We could have also chosen to force the time in msecs but chose not to in order to save screen real estate.
collectl -sms -p /var/log/collectl/cag-dl380-01-20070830-082013.raw.gz --from 08:29-08:30 -oD
#                  <-----------Memory----------><------Sockets----->
#Date    Time      free buff cach inac slab  map   Tcp  Udp  Raw Frag
20070830 08:29:00   64M 529M   2G 483M    0    0    89   17    0    0
20070830 08:29:10   64M 529M   2G 483M    0    0    89   17    0    0
20070830 08:29:20   64M 529M   2G 483M    0    0    89   17    0    0
20070830 08:29:30   64M 529M   2G 483M    0    0    89   17    0    0
20070830 08:29:40   64M 529M   2G 483M    0    0    89   17    0    0
20070830 08:29:50   64M 529M   2G 483M    0    0    89   17    0    0

Same As Last Command But Display In Plot Format

This format is rarely used, but nothing prevents one from doing so.
[root@poker]# collectl -P -sms -p /var/log/collectl/cag-dl380-01-20070830-082013.raw.gz --from 08:29-08:30
#Date Time [MEM]Tot [MEM]Used [MEM]Free [MEM]Shared [MEM]Buf [MEM]Cached [MEM]Slab [MEM]Map [MEM]Commit [MEM]SwapTot [MEM]SwapUsed
[MEM]SwapFree [MEM]Dirty [MEM]Clean [MEM]Laundry [MEM]Inactive [MEM]PageIn [MEM]PageOut [SOCK]Used [SOCK]Tcp [SOCK]Orph [SOCK]Tw  [SOCK]Alloc
[SOCK]Mem [SOCK]Udp [SOCK]Raw [SOCK]Frag [SOCK]FragMem
20070830 08:29:00 3098632 3032208 66424 0 542188 2258744 0 0 0 2044056 36596 2007460 336 62176 432280 494792 0 2 89 39 0 0 39 2 17 0 0 0
20070830 08:29:10 3098632 3032412 66220 0 542244 2258744 0 0 0 2044056 36596 2007460 1208 62176 431404 494788 0 142 89 39 0 0 39 1 17 0 0 0
20070830 08:29:20 3098632 3032440 66192 0 542276 2258776 0 0 0 2044056 36596 2007460 1412 62176 431232 494820 0 34 89 39 0 0 39 2 17 0 0 0
20070830 08:29:30 3098632 3032464 66168 0 542292 2258776 0 0 0 2044056 36596 2007460 1412 62176 431232 494820 0 5 89 39 0 0 39 2 17 0 0 0
20070830 08:29:40 3098632 3032464 66168 0 542300 2258776 0 0 0 2044056 36596 2007460 1584 62176 431060 494820 0 22 89 39 0 0 39 2 17 0 0 0
20070830 08:29:50 3098632 3033020 65612 0 542532 2258700 0 0 0 2044056 36596 2007460 2504 61628 430748 494880 14 386 89 39 0 0 39 1 17 0 0 0

Playback and Convert To Non-Compressed Plot File

There's really not much to see here since the only output this command will produce is error messages.
[root@poker]# collectl -p /var/log/collectl/cag-dl380-01-20070830-082013.raw.gz -P -f /tmp -oz
updated Feb 18, 2009
collectl-4.3.1/docs/OpenStack.html0000664000175000017500000002432213366602004015176 0ustar mjsmjs OpenStack Support

OpenStack Support

For the most part, the hardware configuration is static and so is collectl. When you boot, it discovers disks, networks and CPUs and that configuration doesn't change until you shutdown, reconfigure and reboot. Clouds and virtual machines have turned this whole notion on its head, something that was bound to happen anyway. But it wasn't until several years ago when collectl started to be used in OpenStack environments that this design restriction became significant and needed to be changed. Since 2012 collectl has had the notion of dynamic Disks and Networks embedded in its core. Dynamic CPUs were added even earlier. Collectl has been heavily tested in OpenStack clouds and is as rock solid as ever.

But what about cloud-specific subsystems? Once collectl was able to deal with dynamic devices, additional capabilities were added to deal with new cloud-specific subsystems as well. While not everything in an OpenStack cloud can be monitored one has to start somewhere and collectl has chosen to focus on the following:

Nova

In the case of Nova, almost all the data one needs to report on what a VM is doing is already being collected. Specifically, a VM is using a CPU, a Disk and a Network so if one can tell which host resources they corresond to, one can then associate their instance data with the VM and report something that looks like standard collectl process data like this:

# PROCESS SUMMARY (counters are /sec)
# PID  THRD S   VSZ   RSS CP  SysT  UsrT Pct  N   AccumTim DskI DskO NetI NetO Instance
15622     1 S    5G  562M  8  0.00  0.00   0  1   07:26.72    0    0    0    0 0094eed9
32738     4 S    6G  632M  5  0.01  0.00   0  2   01:11:25    0    0    0    0 0093c0ef
36432     2 S    4G  944M  4  0.32  0.41  73  1   13:24:35    0   16  445  445 009570b9
36841     1 S    4G  935M  7  0.24  0.32  56  1   12:31:27    0    0  445  445 009570bb
The magic that makes this all work comes from 3 places. First, the actual command that starts the VM contains the CPU number, the MAC address of the virtual network and even the instance ID. The process data for the VM also contains runtime information as well as memory usage, all of which are available if runs the command collectl -scnZ -i:1, which is exactly what the export module named vmsum does under the covers.

The second thing is one needs to figure out how to map the network MAC address to an actual network name and for that a second plugin, this time an import module called vnet has been developed. It doesn't generate any output as do other import modules but rather loads the required data structures that vmsum needs to find the network virtual device.

Finally, the disk stats come from the process data, but are only available when run as root, so the ultimate command one needs to run to see the above output is the following, noting you will be warned to use sudo if are not root.

sudo collectl --import vnet --export vmsum
Be sure to try displaying VM output across a cluster with colmux. You'll never look at your cluster the same way again.

Swift

Getting swift data is slightly more complicated because it doen't report statistics in am easy-to-use form. Rather its standard mechanism is to use statsd, which requires a statsd listener. Further, since swift can only send to one listener, you can't have multiple consumer's of the data, which is why statstee was developed. It is based on the philosphy of the unix tee command in that it can sit between the source and destination and record data locally, in this case to a file that looks like a /proc data structure.

For example, when you install/run statstee, it creates a file like the following which is updated with rolling counters every tenth of a second (as long as something changes). This means anyone can read that file as often as they choose and simply report the differences between samples as rates, just like collectl already does for all the other data it reports.

cat /var/log/swift/swift-stats 
V1.0 1425398070.323784
#       errs pass fail
accaudt 0 2784 0
#       errs cfail cdel cremain cposs_remain ofail odel oremain oposs_remain
accreap 0 0 0 0 0 0 0 0 0
#       diff diff_cap nochg hasmat rsync rem_merge attmpt fail remov succ
accrepl 0 0 167004 0 0 0 167004 0 0 167004
#       put get post del head repl errs
accsrvr 153 56960 0 0 57398 175140 0
#       errs pass fail
conaudt 0 4770 0
#       diff diff_cap nochg hasmat rsync rem_merge attmpt fail remov succ
conrepl 74 0 551811 0 0 0 1306955 0 0 551885
#       put get post del head repl errs
consrvr 16884 104 0 7203 630 616300 11
#       skip fail sync del put
consync 0 0 0 0 0
#       succ fail no_chg
conupdt 43 0 130683
#       quar errs
objaudt 0 0
#       obj errs
objexpr 0 0
#       part_del part_upd suff_hashes suff_sync
objrepl 0 73646775 7514 0
#       put get post del head repl errs quar async_pend putcount puttime
objsrvr 16771 3819 0 7189 243 1615248 0 0 49 16614 17031.689711
#       errs quar succ fail unlk
objupdt 0 0 49 0 49
#       put get post del head copy opt bad_meth errs handoff handoff_all timout discon status
prxyacc 0 0 0 0 716 0 0 0 0 0 0 0 0 204:716
prxycon 37 195 0 19 1051 0 0 0 0 0 0 0 0 200:195 201:4 202:33 204:1059 409:11
prxyobj 12560 8155 0 7099 533 0 0 0 0 0 0 0 0 200:8681 201:12560 204:7099 404:7
To make this data available to collectl, one simply imports the statsd plugin. As it turns out, there is a LOT of data that is provided by swift and in fact too much to display in a meaningful way. Therefore one should get in the habit of running statsd with the help option like this:
collectl --import statsd,h

usage: statsd, switches...
  d=mask  debug mask, see header for details
  h       print this help test
  f file  reads stats from specified file
  r       include return codes with proxy stats
  s       server: a, c, o and/or p
  t       data type to report, noting from the following
          that not all servers report all types

           t  name             servers
           a  auditor       acc  con  obj
           x  expirer                 obj
           p  reaper        acc   
           r  replicator    acc  con  obj
           s  server        acc  con  obj
           y  sync               con
           u  updater            con  obj

  p	   proxies require their own service type
	   a  account service
           c  container service
           o  object service

  v        show version and default settings
  xx       2 char specific types built from -s, -t and -p

  NOTE = setting s, t or p to * selects everything
This can be a bit of a mouthful so perhaps an example will help. Also try to get into the habit of NOT using -s, -t or -p as those may eventually go away as I'm not sure they're really useful. Looking at the matrix above, think of the types of data you'd like to look at for which types of servers. So say you want to look at object server server data and container server replicator data at the same time. These translate to os and cr so you'd run the following, noting since this is standard collectl you can even include time stamps:
collectl --import statsd,os,cr
waiting for 1 second sample...
#                       Container                                                Object                        
#<----------------------Replicator----------------------><-----------------------Server----------------------->
# Diff DCap Nochg Hasm Rsync RMerg Atmpt Fail Remov Succ   Put  Get Post Dele Head Repl Errs Quar Asyn PutTime 
     0    0    0     0    0     0     0     0    0     0     0    0    0    0    0    0    0    0    0   0.000
     0    0    0    17    0     0     0    60    0     0     0    0    0    0    0    0    0    0    0   0.000
You can try various combinations of servers and services. You can also use this plugin with colmux or even generate output in plot format as there are now also a number of custom plots available for swift, many of which are included when you select All Summary Plots. See the help next to the Plots by Name for more information.

Neutron

There is currently one big issue with neutron and that is whenever you create a new VM, you get a new tap and so overtime, the number of these devices continue to grow. If you are generating device specific files, you will see the number of networks collectl is tracking includes ALL networks that have ever existing since collectl started. This in turn means the columns in the net file will continue to grow uncontrolled. If you have a set number of VMs that aren't changing, this may be ok but in most cases is won't be. While there is currently no good solution for how to deal with this, collectl does have a new options for --netopts, specifically o, which tells collectl that whenever there is a change in the network configuration, drop any unused networks from the current list. This means you'll end up breaking the column alignment in the detail file but at least it won't grow uncontrollably. Since the names of the networks ARE retained in the line items, you can still see what's happening but you won't be able to get a consistent view with colplot.

A second situation is the shear volume of virtual network devices that one can have in a large cluster and trying to collect data for them on the neutron nodes themselves can easily involve monitoring thousands of devices which may start to consume more CPU cycles than you wish to use. If this becomes the case, consider using --rawnetfilt which tells collectl to not even collect data on the specified network(s).

updated March 9, 2015
collectl-4.3.1/docs/Plotfiles.html0000664000175000017500000003030713366602004015250 0ustar mjsmjs collectl - Plot Files

Plot Files

One of collectl's main features is its ability to generate output in a ready-to-plot format and write that data to one or more files which are compatible with what gnuplot expects and there are actually 2 main types of files that it generates. The first, which has an extension of tab, represents a table of all the summary data. What makes this file unique is that all data elements are in a fixed set of columns - some columns may get added over time, but for all intents and purposes, the set of data for say CPUs do not change regardless of how many CPUs are in the system. The second type of files deal with detail data, the amount of which changes with the number of instances so a 4 CPU system will have 1/2 the data an 8 CPU system has. There is one file for each type of detail data and like raw files you tell collectl where to put the plot files with -f.

Plot files can be generated in 2 ways and each has its own advantages as well as disadvantages.

Caution - though one can leave off -f and have collectl write its plot formatted output to the terminal, this should be avoided unless there is a specific need. In fact, since multiple detail files from multiple systems can actually contain different numbers of columns, collectl explcitly only allows generating detail plot data from a single raw file.

At first glance, it sounds like you'd always want to generate plot files directly since you avoid the need for the conversion step, but you should also realize a few things about this methodology:

Generating Plot Files On-The-Fly

While generating files this way is as easy as appending -P to the collectl command either when run interactively or in /etc/collectl.conf, there are a couple of things to keep in mind:

Generating Plot Files from RAW Files

Collectl has the capability to play back a single file or multiple once but in either case the first thing collectl does is examine the raw file header to get the source host name and creation date. There will always be a new set of data generated for each unique combination of host and creation date. Note that depending on the subsystems chosen there may be multiple output files generated. This also means a single raw file that spans multiple dates will result in a single set of data.

By default, the name of the plot file contains only the date and a test is made to see if a file with that name already exists. If not, it is created in append mode. This means that multiple raw data files for the same host on the same date will result in a single set of data. However, if that file already exists, collectl will NOT process any data, and request you specify -oc to tell it to perform the first open in create mode so that subsequent files can be appended. If you specify -oa all files will be appended to the original one which may not be what you want. Collectl cannot read your mind so to be safe, be explicit. If you want to generate a unique set of data files for each raw file use -ou which causes the time to be included in file names, resulting in a unique output file name for each raw file.

This certainly maximizes your flexibility for all the reasons listed earlier. However, this now puts the responsibility of managing your data more squarely on your shoulders. Some of the questions you need to answer include:

Having answered these questions and perhaps others, it now just becomes a matter of executing the appropriate copy and/or collectl commands, which can be relatively easily scripted.

TIP - If you rsync raw files to another server and then process them using a wildcard in your playback command, you will probably end up processing some of today's files too! If you then later copy over the rest of today's file(s) you will need to recreate today's plot file since collectl will not overwrite an exiting file by default. But if you specify the -oc switch with a wild card you will end up recreating all the plot files which will result in a lot more processing than you were planning on. Collectl supports a special syntax that allows you to playback just the files from yesterday by replacing that string with yesterday's date as in the following:

collectl -p "YESTERDAY*" etc...
noting that all uppercase characters are required and you can include other characters in the string such as a host name if need be.

TIP - If you want to create multiple sets of plot files from the same raw file, you can always include a unique qualifier along with the directory name with the -f switch to give each set a different prefix.

Daily vs Unique Plot Files

Collectl raw files are created every time a new instance of collectl is run or whenever collectl is instructed to create new one via -r such as when running as a Daemon. This is why each file name include a time as well as a date. However trying to plot multiple files for any given day can be problematic even for an automation script that might help generate plots for you and so by default collectl creates non-timestamped, daily plot files.

Whether you choose to create plot files on-the-fly or manually (by playing back existing files), if you've not instructed collectl to do anything with unique file, it will simply append new data onto an existing file. One the other hand if you explicitly ask for unique files, whenever a new raw is processed or a new instance of collect is run, a new plot file will be created that includes a corresponding timestamp.

The obvious question then becomes, why would you ever choose to create unique files when they're such a pain to plot? and there are actually several good reasons you might choose to do so:

Dealing with configuration changes
Consider the following situation: you run collectl twice during the same day with the following commands:
collectl -scd  -P -f/tmp
collectl -scdm -P -f/tmp
or perhaps you even generated raw files first and later play them back, converting them to plottable files. In either case you now want to plot the data. Since it's all in the same file, the headers that are initially written will only tell you there is cpu and disk data in the file! Collectl will have written a second set of headers in the file at the time the second instance was run, but do you really want to have to make a pass through the whole file every time you want to generate a plot looking for additional data? Furthermore, there will be less columns of data in the first part of the file that the latter, a condition that will probably cause most plotting packages to blow up, so you really need unique files.

Another situation that can cause this is when dealing with detail data for dynamic subsystems such as Lustre and disks. Dynamic change detection for Lustre has always been a part collectl but support for dynamic disk configuration changes has been added to collectl V3.3.4. When a configuration change is detected, it forces collectl to create a new file, whether generating data in raw or plot format. Furthermore, if you later try to combine disk data from multiple raw files into a single plot file that has disk configuration change data in it, collectl won't let you unless you specific -ou forcing the generation of unique files. Lustre changes can be combined into a single plot file by adding in any missing columns and 0-filling them.

Caution
If you are generating non-unique plot files on-the-fly and a configuration change occurs, that new data will simply be appended to the single file. There is nothing collectl can do about this, because it wants to keep all files in a consistent name format and will not switch to unique name formatting without explicitly being told to do so.

Configuration changes do not happen often and this is only an issue when generating plot files in real-time. Furthermore, since this only effects detail, because the summary data will always accurately reflect the sum of the instance data, this typically will not effect anyone but is being stated for completeness.

If after all this you choose to generate real-time, non-unique detailed plot files and find yourself in a situation where you should have, you can always write a script to split the plot files back into individual ones since there is sufficient data in the internal headers to do so. If you have multiple unique files and find single files easier to deal with, you can also choose to write a post-processing script that merges these into a single file with zero-filled columns where there is missing data.
updated June 9, 2017
collectl-4.3.1/docs/ColmuxCPU.jpg0000664000175000017500000020607713366602004014753 0ustar mjsmjsÿØÿàJFIF``ÿÛC    $.' ",#(7),01444'9=82<.342ÿÛC  2!!22222222222222222222222222222222222222222222222222ÿÀ–›"ÿÄ ÿĵ}!1AQa"q2‘¡#B±ÁRÑð$3br‚ %&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyzƒ„…†‡ˆ‰Š’“”•–—˜™š¢£¤¥¦§¨©ª²³´µ¶·¸¹ºÂÃÄÅÆÇÈÉÊÒÓÔÕÖרÙÚáâãäåæçèéêñòóôõö÷øùúÿÄ ÿĵw!1AQaq"2B‘¡±Á #3RðbrÑ $4á%ñ&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz‚ƒ„…†‡ˆ‰Š’“”•–—˜™š¢£¤¥¦§¨©ª²³´µ¶·¸¹ºÂÃÄÅÆÇÈÉÊÒÓÔÕÖרÙÚâãäåæçèéêòóôõö÷øùúÿÚ ?áhÅ(¥ÅrØB(Å.(˜„Ç4cšZ*@;ÑÏ¥/ãF)°Ž´½h¢à'CF)@ïš(ÆqF(Å-&9£¥£0 (¥ü)…ÄÅ \qJF$QšpF=?…H¶Ó?݉ÉÿvŸ+b¹'JºšMô˜Ûi)ÿ€Õ”ðæ©'ü»0úÓäateQ[ƒÂ÷ØÌcý©§Ç3j¶qÿÀóO•ŠèÀÅ.+léÚ,\Í®ÂqýÅÍ1›ÂÐýýNy}•(öl\ÈÆ£ªuo Äx†òcîqL>'ðô_ê´Vw’Ÿ³c7éFÒz ý+DxÊ×p[} ØÀÈÍiū뒨xôû+e= QCJ;—J§ÂŽ}m§» ôSS¦“'ݳœÿÀ I©x»]³¸òLñ)?,x¬©|Y­ËÖúAôâª0RÕ6àì͘ü9«IÒÉ×ݰ*aá]K8†?÷¥ËI­j’ýûéÎÛªíuu'/q+}XÕû'ØŽs²>t›Q²ØÉšiÑ´Ø¿×k–ÃýÑšâY˜õr~¦›´·@OÒ©Q—a{EÜìeƒÃÑŒe‰õXóUí¬`Ôo…¶v²©2Iòùšä[ŽÕ#h Ë×ýò+„ñ9é¢Ýß#ühÿ„Ŧ‹uùñ«ú½C›Ûù݉:8þ¯ûäSáeéø.!\Gü ^(<bÜÿã¿ãM>ñGý®?5ÿW¨ÞÎßþn‘ýËŸÈSOÄÝ3²Üø®øÄã'û`?Þ_ñ®zæ mgx'B’¡!”ö52¡8«²£R2Ùž¹Ä>IZeˆ™ŠjoøXÚ0ÿ—©¿ïÕxôgýN{Šl1É<É J^IUQÉ'°¨Q} lö?øXÚ?üýMÿ~¨4sÇÛ$ÿ¿5çcÀÞ(#þ@W¿÷îÿ'Šè}ÿ~ëOc3?m硈z>qöÖÿ¿4ïøXZ?üþ·ýù¯:ÿ„Å?ô¾ÿ¿tŸð‚ø£þ€7ß÷êc0öÔûž‹ÿ H?òúïÑ£þ‘ÿ?¿ùלÿ âpä}ÿ~5¼âpäÿ~MÆaí¡ÜôñI?òú?ïÙ£þÝ+þWþýšóoøB¼Lü€¯ÿïɦÿÂâP9ÐïÿïÃSöSmç¦Â{¥ù}Oûöi?á=Ò¿çù?ïÙ¯2ÿ„?Äc®‰ÿ~øKÄ#þ`—ÿ÷á¿Â—±˜{hw=;þÍ/þ#ÿ¾ ðži™ÿÈ¿ï“^^|)âúßÿ߆ÿ ið¾¾:è×ÿøßáOØÌ=´;ž£ÿ Þ›ÿ?pþFã½7þ~áüygü#:àë¤_ûwoð¦ÿÂ9­ù„ßà;…/e0öÐîz°ñÖ›ÿ?Q~F”xçK'›¨¿ZòoøGµŸú_߆ÿ iÐ5úÞçþ¸7øRö3ìÚÏ\ÿ„ãJÏü}ÅAñÆ“×íQW‘ Vï¦^û`ßáL:&ª:é·cþØ7øQìgØ~ÖÏ`ÿ„ÛJíuçAñ®—ÿ?Q~uã§GÔ‡üÃîÿïÃ…4é:ˆ?ñãuÿ~[ü(öSì?iç¢ø†óFñÌÔV=6šf™q£é¶¾DzŠ8Îrد;þÌ¿l®qÿ\›ü)§N½òéqÿ~›ü)JŒ¤¬Ð*‘îz—ö®–z_EùÕK».ìu†=ëÍÍ…àÿ—YÿïÛ…4Ù]ϼÃþÙš…‡·AûTú¬ú^“;†þÕŒ{qQ>…¦HxÕ¢•q¦ÒäõçýÃIökÿ,¥ÿ¾M_²’è'8¾§Z|5`OˬEù OøEìÏMbøW%öyÇü³ÀMkè­ ¾Ô‡ÛržiN3н)(N\­ØÔ>·=5X*oü"±gR KûÝ>KWXT‡ÇZå Ë“÷ÿZ˜)ÉkëS§Mé+_ü"ƒ¶£üi‡Âòv¿€þ5Ë~ôwqøšËý÷üëNIv0æs§>Ÿ<]Ãÿ}SO†n?çêιŸ6oùèß'7A#þt¹`¼{7ü#W]®"?ð*ˆørü¢A\ïÚ'ÿž¯ÿ}R‹™ñþ¹ÿïªVò+CÐ?²î1ÈQÿvßBF¬×qFOfÍd‡F#l’?û±±­ËKÉ’%Dœ Ç©?©5ŒaÜí­*\¶†ãâðÔR0_íò{5lx>,svß‚õécmIùTºâÝGõ­[Cu<»ç»…R=ªœQÉvp—°-­üÖ¡÷˜Î3Œf é]þˆ“jSÝÉ}kÉ•¤XéÚD_ëµ›qþîOô¬ùjZ¸çŠ1Z­/†!?>§$˜þêTgZð¬]æSø j›1œ.Þxlø·A‹ýV•#Ÿöž¢oÀŸê4‹qþöOõ¡SaΆ,R7 Œ~‚¤[—û°?ýóP?Äþ‘ZÚÆ=£]üu®I· ƒý•«Ù œØ‹DÔdû¶¯íÅY_ jde¢?ÚlW%/Šu©³¾þo 5NMNþ_õ—R·üÕª^Bç;ÏøFeOõ·vÑýd‡HÓâͬ[/°9¯>2ÎÿzG?V4›XõÉ«TèK©n§~cðìÙ íÈ e'Nçæ&ªØ€¶Úô™A÷£•-†¢ŽríÚK‡g?1cž=é©ÿr¼)nÆÛ©—ÒF©¢>m&Ǩ¯"¢Õž„v SÏ5ÞZxÆôigÿ„{I-G/½y>üW3šÔ´çú˜=DŸÕ«£ &Œ«A5©ÝÆ·Ó[J;_´+y’v†ÇëKiâó;é£ûIQv\6-ÆWií\Ëdk2|ß{KÏ?õÈRiÌBøy†N&•qœwÿë×zzœ¼ˆèOŽ%]>òåt}(4¬@}œ`ƒžJüe:ß\Û®›¦m†×Ïìã$í­qÒŸø”k*@»N«’nmjó“¦wÿ®kJà飢‡Æ÷nÚH6pûcø¶/ÍŽ*ñö¢ºuÕÀ³ÓƒÃp"QöaŒ`þ¼W9ož#ïyÌ?ò%W”cGÕG\^¯?÷Õ ‡³‰ÙËãI/¯!X,BÃmæ©û8É8òæ™޵YJR–j.óæn¼|Äq\åÇÍ«êï§ì‹QYçxp€s½ÿ¿†‹ÙÄèÄhisܲŽàD?Ñצ þ•fëÞj)ƒm¼b~ásŸ—¯ç\dƒÿ/—÷ËUû¡ÿ]\näÙg?‚Ò¸rDê`ñ¦±% —0b"_½Ïñª«ã­{û2+–¹„ºò‰òTaGZƳP—ú”ñìÜqœóU¡Û“üZ9­ ä‰ÕÏãmv95u[¤ÕCD<µÎ=ýiañž¶÷¶µà)=±Š77=+ž»¿ñ•-·Í«iYR?ÐÊ{ph{‡"4×Ç:ñÓ-n>Ü7½ÉŠOËŠšçÆzôcX ~Û­Y|°Tp Çã\¼1©Ñ´å*H’üî¼â¬]³ÄnG͹F{ã4˜ù"tPøÃ[{ëX¿´_dÖ~hÊ€Yð«Eã}}ìtùŽ¢û¥¹1JHÇøÖl VËr…dÓrè§§çU¬†Û ó^1à{ÿ:•SøÏÄ5ŒjR³:ˆúp b¤_kÆþ(Ž£.Ö±óHÈûÛsŸÎ¹Ë‘˜|Aè%OÏy«Äâ 3ÿ ¼ÿä:—¸ù#cJxíô¶mNrg¸där2)²øÃ_š£RàgTŒçî‚OøV5¨?eÐFzÝ7ð!LœìÝdúÝ(ñjW°ùÒIâ­pj71 JãbYyª7q»h9ýj;kŽÚ0mNàý¡˜K–ûß6+.b­óù sþÊÓ,Áó<6¾¥þmCúUÛÏøýñ¤*:ûŠh\¨ÖƒÄºÓßi1¶¥rVkrò ÿxàõüª+?kRÛiìú•ÉioZ6%Ï+ÇŸo…Ôô`z­‰?^£ÓW6º7½óŸåBÞÁÊ»>$ÖV=`®¥r<™cß—“ÒžÞ Õÿ´Z?í‚¢ÃÌÆó÷¶ƒšÆ»ÿ]täÇÂ3Vd_ø›\€W4è"•õ*,Ûkú¼ŸØ¡µ“ç»ȯæç½gËâ=i`¼a©Ý³\JxñEšæ_/¨súšÈ—þ<î½î#ñ¨œŠŒUÎ…uíU¿´³¨].ÝJþðü§Žjµ¶³¬\ÜY ÔnŽè™ß2žq𩼩ՇcŽŸJ]4Ÿ¶[vÛfç§±¤¥vr®Åèõ}Iìô¶ké‹K;&CÈÏC\Þ¬ÛµÛ9ËžI÷­Ë|´E GÌǧOþµsú‡üM“Îóü뼊\‘¾Ã¶†üž,×Zší9ü¨ìüÜgø±KmâÍyåÒQµ9ÏŸ <¼ýî+ïkWa’Ë¥MnÔtµþå“7éB„o°4¬\“Æ~!]"9×S˜H÷f0sü9éO¹ñˆ"}`.©6-•|¾œX»´})1þ²ð·ëM¼Éלg&dAÏÒ¥Â7Øj(é—Æ þÑŠ©Ë·ì~kŒ½QÛx×ÄRG¤gR“uË·˜p9±ej×mÎcÓÇ;½ª+4ĺó ¿Z\‘ì¨ÛoxŒYI(Ô¤ÝöÏ%2£îæ­Ýø×ÄÛꎺƒfܨÁÅr‹½ô»$p{~XsÛ5>£Æ—«Çæº 3øP¡ÁÊŽ‰|q¯ÅûDìû'šß(á©¶¾;ñŸÙa¯ŽëÞgÈ9¹ù~]Býƒ©ÙbGÒ‹Q‹½Hû–ÌÇJ\‘ì>TnÿÂ}â/²,¿nk¿$eÝÍIqñ_Œj[/ý¨(;×/ݧi«ÿ=/YŽ~¦™+ ï°{Å^>¢—,{•“øï^[¹bûJíŽÔJ~A÷©x÷Ä&œ­sïâgÝŽÕÎ]Ê‚ma‚V‘Ðc¥=Yú³lK/û&«’=…ÊÈþ!kÆÖÖC49–àÆv>îMI?Ä v+IåßVq,Œ÷®Z¾- –g ôÎ:Ó¦Ëi°)Æd½Ïþ=K–=ƒ•T¿5¤šõC[•‚5aû±ÔÔíã­`\Pë\lª¾n¦À) *'_¥\¸ O¨¶~ä ´Ô#Ø—t±xçV‘­¥¶fŒ»~èqLO꯱òm –R¿êGOZçÐlº„Ï;Bj+aƘ§ÑžŸ${ •#xïRJßg³;fòÆbÒÍã‹ôûWú%‘ãna“\ºöðú½Ñ?­e¢¼?ßœ/ò§Éò:†ñ­è–E66$[ó䎴Ôñ¥Ó¼*Ú~ŸûÈËŸÜŽ+›¸8{ö¤j‚‚\tÿWmK’=ƒ”èWÆ“ºÛ“¥éß½$©E/äH<Ïì}5‰—ËÉ×?á¬Ç÷cfª²smf½ÞbƇmA Þ9*gØzi1à Ä9¨¥ñ’fUmM%wú¡\¬ÿ8±Ó€åMœ’n‰Îp"³fŠœJ~)Ô¢Ô¦ŽT²‚ÕŠò±.s|ŠÒÕø™W®¬ì×™]Þ£¹ßM%; jG>TVñÿ»ªÒøç\`]•ÿtbµ,þj÷PE;OiR(e//PkF?…±Â¼ñŸö$Ö‹>¦/qòx—XŸï_LàFª>¥}//s)ÿôQà Ûº¥ÏŠQä?Á ÿ:–-á­¼ 4Ú¥ìê‡ °wüRÁÛ©/º#Ì “?Þw?SI±­zÂÏðÒÖx¢I½¹y@(]ø9õæˆüaàèb™ìü0œÀç'õk »2^&]å" ü¨O°j"þ~"´ÿÝŒšö&ñ¼VšÍ–Ÿk é¥Ê+ŒŒöªâf¿&›©ÍÚ@ör!7`õ­Ëñ3x‰½‘çöþ ñÏúõ³ßÊ#ùÖ­¿Â¿ÜcK§ýtu_ë]M玩rÒÛÊ»I”åFü*¾­nˆ^Ú£&‹àçˆ3½Œ¾ùÇ•Z R/{â].x†Î+,Þ\>¯áÙd¹•ÄÑ(`\œÄË ßØºÜGqòn¹ê0ÄUûêÂæ©Üìáß…!™!¹ñj¼®7*C%‡¨ h? ­míõ=Fîq4k¸ô+Ÿ‡#_ðô§Ÿ:ÙûòV³‘1á­Z20ÐÝÆØôå–«Ø¥Ô^÷Vw(Ÿí¯¾É‹yq(‹Í̲ðWnï_Jgü%>¶Ób¾²ðm±G˜ÂÏ’3“Ö¹¸pÞ)ÒÎx¸±@H¨WúVB‚|"üÐß/á•#úQìâ—êz,ž;—Ú§‡ô¨ ¤Ta;°þµY¾&kæßJš?²À—r²IåÂ0À`zpk›“÷ž(» O¦ä ãþXÿõ«/pÿ„oI›¼w޹üÓöpì5uw^:ñ,ëŠÚ¬¨Ö®[öÿ*®ºþ¯s¬è‹.¥tésl ¯šp[æ#׊κE§ŠâUL- Ù•ªi¼ð¬£¯Ü<úJÆšŒWAò¢¼—wx^W–âg‘/€;œ“Êžÿ…^¸Ìºþ°¤ñ-o_ùf­Yì èZÔXÿW{õÇÌ´²$ñ,`dyÚP äù'ü)2’FN›&Ù4y?»vGꆦ™â_¯GÉòîѹÿy‡õª¶ÊÒ@~åàü2øV½êÿ¤xª59Ãúp$ãR¶)­I¢Ëø…Á3éxÿ®'ü*¤F‘ ËÚ;×_Õ [³$ëú&K zœ8¬èòª½<½TÔÒÚüº&’ßóËUeúRß‘öOBÌ—‰.=·c?­ DúˆÿJñBúÁÿ*Týæ¥`ƒ.Šx#žÿZ–ø©kÀšõàQmójzIëæi.¿¡ d:yÌÞoXeOçU-F,4cýÍE×õ«i;<.ޒʇóªëòé¶¿ìjÄ~µ,hçµ!·P¹ôÙÿô#QAÍ´øöþucX\j÷‹é3ÿ:¯mþ¢àvÀþuâÕøš;á²"Ç"´¬Ft-TsÇ”xÿxÖmié¿òÕǬqŸü~¶ÃüDTØÕ\lÂO;´¯âÿ®f™¦7î4éw(ÏåR@Yµ;´Üyÿ–mPi‡ýDù¹Î1ÿ|× ·9žÄsø—ë£ÒåÓæjºÊ[[~‡v•Ÿü„*¤ê~Ïâ%鉟œÕØÁ:Ô=÷é=‡ý24Nß&×ÃÍÿO.1ÿãf¶£Œ^'‹S­ñý› ¶?åíÆàKIu‘g¯/Aö¥ÎÞjC-Iαv'v™ÿ´Ö¡³>lóæ°ÿÈ•;q­Hq×KÏþC§ú¯úxqŸø¡ +Ê?âK©r?ãõ“UûŒk꼎tÿý•j”Ø>°=/Wÿf«Ó`êú—ýƒzÿÀ„öœê:þ]Oò5IxÐ-ýµ/ëWmú‡NzÛ7ò5DÈ>œjU¡%ËÁûÿ¸•,òÒ¸ÿ—ÿ Ôwƒý#ÄŸõÍ*Kù é?òà{ûP÷B”òÒùãíçùÔ×c÷>$ÿ®‰üêä ¥ÿØ@ÿ:žïýO‰på¢:C-/ü‡`Çý==ª¦~ÁáÿOµ7ó«ª?â{ÈÿgþËTlÔ c€×nÇÜæ˜ˆ.ø‹=å@üÕ6ë+ÿ`¬óÛ÷uVè³ø‹Ž<äü÷°25pvŒeþ_%GR–Ä6`y>ôòÇ?ð1L¸?ñ(Ö8ÿ—Õü>õ:Ìf€¸&῜TWòÕ[&ùyüé ¿r¹Ö5>q·O½–™fÚü9Ïü³cÇÕ©×!F­«ò@[ÓŽËM³ÇÛ<;ɹc‘õj,"–s pzê?Ò¯_ ]øç¤j>¼Š¢ ÈÜu:vÀ«—£ø—‰ûê€%€«é¹=4ÒyÿtÔZP;ˆ1ÖîBjHþmbÐvig©ÿ`Ó4•ý߇ý甚¥¸=Š—?ñá­[´þf­Ü€5]@õÆž?’Õ+ùê¼õ½_ëW®@:ž°AÚÈýˆlÇúO‡Ç¤,ô*ÅcÿùˆÏ7#ùVí°Åö…ƒÒÕŽ=8jÂë§ýn¥e=#¹ øúÙÎ0#_Ö¡^;È⌶„xÏŠ–S?Z9Á2F>µXö¸ù ‹OéIéa®¦•¾è# |­Ð×7|sw)ÿhÿ:é`“÷Ú/ÈGõËÝÜÉÏñŒOÂ:[Œÿ– ëIÛª~a\1ÜÝìw–-þ—¡?ãÝ»ûv¨íþ}2ÍxçR=^j[G ¨i`†âͳ‘Êýi–,$²ÑÓ¶1Áïך÷ak$y²Ü’øæ×Ä:™UGéV¹œ…Æ=4nÇLÕ;‚ZÃY9^ñTc¯QW.ÛýT–Ï—b«Ÿ^+R ‘.G†“ïn¥›Mn?Öjû5>5oðìdð°–?•2¾`¸#~ Ädõ4ºŒKÆQ}ânЪ‚jèãTŒŽ‰§d£pA> |ýéQçW¦ùu;ìtŽÀ§ZâeAáÔ¥-ÅE;±Óu"PP¿[ sáèÊç3}8ëUƒi(Jæj\ïsÞ“(žðí¾×ºÚ¨ý)öÙMCN@x[$TW® xöà‚‰“ÜqÅJìT—ldyzpÆÝâlIš ïÜ;Ö{ƒ£Ì?ç­þ?ñê’Ëi—AO/ÊwúZŽP[K±Sÿ-/Kgñ4º¨Û¥E›Zp~ä*«ùU€ïý§Vûš~[Š«vI·ÖÙNCL‰üªÌ§n¡|Íÿ,ìƒøReMƒ’îAô¨«Yà™u¾Ù«v‹²óJVlµfÍAl3m¦Ió.ÙñÛ½!’^ÝëêÉO/ËtßóÎÄ ©3—³¾;OïoBþ¢¥»›÷šÃã¤+çÚÅ`…TI¤£eB[³þ”òE¤Gýé‹ÿ3J¤‹ä¹±ïïO ‹½?»>? ¨#…ÆbKàúb¬\àÇ©·÷R£²RÑ[w]»tô&¤¶Ÿþš]úÓ[É&$\ÝžÉlRD»n-h퉤º9þÒ?î¥=ð—3ùçjÒ¨DVà•Ó—Õ™é£ç·_úitOëSD Kf¸û¨á\Çb1ÕÙèɹŠôÿzUQúRÏÄ—§û±ªÒÍ$~òë"‰¹[ÃŒî‘VÂÎ3üõN@X/<)j¹1îÎ>ìaj´ÿ-äcÛ俥”ŒþZ(O<å¸ëÞ¢˜æ?<¸©|¶€ujcdÛÇÏÞµdÍR9ÝPÿ¥¶qTªÎ Ù¼“ëU¿òjë6vÃdoé·×SOå5Ĭ¦2¡Kœ;TbF“EbI%&RI÷¡Ò[ðzŠ– ³oÓq”þMŠï§98ݳšQIì^WƳbùáâOðªÑq¦jž¨ê&Å9ߥÉþÆ?&§íÄÚÌ  ‘ø6jîɰ¯&Ù´™ØÇäÔèÆ×ÕâôɃƒUåbtí:Oò5sñ9Ôãà‰ÈUÍT_¼„Ö†Ýü›u_ÜgïDƒ?ˆ¨R#ü%V¼ °ü4ÝEó¤xvã?t•ëèjîÀ¿;\¸‰Ž»¬) nÒs€¼=*;?hð‹ïl•‘r8þ.”XW1‘\øey1j`cIOþµkj#÷>-LçG'ëTÚ$_ Ü>î®ý9ÔÖ– Ÿéž,Œw·ŽAô§kÉN__úm£wÿv«i­“á=äCùÕ‹EÛ:B©8“I*9硪Zkìÿ ¶1ߊyÀ½~§ÝŒCâ ÿ„?Oª8¸Ð4¶ÉùoØcþù©îù_)ã÷¨ñóH AÛ+ï¥í*­jOÙ¼=ítßú«H ÖmsßJÿÚFª[`X踻aÿ­6~4­oŒ¶'‹UéxÖ/c¥ÿì‹U.†,5ñévŸúUÆùµ›‘œçJÿÚbšk?øüðÑéþŽßÈÕ,Å<¾Ú]²ÏÚ|3Ó>KZ¦xðñÎxÔêìMËwœ]x ÿ©Z’ÛþBÚ0ÿ§ÿ šmàÿMñ'ýpCR[ êÚ/?òàô`g[®ÝN;³¿RÈÿgµY¼Ï‘â~ŸëùÔ0ÈLõ?Χ¼ÿSâúèŸÎ’ÒÿÈrŸò þ•JСxpc´¿ó«ÃþCÐú…ÿJ¥gÿ>ÿ¯—þtÁ.¹·ñòÙ?ô#V†?¶qé¥í:«u²x‡óÝ?ô#VÈÛOœñ¤ÿ삳êWBµ§ú¯ YÜÿãâ —DÔùäß/õ«]<2?é³èu^b±5‡:€þFî¿ä+­gµˆÿÙi¶_ñÿáàü»±ÿЩ×c:¦»ífðK?ùh s‹F8ü€)'ü€í½õý*ÍñùüJÚAŸøW€gGÓþ-Dÿìµ5éÊxŒŸùìƒÿ4À°€ bñ·KÿÙ)4”ܾÞJÙZq8Ö þî•ÿ²RèãáñþÌÇõ5Kq=Œù²tköÇ]@úջ܋ýxž1l£ŸÂ©É–ÑfÇñj"®ßd]øˆŒñ^⥡‰n6êZFGÀŸÐÖAƒ=›©ËKtÜ}1[0Œj¶ ‚vé¤ÿ㦩ÚÖz"‚ú[?I«¡¦C8a¦kÆ~ЃV`EãëiÓð«·#~—ªRYï”=y¨.\>¡zÒ’² P0Sš‰FÖ)3F<-î’¥HÿD;yèq\•ÁýûúæºÄP·úxØ@K"qŸ»ÅrSs+ýk,WŠ¥¸ÊTaH8§(%€ëŠ;›½@ˆmÕàXlÓ²9éÇZfš„C ©ÉÌ®üþ4 ª9(Àæ“ü'ý1vÿ`¡R¸Iù×» 2DmóiWôÓQÿ«7‡kíé ¯éUâ´¸ߨäsÔn©oa×ÛÕÑ+BGÿÝ)ÿWdN? ŽÍsk¡/fÜþµ+b/Æ ?ºÓÏ$ûQf¸m1ÑÿJHL öº³dü÷È£ózìâç\`~íº¯éT­Žû xÿY©ÐÕ›Ó˜õ÷õeAùPöm]cJFþ 2Aô8ª–Œ¿`Ò L‰/X{õæ´v껎Ò`Óù8þU^Ê "€òž84î2­Ë/ØurFK^ªœŸ¼2*ÅÔˆ·š¹”³U>Ý)‚?3NNïõ ò:à÷©.×tzãà Α>”¬Ç –à GKUXì™±ž• ƒöm0NéKàU×O/Q¸l(ØÀþUAö­?H™ºûQm5 êP“ç²¼<þöøcÍY»~¸Ã±"Ê¡­­ËD—äíÞ¤º9ƒY=wΑûö¨EdÇ© ß*ø隂ÄäèÉ´ärÒ¬Ýanõ’JZ*ƒŸn”ËxÂ^iéÏîíµ]}¿÷—äãךmÉÌ¡Ç/:§ò©­km!2rÓ;àþ<ÔL Z¸ï-ö>¸?ýj]X•‡ÛïÊ©ÂÛ*qV «[‚qåZUÊn›TbNw" žä‘©^7»´š®‚LÊÓƒ g|™ÿ?Zt*Mµ°$÷E†>¦–Ë*¶ª@ýÝ©9§[Ž4Õÿyê‘,dŸ4Gû÷!•:ã® Þˆ©HŸ4ÿ·tOäh˜îŠïý¹ÂÓÍ…ùÿWmIí’ÌvÄRÎ~{öôEQJß,ìq÷-±@ˆ!ŽÈz»5&7EÇñÜZ’5ÃZ }Ø‹S"%§»³P1“r·_íHªÝ’.®Û8Ù^jÖ7F0~ýÇõª7m—¾ òX& L¶HYvK+€°æ«Ê6ÅltRzU™w$äç ¢ª\áJŒýØøö¬^ˆÕ½Ón¸sîj/Ÿ)&F9ïMÍy~ó;–Åí=öÜÀÞŽ?h¢bMZ/ö\þG5‘lv•>Œ+p ëW©ÿ="cù®k²‹¼L'¹^SÿËÇÝ‘×ù··:åòÏHœþiš¤ít¸ýWÿ­WÐÿÅElxݧOuÅlŒÙGïhî\ãó_þµh§>"N8šýS·tŸÜ™óy¶“!èÑF?R*£º/ÝþÓ_©Šä©öëZ±ÿÈðx$\ÙÂã9õéYŒ¹ð=Òw‚÷úÖŒoÿ_‡f#‰m#\úõÚ`fÜüþµ'¬WekE? .¬TtõÒª¿yiÆFÝÞ ‰øÝ þ=ùOþµm8Sã õßòÜiää{Âô¬„ŒÿÂ+ªDzÛÞÆHôûËZÑþ-Ò›*Å‚zõFZIŽÆ9qÿÆ—/x¯sø)­[˜Ôë¾'‰WKWú+VR Þ pL7ëßÕþ•´ÛdñuÎ@"ãMÎÏXŸ†f£l¶ðÍÆ~쮽}$úÒÉÛÁŸõs+ŒsÒ_þ½@XÿÂ+¥KÏî¯zzªž¿…iÝŒë>+9nÒœwV¤Ðŵ×ü6üþúÑS?‹/ Χ¬:‚þR?¥h@ÛfðÇ8 Pç§ÿõê³ÄWG×âÁýÍäm€:|ì?­ ¥qóx“T³çée¸÷…OôªɶËÃSrå×ÿSýjúbOÙmññ¥>çÊaý+!IÓeÇ0߸ϦUOô¤oRó_‰FÊÄLJ?Ƶá_7Ä6G3éA~§Êaý*ެ|Eâ €Êì§*ÕzÈgYðËž’ÚùyëÝ×úÔÀr*é‡:n‚êÅY5@Þ™zUÛÔÆ™â¸Àû—hàz|Õd -õU$〠ÿö5¯|Sí>.¶'H’t÷úŠÑ= {–ÿŠ•#˜M£…Ï_à<ŸÊ©YɳOðœåKl¸‘'ïU«K„}wÃÒ€quaö¡åsùÖLwf é²ËYjD ô9©¶Ifõ6è:ü}DZ˜<}H«×X:¾µ ÚR°@z€ãðÅTÔrŶ  ’qõÞ¿ãW!v¹ÕVGÁ{$úþ”“ìiçv£ág?Çfè ÕS³DÒ?êµrµkMoùä÷–?Öª9xzN¿¸Ös€)½À—Q\Xø¡?¹{ÿãÕnônÕuÁÇï4¥lzð*-QänNÿº“œƒS¸¬ÏŸùm¢d{ü´€ŽË/ªh ’éŒûXŠ­¥ü¶¾oîÝJŸ­XÓn<*Ãø­¤OçUl²ºfˆ瞨ëúÓC÷4²?çžµF¦¸ƒÄKÓmÜmúÓ®¾M;T<½]OçNÕGÏâuÿj¨{ '×.1Ü!ÿÇEeÚœ4£Ö3Z¾&Ïö®½ gÿ¬›oõŽ=Pב_ãgm?… ­MnMQO{'?‘²‡ZÔÑ2d¿^y²—ú¼;÷ÅSá5¬Iûg†pv2û骽 ÆOÝÕWù šÄâo ·oa“þù¨aãG›…Õçð5èÄæe«±CÄ $ÞZu£ªè?1± ã¨áéoþ×ñŽö¹?šÑc‘©xm±ÖÙ‡O/@ÉŸøFì@E¹ü­_`Kâ`?:ãöê²’<1 þî¤ôVïWý+ÄÀžÊßøøÿ–’ÇW°ÝÐé\}<³T-¿ä¢ò/[ÿBZ¿¯¤ûé¸Çü«>Æ£¶åý‡ê´Xd—K‹?€>QrŸúUÆÖäÈ:OþÓZè#Ä€ bt?OœÕ Õ×Ò?ö4 Ódwo 6ÐH2§\pøUgg:ÊÚWR±'ü8«V”x`î—q’§¦zõê¼¥Žz€…)©›=3Þ¨ž¥ûÀ~ß⛵MÃOýz[PçTй E‘éÎF—ò×Á$Ÿ±/Ðð9¤³ &¥áæÜAŒOÈⵌo ÐlŠÆ .¢v ýãÿë«w¿ó4.?w”;½óÓòÍTÎ žÝ¹þÒ>V¯ãV¯¹FÏ”ã¾r9úQ°2ʉN½lI ·LËã¸ÛÿꪖjÆËÃ{stäc·5m>mnÓ%÷/¯NvžµRÅ7XøuX ÛŸ½ïúS°n¹´ñ ÿ§„ÿК®0Y›9ÿNxÿpU;¿øóñO[”ÈüZ­±-­\‚zi_û ¬Þå-ŠöCþEœÿÏGÿÐê¼€÷ŽºˆþF­XŸÞxexûÎñãU$?ñ"ºÇ}GúR¡xOö—ˆòê£õZK\ih„ϋѩ/úˆˆˆT~¢–ÛþBšF?èOþ:ÔÄS·û'J¾ ˜©¯ãÛÄD´ ™¨íFtÝÖýŽ?N¼$Øëäã&íêirOù Üÿ³¥ýRéh ð7êi&u}GÛMú§é'÷ú Æqi)ýM8î'±™‚tE=›RëV/ óäÖ€äù¨¿‡—'nµ;vŠÇRcE;4/m£ æf|NzÒHìçùIó¯Æ=À?ýj–ÉH“FD$9ÇÒ™ &ÎÇæ9{ÖlLšŽ…‹zÍÿ©6pʱçð§){&ÂHŠÇ¿jmË–³ÔÎN׺Uþ<ÙóµIà@±­Gec¥Œãd ø?NÔAk[éé»ýeÃIŸÎ¦Ž3ì'!³çØše¬LIB§ä #{qÿ×¢Ár8pßifK°áŠ/0Òëz…T•IdG êd»fýMCpH¯ŸyîBøŠ}\UŠÿ¨ø†&µø-ɦJ›ÿ#  Ô¯ò\Jç°ÄEn>K{³=4 РþýÉ?­K -ªÿriRÈz³=6_š;£ýé•ig8kÓè)Í ÿ·sŸÖ‰yŽèÿzP´ÀŸw#Ë·¦'l?»j|çæ»9þ´…•÷ ÅE+h=Yš³¥ù¢žwÜZÓ@7À?»5“a·Ãrd,Aüj%±q#”öŒÁ`¼Õ]A³çÄëVò ]yik?RgèNqšÆoÝf‘Üæ›¯i@ã­!ëÅ-ys´’Ôfº#þ f=$‰?®+œ„õúVé}·šTÞ¨äÆºðïC‹RWþ$·‰ýÉQ¿R?­[ß·PÒ%õGäÄTH˜]b@Ì?¤•ñe¥ÊÝg_Ôë]72m·ÖaþáÈüœïˆôy¿»¹&Ïõ©YGö¶±~)~[ª¬‡:%“à×óЀéwx{ÄPc\ûÇçšG“køVëœS9ôþ½Mb<Áâ(‡ü´·ûäëT%r|-¢Oƒû›·\út5ÛØçêÍ)c͇‹mðFÉ‹€x=Iª%ʧ…î±€¤¦~’õëaâÛ)·ýí·˜9õP­`ÈàøKG›Ñ^H¿˜†·$D?‹­Âî#2|ÃÒ@súÒ,НáKŽr2„öâOþ½[›gü%~"ˆVk)cÔ jË’Oø¦t)°sÜ‹œºj 'hvÁâ»p#8ã þjxNÝW““÷âXÏàäZtäÿnø®" ù¶Ò7þ‚ÕPÌ˦x^ãG4‰»Ü8?Öy^^…®[÷†ò3O™…k!-â} Æs>ž‰Ç®ÖZ©:'‹¡8È&OÊP­M€jŸ?òÏaü$#úÐ2Að†•#˜uϦPé[SîêY7a#ß0ƒý+$.<=­EŽb¼¿ñæª‡Š´ã“‰ôÅS“ÿL˜JEŠøÑô ¿çÜ‹ôåM[›âø[q!¼ÏÊP­g®[Á¶í€|@ŒúeøV½ÚÿÄûÄÑœþöɤú# u‘_íÏ J:Éj#nqŸ™×­d·Áóǘµü>FÒ¯[§ÂSîHÑŸlKÿÙUY»üò¾Œÿãο֓m\ïñV zyÖ…¿8Aþ•-œ Iá)I#lŒŸùÿ¯LÔoY2Ÿõú|yú˜JÔ>ÝÃóãˆïdR5êSÕ!&P¾Õã‘¢„AóŠ×›xƒRŸ;GÝÿ”ÿJ¡vƒÈñTGª]#úhGõ«± —ÄJXâçFÁÿ¿L?öZ¥±,ŽÁñ'ƒåÏ"gCÿøÕ°¾ÔuT\ÁÇô©-YWHðäÜf;÷R5%èÇÅtÙ~Œ?ï¶ÄZ¾!µï,úuâ3V4’_PÑAÏï4©\¨epúíï9ótPO×Ê_ð§èºïÂÇÖˆóøŸñ¦·#ÓÛw…Ÿº_:~f™v6hšÚÿÏ-U[õ¤·;t)¿çްWõ©µ4?cñdÝ»ÿZ¡u,êC:ŠûöQ¿è)aùµ}8õótb¿øé§]üÚάüµÒ¿ñÑLÓÎíGÃmŸ¿§ÈŸ4ƒJ?'„ßÒISõ¨#ù4›úe­RiÇoîj'æi“šMðéåk ý) 5!‹O ÆRú9çÖ¤Ô”ý£Ä@òM¤,O©ãš5UÇü%C;áaôÍ:øn½ÕóÎý1ô- ç‰9¼·oïZÄJȶÿ\Gû'ùVLjym=»(ÿ­cÛÇÒýò¯'¤ÙÝKáDbµtøûºÆm&ÿÐk+½kxxãS“#ƒm0ÿÇ `TÀÏUÈÉüñYòxfï A]Hÿ|š»zIÔ¼B9µR5¢àX·…N­¦ Å‚id«Œœþ5JÖÛJÐ'zÅùïŸÓ ©¬Ûþ&^=3dËÇÑê„lá±äüº‰ïì´]¥»¡›_’MÂe½~n•p¤rjò:‚i ÇÏ mÇòª—Íóx¥Ië"ü~¥ˆîÖmx?6“Èÿ¶f‹ô ²ˆ•ðÂù‡ÝóŽœçN*¼ÅÎ…©aü„AS¼yÿë~urÌnO rGÌÝ8ïPJ?âG¨7<êkÇnµM 2ÕÐs©k„Ÿ”X®ñþÖÿ^‹%”j>Ëä‹V-ÇUçËûµÿ‰Æ¼ùàYŽÇ RX¡Kï.âÇìnr}òhê#>5èVj®íLì8û¿äÕ»àßñS¸+·( \õ¨#]Ú €9çR ãëV/IÿŠ ’qòvã9¦;“ÅÍvÔb"ÒÉ\ÿ»ÿת–p©°ðâáîÏ=óÿÖ|q®Çÿ`¯éT­Ð=—†U˜¨ûCr=š†.Îl|BHÉ7)üÚ¬ÈÖnúÿÈ/ÿdZ­y͇ˆ àý­8üZ­ËƳ|7cþ%cÿAZ‡¸úY®ðÎ:eÿô#U[cžºô«–_ë|3é‡?øñªx?Ø$v:–AHeÛÎ/|Hé’ÔS­ÇüM´ÜöÓIÿÇM6ûþ>¨Ùå¯×}hvçUÕóÚÀÐTºZâçIÝÓÜþµÑÚ:é$qf£ùTÚzµé§¡M5QÜ–fB?âW¦ïj üÅ:ôÿ¢kÇÖåêi-†l4aݯXþ¢‹¢N¬ÞõGó©(·)Æ©žvéÀqôKHf{Í1$(“Ú·sÆ¥«ûY(þU_LXâ½ÒŠ1#Èvlö84 hüÝùeû Š­t6\j]ö²Ž~µh}OXίƒøÕ[ LÚ‘cÉ•A?LûŽ&œÒ8Ô¯w09³\ñí\ƒrä×WtáuGþ½•ErlIs“\ø­lkH«Z{Ô-ÊðÞbàþ5Z­ixþÓ¶ŸÞ/ó®j4–ÇYw$uÿœ6gަ´!g7Ó<„]8eÝa­uÖÅ2­h’ú‹Ž±Ù*ƒøW·ÜóZÐ’Åpúɻʡˆ¦ÁÎDšŽG·5bÖ5[Ý)àGhÏô8ëQZ(û&”ÿYvÎ}úÕ’2åñ7ˆ[Œ‰Í^rT»?óÎÀ Ξ-kåÈ{´CùŠ»tÁnµp?‚ÑTlP6A¤©Q¡‚xò¤p)aÃiöCû÷äþ¦ŸaÙ´¼ó²É˜{e¢,–º8÷íõæ…°=Èo†bñ½IuLzU‡@šŒïÆ"ÓÀÇÖ©\ö³ç—¼Uúò*ÕàÅÆ±.ï¹j©Ç^•=F%œ M¢&Tìäãø¸¨a‰ZÆÁr?{|Îâx©à]š‚äþîÄŸ¥Cc{}wu‘äæÅeš£‚÷ˆ ú`Šu÷:†­!Èhí•r>”° {V݆jüEzäm°ye@h{ nIoŽòÑ”"²$ç¶j;5ýÖŒ½ÙÝÿHd+{rÄò쀦ÛÇûí-I$%»?ÓŠH¦G(Æžwušû#éšt¤›}YÁäȈ1øS ­´ÑÉrÏüéç-cr{ËzõbrÂêüî8[@:÷¢ÂæÅK–Ì“úSnÏ:»²‰úS›å¼”ÿÏ+0(ê!šR¦Ë¤ç.ÄpöÊH¼½ÏëV¬_³¥J[’jµºNGÍ+>:]4Rãá ¸ôiÕxü)Ó“æ_¶~]Š˜¦¢âð0d¹$ã¿4²üÑ]}ù‚Õ¡8YäÇü³¶¦Ä6Éj?» j&ÿYzàrS4­Ä¯þž(ê.„p²Ìz³54a£_öî3RF{p‚"i±.E õ,Ô_@/ÍÆ J4“°é˜ã šQóG«LM6Q¾9²3¾P(Cµ¥?݆³èæÔ'5¡tBÁtGj„ÜLØaòEÖ¦ED~T·ùº’ÇŠÊÔÛ­þÓV }>XÉæ°µfý õ9®jÎÐf´×¼bÓ±õ¤QÍ;>æ¼£°l]qíZòŸô6_î³/ê cÇÊÕ?6…î—ÏÔõ«§³2¨hmÛ:œóÒ'#ñ\Õ6%´©K’3õ_þµ^LEé4 ú¦*ŒC>¹^ñ·ùŠëò14”gÄ„Ï{qøæ:ÏO›Ã¬?¹r§óR*ê·üN´©OñÃ?™V%Æ•ªEÿ<äFÇѱýj¶©Ð1&¯qäO¦ÄíÇô¬¤%¼ ŒÜߎ}2¸­ 0þÛÒNÖÙ¼gð-U P¾×­ÏX®caø9ÙÑ3§An ø²a’~Õ¦+sÇðð®\ÞnŸ¸Ôê2¤JélŸþ*Kž&Ó¶gé‘XÇŸëðð<‹˜ßýò?­) &©ËøÕIÿ—­8cß1ý+%¼ý1¿ü²ŸýjÙ„¯ü$Þœ¶DÖ±«c·,µ—j<-­Bxk{ÈÙAãø™jlZ5wxÂù{\éäýs?Ò²Xîð~.a¾uϦTé[póâÝBûE„jsÐü¬µŒ‹ÿUâwƒPŒãÓ*Âiâ â_BGúë)$ÿ*°ª@º_†nÞK‰ñèê­i’_ÆJS9ºÒÇâL$JÂ,á°~ð߸üÔéHf…ÔKÞ/·#;[zûbQÏëV-WZð”üðÉõðþ´]¦ÿøš2¤y¶M ëò«fªÆûm<)sŸ¹;ÇùHõ   *øOT‹þyj•ÿÇÅkLD¾&˜qþ‘£‚~¾GÿZ³¤‹m‡Š!àyWø>Ò‘ýjõ¯ÏâM ‰O§*ÿe£ Ê Ð&ãtWÒ||†¬^£,ž.…‡"U“ÿ"ñªÿ⋇ûÑj=~±ÿõ«ZðoÖüPƒj‰- Ÿ–ƤÐ.÷jxvBx’Ò%öfZ«¾µÿ¦:‹~¨?¦»È¶ðÅÀÇÜ)Ÿ÷eÿëÓqá}F3ÿ,µ$#òqý*Vì§²/Þ©:‹#œÆ%üOõ©ì@moÃ.Iýõ‘ˆþr/õ¤¸òÛ_Ö׳i{ùõòÑ¿˜¨ì[÷þ› Ïqé'ÿeTIB0ˆ) öýàÿâ«9ã+áArsoª/PÃúV¥êÖüÈ*¾ŽûtÍÿ瞬˟ª­8î'°÷ý߇çóÇYÏÒ®j‹ûïÆ;¤OüªµâíѼBƒ/SVÙ5zùs©x É}67'N5B>}m?鶇ÿ²Ô[fO ?ªKOióêú!í.’Èj®˜qkáVþåÔ©úÐ"s³HÓIÿ–ZÉ­;Q,<@ ršŠ8üélÒ%óËZÏÓš›T_Ýø©;‰¢“Ù¡Œ54Ìþ%©¶…Ï׌ÒL7_ÏÿM4`Jšün¿Ö†?Öiˆÿ ¨S-}gé&ŽC}6š—¨ÑÍëœÚéMëhäMd[ö´ü•kêĶ‘£³”ê1èŠÇ„âæ3žõäâ~6vÒøFž¸÷­_ãûe ƒðYGï­jxsþCÖÃ8ݼ㆕§Â\· hú3÷[öªšžèˆÇ‹¸Éð6ªÑh«¨ŸäµrñBÿÂP¸À#cÓç5é#”µpx–ô`a´×ÿÐ?úÕVÀŒxY³ÈÆ3ÿM*Ìùo“· úcdg§î꥙ů†›.ÜãâŸQt+θÐ5Aòê+üš¯Ýø›k™ïbü’©]ähºÒ3öõ=z}ê»q“«êÛ¿‹Mß+HYgûCÃGŒ›v«Uáè~Ö8ûg†÷?ð&ª'þEßa©ì¢Ë—ãý#ÄÃÊü~¤€í{q»Kõÿ¦f›|1yâ`Pñþð§Ûœêú_OùÿŽ5;ûõ>?ôÑÇëPÏÆ‡©sÓR5ü{xgÚgþuÁÿ‰.©í© Ђõßü…5üw²^? KOùøwþ¼ÛùRÝø›k¾ö+ü©,ù¿ðéÿ§FþTÀ¥ü€ì?ì&O}Ä>'ÿ}*¿ä bGý?­X¿ÿWâ÷’‡ü‡SŸù…ÿJ©d[xdÇœü­[é®/ø•ÿJ©cÌãŸ5ÿ™¦À¡tOönºO{´þmWgjú‡ İè+T®pt½wŽMâ3W§9Õõ,`cM~KYõ+¡Åφ¿ë›œ~-U: c¾¥ý\³?é>Ïh\þ­UÄŽÜzêGúRÕö7ø”Ž™OçR¨Æ­=4¿ý’¢¾Æ|Jܺüz¤#°ï/ÿe¦€ŠÀâ?duÏþ=PÊ?âM}Ç]DTÖC#ÀŽ7¹ÿÇ@äp?‹Rлü ßÚ¨«6CmÝ¿_—Jª·ø7#'q¨«vÜ]»¥ ¨î'±›h3m ƒÿ?NZŽ~t½Dÿzü šËîxyé«·ëPÉΓqþÖ¢*F[»æ÷]9Ê?•Ae Å}`¡‰cg8õ Էͺ}}»ìAšmªªßÛ…R¡tòp}qIî2+U?cч]×LZ†éwiºã-x{ŠžÛ>F„ï¹úÕ‰Ó.£þ#w¸Ò–Ã[‰p$KËÁ»pX€'=k¹5³1&{ÃþÀŠx5Á]Üè¦8õ©ìÇúdYþøªç$Õ­<§Áþø¬¡ñ"žÆÌŽíg©m‘æQ‚Z؃rX껺ˆQ2sÅ`;f ¯V¸η¢ÒuORȿʽjNì⨴4ãuH¿é¨l×èKê]ÿCRÈvê7gþyØMµ\O£/÷`fý+©˜`ù­.ÎÖj*?#V/›ëÒgª þZÑYì`Úy}DžzpMK¨tía¿½:¯ò©<+²æ2喟Ƕh´_ŸDQÿ<ÝÏåN”å¿#ø,”:tmæž?¹fÆšeˆÙôóÃf]C zŒÔ׊Å5Ç;ŠF§×L·W:~š9kÆÞin7½¦¥ÇúÛµߑҠdò!]Bv9+3èi¶±2Í£¡Èòàgoj[–&ãXoîÛ*þ”äÜš…ªç˜ìNsLìcW·°ܯtïÓ¸ÍT¸ %¾ ÙÿYvõL‡Kì»ÕíH–÷?^i=†·&˜”}['$B¨?*@KäãýU—5Zá·Cª°þ)•?•X¶\ê,Ü´UÈüi †´˜Õ¸ÎN=©‘¶ë{Tì÷e¿SScmÍšãý]¡9ªuù4ÀÉ,ô®1³’Ñ^œÿ¬¹UýE:åŽíE节˜¸kh‡y.Éü3DÙh¯ˆÏÏ:§ò¤ØÑ~/Ý ‡tµº€úzã%!f?•I3bÒý¹È@¼õ |·IŒf;^)“bxGËb§¹g¦šóüwÔ‘¾Ø‚“Jl#å³³5]Ä$‡tw'~`´³oz*¨¨Ëd);î2:tœÇ9ïÊ&˜í–CýÈ1HƒCþÄDÑ3cíMì‘ø’OöbÅ61Űúµ1yHÿÚ”šxù^<ÿ Y¦Gÿ.ãР.y¶“Å(­Q¸\Ép@Çr*ìßê ‚ÒTeËy„óºP*d\F¹ÁÿvX@®‚S„œƒÔ€Msz¹ÿIÆs\¸—îR^ñž½èühZ^=kÌ:ˆãûâµbù´K¡ýÉQ¿˜¬„85¯cóXj1÷òƒ~DWFêgSbð}º–‘7¬JàÄTQ¦Û]b}Ó»òzd‹=*oî³/äAþµl§üMµˆ¿„~[«°Å‘HøFŸû ®~ŸëSÛu®À02®qô`jœ Éÿçœî¿˜´0ˆor&·cǼy¦"+y Éé,‘ŸÇãV¢Œƒâû_úfÎÑÁ¬}&s–Ÿ>ù7Ã;G<þјöøÃ^·>}œ„P닼QŒ¾";Ürdˆžß{ÿ¯U¶›ÅÖÄs峎áÁþ´ÛiŠøWC¸PXÛê,0:òþ•vâ/ø¬uëq“çÙHqï°éMìJܧçí‹ÂWÜf\ý$ãN•V3ãlç ]qí(ÿ ÌǺDãþXÞH¹ôÈSý+^âÞ'ñ4¤Öo Çî« L¤AÅu\g‚<ŸiþµT¡M#ĶùÿUtÇN$#úÓ¶¹Ñ|52).·R ¿2‘ýjìö„^øÆ>J-Ÿ}á‡õ¤Ðî…‚BÞ&ðä„ÿ­²Høï÷–²‘Gü!—Ñ‘óAÎ:eXJÑÛöi¼!9ûÛpH=¼ÌçUL šgŠ-ñƒ Ê9Iþ´€Ð+æx²@Ü ­/9ëœÃþ"²DÇþ]!¸ýÍôœúd)þ•±j øŸÃž.lQäËXÂ?ø¢fÅo¨¨üÐéGQšWª©âèJš ýõoëIfûu?Í€A_,þýjÌÈ“x§U:Q|_$7óŸt¯ Üñº;¹ý7£çR+²ÿÅ1ªÇ´f-F3ôûâµd>g‰o8éFò?ÒªÏ [oÛ‚1ë&p%#úÕ¨>#Òƒö()Çýsuþ”‘vXøkBŸøRi“õV«¡:g‰#Áu˜ÿ°þµRcŸÚqÌwÒ ç¦QOô«¬7/÷ ŽQÿ}!þµY] ‰¶O&ãòÜhüŸûb«Z;fxfP3åßH¼uûÈjÍ¢î×|<ÍÀ›Oòóø:Õ;W”øgKu`­o©”VÎ6åTÿ0jÉd—Pºé^&ŒôPC‘Ðî?­hËoj!º¾‹–ŒçÊ^ž½* õÄþ0€p$ Û@súÔÖäI®é¤sçhÅ[þý¸þ”!Øm[ŸHH9.„ƒÇúΟ­AdDz:€8‡ZL{øS´ðÇO𻪒Ë} uÿ+!JÕØ)ØšºöÁoñЋz’~çűÿvâ)?Z·7Ï­ßÏ¢ƒÿŽŠR\Þx±?½oŸÊ¦‡çÖ¬üöÑŠýx5B!ÓXÏ ¿÷­eô5ZÌíÒôVÿžZ«)üê}-¿wá)=%•?Z…FÍ?鎵ýiˆ/FÝ7\P?Õêˆß­Oª kŸŒr`‰¿•7SLAâ„þíÔOúÔ×ãv¡®'y4Ôp=pHÐÙ†ýFãþšè þ• ¯7Z)þþšëúž?ŸP´ÿ¦š1¥A§Œ¿‡O¬§ó¤÷9ÝXgAÑØŽ‚UÿÇ«N&Cï]ª?âžÓ¿ÙšeýkŸ§ÔW•‰øÎÊ_Œ~cõ­?œx†Ïœ|ýY®0íõ­ ã_±ãþ[(ÅM=‹a±áÔÿgQÏþ:*þ ƒí'®6ûãüj“©¹ûš€ã?ìšÑ¾É¾ñ(9·SŸøפŽVM!ω-ŽGͧ0ÿÈf³í04íÿ»|ãÿZÑÉÿ„‹Jl}ë1ÿjÍ·`4}#Ÿ»¨·óZ}D:ôcOñž×¨V«Ró¬^ÿµ¥çÿZ†ø¢ø˜qr„ßMSH3¬KƒÚO§ûØñqნýáôùÍS`°f ðºçÅ[±\?†XùhÃÿªÒñ ^ tÔGò4†\¾_xŒcþX©ÿÇ…>ÔíÕtsŽ?³þ‚Ô]ø˜ø„œÛ)ýV‹U?ÚZ)ZÀÿè-LBÙcìž?ôðÿÌÓ.üIµû)ýiöcý ÃÇÒéÿ6åGöN²9ÿñÍhIvèÄÛ[ÿ¯þTÛ/øýðïôFãð§NGö¦®Î,sôÛ?øüðé?óêßÊ‹N!ÿ;/Oí3üê{Ñû¿öÒ¡‡×êsíÍO|?uâ^‰(ÈÇöÒ‘Œ/ úñUlÔøgž’?õ«H5dETÒð¶*½ŽL^ÈÆ%qõó@ŒÛ“ÿ½kž·‰ýjõ×¶«ßx¢Õ;+X㟶¯õ«w8:ž®yÀ±Qÿ Öw,e™kððî-ß?øõTs£ÙžçQ8ý*å˜mÐAÿŸVÎ>UaRt›Oíøò(KÜù^"aÿ=PÌÔíªËÉãLú¨¯Aû7ˆOý7Oæjy1ý©t{ 4!EÀŠÄa¼8:ýóÄÕnN‹ŒpÚ•Z±\\ø{þ¹¹ýMW@‘÷mKü(è×Ù-â#­]ˆbêsÓn–£ôªWØÙâA‘üêèùno†yrc?Jq)Yç>_Mçõ5Xs¥öµ·d1q úœŸÖ«F3¦Úÿµ¨úÒ%éÿûz²/ëKu7‘¨#*?bT#ÐŒÓoqäk¤tiÐ –@~ßwæaŠØŽßJ}E$Üt+$r¢D‡»ÏäNk:ICX¼eFMÑ$úÕû)wvÓHHUí¸Vqù­”c­É5U‚Nu®è%eÍþW'Žk½kÍÒóâ²OZà­Ð뀘â®éƒ7ñ½U1Z::¾R{JΟĊ–̲9´9ê×U¿5Ù<¹Aü« !›kö®3ú×A–ÑñÁ/z:ýkÕ¥¹ÇSbíÉÿIÕ›°¶Uý)Ðü·öCû–dÔWwY=°‰Sà%ññt˜ôÌ›]0c†ºwþt—|é7_ôÒôÖ¥Óît…#³½G(Ý¥Û¯üô¾Ï׿¤¶R{¢s«r¾R&{ô©¾íù9ÿWcІï ¨qÃL‹üªIÎÛ‹÷ÕGéT„P³2yZ Ïñ;çð¤%ÚÇž<ÍC#ó©ìÃ,ºJ0 fþTÈ †×MrnY¿j:6ì±Með2̉üªi™…õÓpvYš†WcixF>kÀ£ó%Ó¶ucÄJ£JšÌymdRÕŽj¡nOÙôÑ×3³ãó­%ZBßòÎÌ ¡h»[LrDlÄRlh]ÄÚJG %èþtûŒ•Õ›<’©õ¦Ä¹·³¼º/üéÒðÞ`æ]ýE ;ˆ®çÏ;-@S`ùg²–ÅŽ;RÝó&¤Çj*T†5K·aÿ,í1@Ê!uÓÀåÙÿ6XP‘2ë=}êì ¶[!´ÜECþ²ÞÀÿ-ã×­ +ÇÝln‰ä¼¡~½(g_´Ýœ"TéÒ’n,¢£Ügõ¤Rî'2*æ€-¹Û,‡¦È¦Æ0ðîÂMžo°Zí–CýÈ@ª$l#‹@}KSW˜ãÿjbiëò¼#û‘IÏÙGÕ©Ù9IÛ” %ér~‹BŒ¤_íJZ›ÉWÏF—Š`,œ4Çû±M<8ÿf*$ŸV’\?B€(¸nSl Áš©’V,÷rjÝÇúÁ2‘`ƒU¤C<$YÉêh‘§÷'œnzåõ#›·çÚºi¿Õ&<šåoußs\x§î›ÑZ®vÑš~ÜF)˜ú×Ð@;VÆ•†’â2~ý»Ó5ÓšØÑ[œ#³‚¿˜"¶ ýâ*l9ŽíþåÁšÿõ«E>oŸõðÖ:ΈgAºCÖ9ÿ1WƒíÕô‰»1q§äß#ü+aÖà=mïÃ}>lVÚ¿üVºL½®tð>¿)¦#9ÿ„?D›þ}ïÊý9ÍjÊ„ø£Ä¶Ûr“Ù$Ÿ(#ëÍcàÿÂv½íµËœVþ7øå†8ºÓ??—ÿ­Hf ì™ð߆nGü³•þ ŸéS^š¯Œaë¾&“ÿVþµFHöü>¶q’WP;ìvâµ@x·WN×Zk·×0†þ•Ü´U·›^Ÿû¹Bsé)ÿ¨ð‘¡ø†‚-ïc~mì¿Ö”>JŸéT-æ+iáY³þªæEÿÈŠ­!•<#z„s¤ ª°þ•nÔ .uU']$?âô¦\)o‰!ÏܾGÇü Çõ§é£~¡8ÄÚK¯äŒ?öZž¥t,éì«yàù³Ô´møJÆ Xcÿ„bAó®ƒÐŒL´”®á©sþ®òEú|È­[¹Mž×#Sþ£U 8ÿhŠÒ;Ëz„Aµ&Ì“n’ôÆÓóéVlÂ{C*ï´’¿øé¢åKkzâ67M¤†8èNÑQØJ¢óÂSh}úcõDì ]ÃÎØV¤È±#¯ãO½Mº_‰£ÿžzŠ?ëP‰ ¡o½e«åÇ 9ÿVõ¾o‹-ÿ¾±Ü)üWÿŠ¥qسv7júâÿÏ]-_ôiçv§áÆ?ǧº~†•˜6¾ˆåëHÚ1ë°ŸéQiŽøNSÝdˆþxþ´œvéž ó è}²i.pšn®ŸÅª²cØ’3úTvÊSL*o•¬ ÈíGô©uçÅIé,n?ï¿þ½O0X›R`n¼M È2CÃM¿ãHíçêé(3ÀøRN7êäŸõº:1÷ùTÿJ[1º÷F9ÿY¦HŸqRØÅÓ\Í>ŠÇk£? TZkÃÍýÉfˆû¿ÖŸ£ð<>Í€£í ŸóõªÚtÑGœ ŒÅ¨;0¥G8ô¢û\f^¬¿ñOÂ1÷/eZçHùÓýá]Nµ É¢6ëyÑMó:»FUH ã“\¹ûËìEy¸ŸŒê¥ð„˶wt&®h¼k–$ùnŸÎ¡»º—ýê·¢Z]O©A%½´Ó¥F) `gÚ²¥ñ#Iì]v蚢Â_©ý¯Þƒý©®ïXƒÓýÃN“EÕÇR…t«ÒÓ\¬‘þᱟñ«si:œ—×Ò"ø¤öb>Iåð¿á^Š’îaì§Ø®œëZp7Zc?ðY‘ŒhV< &¤öZèbÑõU¿ÑæþɾÛmIsÐàŠ©ÿî´ºB[&ì:^ùܦÜ ®u}Å즕ìR¾\/‰ùj‡·ÌjUµxùûÚWþÓ«×Z.«4ºÙM2ç…LYtlóÍ*hÚ²ßÁ9Ó'Ú–_g<¯ÞÛ^™§s;™y1øoÚwøýA*ãBÔ‡¥øþF¶mô f8t¥:l¹´™¤Þ' °#¶œÈÙºOnh—D¿’Öþ¶€ÜÎ%\ݧzóWt#6v~¥û³DʦḅZ³]ø|óŸ³7ò¨|Cgp—ê_ #…@#™\‚|U*9/ …Äoh‚ÉZ6Ynž‡UIés r哃)D1£Úñÿ1?ëSßÝø“¯,•mtk…²Šw¦—>~~Ö9éÒ¤ŸL’eÔ¿Ò×í„n~î? ždnBä5ÿpÏéUì—äðçûÏZ`év.ÿKPÖ¾@h?LýÚ‚…tàu](›2OúÖù³øRæCjÆ“¥ê¿õú¿Ö¯\©–°xâÉF*±&‘Û\ÂÚÞ˜Ó s¹Ž1Û¥I-•´—RwNâ!yÆ1ÏOj†Âå Qþ¢×›&ªÐý™¦æý¿˜­x­,¢šÒC¯Xmˆ È=ÿZbXi©¼_ðÚb ŒªD.rOj.†g]ô]tö7+üÍXœÄÎÿÛOôbKM.H®£o[ââA#bݸ#Óš{Ǥ´ÓÊÚô{¦ˆDض<ÎŽd"•’âëC㥻ŸçU¡Pt»ÜêùŠÖŽ=! ¼«®ôT1®-CëÍÏFŠÎu¹LqÊdR-¿‹ßš\ñ]KP“Õ#6ô£ë‡òð‚¯8Äš¯|YF?JþöŽæ6Ö.JÜ8wŰê?ÐúDšuÍðÕ.šTG. \àt§íaØáJu4Š1-F.´aØZ±ý W·Pl4ïS|Çõ­¸ðäO KPÌ)塯Ý5Üxj4‰æ¦V2&åG<{‘fŠW?ñá«¶3›¥©$“ͺÔ\.1h¢­K?†M¼€Ï©²NûØ£-ùSç†K¸Î§™#|ê2?*N¢îZƒÚÆVçšU ÷`ç€ÔRÉ,»‰ó]®‰£¼ÏwÛBÊ3·Ó¥s7‚ÊÞâéNc^b-Ã/ûÿùúQJ¬*ÅÅ;ØŒV¶j«V¹?ú»£ØÉYgô­­Aí?³ÐE‹3¼ža*ßéXÄq\X…iXè£%8ó ¥£ñtHù f‚ºM?TÒ¬­cK9æÇ2yŒ¤þF¢’\×eÔnÄvãY`ËRk ¶éÖC~ðŸÖ³Çˆ4Ú?áO—‘ûæãõ©×Å2ª¨Ð†îÎn¯Zô)Ô„wg$ã'нp3mªï\*ÿ*ž–ïPoîZTí|A¤^Ü-¸ÐÇï›-ûæëëÖµ/&µŠwòü=ç#(Rþyöë]‘NPç[îIK‘îT°³F>í«5C·u–”ŸÞ¸-üêY5K;H¼×ðæÄŒmÈœð=:ÕOøJ4]¨¿Ø#r¿¾nçXʬc£5P“è\u?dœwÞŸÄS®¹þÖ?ìªÊ©ÂS¢AІ Ý9¹>½kVÃQÒõ+K‰ÓDç¡_5ŽÿÖ´¤ý«´53©ûµyìÙyÿ\m?6Ù:bó•Fsǵ<ê0+å¼0Û±·>qééÖªMâ=2ÒUŽ_ùoƒ+p?:™TQܵKaÉk[`§™n‹tô4éãó-ïzþöpŸN•Xx«F¡( r?zÜΉ´y\Gý†Ÿ¼pOï_^´£V3vLr„’»E«¯•5D…P~UM2—6Ø#ä¶5¯{ui+ÃÞzºä9ýyªªYD¦Gð¸UA‚KžçUU8JÒ&œ¹ÕâEj™M-H?Äçò¤T kÏvOëQèé·nÉ÷~sÅð”èý7nû篭b«C¹§$»—¼$ä4ê¥}G=èköÚr¨«øU‹ GJÔ`žTТÜq¸Çó¨›Uƒ-¿Âã,0Ù=kyEÆ*ofe©IÅt²¤ìŸÝÛd{T9Ä–!NÐ#-Í6iÖòí›ÃèŽ0[µB|U¤áâCËÓæéX:Ñîl¡.ŶÃ-‚îXþ´–ê2Ú¸5&Òæš(¿°á7=3Z7—öÖÒ‘‡’T?6áë[B<ðs[J\³P{ÉÊMÓæ˜ %a›£ôZ‚]z‘žO \剪ÇÅÚyÝzóÖ³u ·e¨Iìh¹å9,@R.‰ÈùaÍgÿÂaa×ûF:Ö•¾¯gq¦ýªuà ­í›PèEGìÒr#Œ¯ú8ÈàM\m‡‘˓֣þÝ €<4€ŽƒŠ­'Ší#m¡@¬§¡ì*ÍÔŒw-A½‹e•—¨ù¤õ¦»$äràuªGŶ}´K~¹íþ‡Å¶œÿÄšßž½9ý*>±O¹^Î]‡Ý:‰g;±…«3«1í„õ©_Å6›wc[ÜþñU¶1ýmÓøTºÐ}F©Ë±Zåªáº-r“Ò1=Ítz–¿Õ±Œi°Æ{2ñҹϼàzšãÄÔR·)ÑN-n>QµT_…Y»YGµ@1ŠåF¥ZÑÓË¿µIó¬ÚµnÅ^6þëWIÚhSØÖDÛ³8PX~K#‘k¤N*Y&úÔÁs­jqÏH\ÅsUîðý³±Ü‘ù¨?Ò½ÎbéLj:äø£‘€üCUiOüJ4ÙsÊK"ÿ#W€Ï‰eóñoÿ¡GYëóøhñÝÕOøS¸%s®êñö’{ü¡«[Q—ý·€òŸ.~„…f!ÝâxéâÙÇŠ·xÛü ¤ËÔÃrÉü릓÷L§º/ÉþÛñ…®ï¿’qß5J%ð}×ý³'ñžÈ²xîö22·šq8õÌ_ýj¡æÅ# \t0^•'ñÍk6ïûé-¦ÚÞ›ýHŒ}$aýjå£+ç> ‡Ò GŸ^Sÿ­[ñˆ¢Ïü|ií(Ǻ+ÖV<1¬[•Á†þ6ÁíË­k£™<[$¤i sïÒ¤¢œ-¯….ó÷'xÏá"ŸëI5ºÇaâ»`?ÔÝ#¯¶%#ù®‡þ)=.bÇ÷ƒ®=2ª¥hßF±ê^/„d†Ìÿ¶­ýh`>Ü |O§n9:XSúäËý+)$Ç…´·ÉÌ:ƒŽžª§úV¥ƒcZð¬ÙÆøyú;/õ¬­§þIÔËD~ªÃúRBî0.|_¥ö”ëQhDlhÄ瑞=ÜZ¹x¥µ¿(óôíøþTj£¡œjôÿHhÿñáÿÅTý¡ô Œ…ðÅ„™æE†1ꪥjÞn âÛ~JîY€ôýàçò5BH‚xfú"Ã0jKßî°þ•«s*˪ëñÄe»´]žZ—,øC|þTù’B³cížI5Í3qȺÒ67û_#T-w&“áÉŽ@K÷é¹øÖÅŽ“¬5ΉršMá[Ks¾byc«c–Çb*Ð.­ôÛkKÝCK´ò.MÆdº zŒ.})s¡Y™÷‘mÓ|I9ŠýgýçzåCjúÀýn’¯øìCO¸Móuâ(½I*YÚ“È$€ zOí?´²8µÔ®äye¥œF.tàT:±CQrÙ¼«.±áùÏȯdc% eªÚMÂGe£ÅÞÞúBÊ ±U!yÀíÖøI­¢ ,ô : ƒ Ò«JGýôjH|[­Ì“,W+„8[x•1ÈôÄ%¨(]Ù–¡Ò59­ïcƒM»p÷é)ÑøV²Õì-®í ®Ÿ,„sÎ?µ¦‚#•›5%M. ÷\k±—¢ÑÖÚÙŸ+Œub9¥·þÌ1Û½¥†µ~ SO‘É#å÷ýjô 5K›™ ÅÌ.ãÐ5é>ðøðæ–,„æ|¹råq×Ë[8Fè¨:r›ŠgœZéZ´»ÃÁ°D«÷Zè³mÏûÇúV¨Ñ¼Ox¥¼¶š|°®gò(ç¦09¯OÁéYÑ@Ë«ÞÈO‰l ó§¯&¬vRT£\ñßx?QÓôA}{«ËxÂEp3ß“^nÃú×Ð#S໓‘ÉûèWÏÒµ­)NZÌŠu!/„šùvÞIõÍt¾ÒäÔ.®fŠòH ª v¯ü´ö5ª([öÇBªyú ìþk½I Æë|~tçÍÉîîo‘§5)lo‹«Ž?}'ýôiës9Ïï¤ëýãN6鿯€ŽFãŸJ¹g=±n’ÀWšðØ­õ>½f8%m ~|ÿóÖOûèÕ-f+ÛÝ.Xm¼ç”‘€„ç­v¯k§G‚{Õ«(m#•š[¥FŽ#^癌Ï0žÆ\‹SÃæð÷ˆ¡fmvv¾ÌÄ}~žôÑáÏ?üº^rÛrC~J÷ÇŒlR Ä`LW¸½¥„xÙ¶xðψoúßÌOPÜbx__#?a»åIåOåõ¯ü(êÒýà}ng„Â-¯L¬˜ §¿o­DÞ×Àcö “´§œ×¾¯CõÍ#1I{B¥Œ›ì|ûsá­bÔÊfµ–4‰w–nÞµ Z¯3ªGcpÌÝ^µîº½¤Wp˜æŒ:2m Õk[O.HWk óÛ·Tj8óòÌjF\¶<‰¼!®ÁK%„¡7±ÇAßòªÙó žXÙm¦r‰/bGQ_Cýå*Éx ÷¯3Ò´woâ Éþ² ~ÑjOcСï£ZÐräq—õqºòrSêsð‚xƒ~Ñ`åwc9=~”?¤4¹ã— ol4«›ëÛ” åb„n-øÖß…¼"FŠ‘Ýä½Óeã=kѧ1˜¤Fݼæ¹Û©®ÊZK§ãbL¡”ÿçÓžua¡®4–£“g¨|+Õ%¾•í^Ö( ù±à~USþ>´zÜÚÄÿ…{8‘HŒÓל0¢1šV9êbç&å͹äKðŸWû;F×–½r:ñRè å¶Öd¡I#ˆ ›z9õ¯Y3.XV½}si|ÒKÃyePd6}j•MY“õú°jIÞÆ»¥5–#F8q^4×r}±šg, ÚwuÅ}«2ÜX• ’ã_>ø†Ý-µyÑFâHô®\+•®(úIâV;œÞ¤$¬2›y?ÔÉ÷O¥Q¹… ¡éØúÕ¨ÿÒ­LGýd|¯¸¤‰–î³Êq"ýÆ5êÎ*vKåþ_äyTäé¶ßMÿÏüʧ”ãhö¨Ú6Ê· :Š|ÃçÀW"OTvÝ=DÝNܨði@ªI…ÑÐx^7R2‘©<úÕ›Ý?T–îIÑ„c•aÒ¡ÑåŽËGº™D®0£<ÖšBsæ7ç^ÝiÆ–0îy´âçˆ”Í È/-|ùC+ÿuóTCS ±’~¦“<×'vzуÏZ묭®O†‚Úñ4‡ îÛ\r‚ÌÖº=bíbÓ­-à—>m§§ëe¶„'7ØàƧ9B+¹XkE°Òóï0¬›“7œË;ëÁÉÍGö‰Yó¦–,rI'Ô×›Rj[°VÜQW4´ó5(ý¡T³šÔÐBMØPNI­0Q½xÜÏÿtìlê¶ú¬·…íXˆqŽ$ô¬‹˜5H!i&s³¡ÃƒQjwr¶¡9I[nî0xªm4Œ0Ò1Á5Ó˜UR­#<$i">ôž´ìf€3^b½Î¶ÑÒèqÊtk#ýa=D«òŽjkö vä #Qå­8«„´39©¢?)úÔL0Õ$Ä(ƒ÷„ö:8ˆo[±é4*?5ÅRˆg@º^ñN‡ùб¿mæ?Oø1Õl:Ôø~`>Ž+Ñ9ËHÛu½"_ùé `þ«TâR4}Nùå*7äÄZ’GÛ‹?÷w)?FÏõ©L{nõë~™Gaø04ÄH’mÔô9ó÷¢Eü˜ŠÑt-à+È»Û_ôöÎ+Fņ‘?÷×ò`­tˆ›´/[à˜Jãšè¥³3ŸBÌ$ÂcáéÁ#í6Q©>ø+T ‘àKˆûÚê?—8©<¯àûÂÇŒÆ}¶Éÿשd„®‹â»B0aºß‚süUµôdÖToØÊNMõ‚…eΈ#òâ¹õØ| 2rZÛQBàžA~Ur)vê^¸',FOÒB?­VXvèÞ&·í Â8ãÒB?­KÒ±cS·kËoe{d¸H×¶S$ÈÕ).§k+h#*¢Úc,D OQýkR;›xçZ,lZèn|=§Ç}y,ú•õÜÚˆ+2YÛ¬jGbqÐTéöIV¾¹º0.ØÞòôíQœðÈñÔÚG ²ìKû,ä@Aá ã.¡áÔ¨Ïb¬?!Zî~ѯjÏn’Ü%Õ†Å0D_.c_—ê+¡µµÕI·Ó´M;û»-Dù¶jɲ׮!¹Tþ庈Çé\óÍhG©¼2|LºXÀ´Ðu¿'A“û.xÚÁÙ¥iöÄ.`±ô×ÐŒvz…­æ­¤Z%ÕÀŸ›ƒ+& aÏZß³l›™..YñÌÒ–­|?amǤ[ÝÍrÏ:‡ÙG\2)ý¹¼1hÓjbFÖîgšX¡–XR›BXúw«§xgKÕìtèìo®ÙŽðó\ì žà(•ÔGaXªL.*«h0n+ébv”ž1þMsÿjU©/u°Ê0ôãï;³2ËËI-îí´]*I®hŒ’¹ù‰b}+z]bò+y<—ª©!a@½…>ïKU×´+x<«/6BUzci<ýrk£µÝó°í\õ«b&Ó¾†t«àhsE¥sÇ´K]sZ· ·w»e˜‘¸©ãžüŠM?áψuc™`†äV–P8ú ×°øcEM ÂêÝYLou$¨Ùí[¬qƪ¸b½89?xùœV.S÷c¢˜êzÄË›(%Ô¯æ*†‘w¼iªF±ÀÀÀÎMuÆ“”9*x©F|·8 x*Õµ‰¾Ýj³CLdƒµ€SùW¤Úèú}¢•·²·„c$`V&‘vÑÝ32±, !Fy'5´ngpvÀßV8¢µdEÜé¹*€A\î½§Û]kú=̹ßlå—ÔŽ¿•kvÝv/ãšÍ½F¥ªÉ&ì÷æK›Vê5ÄÝ2®Iâ¡“Q†˜dôªÉm Îæfúµ+%´l§j¼š‡F ÙêW´•®´õpI¬ßAUEÍÃÜÊËn(¹ŠšKûh‰Ô}+:}fîL»ÁVM¿•k *–Ð#ëq‚jRÜÁñº^IáKÓ6Ѐ)À<ýá^&NêöŸkñ\xvúy?¨¯s’jqtgNÎJÇVU5%+;—µbíÖ?øè®—áÜsMªÝG ÁÎ{ó\¥ÔžcDÙÏîÔ~B·ü¨?YyÎb#N ›T¢vfNÔ='þ‹¶}ßmNFÜí©#ð½ÚˆÀ¾Qå±#åªãÅòS—Ådü•˜Xë+]—dðíôA¿^~åY²Ñolæ/öÐÀŽ›k/þ™¸(ÿ„¦oîŠÉá1Yñpjͳ£6·¥OúRø *AxWþ>Wþù®i¼Q9SòŠhñ-ÊðTýJ½ºõš^gR-¯?çåïš>ÍwŸøù_ûæ¹_øI¯=ð“^gµQ¯ä[¥æu‹mt iÿ³GÙnˆÁº÷ÍrCÄ×™=)O‰o=¨ú… úÝ/3¦–Ú`{çJ¨¨Ì8—a·jÇ·×®'œE&0Z»<í:U*‡»"%R/ÞHØ[ ‚3öçÁöÆëVÒh¾<Ó/MÃï—È’Lsž?š~UløŠí2¼qÅ`ø«PŸSÒÁo¿‰‡QØÿ?Ò—Ôë+ÉìkKIÉG¹¡¯ÚKá_ZëqLÿd¼>UÓ(èOSù ýT×l,Œˆ®·ÎÊÂÁµÁÜêòkÚ‘p,уœ}Öõü Vð÷‰oa³þÍ•³%¯È3ýÞß—OÊ“ÂUvWÜ¿¬AÅ»lz8ÓXcý2\~0“Íäߘ®Dø‚ôŸ½À¥ ¼îÔþ¡_¹×(ö:Á¥ÿÓÜܵR-€@GÚ$9õ5Ç~ïzšuë²~õÙõŸPúå.ˆì¾Â¹ÿ]'çGØõÒßUÆÿnÞ~®]÷z?³ëwשÿ)Ù}>tŸ7û><çÏ“þú®?ûrëþzgöÕßüô<ÓþÏ­Ü—§ü§iöÈÿ_'ýõXZì³iÒÆ ¶°äšÉþÚºÇß5NâêK—ß+5µ á;ÍÝVÅFq´U™%Æ­tÐ8’W)Ž@­½SŠM?´6רí\è‘Jž„sM!Õ#m ùWeL,f’Z Ž!F =ÏEµKy­ÒMÍÈîÕ#AlÞÿÇ«„]Ná(€)?µ.IϘkå³nüÆ«­nS»6ö£©ýj­í¤–’&ž£<à×uK“ÿ-ó¦Å¬N—*˜Í%–O~`xµ%ðšZƽ>™¤8Š &¹#ní¼ ñP]–šê2¯!Ýõ¯Tñ‰mŸM’ÖBˆÒ®#5å×°F",/¼æŠó±xeIÞÚžöO)8;ìgC)†UqÛ¨ö©¯Sc­ÄG ü‚;ª{ÕËFDö²*} qS|צúíêzÕW+UWMý?à _B`LG¨®ý¶©3B.(|‚¤ýq^oû{¡ÙÁ»ko£Je—P7KèÐúŒWUs]Û^¦#Q¿ºöÿ/ò;x¡ðLN"o³*IÉÎ.!ð]Ä+o+Û„•’?”×!åx?©¸Ô1þ¢/ÿÏÆ¥ùð­}§’gêvÐÏà™Y…‰bŒ ¼ðÇÜÕÏì ‚àÛÚ Y¾läóï/Á~Ѩþcü*]Þ#ïSÇLnÿëSU;ØNŸk…¼þ w[=Ä•&›‚|¼†¶ßæïb¥qâ?Ï}Góá@ÁóßQüÇøRç]{7æuÒi~ûJÜC$[Ãä¡í?†jí̽&™m¡*6¶®Êð_üõÔï¯þµ8'ƒ17QǦïþµ5Q-4Mù•Æ…á+=!¼Èà`ãr¸çð9ªÖÑø) á .àì~aø×2áe ÓjEG@_åLòü?‹Pÿ¾ÿúÔ¹ãÑ!¨>·;‘£ø2舭ÖÛ,¸8|ciþ°¹}¾@—hM¥ø®)ƒ£`ÈÚˆ>¢B?¥#·ƒ™‹7ö‹Ô™øP¦—D.Gæv·ZO„¶Êû-÷²>sçPŸ øf]5fˆÅµ>ndþf¸ó/ƒŽAöó =n|"¨cS¨„=WÍ8¡Ô‹z¤>F–—;ë *ÄvѼ˜!–£X<'<¢7ŠÛå$pJçì#ð]ìë{<’CŠêm< á›ÈŒK;®8"lŒÐ¥…!5Ë»dñxúyD1ùh;??ãPGáï B!_&ّƳdçÞ©ßøs@Ñô׺•îV>˜I۟¹ðþh<Ãk©Týíï´ÑéºBоÌêƒü6X£Ç ;²²×¥5tÀIglt%²ZäÛQð¸s„¿lŒgÎnŸB.|#œýŽèýdoñ¬å(.ˆµ ulì£Ò<gßuÎ0Ü~"˜4 ß3–ò÷1ùO™ŒW$.¼'ÚÆçþûoñ§.¡ádû¶wCÓ7øÐªÅvæu«áÏZòò89·sùÒ®‡ái­€ÙºžnµÈɪxjQ‰,îŸýéXÿZhÔ|0ŸqúèßãC©O²Iyké>fòßÈ Ô8 ÓŽ™ád‹È„ÄU?5qçPð¿}6cÿ?ãIý£á€r4É?ï£þ4*±OKÙ·½Î¸xKÃÒÌ&„F@ûÀ¿•VMÖ÷2£,E‰Æ µs©­hçe„êO\;FÚ¯‡‹1Ëz–?ãG´§¾‚䟙Ԧ‡á¨×摞w•FMÃÈòñ>UÝXO«xxõÒØýXÿC.¥ ²šÈ݈=?ZJ~CP—™‹¨,i{"Å€àfª*IJ´¬T¤Ôg§5æÍÞLìZ"¤§÷•6œBÜ3J‚O¾Nh¶³k>sºVlõ5j5Äkôª}[š¼¿tV”Ñ2fLã¸÷¢¸õ’1w,zžµ=„âî(w„ÞÁw7Ašši¶’)»#FBN›§KŸ¹#¯ò5q“þ'·ñòVPI àÖ¤ÞºŠ³›RÒâŠ)‹,òݸc(É©†›£G¨}¢çÄð4Ì‚=–v¯&xÛÕ±^š\®ìãkÚ'õ0¯a0éÀùW ö üªÙP|ExžÙŽ~±æ®Çmf/‚jË=”r1QŠ1€}«´¸¶ðÆ›`×vÚ ´Ï´l’æF”œôêk,Mztæz=Žœ¿ [œb½èîyp+&…jwJ—„œ¿ ì4ËK룬$eôéylª†;v ¾Ð:œwï]¬±Óôå•mìm@XŦ׋¥êš–¥›«Û¶ûDÍ"…íU^ƒé\k7¥Üu=eÃõ¥e&‘˜¾ñ èúTØGfös3.®Q F 稭+ÿµšj÷š–·¦Úè°$Jó²ûƒ5ÐM§G}hdÜˉ#ù¸-ïYšž–uX­¸Âº“ŸcÎ=ÿƹ¥žKK-Θdµ¼¯b<#¡Ç ŸÚ5MNåm—|&8Ò 9ëÉÍiE¢øj’EÑÞâIŽéêåŸyÎr@À<Ô“e†I Œ€:TñÛK"© ÁèkŠy–*§ÂwÃ)ÁÓä‰!¸¶´èZ^ŸlCºçó<ÓäÕ¯åk©ÿuNéM:lû áäoZ¿g¤¬ñ«–„w¬_×jëv\ê`0ÊîÇ-©è)¬³Íæ²Þ6ÈNzv5ƒöUžål5%6:¢cɹ }3þ5ê¿ÙvöYg àç#¸¬ßè¶µ]š>C¯ÞP{ð¯S WÁRöèú¯ó]Ó>k4Æ`dý¥]îº?òk£Zœ®­u¤Ü­Ž½LñÈ+ñüë°‚Ë톌«+€ÊÊr®=u)4ø×Dñ$"æÉ¿ÔÜ‘Ûë×Ì})Ö÷‡„g3iÒý»JÎ^Ý›%Aç ÿQøŠÞ¦Jª¾mŸ[lÿËÐÆ—º1µù—KüKɾ¾»˜Ó\Œ·qSÁ§,ŠàõQš§¥ø–×ZÞl„’\´{~d>ãúÕ›x51¶Û6Ìœ8È=ϵ(d”㬙58¦M5êK½¾y gŽ{Yš uYÝVÃûÞìFoõ“Ãy8ÉüjÌ~HžòGÜ0BŒ]qÀàéùœ53ìeGî«mv¹™ÔnSúÿ,ÓŸTŠé-dR+íaß¶aðö›æ6€]É«qYéöÃ[˜ôQ[¥†‡Áš¶a‹ª­)XÃRíÝå6Q•IìGõ«‡í×R8µ‰Æ=+Aï­¢#.£éUäÖíP}ðhärwŒ6SIûóc¾‚ËdþXr;òsRCe2ErÝ»m\p1XÚ—ˆ’Ki>½ª«xšfUÀçÖÑÂVjéXš¸ê7ô5õ‹xâ´ÎçvÏ𛦛u– ¸F¸ÏÒ°UžõÂ9ù}*´×r+SŒ±_Ëã]1ÂK—–OS†X•)óAhoéWq[ÌæCŒ®?Z¿.»jƒïæ¸Û™ðªÄóϵU,OsZýF|Ò3XÊ´£d޾_ļ"泦ÕäÞ~1åŽ+iQ9wUúš­q«¥¼l‘°d#÷Œ;EiõZT•ÐéËŠ—)¹ý·uÈÜGãPK©\IŒÈxéÍbA«Cvìª_iä±âk(ø 5ÈËXÓ8.NqïZÞ„{ê8¹É£¦’âW$³“øÕg•‹ÏÍs:–½*ÊÎèºmåŒ{yöªw:ÚËk°Æ|â¸iZCœû‚¡ãhCcªŽMUëQ›:äè4«€]yB1šà3Åh>¡"ÚM 6D£ ÆI¬óÀÆ9¯0Æ*í$¶=ü a¢ÕÇ“Âýiè2¤Z.Ø d¢•œæ¬Ù›‘>mƒÁû½q\¸LG°¨¦ub(ª°p}NÙ5 V8¡#œNþÑ´BCNŠ}ø®N]N<È ‘[ûÀŠtÖz•̆IavcÔ–ím¯åj1èI52Ìm½ ®r—]S:èu(®t+$ƒ¾J‰5»G›É]þfq·ë–H%2²%Ç•íÎéNþË—vEÒî-·;NséW<ÙÅÙ¢)ä´gcª¹Õ¡³U3£¦îiÑê)4BHãvB2\|¶€²ê Jñ‚§Šh†01ý öSSý°ûaQîu°kV÷âW.; .u˜­VxÝwtÁ¹"9üŸD¤h­Û¯XÿÀik˰aÐ; udŠ+Fvá…2ÛZ†ésæ¹#¾0o$#ýÚo—j£‹©?¡æòìWö%¬m~¹òlã9§Ï¬%¼^c #=Á5Ç”´<ý¢R~‚‹\cΔýhþØ—`þÄ v0k pÐ(ûÏŠ‚?G%ÃCåí`q–n+“ÿG—ó¦æßþš~t¿¶'ѲZYsâ¶#F<å"&¶‰š<²1¡¹®@µ¿uÿÀ© ÁÓc~t¿¶'Øk&¡ØêíüAÈ8 ßjŠO$w"ŠG÷Áâ¹}ðcˆÎ~´ÒñÏ?ÌÔ<æ¯BÖO‡] mvXîÂJ²ÆYxÂ÷¬§-üó4²-kÏÄbeˆŸ3= 4#F±Ø¨ëɦ©(êÊpAÈ­=FÂK'D•P]À¡ÈÅgÁ°© S•¥£4„£8Ýl^™VuŽá}@ojEŸÊ»¡ê)-% ûî¿ëK4’G!BªqÐãµm)èªÇçëÿç„´e·OOøwVâ3½9ºUl ¿ÎO—&6žžÕáâ©ú:ŠŠ–kÚGoȺRiû9ïù¢—ãI“š°]³÷GåI½½åXó›ØƒïéHYýåB˜Xˆ1÷¥ô§'P?JaioÒŽ`°ñ6)ÂaPï“Óô¦——û¿¥>`±hJ§½Ó®j¡wôý)cYä8Dcôs‰ÆiCZÚî5Ü𰸨ǘGÝ?•AbÊÊ£ø«GO×/tÖÖîHÆs´?*£cn'ºHä!êH®”xNÃf|OJJ£O@å¾ä×þ&·×ôÏ/Q•¢¹NT§Ý¨­7\³¿²µÑ­ü±&dßÔã·Ö¹{­ŠVŠ@Ä ¯=k,ѾU\7¨º#‹•îÌ¥Al޳SÐ`¹•§°¸„&HØN9®jæ3i1Ž\‡ Ô ÜàãÍü3Mhç“–I ÷•J°ž©8èØï5½Gœ˜ëQùÿÏ'ÿ¾M'ÙåÿžOÿ|ÖÅ’ùÈxÍ'œžµ·—þy?ýòh0JLOÿ|ÒR‰|äõ£ÎOZˆ[Êsˆ›ò¥³ž¿ýóG0XžŸÞ£ÏOZgÙ'ÿž/ÿ|Ðlç?òÁÿ*9‚üôõ£Ï_ZoØ®?çƒÿß4¢Æç?êò£œ,rzÓ|Õçï°ÜÿÏÇÒ¥·²q:ùðHbÏ8ÕBÓ•®&ì®g6I4È܆ÇéZW2‰˜E˜û;S­ôƘl8ÿÚ¥8¨»\"î®Q¸ ~µ;hWcîìo£S?±ï‡#ø0¥Xm51¿›ìãnùxíUâb²)kø†%FëÆzñX«Ã šoDÆÑbb|æÉï[þ±û]Гj¹F^ éÏZÀœ~û> µ¦ßÜi·I<%”Ž~¢ª·4¢Òz)•U9­躭´vú­¼²ô{•òeÿä †®#¾‡F¸`bY|ÄcüK×çÞ®¥Ý¯‰¼11…€»ˆy†#×#®? Õµ´_xjÞâÔãRµêHëùà­:8J•¨F==-ð¿Ñù‰ÇQÂâå^‹þòóOIÅüÒ’ìËzŽ–º­£ZÈΊH9NµÒh>´ƒ@+>øA%ŸÓ­/…µ Oi‹!„%ô»¹Œ.>o\{ÿˆ®…í%X¾Ïس|›»Œõ¯9eõ)EÆG\ø†ž"qTÕ¼ÎYâ0¢©?xnàÒÛÍ“ ™£Ü€íéÆjíÆ•p“¶g‰®®¥ds¾mÛHd‘Ð)ÁõëSÙZꊲ*ÚìF;”»jÜmBÚ1̪*´šõ¢gœViÉé«uj• ­¤j*Vkˆ‘O¦Iè´¶E2]Èí÷ÔRøž >E$ÕQâ¹r»@Uÿ!šÑRÄ[k#™ÖÃ-/sRÿÚV¥k$poGî[•=†¸x$¼øw¨y±‹ífÄwx½¿úÝQÜV›k÷r`îÁ#=jÝä—°<3âHÜa•¹¶§ƒªþ'¡œñô ì¢O«x~Âðǯx^ý,¯ˆÞ­Ärúäv>½½EO£|BVs§ëÐ}PN7tI=ýè{zWéyáÉZkFytö9x³÷=ÿúÿhÈ–> ² Nñü,8d?çµRÁ).Y=PåŽq÷’÷_Tzž&„”f©Éâw<*×›­Åþ€Â;×6ÂÈ:¯ùô­˜µ;)cYæ2¤qósùV´ð”v¶¦q—ð»§Øé$×îH8ɪRjw2L†°nµË+dF‘ßiÎӰᾕëa¬MäQ/’9’@¤ã°Mn©Ò‡b ]^ŒÚyäsó9¨÷“Ôšä¡ñcOq²P–ñ`åö—#ðªˆ.d¸+ky)‡–P¤žü ™bhÁno Ÿ?‹C¸•ÂFÄœqÔÕdÔ-Ö/:–Ç ŸÈW.º^×Ê6ñ™ mi‰cî; ϶Õ'²w{yÊ;¤¯\V2Ì©'dwÒÈýÛMÕ׈­­XÄŽë/#ª÷Ú¼úr¤’l‘› /š äç$«ˆ–êiä.åÝñ5FdsÉ*>§5ÉS2w¼QéaòÚT¡Ékm¶½ ‰åžgŠl¶Õ7g>ç¥fǯÝý =Ä,c9vÐGáXOöóôÝãП©¬cQìΧ„§-Ñ­¨jkvêV(áUByúÓdÖÆw0…Û° V^ï@9#šS„Gcèkšx¹ËvkŽÈ›íûÔfgìÃð«Qèš„¼ý™z¹ ?Z—û"ÿ›ûxý”î?¥`ë>æŠnòÜcøÒ“Žõ­3Y.®û*S…í¬\[iOf”–¬ý£+”£dËtãÍò°Iò×$žÂŸ“}1%-dÛž :¹ý£«J6©ô§.“®ß‘Ÿ=óêM9V\ª ¡ï\ ÚuÖï-ai>÷–7b•tíANVÒp}”ÖÒøc]´Bc2©n¢3N]Ä=ÍßçXûDº•ÊcÿgêŸóïsùwön¨åÚãò5´4?ùùüZž4¢ãþú£Ú®ãå0¿²õNöÓþT¿ÙZ¡ÿ—y«xxo^<‘7ý÷J<1­Ÿá—þû£Ú å0²53ÿ.Òþ&—ûS?òîÿ˜­ñá]hÿ Ÿ÷Ý8xOYîþû£Ú å9ÿìMO÷ Ÿ÷…#hÚ’õ‡ÿÑk¨oÅé?áÕ°I¿IUSžG¾'0>®*ÚxsQ`HÿV´SÃ!Ÿ#(Ý­´›HSiVcÜ’y­cvCHÀÿ„oPî!ÿ¿ÂáËî»­Çý¶ÑÿfZg>PüéFgÿSÊ]¬Ï[qÀÉ݈ßû².*ౘŒ¤ÈãÕk´¼ðö›zk>¸æ±.<Ös@{ÙJ§q8˜Om*pÍ¢1?w•iM¥x†È`¯v<Ï–ôDJßiÓ@¼£Šµ4M†߳ʣ–}áùU¨¤³¹[Ý®ãü/Á¦OÂûd?έ$Äg˜æSÆÓøUÄÓµ'@R,©éŠDŠY›lHXûWaáû[¸ +qÊöÏjÊv[2£©É+TÿžTdê‡þY×£ì_AI´zV|Ì»sý‘ª“þª“ûV?Á^ŒTÔÒíEØXóŸìMTÿaj§°¯D*)6ŠwacÎÆ…«w¦>¸—”|ÞÕÞ•…A"‹°±äÓÚÜÃ!xÅCþ’8óó¯X’Æ 9(§ðªçHµ'>J~Tù‚ÇŽê×1\Â…q¥dŠy棭b¬fÝËRrb>«]RIáVhÀŒ•ü+žõ1Ÿ¨«:}ëZL­÷“?2úŠ×“šHNXIXö?x{O}yõ4t’QsÁcê+S†-Æ:¶‡ Q ‡Ï¶'°ëøüv¸6ý`D¼Ñ™‘ ÌîåOµ]ÖîçûE–±“æÄB±ïƒÛùƽøà$©s_CäªâdªrËvtþ!'ÂÞ&ÄšnÚs²ú=sßñëõú×Z¾'±’æ†Pñ¸ ¬=ë‚‘þÙYø¤^sÜÄÓ¥“K¿m2w&'; sïÛü÷úÖ/‚kŸSŽ9…IEòh×äzmï‰bšˆC u®[QÖ%¼ˆ£ãÌŒúŽÆ¡¬û‰TI; dž„×e,-*_ %j¸š¶ž¨ŠÏW»±½2,„pÀwо£q6ÈÜŠáÍõ¨¸ß$Ê·é[³k1Áfn’Þg·TÝæÂãñ¬0òŒ¹¥.ç½ÄXDãMaã­µ±®f‘º¹¤ÜOS\­¿‹£º¹ò‚G`Ÿ2Và~UCTñEÔWUÜrÇ´"Ç´gÐf¶•z1ê|å<£7ª±Üd Oµ¤‡šˆX`å‡JàG‰­PO×;‹JU:ú Ç‚þk{¥¸W êr7ò?*ç©£7;©d3½å#ÐN¹a!åœe‰<y§&°—¦âÞ"ñ ‚ÎÁZóûýfëQØ.§óU@ úP/Ú$\,r•8â¹åš%²;ÖGI»ÈëSÅÂiÒ?%"‰Žœ“ô¬ë‹èlî÷iwó¯ïMŠzž•‰äL>ÿ—ûÍHV5û÷ ÿ¸µË<ÊRÔë†WF ÑFòë͸K›µK»!åíǦeÁ|Ð\,ÊÁ$S•aØÔ 4/ˆ¾wã-ZÚ~Ž/.L*b‰—;‘—çü3Séúçu=®°zö6‹t´©·óìW$óH‹ô9¦—¡wcíÅ]‡Ã×òƲ0Š8Ød3¸çò©F‹i>Ѩ}"Œµrº¯©Ô¢eyŠþg4¾kvÂýmÅi¥¡ù,¯nHþ÷Ê*ô^zöMÚFuÜZ\ñÍGosp@Š)d?ì©5z?jNhKë+­ó±8Ú÷žRÿv%Å"èjç3Ï,‡ý¦401± ˆÿ¤êvê»\ÓÒÏICò‹«“ì6Šè"Ò­"äF ÷©Y"p¨£ð¥ÊÂèçÅŵ¿i‘)õ?­<_êqòÁí]~‹ F´O6ëÐâ5ÕÁ¤YÀ—n‹ôZÂm§cD®yJé:¥éÉIß>¹­ ßË詟ïר¬½¤Ú¢ì«·€ƒ4ÿ‚ŠØ¶ðf› F\ïéöâ—ƒC2ÎܼkôZ¸–ȸ@ú ².(°ÈÄKýÚ<•ô0—X<•ô¾Pô©¶ÑŽh°ùKž”yC=*nÜRqÖ€ËqIåJ”â’‹–¾•Zö ž 9ä»F(@s®%‰­Ø{š„Åk)è®§‚0Fj ,í¦ûñ)> VʳFnš9¦Ò‘¹Fæ±u)áÓ&ò]Œ“qFMvͤª¶è%d=³ÍGm ÙÛÎÓ´bIØå¤~MS®í¡*ž§m¦ëZ¹#ân¸úWC¦ø2ÆÐ‰.7\M×tœÊºpF§` ÅÉËsEˆc#P¨@ìr::+Çz–œxŠí|ø‡©ëÇæß•v€Wã5þÍÕ4qÄ2ùR‘ýÓÎ?-ßm†WnŸuøîŒ1>êU?•þ3¯+•ñ®(‚ nÌubÁ‰Ó?Óù]h €G ŒƒëHè’#FêXÊzk:Stæ¤kV ¤J:eì:®›ì?rUÎ?º{ÀÕ½¾ÕÆø}›Ãž&¹Ð''ì×Í´f?§â>«ï]­UjjÓgª&G8ûÛ­¨Í™£gð(¬ll3miô”ìvÐE:ŒPzRšv)1JÀ4{Òÿ:\QE€m…:šHN(° IQIyo/*ƨO¯ZG÷Io¥5ØWF™¤5ËÝøÂ8¨ §Þ°¯•^É-عû†£©E¤ø‘uk/õS·úŸ_ëõõjëÆjØêµp·îà} H¬­ÕãJXU˜ÈÒ•R9Öâ§ ÛUùŠNKt—çÿèµo¥ý«Å).¤äg±¬¸u&òFÜ:žî+36‘&v—üiÆñcØRµINüèÔœ‚É罚RNì)¨|Ö;C±e©"Á%väšGFFÁ£p%Èiv„ƒëSÇk:ÆØp öª±Lñä¯4s+õcE€’Ka%¤ª!) ´ò)[æ‹nõ¤Â£dr=èQ´AÝìhó 1qèjÀ7PÎ\òsH JD›Õ¶0éŠèôÞéî©pLÐŽ=Erc' 5<6sÜ6#Øû ‰r½ÊWG¯éþ/ÓoÔmfSÜc¥lÃ{mp3È}³\‚t[‹k¶’æ#aÑ…vÒèÖrüÊ›ÕN+Ù½ eÇtEÜÄX—úÅ’eLk&=«3ÄQÞé–Ûáä‹<†çÅ^k2ÉT«¦¶„cö‰“} :ýþŸ;ÿ£Û*HÞQŠÌ}^ñ¡XØî Оµ˜L¥²O9ïW-%Œ¾.•õ«W%Є›74¯gíYíï ï4Xk 2“º yÕÆ§On^Ú› ¬D–}>íe…Ê:‚+9SºÐjMði bøg[Κ’7úÕá…mç5¸˜¤"”õ¤<Ó¦›ƒëO£ÐãÖ”)Ô©€˜£Š3E†|înÅ7¥K ¬ÒÊŸ/\Ó|ÄWA7[aìÔÑÅ>!˜} µ¥ö$ÝðÕÑŠôD_j¿_¥uš–»§}’KG2uàlÁ‡šæ|-y§é·rÜÞ3TÄ8]ÃqõJènük¤ [ik3!'|,;ôädgô±éPQ‘ãâp«W¢Þ}s=‚C FÍË—cÓ“Çò¦Üi^%ÕT´ÚtPG{¸R3ïŸÆ°ìõk»1ô¨dŒÉ…E˜nn;‚qÏ'ó4·Úˆµu¿ÔR(Ý‹2¼À˜è=«*¹‡,H¡—SrnÚÜ/µÙ>ÂIpn[þZyŸ À{Ve¦³=«JÅc™œf€ü(À"í{ÌOü³múQ»iVìPÛÜÌãþzI´~•Ç[VII=ì=A¸[Uø¢;Ëùo®ZâvRäÀ=© ÍÕʈ÷M*¨ÀQ’§jGü{ØÛGîWqýi¯«ß:ãÏ*=åÅí¤v¸’.|ÊÙŠY_çNûÏõ×¶ñû),J¨‹wvøUšcì Uø|5«Ì3öF½) ?ZhÂÄ%tøþõÄóöT(¦}¦Í?ÕÙƒï#“Wÿáò‡ú^¥i¨VÞJziºJy×WMém<㱘u‡úµŽ1þ c\\Nq¾G>ƒ5º_K²]çNEã=Ë1ü*«xŠòOÝéö±@¿ÞXÀ4¹‡b¤:F£qÊZJG« Ö¬hÆnn­mÇ}ò!H-õ›ó™g¸“=”œUûßÜ`›wç»T¹Øv),5± .¦ò°çÇýMM}®XÜ8’ºiò3+ü…tßæ` †4˜ÍmZü?³k³{Š#]ÁÞ$Î’šåháŸU»HÐYÁl Ž_Ëù¾¼Ô>µ~´ò0ÏÝ Çå^qà:[CJb—ød8>þ¢©i7çE½V¹DÝ"¹Ûò¸÷?×ó­t¨¹é+>«üŒStß%]Sëú3‹]W6¥þÏ2…ä”ùU­GÕuDckeê÷‡á^¼±(jçõ %ÌßnÓ$ûúÁ“…sïéõüé*ʪµMóÿ2)RÖž«·ù‘Íø­>íÌ øÑý“âôíþ5Ði~(xn†›¯ÅöK±ÀŒ#ûú¯O¥u@ qÒ²©”Ý™­9¢¼O46ž.­”/ùRñB}í&3ôÅzfÚ6w¨ç—sNTy¼7š²8z4g©A‚+RÇD7÷¡¶º[¸ï?Jí¶J]£< U{I ‘ Š‚%Ž5 Š0§â—m5™BRš\Râ‹€`Òõ4b—X R÷£b€sHîK‚” ©"Æ‘ïlØëN*í z!âìá¥7'ÐVq¿µå2®~´‡S³UÏžŸur@ÚF‡ÚN~íh>‚²[Z² œ‡ñ¤mnÉIÝ*~ŽHv ÈÖ7 ;RyíÖ±ÛÄ6 2fOΡ>%Ó•OïTþ4rð^FéÀÍIm1•Nzƒ\Ôž+Ó‚àHö­Mú+øe’Ü7ã5m¡P½õ6¥Å-Îj6Žôî)(ÅQL¬oØhønö2êžb¼¼þ¸Çã[4bª2p’’èLâ¥ÔÄðÿö‡†¬äfËÆ¾SýWåƒøÖßjã|!ÿÍwYÐÛ…ŽO:þÏÿ¨¥vU¥x(ÔvÙë÷™áäåM_u§Üs4Ò^óL[û\­å‹y¨Ã®Ñɦ ÔÐuTÖ´ˆ/Øm‘Gð¸ê?Ï­i‘ƒÒ¸;>ñ”škiúßì¬zý—þùª‡ï)¸uZ¯N«õ3Ÿîª©ôz?^ô;~ôQŒRö®s¨n9¢Šo4”b–Ž”XÅ%?SH¡ Š1NÅ žµç~#ñ݆·$nò1Æ+Ñšáügö;ˆñ4$L£åqU Å-ŽOP×¥“iµË/ñzÔv·3OåÜ«yg£Ò²¤Ìoµ‡¥CqwrÍó7Õ¿1mZÑÌÉv„šÇ@­ûÉÉ'°¨Cù’~ñÈ_­5О0ÍŠMÝ–É&ÈâÉ÷æÝ»K°*®? ¨òî•X€§ÔQ#~÷,wgÒ‹‡fpòž”ëGÞ¯ä«t>† æ93·¯­! ›Àë‘ZS›„®EZjq°òŽªñ¿)ïÖ£4D9ªÜÛf‰nd¯ *¢ÈbrS€{UT‡,´ÙŠ”ùã®ërͤâ18ù?•2âC’Ã#§EWÞwf­ÄÂê/%ÏïÜoZ¸þò“4Äy+ÛBz å®c)pëèk»š®GS…ó¤çž8=H’+ÙF²JRI–%a˰ÈxC¤Å÷î®'#´hΡƒGÔ.1åÚJAîWõ«ñøfðÿ®–GûO“ù Ûœ‹ Í:/õ:psë3–ý)ßÛ—(1oúg­]MÂ/øø¿g>‘'øÕ„¶Ñà%¬“Yú Nh|¬Ê·Ô..®ÜÌò Ç¥B,/&•–8%”ƒŒ…'5Ðý© Aeo# “ùÓa–쯑’mµ²Ÿ='Þ?“0q䬻Kó_ð È|9©’‘ œÈàb­M¥Ä÷(—*¿/Í$Cp5¥‹¨Üž!•³ÜÖ„^ Ô$\áPã€ME*é>Ylÿ«•Z“iJþ¬sÂÓF·8òînÜ…ò«V¿;bËG·_ö¤¿kh^Qšxf•¢¹°ð‘ÛÖº/øGní€òJ8CQ?rn2.U"¥žKMm×þ>ô‰BÐt9$ù®nf“K1ÅoyWp¸Y¡uã8Ȫ·š&­âÒ7-:?–5?yÇ©ô9+f`\O$mKýÕæ¤´Óµ½gdµvÇøØc#ù×o¤ø?KÒö¸‹Í˜u’NNk}c+7>Å(œ=§Ã«%ÃÞË%ć®NoÚøcKµP#´ñ­°)بweX©”1 $J°©Ö!éRñKŽh°ÆžzP)à Q`jž©¤ZkfÚî=ËÕX}ä> ÕübЍ·tõ&IIYìpö·÷þºK WtúkœArvOþ·nÕÚÄñÍK+Æã*Êr¦ÝÙÁ}lö×1,‘8Ã+WWPð5Öåßw¢HÜ∟åü±®›F¾ÚKóÿ‚rÞX}õ‡åÿêµ=ËWµò/!?…‡ ‡ÔåĺǂÜ,áõ 8>üCú/¥vW¶Ú¢\ÚʲDã‚?‘ô5; e*À#†¢erjë·õ±¬éF~üŸër®¨Új–¢âÎe’3×TúØÕºäu \é×M©xnSÝ^ØŸ‘Ç ÿøb¯h~+·Ô¤ûâ=ENÖ†NösüºýiΊkž›ºüW¨¡Y§ÉUYþúìt4b” \Vã@¥ÇZ\ ZLsKŠZ(âŠ1ERâ“ 'N•ÈøæGZ…b6°Hvõnk‘Ö쎮Kîxü–­sÈÈUúdŒ[V>Ôî°Lb5=Ízµç‡ì/4ù,ÞUaò²ŽTö"±¼9¨Ï¦ß7‡uS‰cÿiOI°Ïòü»Vò“¯uñ-üüÿÌçŒ}ŒùÂöòòÿ#Çá´K†º”·°âº+?é–x+’;‘]8¤"¸ž»–H©œ0¨Ù¨ö8^)ô˜âÆÒšuÀfÑIŠ“ùÒb„dRRHE1‘F)Ø ŽÔ€f)¸©1M"€H?˜¤HâšE<ŠLQ`ŠmIHE4€f)¸§âŠf)1ùÓðqM"ÅIƒLâ“ê6ÕÇ]i¬ §ò¬)kgeˆa}¼×¤˜Õ¾òƒM]2Ô¾ónõÅe{Gšˆõ“ÇšßLÕ˜|7©NAòX{±ÅzZ[Ęڠ}JÒŸ0¬p0x.åù–E_§5©oà«eÁ–WoҺݸ¥•ØXăÃ:|8>@cþ×5•©ÚÅ¢x–ÂõVÚã÷R pLþ þÙb²o³-üŸ™Ù`È š^¦—®c¨LR€iE/9¦RâŒS¨bŒRâ” `–Š\q@-¢š@/zl‘¤±²HŠèà ¬2÷§Py "÷K¾ðÛêZ(i´öæ{BIÚ=G·¿Qî+¨Ò5‹MnÈ\Z¾{:¼‡ÐŠÐ·jä5]i·§XðéòçËjË ïý?.k¥J5•§¤»ÿŸùœ® ù¡¬{vôÿ#¯Åcëž±×"Ì«å\¨ù'A󯨣Ãþ"µ×`!uvŸë`n£ÜzŠÚÅeïÒŸf}ÊÐî™ÄÛkº§†nËÄÓÚ¶+ÔñïëüþµÙ[ÜCwMo*IŒ«¡È4\ÛAwAqËŒ28È5ÆÜhš¯…g{ÝšâÈÒÙ9ÉO_¨çëZû•¼¥ø?ò2÷èÿz?Šÿ3·Å.+Cñ%Ž»îÇp£çÏÌ¿â=ëc½sÊ.ÒÐÞ2Œ×4]Ðé@£½.*[,1F(¢¤u£¥4˜Ã”@âÅÅ.Ú3N¥  KŠu%&(ÛKŠ\P`RÒâŒS”bŠLS)iqF1@ ŒQKŠ1@ E(¸ DSD—I ƒrH¥z‚0k‘ðÏo££L{ep@Ï¡Èþ`ŸÆ»ô'+ýßðzÞìá?;}ÿðNÖŠwj@+šçA‹â}uÍ[`ž¿¼„žÌ;~=?«àídêú*¤Äý²ÔùSÔúõóºJá5P|)ã(udÈÓõ²à·sÿ³ßUÕK÷tžû¯òùœÕw5W¦Ïô#¹Å¥ŒƒÅ+”é¥%-Æ%%;b“´”üR`ç¥7µ4Ô˜¤Å&esI¶¤Å!†DS?JÇÕ|5§êÑ‘< »³¨Á­ÌPTP<‹Xø}g¹ìÏŸaüB±í|«Ý?ü{”êÕî…E4 §Ï-“'‘_cðÒC†ºŸ¡k¤±ð&—i‚Ñù„wnk®Å!µ{•"…¾—ilŠ\{TÞ¯¤Íf@FèÛû®:éô5¬Gµ&* Ü”z PR‹‹êsÕd¼°{ ¼­í‘òÝ[©QÀ?¦ÓÞºLWâh$Ðu»Ú¡1±Ý¢÷Œþ#õºøeŽæç…ÃÇ"‡VÁé[׊v©Ÿàú£jôå¼ц9¬_è+­Yˆˆïaù úgÓùVî(Åe ÊRŽæÓ‚œ\e±Îxc]mR´¼^£mòÌÁlq»ük|ƒ\ÇŠ4iã5í(m¾·æEQþµG·sÌVƉ«Ûëzj]BpÝ$<£w­X©/k žþOúØÂ”å짺ÛÍ[—±HG!˜ïX$x¢ŸŠL~tÜ LSñIІRSñÅ&)Œa˜§â“ Å& <ŠLS”„T˜¤"ˆ©1RIŠO#4”ˆñI¶¤8¤¢àEŽi1ÍJE7\öÒRbšV˜Æc4˜©1MÅ6ØðN¤úûOûí¿øšxð^¢?彯ýôßüMwTVžÍó³‡ÔGü¶µÿ¾›ÿ‰§Â¨Ïk_ûé¿øší¨£ÙÄ9™Å¨Ïk_ûé¿øš?áÔ?絯ýôßüMv´QìâÌâÇ„/ÿçµ·ýôßüMðˆßôó­¿ï¦ÿâk´¢ŽDÌñOè—:WŽ5/ ¼‘&ñö‹rÄía€p8ô?øé®çþ+ÿùímÿ}7øVGÄèdÑõMÅöÈKÙN!ŸoVŒäÿ¡øz43GqsÂáâ‘C£„k¢´T­S¿æŽjpr§ÛoFrÂWÃþZÛßMþCYø}u¬i²ZË-°cÌo¹¾VìzW ÑXÅr¾do/y8½ðZÞßÏu _¸ÕIôÖøw^µñ&‡o©Ú–A‡ByÇU?Oåƒ[U§~ò=wòf4g(¿e-ÖÞhÆÿ„VûþzÛÿßMþÂ+}ÿ=mÿï¦ÿ 먬9Q¿39øE¯ç­¿ýôßáKÿ½ïüõ·ÿ¾›ü+­¢ŽTÌäÿá½ÿž¶ÿ÷Ó…ðŒ^ÿÏ[ûèÿ…u”Qʃ™œ§ü#7¿óÖßþúoð¥ÿ„fóþzÛÿßGü+ª¢Ÿ*fr¿ðŒÞÏX?ï£þÂ5yÿ= ÿ¾øWUE¨9™ÊÿÂ5yÿ=`ÿ¾øRÿÂ5yõ°ßGü+©¢ŽTÌåÿá¼ÿž°ßGü(ÿ„nóþzAÿ}𮢊\¨9™æ!øi}s7ö®“<ú¤0ØÄ O¡ã÷è{Õo ë¬Þ>‘¨ìu˜NÖ‚l¯˜G]¼uöüFEzÅrž1ð=ŸŠa[ˆß욬#0]§#G¿QÛÐïFQä©ò}¿àò„¡.z5ßþ 'ü#—ŸóÖûèÿ…ðŽÝÿÏH?ï£þ…á^Xê#Ã^0O³jI…†éøIÇlž™=Cìzú%g:.ÌÖeQ]k®ü0—Q—íÖÙjJw,±³Ç߯¸ýkOׯ4ÍQt_ƶ7]éøŠAØä ~#\W²Vn· é¾!°k=NÙfˆò§£!õSÔÒ3MrTÕ~(ÎTš|ôÝŸàÌ¥ðýÓ¨tšÝ•†A H#ò¥ÿ„vïþzAÿ}ð®<§‰~¹hüÍcà ä©ÿYn?§þ‚}‰¯BмC¦x’À^i—+*3èñÿ"¢xu̵]ʧ_™ò½c;þë¿ùéýôÂøGnÿç¤÷Ñÿ éh¬¹¯39ŸøGo?ç¤÷Ñÿ ?á»ÿžßGü+¦¢f‡ÌÎgþÛ¿ùéýô—þÛ¿ùéýôºZ(öqfs_ð]ÿÏHï£þ¿ð]ÿÏH3þÒQG³ˆs3›ÿ„zïþzCÿ}ð£þë¯ùéýôºJ(öqvsð]ÿÏHï£þÂ?uÿ=!ÿ¾øWIEÎ!ÌÎsûëþzCÿ}ð£ûëþzCÿ}𮎊=œC™œçö×üô‡þú?áKýuÿ=!ÿ¾øWEEÍ39Ïì ¯ùéýôÂì ¯ùéýôº:(öh9™Î`]gýd?÷Ñÿ _ì¯ùéæº*(öh\Ìç°n¿ç¤?÷Ñÿ ?°n¿ç¤?÷Ñÿ 訣‘™œïö ×üô‡ó?á\WÄ¿ÝÛh0kÑ™4ùÕò¤’ g§÷¶×«Õ oMMcC¾Ó¤Æ.ahÁ=‰ÀàÖ”m ©V‹7NÓåÔ´Ë[è$„ÅqÊ¿1èÃ>•gûëûðÿßGü+á.¤÷^:|ùtí)ê;‡ó#þ]å*”T&â:U\à¥Üç°.¿ç¤?™ÿ Ì×¼6µ£ÜY<uÌlIù\t=?ÈÍv´RŒyZ’Ý/y8½*ø{%Ö«aq¥\²G¨iäˤîÚ ôí‚?ë]—ö×üô‡ó?á\—Œá“Á¾5°ñ¢±Ü°·Ô{ç¿â~ª=kÓaš;ˆ#šWŠEާ!VµéE¿h¶ŸS’Nœ·åÐçÿáºÿžþgü(ÿ„~ëþzCÿ}𮎊ÑÌç?áºÿžÿßGü)?á»ÿžÿßGü+¤¢ŽDÌæÿá»ÿžÿßGü)áºÿžþgü+£¢ŽDÌæÿáºÿžÿßGü(ÿ„~ïþzAÿ}ð®’Š=œC™œ×ü#×óÒûèÿ…'ü#·óÒûèÿ…tÔQìâÌæ¿á»ÿžßGü)?á»ÿžßGü+¦¢gæg3ÿíßüôƒþú?áGü#·óÒûèÿ…tÔQìâÌæ?á¼ÿžßGü(ÿ„róþzAÿ}𮞊=œC™œ¿ü#wŸóÒûèÿ…ðÞÏH?ï£þÔQG³ˆs3¼ð„×ÖsZÎÐ4R©Vøu®+ÁÖ·šv½yàûé#[›r^Ùœ$N¿/‡wçé^É^ñ3C¸6öÞ(Ò†ÝKI`ä¨åâ'>¸äý Vôc'³üÎz÷MUŽëò6?á¼ÿžßGü)?á¼ÿž°ßGü+Wú忈´+]NÛfO™3Ê8á”ý jV.’NÌÝTæWG-ÿÕçüõƒþú?á^â/êÔ†¿h‹&™;»†"HBO\``g§¡ã½{MEsm 嬶×1,°J¥d0=A«¤Õ7ä÷3«Qvkfr6Cê–_YÝ[Ko:‡GV<˯µXÿ„f÷þzÛÿßGü+”Ó®'ø_âì›Ùü7¨9kiÜÿ¨osùøQ^ª A¡T¡;­žÁJ³š³Ñ­ÎSþ‹ßùëoÿ}7øQÿ½ïüõ·ÿ¾øWYEgìâkÌÎKþ{ßùëoÿ}7øQÿ½ïüõ·ÿ¾›ü+­¢ŽDÌä¿á½ÿž¶ÿ÷Ó…'ü"׿óÖßþúo𮺊=šfr·ßóÖßþúoð¤ÿ„RûþzÛßMþØQG"fqÿðŠ_Ïkûé¿Â“þKïùëmÿ}7øWcEˆ9™ÇÂ'}ÿ=m¿ï¦ÿ „ïÿçµ·ýôßá]r ægÿ•÷üõ¶ÿ¾›ü)?á¿ÿžÖß÷Ó…vtQȃ™œ_ü"7ÿóÚÛþúoð£þ ÿùímÿ}7ÿ]¥r ægÿ~¡ÿ=­ï¦ÿâiƒõùíkÿ}7ÿ]µrD9™Ä¨Ïk_ûé¿øšƒuùíkÿ}7ÿ]½r ægÿn¡ÿ=­ï¦ÿâhÿ„3QÿžÖ¿÷Óñ5ÜQG"fpßð†j?óÚ×þúoþ&“þ½Gþ{ÚÿßMÿÄ×uEˆ9˜QEd…Q@Q@Q@~$Ò^ðíö˜øÌñ„öqÊŸÀ\ß­]õ -ÆEÖ›!¶‘[¨QÊþŸ/ü»Šó+ø¤þ2Mܱף޾‚^Oç¸7ýö+z~ô%™ÏWÜ©üŸéøž›EV@„<kÊþo†>7Ï#ÃZ»ÿÀmäþ˜ÿÐO}µêµ—â-×Äš%Æ™v>IQÀæ7}?ÄV´¦¢í-žæU©¹+Çu±¦` ƒÈ"–¼÷áÞ½um=ǃõ³·RÓ¾XYú؇Lzà`U#ÐסTÔƒ„¬Ê§QN<È(¢Š‚Š( Š( Š( Š( Š( Š(  ?xSMñ^šmo£Ä‹“ ê>x¨õ£½qZ?‰õ_jQø{ÅŤ±o–ÏR‘·°cÜ~«î0kÔj†±£XkÚl–Œ 4ëÕOfSØŠÚ\“Õ~^†5)6ù᤿?Rê:KÉ«£ÊÊr=4êòx.õŸ…wëiæê•ñàe­ÉííôèzŽâ½BÊú×R²ŠòÎtžÞUÜ’!È"¦¥7V©õ:ªz=èN@`AƒÁ¼ó]ø{qc~uÏ\gê ËÚƒˆ¥ö ú>•è”R…IAÝ¥8ÍYœG…þ!Á©Ýdkp/ZC±¡”Yû$ô'Ðþ×o\ÿŠ<¥x®×eì[.Pb+¨ø‘?ãØþ•ÆÚø‹Ä?ncÓüN’j9; Ôc™`Þ¿CÏ\+^HÔÖž·ù{IRÒ¦«¿ùž¥EV°Ô-5K(ï,n#¸·edŒäð>Õf¹Ú¶çJwÕQ@Q@Q@Q@Q@Q@Q@Q@Q@Q@i¢Å9ñ“VÓËo«Åö˜‡«òÇõókÒë;)FÚU÷‡üS Ö7B9qÕüÀ}8aÿ¯GŽD–$’6 Ž+àô­êûÊ3î¿#ž»)C³¿Þ:Š(¬ƒ?]Ò-õíïL¹˸Œ®ìgkvaîá\gà bâ(¯<)©œ_éNUþ(³Ž=@?£ ô:ó?ˆ–sø{^Ó¼o§#¬7¨¿Æ‡€OÔ|¿÷ÍoKÞN›ë·©ÏYr5UtßÓþé”TWj60^[8x'ŒHêÈ©ë¡;…Q@Q@Q@Q@Q@Q@Q@Q@#*º2:†V Œ‚)h -Й¾üAŸÃó4mY¼Û&nˆç€¿û/ýòkÔ«•ñÿ†?á&ðä‰Åý©ó­XuÜ:®}Ç\Ô¾ñ?ü$ÞŽIÎ/í“t§ƒ¸tl{Ž~¹ô­ê~ò>Ó®ÏüÎj»›§ÓuþGSEVI—âÏÄš4Úmêå$GæÇFãüEq~ׯtMUüâÅÔXÎlj£ì ý:~]F+Ò+”ñÏ„Åb½³y:­§ï-'iÏ]¤ú^ÇŸZڜչ'³ü *ÁßÚCuø®ÇWEqÞñ{ø‚Î]?R_'[±ù.ba´¾7õàŽÇê+±¬ç r³HMN<È(¢Š’Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( ¸Šú\³ør bÓ+y¥N³£¡IüˆSøW}PÞZÅ}e=¤ëºãhÝ}TŒºsä’‘a϶‰ªE­h–Z”8ÙsÉü$ŽGàr? ¿^qð®ê[í Ý·ïôˆ1纃lóÿ¯G§V“hTgÏ&QEfhp|9qq%Ñòš¾™ûÀPs$c’=ñÉÇpHï]…Ì}+¹ÑõK}oHµÔ­0ÜFzƒÜpr? Þ²æJ¢ë¿©ÏAò·IôÛлEV@QEQEQEQEQEQEQEQEW–kŠß¾ ï¥tmY¼«ÅQÄny'ÿfÿ¾…zeø‹C·ñ…u¥Üà,Ëò>9G«¡­iMFZì÷2­8û»­5etŒXdr¥®áž¹pÖ×>Õ~]OIc yxÀÇ®8BµßÔÔƒ„œYTæ§$QEAgžø÷Ãw–÷‘x¿ÃÃf©góOõñÎGrwA]G…¼Kgâ-BÔío»4DäÄýÁþ‡¸­ªò½~Âçáωá&ÒbgÑnÜ%ýªtBOP;rr= tEûXò=ÖßåþG4ײ—:Ùïþæz¥^ÂúÛS°†öÎU–Þd޽ÅX®v¬t§}PQEQEQEQEQEQEQEQEQEQEQEQEQES^D‰äuDQ’Ìp¨¯o Óìg¼¹}AI#z(5åZf•ª|S¹“UÖ.æµÐV¶qp_ý=‹sÎ@ÅkNŸ2rnÉT©ÊÔb®Ùé_ðèŸôÓÿð%?ÆøHtOú iÿøŸã\ú|*ðj( ¤³Ÿï5̹ýÿ ·ÁŸôÿÉ©¿øºv£ÝýËüɽ~Ëïäsþ§§èŸôzÊúÚkkõû5ç“*¶:.æÁ邇þ^¯\‚|/ðj:ºèÃ*r3s)‘zëè«8I.^F‹“•µ×@¢Š+pªÖ‘k¯i:mên†uÚOu=˜{ƒÍ_¢šm;¡4š³<ËÁ>"o ]ÝxCÄ·Q[µ—6—3¸Dxû Äþ#ñ«µÿ„·Ãô0i_øøÓuÏè^$–)um=n$ˆmGÞÈ@ôÊ‘‘õ¬ŸøU¾ ÿ 7þMMÿÅÖò•)¾i]?#ž1­Ë5çscþß ÿÐÁ¥àlãGü%¾ÿ¡ƒJÿÀØÿƱÿáVø3þ€ßù57ÿGü*ßÐÿ&¦ÿâéZwø˜ï_²ûßùð–øoþ† +ÿcÿ?á-ðßý WþÇþ5ÿ ·ÁŸôÿÉ©¿øº?áVø3þ€ßù57ÿE¨÷‡ù…ëö_{ÿ#cþß ÿÐÁ¥àlãGü%¾ÿ¡ƒJÿÀØÿƱÿáVø3þ€ßù57ÿGü*ßÐÿ&¦ÿâèµïðÿ0½~ËïälÂ[á¿ú4¯ü ühÿ„·Ãô0i_øøÖ?ü*ßÐÿ&¦ÿâèÿ…[àÏúäÔßü]£Ýþæ¯Ù}ïüøK|7ÿC•ÿ±ÿð–øoþ† +ÿcÿÇÿ…[àÏúäÔßü]ð«|ÿ@oüš›ÿ‹¢Ô{¿ÃüÂõû/½ÿ‘²¾+ðã0U×ô¢O ÈùýkJÞæ ¸D¶ÓG4G£ÆÁüErmð³Á¥HA÷Sqÿ×7®x*ûÁ 'ˆ<y:$#uÅ”‡zº¿PcÏR ¥¤[¿˜:•`¯(¦¼U¢²ü;®[øBµÕ-U™~d=Q‡ ¿­JŦ™ºjJè*›Ë[(üË«˜`8Ý+…úšÄñ§‰£ð§‡f¿*á•oèÒqŸa‚OÒ¹áÄÞ UÖ¼gysuyp¡–Ø1Až@8è}†÷­!M8óMÙN«Rä‚»üŽ÷þþƒþ§ø×Ÿx¯GÓcÔ?á$ðž¹§Újñ’òB—Q„¸õã8ÉîëÍtCágƒ1ÿ üš›ÿ‹¥ÿ…[àÏúäÔßü]i Óƒºoî_ægRj+4¾÷þAàYø²³ÊÛUŒ~öÜžY=G·Qú×a\æ“à? 蚌z†¦n£$žtŒ‚ Ät&º:Ê£ƒ•á±µ%QFÕ7 (¢³4(êú6Ÿ®Ø=–¥l“ÀÝ›ªŸPzƒî+Î$±ñ'Ã}8É«øo%žÝÏï Ï> cÔµê´Vªã£Õv2©IMó-s›Ó<{áRÅ.—X´·ÝÖ+©–'SèA?¨â®Â[á¿ú4¯ü ük6ïá¿„on¤¹›FÍîo.Y#ÿuXø ‡þoƒ?è ÿ“Sñu¸óüÿhòüMøK|7ÿC•ÿ±ÿð–øoþ† +ÿcÿÇÿ…[àÏúäÔßü]ð«|ÿ@oüš›ÿ‹¢Ô{¿ÃüÂõû/½ÿ‘±ÿ o†ÿè`Ò¿ð6?ñ£þß ÿÐÁ¥àlãXÿð«|ÿ@oüš›ÿ‹£þoƒ?è ÿ“SñtZwø˜^¿e÷¿ò6?á-ðßý WþÇþ4Â[á¿ú4¯ü ükþoƒ?è ÿ“Sñt­ðgý¿òjoþ.‹Qîÿó ×ì¾÷þFÇü%¾ÿ¡ƒJÿÀØÿÆøK|7ÿC•ÿ±ÿcÿ­ðgý¿òjoþ.øU¾ ÿ 7þMMÿÅÑj=ßáþazý—ÞÿÈØÿ„·Ãô0i_øøÑÿ o†ÿè`Ò¿ð6?ñ¬øU¾ ÿ 7þMMÿÅÑÿ ·ÁŸôÿÉ©¿øº-G»ü?Ì/_²ûßùqxŸÃó>ȵÍ2Fô[¸Éþu¨e ¤FA댗áOƒ¤B«¥¼GûÉs.V"¹™#ÔþjÖ¬/&¼ðµÔ»$h þ½øÆpxÎ(Tá= õóV¤5¨´îZ¢šŽ²"º0eaGB*+ÛÈ4ûï._dFÒHÞŠMac¢ýI^D‰äuDQ’Ìp¬ÿøHtOú iÿøŸã^k¦iZ§Å;™5]bîk]%akgñßÓØ·<ä WTŸ ¼Ši,çûÍs.F­Ý:pÒo_#U©=`´ó:øHtOú iÿøŸã^wâÝOOÓ¾#øwÄWÖÓ$‡ì·^LªÛW8ÉÁôsÿ|×Mÿ ·ÁŸôÿÉ©¿øºUø]àÕ`ÃF<ÜÊG循¥}~åþdÔ…i«Y}ïü޾Š(®c¨(¢ŠŽâÞ+»imçA$2¡GFèÊF¯/ðv¤žñ&§á=bí!²Ýö‹+‰Ü"àöÉã‘ÅO­z¥bëžмI,2êÖ q$*UÌt ß)ükZsI8ËfcVœ›R†èwü%¾ÿ¡ƒJÿÀØÿÆøK|7ÿC•ÿ±ÿcÿ­ðgý¿òjoþ.øU¾ ÿ 7þMMÿÅÕZwø™7¯Ù}ïüøK|7ÿC•ÿ±ÿð–øoþ† +ÿcÿÇÿ…[àÏúäÔßü]ð«|ÿ@oüš›ÿ‹¢Ô{¿ÃüÂõû/½ÿ‘±ÿ o†ÿè`Ò¿ð6?ñ£þß ÿÐÁ¥àlãXÿð«|ÿ@oüš›ÿ‹£þoƒ?è ÿ“SñtZwø˜^¿e÷¿ò6?á-ðßý WþÇþ4Â[á¿ú4¯ü ükþoƒ?è ÿ“Sñt­ðgý¿òjoþ.‹Qîÿó ×ì¾÷þFÇü%¾ÿ¡ƒJÿÀØÿÆøK|7ÿC•ÿ±ÿcÿ­ðgý¿òjoþ.øU¾ ÿ 7þMMÿÅÑj=ßáþazý—ÞÿÈØÿ„·Ãô0i_øøÒ¯Šü8Ìuý(“Àò>ZÆÿ…[àÏúäÔßü]#|,ðiRA=ÅÔÜãôZwø˜^¿e÷¿ò:Ë{˜.áÛMÑñ-yV¹à«ï,ž ð}äè×Rêè:ý@=H5è>×-üG¡Zê–ÀªÌ¿2¨Ã†_ÀÔΚKš.èªu[—$•™©EVFÁEåÚÞ¡«xóÅw^ÑoÏI³o®Tr휞ã9dgô­)Óç}’3©S‘mvö= ]sH†FŽ]VÆ9á•î}ÆiŸðèŸôÓÿð%?ƹ«_„¾·…R[®X –áÁ>ÿ)§ÿ…[àÏúäÔßü]U¨÷wü/_²ûßù¿n¬´­wOñއ¨YÍuˆîàŠáI•:gú|§þé^¦êÚ®›ohûíî:cØûކ¹ŸøU¾ ÿ 7þMMÿÅ×K¦i–z> …„ XA $ã''“Éäšug E%{¯ÈT¡R3nV³íÜ·EV@TWV°^ÚËks˪RDaÀõ-å:mÌÿ üNt‹ùü7~å­nþX7¡??õ®ûþß ÿÐÁ¥àlãVu}N׬ –§j·åƒmbFî ƒô®{þoƒ?è ÿ“SñuÐçNzÎ÷ò9”*ÓÒ·ŸCcþß ÿÐÁ¥àlãGü%¾ÿ¡ƒJÿÀØÿƱÿáVø3þ€ßù57ÿGü*ßÐÿ&¦ÿâéZwø˜ï_²ûßùð–øoþ† +ÿcÿ?á-ðßý WþÇþ5ÿ ·ÁŸôÿÉ©¿øº?áVø3þ€ßù57ÿE¨÷‡ù…ëö_{ÿ#cþß ÿÐÁ¥àlãGü%¾ÿ¡ƒJÿÀØÿƱÿáVø3þ€ßù57ÿGü*ßÐÿ&¦ÿâèµïðÿ0½~ËïälÂ[á¿ú4¯ü ühÿ„·Ãô0i_øøÖ?ü*ßÐÿ&¦ÿâèÿ…[àÏúäÔßü]£Ýþæ¯Ù}ïüøK|7ÿC•ÿ±ÿð–øoþ† +ÿcÿÇÿ…[àÏúäÔßü]ð«|ÿ@oüš›ÿ‹¢Ô{¿ÃüÂõû/½ÿ‘±ÿ o†ÿè`Ò¿ð6?ñ«Öz•†  ²½¶¹d˜eWÇäk™ÿ…[àÏúäÔßü]ej¿ ì#Qwá›™ô½Fº&³)>„“‘õÏàhå¢ôM¯sWZ¸§èÿà‹EqÞñUνesaª¯—¬éïåÜ©]»¹ 6=x ãÓÞ»Êpp—+5„Ô⤂Š(©,(¢Š(¢Š(¢Š(¢Š(¢Šä¾'HÑ|:Õ™ X×ð2 ?¡­/Dø+CTc `z”þ¤ÖWÅ/ù'·ý±ÿÑÉ[ÿ‘7Cÿ°}¿þ‹ZÙÿzþ‡:þ;ôýY±EV'@QEQEQEQEQEQEQEQEQER2«£#€ÊÄRÑ@uðc#©’Bj2*ç°Ùz-y×ÁŸùïÿì%'þ‹Ž½¶ÄFoàÄóo‰ MâoZ8Ì3j:ž‡çˆ&5é5æÿ?ärð'ý„þŒ†½"OáÃçù…/âÏåùQX…Q@Q@Q@Q@Q@Q@Q@Q@Q@qÿcY>ꛀ%|¦Sè|Ů¹/‰ßòNµ÷cÿщZQþ$}Q•áKÑšÞbþÑŽXØ@IõýÚÖOÄé/‡Z³!Á+þTô5«á/ùt?ûÛÿ赬Š_òN5oûcÿ£’ªÆ^¿©3þôý _Dø+CTc `z”þ¤ÖÝcøKþDÝþÁöÿú-kb³ŸÄÍ)ü(¢Š’Š( Š( Š( Š( Š( Š( Š( Š( Š( eWFG•†=¯;ø1‘á ÔÉ!5sØl޽¼ëàÏüŠwÿö“ÿEÇ[CøRù~¦þ4>¡è´QEbnæßÀ—Ã:Û Í. ûÛ¹Â!þl:ôšó‚ò&ÞØAÿô\u¼?…/—êa?ãCçú‘EVáEPEPEPEPEPEPEPEPEPEPœhª"øëâO•_OW tÏîyýOç^^u¤ÿÉw×ì¿ûF½¶¯ºôF}¥êÿ0¢Š+p¢Š(¢Š(¢Š(¢Š(¢Š(‘ø »¾êÀúD)Rµ¼%ÿ"n‡ÿ`ûýµ•ñ;þIÖ¯þìú1+WÂ_ò&èö·ÿÑk[?à¯_Ñ/ã¿OÕ›QEbnÂüR¿¼ÓôMK+¹ížMfÖ'hd(Y © òq]ÕyïÅïùhŸö³ÿÐ.«Õ~h:?GùŠ|_¦xJÚÝïVæâæêO*ÖÊÎ/6{†ã!¾3ÜÌ€xx¼t|CñWÂövÚúhX®…î™}@ùØ eÓ8=È<þ£­K—ÇMãQtŽÚ}*[{'“îý£ÌìÅHùÅEâ › ~;xJeºûB©¢”Êôþ#zpÞ/½ÿÂ{5éúøÄ‹iáÏjšþ©'Ùlµ«´3\ÈÏåÆUÎN9ÀQë€*݇Å}îòÚ+½/\Òí®Ø%­þ£bb·›î…|Ÿ¼9ÀÅy~¥Ãü6Ö¤Šso>3‘îgòüÁ{ñ¸©á€b§­tþ,Ð5©¼$Òø‡âÄh—f0¬º-çC&Ï-·1à·°ÏJ#²~‹ð_çÿ jׯæÿ¯ò=’¼SWñ†»á‹šö¡$×^³6_[™…²ËĨã®ï|f¶VKX‘œ»*]† u#µyþ‡em¨üOø‰ey Ím<61ËL,£iiÑ?ÐjÎ.þ_š,|MÕn-¼=¢]i—òÄ·½¢ù¶Ò•ó#bxÊžTÀÖ·‰üu¥ø^îÚÂ[{ýCS¹á°Ó­ÌÓ² åöä~ÇÁÇŽkóÞøTéÿõ6–hmõ›[­"í†|Ëbç(OªÏ Æ}H–+O޾"ŠýÑ.o4ûfÓüίƒæÿH™íDuÛfßä´þ¼Äô½÷I~lèü/ãm+ÅfæTº´¿´8¹°¾‡Éžô,¼ð}‰ê3Šçîþ.hDÞ-½–µ6Ÿ 4/¬ÁdÍfŒs 9à3CÓšÇñ >§ñ_W‡B}÷ø^x.ZÒRO–¬Gñr:ÖÇ‚5M > YË%źX[Øo2@øÃ«ïzu;‡­Làååúµú~%GI¥çú'úþ> ñ¶ðsI×¼I©M!`êe™šYf1‚¨ÎK7ìkè¿4ÍWV‡K»Òõ­îç?eM^ÌÀ.êä‚GuÍyit¾øcs¬ºE¬ww Ú„°¬©o)fòÙ•¾^Ì2zg5ÙkþÖbÔ4ñ7ÄÕŸv¥–VË¢F²K*ž6˜Îà0pOA‘šÕë7~öþ¼ÌÖ^—üÿÈê5Ÿ‰ºf•ªÝéÖÚF»¬Kevú]—œ–ìFv»8çŒþ†ºmY°ñ‘oªé—{;…Ý€œAäAö¯4þÂÔ®5ÿß|8ñ”V·ôiiW¶¹ˆ\󜲖PÃ<ªœ~n:O…úÙÖ¼3s¿LÓì%³¾šÚUÓ“m´¬¤ñûÝyÉÔGUçkÿ^ZèT´×õ}ÚŠ( Š( 8ø,Û¼zÇ©Ôd?ù:ôzó‚ò&ÞØAÿô\ué¶'ø²0Âÿ'œ|G_ø«ü ÞšŽ?ò$?á^^uñþFÏØHèÈ«Ñh©ü8|ÿ0§üIü¿ ¢Š+p¯?ñ«­ø‡ÆàÝüépÛÚ‹OQDÝ*«ð±EžˆçwQÔtÁô óe¼¶ð§Æ}J]Rt¶´ñ œ&ÖâfÚ†h~CO ƒÏ¨Å ^I??Ëúü‡´[þ·B_ø'Äþ„jžñV¹ªßDFý?Y¼Yâ¹LŒ¨,#µŸlŒÒxéu cÅ^ÒF§«è±ê"èÝ.ŸyåH Ä®²åNÇ\W[âïiÞÐgÔ¯¤C Sö{bû^âNȽO'ààrkƒñ¥¥Ï‰üQðêǾÑ.oîI>Ãs¶{vòUЉ6õìxõ¡jÒßþÿ_ðáµß“&ñ'ƒ5 xvÿ_Ò¼wâv¼ÓáiÑ5+áq…«!PF@ô85rçÄú‡Š>\,ÓÛEª[ÜMsl’2£“n¬.ppOéV?áRi·"êÞ#ñF³h­¹¬µLÉ „tÜ¡A8<õíQxúx4_øWº1Ûé–×7ÒÌÇjDd‹jdôƒíÅ8½R}ÿGù» ¿£;Ë$Õæ†GŽXì¦dt8e! ÐÖoÃË«‹ß‡š ÍÜòÏq-š4’ÊåÝÎ:’y&¤ñÞ©g¥xY¹¼"¬å 0Ý…Qž¤žÔ߇֓Øü=Ð-®c1Ì–1CÕNÜàÒÚù~ þÏÏô9߈pÞj^4ðv‹³ªé–×Ív'}6èÂí²5eäpyÁêkÅÚ.¥ðïJ‹_Ò|k¯Þ]Çska«Þ‹ˆîÃ05R£æÇ9€1ÔhüIÑ4ïxûÀÚV«oö‹)Ú÷Ì‹{&ìD¬9Räõ—ã‡Þð7†'ñ&‰öF¯`Ë-”ÿi‘÷Ê µØ†ÏÞ[Ä…ˆäœ\ÞŸ¢jöÚ%Ö¡ðóÅú}߆¾Ñ4’éZŶ"p|ÈÙŠï?ÂBprIÎLÞÉ7Ù?¾ßv»~¶üÿkâWŽfÓ-¼2úDZ¬ö÷·ö× saäžçÉ K¿^ã ñ‘^§^hé¶×Ÿf¸¶óãY<‹˜öKFv²ö#¸¯#Ôµ¨õ߇Ÿu%ÓáÓ’M~Ð}š mgS°vSŒc^ËWk&¼ß䉽ìü¿VQEH¹Š ·áάG¤Có•+¯®?â—ü“[þØÿèä­(ÿ>¨Ê¿ð¥èÍ È›¡ÿØ>ßÿE­düP]ßu`}"?”©ZÞÿ‘7Cÿ°}¿þ‹ZÊøÿ$ëWÿv?ý•Pþ2õýIŸð§èjøKþDÝþÁöÿú-kb±ü%ÿ"n‡ÿ`ûýµ±YÏâf”þQEIf~»a6© ßXÛ]Mi<ð²E<xß+ŽqX? uËwÁ’_³6¥hÍgy¼å¼ØÎӻ܌ƺêòCV‹á·üK< -ޝ`u;e# n£]»eXýi^Íú~ZþWükÚßÕôüíøšÚv­q«üK×uÔ$ƒÃþ¶¥|â°¼än‘ØgiØ89éÅ"|`ÐY„í¦kÉ£—Ú5—ÓØYžp~s‚xû½kTðÞ¡¤~ÏZ¶×:­Ìúø¼Îî\ý#𨑮Ý|?7óüWµ{-®°à("ÆÒœ6sü8玴õÓéo¿[þ; ZVk¯ü ߉<¤xbêÆÚê;»™oáymEœbS1] "€rY· cR)4_ZkͶ“.“«éw—6­uzºÄYʀĆã8#¦ qPYCoão…Öév×ÑC¦ÜùW2BÑD»[cr¼v5Ð|PŠM6 ÅöÊ|Ýõdœ¨å­œ„”~DÂ’µûµøµÿ+·{vOð½¿Èéåñ%¤~,·ðÚÅq%ì¶­vΊ qFsœŒž¬?‹7·zwÃ-bêÆê{[”lš :æT 9*ŸÃ¢Úî«â;‹Pºû-‹Ó´$¨#ýæÜjOŒÃ? õ°}!ÿÑÉRÖ‘ù~/ü‹§¬ü¯ý~%χe¦MzŸ<_nðÄe]j»¡Œœ¸*2£¸Èâ­h^?’Óá6â¯Ã4’¸Xåû2.é “ËW•<1é×Ú ›à·Ã¤°’Ytan¢"Í1¾˜yc·/Ž:óÅqÚ–«y«~ÏS]Mö˜ìïÒn¶àËs¢«œ{qøU'­¼×çgù™­bŸ¯åtzÿ‰¼Meá]j—ÑO$XâÛ‚ÙvÀê@Ç>µ™âˆv« *=7WÕõšKm*ÐÎð¡à3òëüÅs¿uKàK~× ’ööÙ­”0&U •ÇP9éÈõÝsD—Qñî­sà¿dx-o¬¦¶ó"œ`ymóŒ0 (|dtÉÊ×ñ‚_©IþKñmø¢?‚î|S W³YÚ¸Žê„ à|€ÊèÄ`®ážO3Zúçˆì4 ϯ]’Î(Ö@!šMÄ ’HÇ5Æx&ê/‹|3¯éšJ^@âë­2í…÷mÜI\ dd“Àñ"ÊÞÞOYEðë¶Ñ¢NT€2y=MñJ5–ÿÁêOˆ`È?F¦µ~®ß—ù“²ôWüÿÈÔ_ˆÚt^—YÔ´­kJgðÛ_Ù츹ŒªÅ$¶zÀÎ} C¤|NÓ5-z×F¼ÑõÝîì7Ù†­eä,äuU;?\ÝH¿u¹tûÏéÚ~™¦]ë÷elfÔ×÷6ΠùvâÒ=ú ÅSxŽ?ø&?kÚL·Í©Æë¥é°a-× y›Üï;º`ŒpqÒ”u’óv iéÌèü-ã«ÝK⮿¦Ï¦kâÕüˆ­ãš×ÙíG,Òsò<©ä°Ç ­¿ø›C±ðþ½ªÏªêa²Ô§K‰µY˜Ü G·'fH ½yéUü,Gü-ì°ãþÙó{©àMFâásaoã––÷+‘ä‡ÁÈôÉ.‹É~.+õ*[¿ëì¶zf™ñ[G¿Ô,ínt­sKŠù‚Z]ê6^T ~ê«äòAÈí]Ýs>+×ü=§iVê¶ÑêQ^\Ä–VéLÓHÄldV88ÎsÛò®šŸBBŠ(¤0¢Š(¢Š(Î4vÏÇŸM9éz=y¾ÿ%ëÄ?öOý ôŠÚ¾ñô_‘†iz¿Ì(¢ŠÄÜ(¢Š(¢Š(¢Š(¢Š(¢Šä¾'É:ÕÿÝÿF%jøKþDÝþÁöÿú-k+âh'áÞ¯ü1ÿèÅ­O°ohd°@?(Ö¶Á^¿¡‚þ;ôýY³EV&áYúƇ§kÖðA©ÛùñÁ:\F»Ùvȇ*~R3N•¡EfkÞÒ|M¦¶Ÿ¬ØÅyjHmTŽêÃO¸#©¬½#áç…4«+/GŽÚ{/3È‘drFñ†ÜK|Ù|ÙÇlWOE Më£2¬<7£é–W¶vÖ1‹k餞æ) ‘ewûä†'ƒéÓÚ±4Ï…ž ÑõdÔìtî6ÞŒòI"£g9Uf*=vÅvP´w ¬û]O²Õïõ[{}—º€Œ\˽™°ax'ö´(  oÃ7ˆÞÉõk¹{)„öîY•£qÜ ã‘ÐàdqQxÁþñtC®é‘^,G1±,Žž :Àã88¹E ™ð¶‰áK³Ðôè¬ác¹öå™Ï?y˜–ldã'ŠË¸øiàÛ­xëshÍ~ͽŸ,·÷аžùÆsÏ^k«¢‡«»¦ˆÆƒÂz¿†WɦÄÚB©Qk)2. ÝÕ‰9ÉÈ9ÈíYÞømá êoÒ4H » 3Èò²v;K±ÚpHÈÇÕQEõ¸t±Êkÿ |â}D꾉÷eB´©,‘ÇMÛn=²rq]™¦XèÚt:~›k­¤ ¶8¢\âO$õ$äÕº(Z+ zêŠ( Š( 7ø'ÿ"mçý„ÿEÇ^‘^oðOþDË¿û?þ‹Ž½"·ÄÿF_àÄó¯ˆÿò6xþÂCÿFE^‹^sñ$…ñO‚¸UÔ'Ó犽•OáÃçù…?âOåùQX›…gë:—â9ì5{/-[Ÿ.UÎÊžªØ'`ŒÖ…Zá{†‹ð·Á^ÔRÿNÐaK¤û’K$“l9ÄÈê9÷­ûÍO¿ÕtýNæß}æžd6²oaåï]­À89 Ö…îUu6ËV±–ÇQ´†êÖQ‡†d ­øέQJ× Ž+MøGàM&þ+ÛOCçÄr†i¥™AõÚìW>œq]­Qp±âox{Æ"ØkÚÚŶãï¤nìgî0ÏÝk'LøIà]#P†þÏÃñ ˆNèÚY¥”)õÚìFGcŽ:×kE Më£2|CáÅzx±Öì#¼· C¬¤wVRü=*–“à? hSØÍ¦i1ÛKb$2Hù`Ëdüä€[$1]-6®æv‘¡éºW1é¶ÞB\Ü=ÔÃ{6éï7ÌN3è8®{SøSàcU“S½Ð!{¹[|Œ’ɳz•V IïÇ=ë²¢€25 hšî„º&¡¦Ã.š»v[¦cTÛ÷ví ®:qŽ8éW4½2ÏEÒíôÝ># ¥²⌻6ղēøš·EQEÇüRÿ’q«Ûý•ØWñKþIÆ­ÿlôrV”‰Te_øRôfÇ„¿äMÐÿìoÿ¢Ö²¾'É:ÕÿÝÿF%jøOèö·ÿÑkY_A?õ|áÿF-T?Œ½Rgüéú¾ÿ‘7Cÿ°}¿þ‹Zجo°ohd°@?(Ö¶k9üLÒŸÀ‚Š(©,+^ð®‰âsfu›ºû¾t—cÀHÈàpr+bŠBzƒ\gü*_jÿiÂ7köûöî+?õËvÌ{mÅv”Q³¸t±s¡i·zÍŽ¯=¶ëët¶—{,8à ƒ‘ê s_u+öÒ&ðæ› j…Þ¯nÖñÎnµƒwÊL¯Ÿ—’8í]µš¾a§gu¹›áíxzÃH¶Ei Äî@äþ''ñ§ëZ.Ÿâ"}+U·ûE”øE½“vaÊG w«ôUIó;±GݵŽ|øx¬ððÈ9æòsÿ³×fÚNœÚIÒ·öyÊû(ˆöwoLUÊ)=Uƒ­Î2ÏáGì-n-í´Ñ'(d&yYÎÆ  1mÀn9Åhx“À^ñtÑM®i1]MÚ’‡xß…‚G±é“]=w¡m¦é>ðÌÉ¢è³}–Ü~Ëa™4§¾2ríõ9ÀÇjÂð“=ι¯øÊÿK—N¸ÕåT··6M€ ºÿ 1#ØWyEÖám,QEQEæÿÿäM¼ÿ°ƒÿè¸ëÒ+Íþ ÿÈ™wÿaÿÑqÖðþ¾_©„ÿŸèzEQX…y×ÁŸùïÿì%'þ‹Ž½¼çàÉð‹j)üCR‘éòGþ´?…/—êa?ãCçúEV&áEPEPGˆü/£x³NM?[³ûUªJ%Tó]0àTƒÐšå¿áI|<ÿ¡{ÿ'n?øåzX.`ÇàÏŧi6 a‹]&aqdžtŸºdƒÙn§†È«º¦‡¦ë2ØÉ¨[yÏcp·VÇ{.ÉFàŒõèr+FŠw–ÆWˆ<5£ø§NþÏÖì#¼¶Ü+¥XwVR Ÿ¡é‘Y6 <§ÚÛÛÚhpÄ–÷Iy,æ S;X¾íÍŒœHç¥utRZl7©ƒuà¿^ø¢ßÄ“éÊuˆܬ®§€@Ê‚¸8ä*ņtm:ÆúÊÞÂ?²ßM$÷PÊL«+¿ß$9<NžÕ­E,nrš'ÃOxwTþÒÒô8a¼Û#;ɳ=Ô;§ÜcŽ+«¢Š<ƒÌ(¢Š(¢Š(¢Šó}þK׈ìŸú éæú7ü—ŸØ=?ô+Ò+jûÇÑ~G>iz¿Ì(¢ŠÄè (¢€ (¢€ (¢€ (¢€ (¢€(ëd:Îy¦ÎHŽæ&Œ°ê¹ô8?…yׄ‹á™?´uk±å£Züë=NáÁ8ÏNMtž ðèð¿†m´æ`Óó$ì:®=‡ð«:7†4OþÊÓ¡·fgg#ÓsqíšÖ¥9Ç—’åÍÏ7¯äQEdlåæo†2½{ˆe“Ú£ù‚HÓ>K’N? ‘Žãœb½^¢¹¶‚òÚK{¨cš 92°÷´§>[¦®™•ZnviÙ­Œ˜¼eᙢY_Ó°ÈtˆH#ñ§ÿÂ[á¿ú4¯ü ük&O† ’FvÑ”9;n%QùÀ¦ÿ­ðgý¿òjoþ.®Ô{¿À‹×ì¿ò6?á-ðßý WþÇþ4Â[á¿ú4¯ü ükþoƒ?è ÿ“Sñt­ðgý¿òjoþ.‹Qîÿó ×ì¾÷þFÇü%¾ÿ¡ƒJÿÀØÿÆøK|7ÿC•ÿ±ÿcÿ­ðgý¿òjoþ.øU¾ ÿ 7þMMÿÅÑj=ßáþazý—ÞÿÈØÿ„·Ãô0i_øøÑÿ o†ÿè`Ò¿ð6?ñ¬øU¾ ÿ 7þMMÿÅÑÿ ·ÁŸôÿÉ©¿øº-G»ü?Ì/_²ûßù#Å~f ºþ”I8^GÏëRÿÂC¢ÐcOÿÀ”ÿÁÿ…[àÏúäÔßü]ð«|ÿ@oüš›ÿ‹¥j=ßÜ¿Ìw¯Ù}ïüïøHtOú iÿøŸãGü$:'ý4ÿü Oñ¬øU¾ ÿ 7þMMÿÅÑÿ ·ÁŸôÿÉ©¿øº-G»û—ù…ëö_{ÿ#{þþƒþ§øÑÿ ‰ÿA?ÿSükþoƒ?è ÿ“Sñt­ðgý¿òjoþ.‹Qîþåþazý—ÞÿÈÞÿ„‡Dÿ ÆŸÿ)þ4ÂC¢ÐcOÿÀ”ÿÁÿ…[àÏúäÔßü]ð«|ÿ@oüš›ÿ‹¢Ô{¿¹˜^¿e÷¿ò7¿á!Ñ?è1§ÿàJðèŸôÓÿð%?ưáVø3þ€ßù57ÿGü*ßÐÿ&¦ÿâèµïî_æ¯Ù}ïüïøHtOú iÿøŸãGü$:'ý4ÿü Oñ¬øU¾ ÿ 7þMMÿÅÑÿ ·ÁŸôÿÉ©¿øº-G»û—ù…ëö_{ÿ#{þþƒþ§øÖ6½ñÃڛȷð^ÜtK{YŒÍèHáG×õ¨¿áVø3þ€ßù57ÿWt¿øcFº[«-"NUäf©õÉÁ÷/b»¿¸·zh¾óáÖ‡¨,º‡‰õ¤)¨ê‘. qç c¶xãÑEw´QQ9¹ÊìÒœ#Ê‚Š(¨,(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢ŠÿÙcollectl-4.3.1/docs/Tutorial-Lustre.html0000664000175000017500000002723413366602004016373 0ustar mjsmjs Tutorial - Lustre

Tutorial - Lustre

Introduction

This tutorial is intended to help you get started monitoring a system on which the lustre filesystem has been installed. This is not intended to be a tutorial on how to use such a filesystem. In terms of the basics, there's really not much to say other then tell collectl to monitor lustre by specifying an l with -s along with any other subsystems you may want to monitor.

As you should be aware of by now, collectl will try to display everything you've selected in brief format if it can, writing all your data out on a single line for each sampling interval. This can get quite wide depending on how many subsystems you choose to monitor and while you can cerainly specify collectl -s+l to add lustre to the default subsystems, the output width is too cumbersome for this tutorial. Therfore I'll use some other subsystems and switches in the examples to mix things up, showing that there are a lot of possible combinations. In this first example, we see what happens when you run collectl on a lustre client and request cpu and memory data along with lustre.

$ collectl -scml
#<--------CPU--------><-----------Memory----------><-------Lustre Client------>
#cpu sys inter  ctxsw free buff cach inac slab  map  Reads KBRead Writes KBWrite
   0   0   101     26   3G   6M  29M  12M  43M  65M      0      0      0       0

Collectl is actually very intelligent about dealing with lustre because if you were to run the identical command on an OST, it would recognize that too and change what it shows accordingly as you can see below.

$ collectl -scdl
#<--------CPU--------><-----------Disks-----------><--------Lustre OST------->
#cpu sys inter  ctxsw KBRead  Reads  KBWrit Writes KBRead  Reads KBWrit Writes
   0   0   100     28      0      0       0      0      0      0      0      0
In fact, if you're also running a client on an OST it will show both! I've also included time stamps to make the output a little more intersting.
$ collectl -scl -oT
#         <--------CPU--------><--------Lustre OST-------><-------Lustre Client------>
#Time     cpu sys inter  ctxsw KBRead  Reads KBWrit Writes  Reads KBRead Writes KBWrite
14:35:32    0   0   103     24      0      0      0      0      0      0      0       0
14:35:33    0   0   123     53      0      0      0      0      0      0      0       0
As expected, you can run collectl on a system that has any combination of MDS, OST and client services and it will show you what you want to see. For more detail on some of the more advanced concepts beyond the scope of this tutorial you can read more about how collectl deals with Lustre here.

And finally, don't forget any of this data can be written to a file for continuous logging, played back later and even converted to a format suitable for plotting. In fact collectl is configured to monitor lustre by default when run as a daemon so all you need to do to begin collecting the basic data shown above is service collectl start.

Beyond the Basics

For many users of collectl, you are now sufficiently equipped to tackle most lustre monitoring tasks, but for those more exotic situations there's so much more you can do.

CLIENTS

Let's start off by looking more closely at client data. As with all other collectl data, one can switch between summary and detail data by simply entering an upper case L instead of a lower case one as I've done below. The actual content of what is displayed will depend on whether you are on a client, OSS (there is currently no detail data for an MDS), but as you can see for clients, this data is broken down by filesystem and just to make the display a little more interesting I decided to show the timestamps in milli-seconds. Naturally if there is only one filesystem, the data with match that displayed in summary mode.

$ collectl -sL -oTm
# LUSTRE CLIENT DETAIL
#            Filsys   Reads ReadKB  Writes WriteKB
15:10:06.009 spfs1       0      0       0       0
15:10:06.009 spfs2       0      0       0       0
Just to take it one step futher, it turns out that lustre actually tracks client I/O by individual OSTs and sometimes that is more interesting, so there is a special option for lustre clients, --lustopts O.
$ collectl -sL --lustopts O -oTm
# LUSTRE CLIENT DETAIL
#            Filsys  Ost      Reads ReadKB  Writes WriteKB
15:19:17.007 spfs1  OST0000      0      0       0       0
15:19:17.007 spfs1  OST0001      0      0       0       0
15:19:17.007 spfs2  OST0000      0      0       0       0
15:19:17.007 spfs2  OST0001      0      0       0       0
So what else can we look at? Lots! Lustre tracks readahead data which you can show in brief format like this by using the R option noting this time we're also requesting date/time stamps be included. Here we see the cache hits/misses added to the brief display for the client.
$ collectl -sl --lustopts R -oD
#                  <-------------Lustre Client-------------->
#Date    Time       Reads KBRead Writes KBWrite   Hits Misses
20080319 15:20:12       0      0      0       0      0      0
20080319 15:20:13       0      0      0       0      0      0
or you can look at it in verbose format like this:
$ collectl -sl --lustopts R --verbose
# LUSTRE CLIENT SUMMARY: READAHEAD
# Reads ReadKB  Writes WriteKB  Pend  Hits Misses NotCon MisWin LckFal  Discrd ZFile ZerWin RA2Eof HitMax
      0      0       0       0     0     0      0      0      0      0      0      0      0      0      0
      0      0       0       0     0     0      0      0      0      0      0      0      0      0      0
Lustre also tracks client metadata, which you can request by specifying --lustopts M and BRW stats which are selected by --lustopts B. However both are so wide they only have a verbose form and the following shows both being displayed at the same time.
$ collectl -sl --lustopts BM
# LUSTRE CLIENT SUMMARY: RPC-BUFFERS (pages) METADATA
#Rds  RdK   1P   2P   4P   8P  16P  32P  64P 128P 256P Wrts WrtK   1P   2P   4P   8P  16P  32P  64P 128P 256P
   0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
# Reads ReadKB  Writes WriteKB  Open Close GAttr SAttr  Seek Fsynk DrtHit DrtMis
      0      0       0       0     0     0     0     0     0     0      0      0
And if that's not enough, these even have a detailed mode of display and you can look at them by filesystem or OST. You can specify any combinations of B,M and R with --lustopts to show the associated data in both summary or detail formats, though only readahead appears in the brief display. All others will force verbose format.

Object Storage Server

If you haven't figured it out yet, it's also possible to display OST level information on an OSS by simply using the uppercase subsystem specification as you can see here, noting just to be different I've chosen a second form of the date/timestamp.

$ collectl -sL -od
# LUSTRE FILESYSTEM SINGLE OST STATISTICS
#              Ost            Read Ops   Read KB      Write Ops   Write KB
03/19 15:32:14 spfs1-OST0000        0         0              0          0
03/19 15:32:14 spfs2-OST0000        0         0              0          0
03/19 15:32:15 spfs1-OST0000        0         0              0          0
03/19 15:32:15 spfs2-OST0000        0         0              0          0
You can also show BRW stats as both summary and detail data as well and since they look identical to the way they're displayed for a client, I won't bother repeating those forms here.

Metadata Server

As mentioned earlier, there is no detail data for an MDS nor are there any other types of data other than that which can be displayed in summary mode.

Summary

As you have seen, there is a wealth of data here and often you may not even know what you want to look for. When such is the case, just collect as much as you can and save it in a file. Then you can play it back later and display it in multiple formats.

I just want to close with an example of a problem with readaheads and Lustre 1.4 which demonstrates the power of a tool like collectl that can show multiple types of data at once because I know a lot of people may still not be convinced. Consider the following sample which was collected while doing random 32KB reads of a large file. See anything wrong? Do you know why the network bandwidth is so much higher than the lustre client read rate? How long would it have taken to even realize there is a problem?

$ collectl -snl -oT
#         <----------Network----------><-------------Lustre Client-------------->
#Time     netKBi pkt-in netKBo pkt-out  Reads KBRead Writes KBWrite   Hits Misses
08:14:18   41776  28310   1065   14786     50    200      0       0     30     20
08:14:19   38328  25987   1032   14078     62    248      0       0     35     19
08:14:20   44763  30337   1167   16114     58    232      0       0     30     20
08:14:21   43666  29596   1137   15632     46    184      0       0     30     16
08:14:22   33777  22905    891   12191     58    232      0       0     35     23
It turned out the old algorithm triggered a readahead after 2 consecutive pages were read and every 32KB read (which is 8 pages) resulted in 1MB being read over the network, loaded in to cache and was then discarded! If the value of max_read_ahead_mb is set to 0 for the associated filesystem on each client involved, you see 2 things - the network rate drops to track the client read rate and the misses goes to zero.
$ collectl -snl -oT
#         <----------Network----------><-------------Lustre Client-------------->
#Time     netKBi pkt-in netKBo pkt-out  Reads KBRead Writes KBWrite   Hits Misses
08:21:22       0      3      0       3     59    236      0       0      0      0
08:21:23     317    335     58     298     91    364      0       0      0      0
08:21:24     442    457     77     388    107    428      0       0      0      0
08:21:25     442    457     77     383     97    388      0       0      0      0
08:21:26     432    446     75     373     89    356      0       0      0      0
Caution - changing the value of the readable variable in /proc to 0 will eliminate all readahead for all readers meaning any applications that do sequential reads could have significant performance problems. In other words, this is not a recommendation to turn off readahead but rather to be aware of what it does and if you do choose to turn it off to be aware of the consequences

Note: This readahead algorithm has changed with V1.6 and lustre no longer triggers readahead on the third page read but rather on the third sequential read.
updated Mar 25, 2010
collectl-4.3.1/docs/Data-verbose.html0000664000175000017500000010402213366602004015617 0ustar mjsmjs Verbose Data

Verbose Data

Data is reported in this form when either --verbose is used OR if there is at least one type of data requested that doesn't have a brief form such as any detail data or ionodes, processes or slabs. Specifying some of the lustre output options with --lustopts such as B, D and M will also force verbose format.

Buddy (Memory Fragmentation) Data, collectl -sb

# MEMORY FRAGMENTATION SUMMARY (4K pages)
#     1Pg    2Pgs    4Pgs    8Pgs   16Pgs   32Pgs   64Pgs  128Pgs  256Pgs  512Pgs 1024Pgs
This table shows the total number of memory fragments by pagesize in increasing powers of 2 for all the memory types.

CPU, collectl -sc

# CPU SUMMARY (INTR, CTXSW & PROC /sec)
# User  Nice   Sys  Wait   IRQ  Soft Steal Guest NiceG  Idle  CPUs  Intr  Ctxsw  Proc  RunQ   Run   Avg1  Avg5 Avg15 RunT BlkT
These are the percentage of time the system in running is one of the modes, noting that these are averaged across all CPUs. While User and Sys modes are self-eplanitory, the others may not be:

User Time spent in User mode, not including time spend in "nice" mode.
Nice Time spent in Nice mode, that is lower priority as adjusted by the nice command and have the "N" status flag set when examined with "ps".
Sys This is time spent in "pure" system time.
Wait Also known as "iowait", this is the time the CPU was idle during an outstanding disk I/O request. This is not considered to be part of the total or system times reported in brief mode.
Irq Time spent processing interrupts and also considered to be part of the summary system time reported in "brief" mode.
Soft Time spent processing soft interrupts and also considered to be part of the summary system time reported in "brief" mode.
Steal Time spent in other operating systems when running in a virtualized environment
Guest Time spent running a virtual CPU for guest operating systems under the control of the Linux kernel, new since 2.6.24
NiceG Time spent running a niced guest (virtual CPU for guest operating systems under the control of the Linux kernel), new since 2.6.33

This next set of fields apply to processes

ProcProcess creations/sec.
RunqNumber of processes in the run queue.
RunNumber of processes in the run state.
Avg1, Avg5, Avg15Load average over the last 1,5 and 15 minutes.
RunTTotal number of process in the run state, not counting collectl itself
BlkTTotal number of process blocked, waiting on I/O

Disks, collectl -sd

If you specify filtering with --dskfilt, the disks that match the pattern(s) will either be included or excluded from the the summary data. However, the data will still be collected so if recorded to a file can later be viewed.

# DISK SUMMARY (/sec)
#KBRead RMerged  Reads SizeKB   KBWrit WMerged Writes SizeKB
KBReadKB read/sec
RMergedRead requests merged per second when being dequeued.
ReadsNumber of reads/sec
SizeKBAverage read size in KB
KBWriteKB written/sec
WMerged Write requests merged per second when being dequeued.
WritesNumber of writes/sec
SizeKBAverage write size in KB

Inodes/Filesystem, collectl -si

# INODE SUMMARY
#    Dentries      File Handles    Inodes
# Number  Unused   Alloc  MaxPct   Number
   40585   39442     576    0.17    38348
DCache
Dentries NumberNumber of entries in directory cache
Dentried UnusedNumber of unused entries in directory cache
Handles AllocNumber of allocated file handles
handles % MaxPercentage of maximum available file handles
Inodes NumberNumber of inodes in use

NOTE - as of this writing I'm baffled by the dentry unused field. No matter how many files and/or directories I create, this number goes up! Sholdn't it go down?

Infiniband, collectl -sx

# INFINIBAND SUMMARY (/sec)
#  KBIn   PktIn  SizeIn   KBOut  PktOut SizeOut  Errors
KBInKB received/sec.
PktInPackets received/sec.
SizeInAverage incoming packet size in KB
KBOutKB transmitted/sec.
PktOutPackets transmitted/sec.
SizeOutAverage outgoing packet size in KB
ErrsCount of current errors. Since these are typically infrequent, it is felt that reporting them as a rate would result in either not seeing them OR round-off hiding their values.

Lustre

Lustre Client, collectl -sl

There are several formats here controlled by the --lustopts switch. There is also detail data for these available as well. Specifying -sL results in data broken out by the file system and --lustopts O further breaks it out by OST. Also note the average read/write sizes are only reported when --lustopts is not specified.

# LUSTRE CLIENT SUMMARY
# KBRead  Reads SizeKB  KBWrite Writes SizeKB
KBReadKB/sec delivered to the client.
ReadsReads/sec delivered to the client, not necessarily from the lustre storage servers.
SizeKBAverage read size in KB
KBWriteKB Writes/sec delievered to the storage servers.
WritesWrites/sec delievered to the storage servers.
SizeKBAverage write size in KB
# LUSTRE CLIENT SUMMARY: METADATA
# KBRead  Reads KBWrite Writes  Open Close GAttr SAttr  Seek Fsynk DrtHit DrtMis
KBReadKB/sec delivered to the client.
ReadsReads/sec delivered to the client, not necessarily from the lustre storage servers.
KBWriteKB Writes/sec delievered to the storage servers.
WritesWrites/sec delievered to the storage servers.
OpenFile opens/sec
CloseFile closes/sec
GAttrgetattrs/sec
Seekseeks/sec
Fsyncfsyncs/sec
DrtHitdirty hits/sec
DrtMisdirty misses/sec
# LUSTRE CLIENT SUMMARY: READAHEAD
# KBRead  Reads KBWrite Writes  Pend  Hits Misses NotCon MisWin FalGrb LckFal  Discrd ZFile ZerWin RA2Eof HitMax  Wrong
KBReadKB/sec delivered to the client.
ReadsReads/sec delivered to the client, not necessarily from the lustre storage servers.
KBWriteKB Writes/sec delievered to the storage servers.
WritesWrites/sec delievered to the storage servers.
PendPending issued pages
Hitsprefetch cache hits
Missesprefetch cache misses
NotConThe current pages read that were not consecutive with the previous ones./td>
MisWinMiss inside window. The pages that were expected to be in the prefetch cache but weren't. They were probably reclaimed due to memory pressure
LckFalFailed grab_cache_pages. Tried to prefetch page but it was locked.
DiscrdRead but discarded. Prefetched pages (but not read by applicatin) have been discarded either becuase of memory pressure or lock revocation.
ZFileZero length file.
ZerWinZero size window.
RA2EofRead ahead to end of file
HitMaxHit maximum readahead issue. The read-ahead window has grown to the maximum specified by max_read_ahead_mb
# LUSTRE CLIENT SUMMARY: RPC-BUFFERS (pages)
#RdK  Rds   1K   2K   ...  WrtK Wrts   1K   2K   ...
This display shows the size of rpc buffer distribution buckets in K-pages. You can find the page size for you system in the header (collectl --showheader).

RdKKBs read/sec
RdsReads/sec
nKNumber of pages of of this size read
WrtKKBs written/sec
WrtsWrites/sec
nKNumber of pages of of this size written

Lustre Meta-Data Server, collectl -sl

As of Lustre 1.6.5, the data reported for the MDS had changed, breaking out the Reint data into 5 individual buckets which are the last 5 fields described below. For earlier versions those 5 fields will be replaced by a single one named Reint.

# LUSTRE MDS SUMMARY
#Getattr GttrLck  StatFS    Sync  Gxattr  Sxattr Connect Disconn Create   Link Setattr Rename Unlink
GetattrNumber of getattr calls, for example lfs osts. Note that this counter is not incremented as the result of ls - see Gxattr
GttrLckThese are getattrs that also return a lock on the file
StatFSNumber of stat calls, for example df or lfs df. Note that lustre caches data for up to a second so many calls within a second may only show up as a single statfs
SyncNumber of sync calls
GxattrExtended attribute get operations, for example getfattr, getfacl or even ls. Note that the MDS must have been mounted with -o acl for this counter to be enabled.
SxattrExtended attribute set operations, for example setfattr or setfacl
ConnectClient mount operations
DisconnClient umount operations
CreateCount of mknod and mkdir operations, also used by NFS servers internally when creating files
LinkHard and symbolic links, for example ln
SetattrAll operations that modify inode attributes including chmod, chown, touch, etc
RenameFile and directory renames, for example mv
UnlinkFile/directory removals, for example rm or rmdir

The following display is very similar the the RPC buffers in that the sizes of different size I/O requests are reported. In this case there are requests sent to the disk driver. Note that this report is only available for HP's SFS.

# LUSTRE DISK BLOCK LEVEL SUMMARY
#Rds  RdK 0.5K   1K   ...  Wrts WrtK 0.5K   1K   ...
RdsReads/sec
RdKKBs read/sec
nKNumber of blocks of of this size read
WrtsWrites/sec
WrtKKBs written/sec
nKNumber of blocks of of this size written

Lustre Object Storage Server, collectl -sl

# LUSTRE OST SUMMARY
# KBRead   Reads  SizeKB KBWrite  Writes  SizeKB
KBReadKB/sec read
ReadsReads/sec
SizeKBAverage read size in KB
KBWriteKB/sec written
WritesWrites/sec
SizeKBAverage write size in KB

Lustre Object Storage Server, collectl -sl --lustopts B

As with client data, when you only get read/write average sizes when --lustopt is not specified.

# LUSTRE OST SUMMARY
#<--------reads-----------|----writes-----------------
#RdK  Rds   1K   2K   ...  WrtK Wrts   1K   2K   ....
RdKKBs read/sec
RdsReads/sec
nKNumber of pages of of this size read
WrtKKBs written/sec
WrtsWrites/sec
nKNumber of pages of of this size written

Lustre Object Storage Server, collectl -sl --lustopts D

# LUSTRE DISK BLOCK LEVEL SUMMARY
#RdK  Rds 0.5K   1K   ...   WrtK Wrts 0.5K   1K   ...
RdKKBs read/sec
RdsReads/sec
nKNumber of blocks of of this size read
WrtKKBs written/sec
WrtsWrites/sec
nKNumber of blocks of of this size written

Memory, collectl -sm

# MEMORY SUMMARY
#<-------------------------------Physical Memory-------------------------------------><-----------Swap------------><-------Paging------>
#   Total    Used    Free    Buff  Cached    Slab  Mapped    Anon  Commit Locked Inact Total  Used  Free   In  Out Fault MajFt   In  Out
Total Total physical memory
Used Used physical memory. This does not include memory used by the kernel itself.
Free Unallocated memory
Buff Memory used for system buffers
Cached Memory used for caching data beween the kernel and disk, noting direct I/O does not use the cache
Slab Memory used for slabs, see collectl -sY
Mapped Memory mapped by processes
Anon Anonymous memory. NOTE - this is included with mapped memory in brief format
Commit According to RedHat: "An estimate of how much RAM you would need to make a 99.99% guarantee that there never is OOM (out of memory) for this workload."
Locked Locked Memory
Inactive Inactive pages. On ealier kernels this number is the sum of the clean, dirty and laundry pages.
Swap Total Total Swap
Swap Used Used Swap
Swap Free Free Swap
Swap In Kb swapped in/sec
Swap Out Kb swapped out/sec
Fault Page faults/sec resolved by not going to disk
MajFt These page faults are resolved by going to disk
Paging In Total number of pages read by block devices
Paging Out Total number of pages written by block devices

Notes
If you include --memopts R, memory and swap values wil be displayed as changes/sec between intervals rather than absolute values in addition to page fault information, which is already displayed as rates. This switch will also honor -on in that the values will not be normalized to a rate but rather displayed as changes in size per interval.

If you include --memopts with P or V, collectl will only display Physical or Virtual memory. The default is PV and will display both.

Memory, collectl -sm --memopts ps

The p and s options allow you to display data about page and/or steal and scan information. If you want this data combined with the standard physical or virtual data you must explicitly request them as well. The columns show how the memory is allocated for the respective sections.

# MEMORY SUMMARY
#<---Other---|-------Page Alloc------|------Page Refill-----><------Page Steal-------|-------Scan KSwap------|------Scan Direct----->
#  Free Activ   Dma Dma32  Norm  Move   Dma Dma32  Norm  Move   Dma Dma32  Norm  Move   Dma Dma32  Norm  Move   Dma Dma32  Norm  Move
    14M  136K     2    69   13M     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0

Network, collectl -sn

The entries for error counts in the following are actually the total of several types of errors. To get individual error counts, you must either include --netopts e or report details on individual interfaces in plot format by specifying -P. Transmission errors are categorized by errors, dropped, fifo, collisions and carrier. Receive errors are broken out for errors, dropped, fifo and framing errors.

If you specify filtering with --netfilt, the names that match the pattern(s) will either be included or excluded from the the summary data. However, the data will still be collected so if recorded to a file can later be viewed.

# NETWORK SUMMARY (/sec)
# KBIn  PktIn SizeIn  MultI   CmpI  ErrsI  KBOut PktOut  SizeO   CmpO  ErrsO
KBIn Incoming KB/sec
PktIn Incoming packets/sec
SizeI Average incoming packet size in bytes
MultI Incoming multicast packets/sec
CmpI Incoming compressed packets/sec
ErrsI Total incoming errors/sec. This is an aggregation of incoming error counters. To see explicit error counters use --netopts e
KBOut Outgoing KB/sec
PktOut Outgoing packets/sec
SizeO Average outgoing packet size in bytes
CmpO Outgoing compressed packets/sec
ErrsO Total outgoing errors/sec. This is an aggregation of outgoing error counters. To see explicit error counters use --netopts e

Network, collectl -sn --netopts e

This alternative format, which is displayed when you specify --netopts e enumerates the individual error types. You cannot see both output formats at the same time.

# NETWORK ERRORS SUMMARY (/sec)
#  ErrIn  DropIn  FifoIn FrameIn    ErrOut DropOut FifoOut CollOut CarrOut
ErrIn Receive errors/sec detected by the device driver
DropIn Receive packets dropped/sec
FifoIn Receive packet FIFO buffer errors/sec
FrameIn Receive packet framing errors/sec
ErrOut Transmit errors/sec detected by the device driver
DropOut Transmit packets dropped/sec
FifoOut Transmit packet FIFO buffer errors/sec
CollOut Transmit collisions/sec detected on the interface
CarrOut Transmit packet carrier loss errors detected/sec

NFS, collectl -sf

As of version 3.2.1, by default collectl collects and reports on all versions of nfs data, both clients and servers. One can limit the types of data reported with --nfsfilt and if only server or client data has been selected, only that type of data will be reported as shown in the 2 forms below. When both server and client data are being reported they will be displayed side by side. As with brief format, if filters have been selected they will be displayed in the header.

# NFS SUMMARY (/sec)
#<---------------------------server--------------------------->
# Reads Writes Meta Comm  UDP   TCP  TCPConn  BadAuth  BadClnt
ReadsTotal reads/sec
WritesTotal writes/sec
MetaTotal nfs meta data calls/sec, where meta data is considered to be any of: lookup, access, getattr, setattr, readdir and readdirplus, noting that not all types of nfs version report all as V3 clients/servers do.
CommTotal commits/sec
UDPNumber of UDP packets/sec
TCPNumber of TCP packets/sec
TCPConnNumber of TCP connections/sec
BadAuthNumber of authentication failures/sec
BadClntNumber of unknown clients/sec
# NFS SUMMARY (/sec)
#<----------------client---------------->
# Reads Writes Meta Comm Retrans  Authref
ReadsTotal reads/sec
WritesTotal writes/sec
MetaTotal nfs meta data calls/sec, where meta data is considered to be any of: lookup, access, getattr, setattr, readdir and readdirplus, noting that not all types of nfs version report all as V3 clients/servers do.
CommTotal commits/sec
RetransNumber of retransmissions/sec
AuthrefNumber of authrefreshes/sec

NFS, collectl -sf -nfsopts C

The data reported for clients is slightly different, specifically the retrans and authref fields.

# NFS CLIENT (/sec)
#<----------RPC---------><---NFS V3--->
#CALLS  RETRANS  AUTHREF    READ  WRITE
CallsNumber of RPC calls/sec
RetransRetransmitted calls
AuthrefAuthentication failed
ReadNumber of reads/sec
WriteNumber of writes/sec

Slabs, collectl -sy

As of the 2.6.22 kernel, there is a new slab allocator, called SLUB, and since there is not a 1:1 mapping between what it reports and the older slab allocator, the format of this listing will depend on which allocator is being used. The following format is for the older allocator.

# SLAB SUMMARY
#<------------Objects------------><--------Slab Allocation-------><--Caches--->
#  InUse   Bytes    Alloc   Bytes   InUse   Bytes   Total   Bytes  InUse  Total
Objects
InUse Total number of objects that are currently in use.
Bytes Total size of all the objects in use.
Alloc Total number of objects that have been allocated but not necessarily in use.
Bytes Total size of all the allocated objects whether in use or not.
Slab Allocation
InUse Number of slabs that have at least one active object in them.
Bytes Total size of all the slabs.
Total Total number of slabs that have been allocated whether in use or not.
Bytes Total size of all the slabs that have been allocted whether in use or not.
Caches
InUse Not all caches are actully in use. This included only those with non-zero counts.
Total This is the count of all caches, whether currently in use or not.

This is format for the new slub allocator

# SLAB SUMMARY
#<---Objects---><-Slabs-><-----memory----->
# In Use   Avail  Number      Used    Total
One should note that this report summarizes those slabs being monitored. In general this represents all slabs, but if filering is being used these numbers will only apply to those slabs that have matched the filter.

Objects
InUse The total number of objects that have been allocated to processes.
Avail The total number of objects that are available in the currently allocated slabs. This includes those that have already been allocated toprocesses.
Slabs
Number This is the number of individual slabs that have been allocated and taking physical memory.
Memory
Used Used memory corresponds to those objects that have been allocated to processes.
Total Total physical memory allocated to processes. When there is no filtering in effect, this number will be equal to the Slabs field reported by -sm.

Sockets, collectl -ss

# SOCKET STATISTICS
#      <-------------Tcp------------->   Udp   Raw   <---Frag-->
#Used  Inuse Orphan    Tw  Alloc   Mem  Inuse Inuse  Inuse   Mem
UsedTotal number if socket allocated which can include additional types such as domain.
Tcp
InuseNumber of TCP connections in use
OrphanNumber of TCP orphaned connections
TwNumber of connections in TIME_WAIT
AllocTCP sockets allocated
Mem
Udp
InuseNumber of UCP connections in use
Raw
InuseNumber of RAW connections in use
Frag
Inuse
Mem

TCP, collectl -st

These are the counters one sees when running the command netstat -s, whose output is very verbose. Since this format is an attemt to compress those field names to 6 characters or less, sometime something gets lost in the translation. As described in the brief data formats, the actual TCP data displayed is based on the value of --tcpfilt and like brief data, everything is displayed on a single line which can be quite wide, even more reason to use this switch, espcially since the default format is over 200 columns wide! The following definitions are based the value of that filter:

--tcpfilt i

# TCP SUMMARY (/sec)# TCP STACK SUMMARY (/sec)
#<----------------------------------IpPkts----------------------------------->
# Receiv Delivr Forwrd DiscdI InvAdd   Sent DiscrO ReasRq ReasOK FragOK FragCr
Receiv- total packets received/sec
Delivr- incoming packets delivered/sec
Forwrd- packets forwarded
DiscdI- discarded incoming packets
InvAdd- packets received with invalid addresses
Sent - requests sent out/sec
DiscrO- discarded outbound requests
ReasRq- reassembled requests
ReasOK- reassembled OK
FragOK- fragments received OK
FragCr- fragments created

--tcpfilt t

# TCP SUMMARY (/sec)# TCP STACK SUMMARY (/sec)
#<---------------------------------Tcp--------------------------------->
# ActOpn PasOpn Failed ResetR  Estab   SegIn SegOut SegRtn SegBad SegRes
ActOpn- active connections opened/sec
PasOpn- passive connection opened/sec
Failed- failed connection attempts
ResetR- connection resets received
Estab - connections established
SegIn - segments received/sec
SegOut- segments sent out/sec
SegRtn- segments retransmitted
SegBad- bad segments received
SegRes- resets sent

--tcpfilt u

# TCP SUMMARY (/sec)# TCP STACK SUMMARY (/sec)
#<------------Udp----------->
#  InDgm OutDgm NoPort Errors
InDgm- packets received/sec
OutDgm- packets sent/sec
NoPort- packets received to unknown port
Errors- packet receive errors

--tcpfilt c

# TCP SUMMARY (/sec)# TCP STACK SUMMARY (/sec)
#<----------------------------Icmp--------------------------->
# Recvd FailI UnreI EchoI ReplI  Trans FailO UnreO EchoO ReplO
Recvd- ICMP messages received
FailI- incoming ICMP messages failed
UnreI- input destination unreachable
EchoI- input echo requests
ReplI- input echo reploes
Trans- ICMP messages sent
FailO- outbound ICMP messages failed
UnreO- output destination unreachable
EchoO - output echo requests
ReplO - output echo replies

--tcpfilt T

# TCP SUMMARY (/sec)# TCP STACK SUMMARY (/sec)
#<------------------------------------------TcpExt----------------------------------------->
# FasTim Reject DelAck QikAck PktQue PreQuB HdPdct AkNoPy PreAck DsAcks RUData REClos  SackS
FasTim- TCP sockets finished time wait in fast timer
Reject- packet rejects in established connections because of timestamp
DelAck- delayed ACKs sent
QikAck- times quick ACK mode activated
PktQue- packets directly queued to recvmsg prequeue
PreQuB- bytes directly received in process context from prequeue
HdPdct- packet headers predicted
PurAck- acknowledgements for received packets not containing data
HPAcks- predicted acknowledgements
DsAcks- DSACKS sent for old packets
RUData- connections reset to do unexpected data
REClos- connections reset due to early close
SackS- SackShiftFallback
PkLoss- Packet Loss
FTrans- Fast Retransmission
updated Nov 02, 2016
collectl-4.3.1/docs/Data.html0000664000175000017500000000141013366602004014151 0ustar mjsmjs Data Definitions

Data Definitions

Introduction

Collectl can generate output in either summary or detail format. Furthermore, summary data can be generated in either brief or verbose format. Although verbose is the most complete, the brief format, which is also the default, is the one viewed by most users.
updated Feb 21, 2011
collectl-4.3.1/docs/Exceptions.html0000664000175000017500000000333513366602004015431 0ustar mjsmjs collectl - Exception Reporting

Exception Reporting

By default, collectl always reports all data for all devices. However, in the cases where there are dozens or possibly hundreds of devices such as with large disk farms, it may be desirable to only look at those devices that are actually doing something of interest. These are referred to as exceptions, because their activity has crossed a level of minimal activity. The defaults for these levels can be displayed with the -V switch or changed to different values with the -l switch. To change one or more values simply specify them as a string. There are currently 4 levels one can set:

Note that one can also specify all conditions must be met or simply 1 must be met by adding a selection of AND (the default) or OR, respectively.

For example, to set the minimal SVC level to 50 and require both SVC and IOS limits be reached, simply add the switch -l SVC:50. To change both values and require only 1 be met, separate them with a hyphen and be sure to include OR as one of the parameters such as -l SVC:50-IOS:10-OR, noting that order is not important.

One should not confuse exceptions which are based on threshold values, with filters which are based on the presence of explicit field values.
updated Feb 21, 2011
collectl-4.3.1/docs/RunningAsAService.html0000664000175000017500000001161113366602004016632 0ustar mjsmjs collectl - Running as a Service

Running As A Service

Assuming collectl has been installed from the rpm kit, it has been configured to be run as a service, but disabled from automatically starting at boot. To enable it, simply chkconfig collectl on, noting that by default collectl is configured to collect most data. To see what the specific subsystems are, execute collectl -V and look at the daemon default values for -s. You should then look at the DaemonCommands string /etc/collectl.conf to see if any changes to -s have been explictly set. At the time of this writing, collectl has been further configured to add slab and process data to the base defaults.

Further inspection of this command string will show the daemon has also been configured to write all its data to a set of compressed text files in /var/log/collectl, which was created when the kit was installed. To verify collectl will properly run as a service, you can execute the command /etc/init.d/collectl start (or as a shortcut on a redhat system use the command service collectl start) and examine the log file in /var/log/collectl for the startup (and hopefully no termination) messages as well as the appearance of either a raw or raw.gz data file in that same directory. Note that since the output is buffered, the data file will probably have a length of 0 until the buffer fills or the flush interval passes, which is currently set to 60 seconds, which ever comes first. Or the command /etc/init.d/collectl flush is executed.

In order to write its output as a compressed file, the perl Compress module must be present as it is with newer perl distributions. If not present you should install it, otherwise you will get messages warning you that compression is not installed.

To change any behaviors of the daemon such as the flush interval, output file location, etc., simply change the DaemonCommands line in /etc/collectl.conf, which specifies the actual command string collectl is passed at startup. Use care in setting this string as incorrect settings may cause collectl to abnormally exit and if it does, you should examine the log file for messages.

caution about pipes in DaemonCommands

Since some of the filters can include pipes, one might choose the use the perl form of "abc|def|xyz" when using them interactively, having to use quotation marks to prevent the shell from acting on them. However if you include the quotes in the DaemonCommands line, the filters will not work correctly as collectl will see the quotes as part of the filter itself.

One-time modification of runtime parameters

If you want to change the way collectl runs as a daemon for a specific instance, you can pass the normal collectl switches to the start script as its second parameter (more on the first parameter later). For example, to start collectl with a monitoring interval of 15 seconds, just start it as follows:

/etc/init.d/collectl start '-i 15'

The next time it is started (or restarted) it will use its default values.

Running multiple instances of a collectl daemon

By default, collectl only supports running a single instance of a daemon and it you try to start a second you will get an error message. However, there may be times you really want to run a second instance, most typically if you want to collect a subset of data at a different monitoring interval, and to do this one uses an alternative syntax which prefaces the parameters with a string such as test as in the following example:

/etc/init.d/collectl start test -i15

Also note in this example quotes weren't needed because there were no spaces in the second argument. In this case a process named collectl-test will be created and use the argument -i15. You must be careful when using this format because if you leave off the second argument you'd actually start the main process with the invalid switch of test.

To perform other operations on this second instance, such as stop or flush, simply add the test qualifier to the command. If you want to restart one of these instances be sure to include the appropriate arguments because you must use the 2 argument form of the command. This syntax was also chosen to assure the user does use additional switches because without them you'd essentially be running an identical copy of the default configuration.
updated Sep 5, 2013
collectl-4.3.1/docs/FAQ-collectl.html0000664000175000017500000010343513366602004015520 0ustar mjsmjs collectl Freuqently Asked Questions

Collectl Frequently Asked Questions

General Questions

Running collectl

Gottchas - what you don't know can hurt you

Operational Problems

General Questions

What is the difference between collectl and sar?

At the highest level, both collectl and sar provide lightweight collection of device performance information. However, when used in a diagnostic mode sar falls short on a number of points, though admittedly some could be addressed by wrapping it with scripts that reformat the data:

Isn't a default monitoring frequency of 1 second going to kill my system?

Running collectl interactively at a 1 second interval has been shown to provide minimal load. However, for running collectl for long periods of time it is recommended to use a default monitoring period of 10 second and in fact is the default when collectl is run as a daemon and started using the 'service start collectl command'.
A lot of effort has gone into making collectl very efficient in spite of the fact that it's written in an interpretive language like perl, which by the way is known for its efficiency. collectl has been measured to use less than 0.01% of the cpu on most systems at an interval of 10 seconds. To measure collectl's load on your own system you can use the command "time collectl -i0 -c8640 -s??? -f." to see the load of collecting a day's worth of data for the specific subsystems included with the -s switch.

What is the best monitoring frequency?

There really isn't a 'best' per se. In general collecting counter data every 10 seconds and process/slab data every minute has been observed to produce a maximum amount of data with a minimal load. When this granularity isn't sufficient there have been uses for collecting data as 0.1 second intervals! There have even been times when wanting to verify a short lived process really does start that doing process monitoring by name at an interval of 0.01 seconds has been found to be useful.

Why so many switches?

In general, most people will not need most switches and that's the main reason for 'basic' vs 'extended' help. However, it's also possible that there may be an extended switch that provides some specific piece of functionality not there with the basic ones and it is recommended that once you feel more comfortable with the basic operations that you spend a little time looking at them too.

Why doesn't --top show as much data as the top command?

The simple answer is because this is collectl, not top. Actually I thought of that and then decided with all the different switches and options, the easiest thing to do is just run a second instance of collectl in another window, showing whatever else you want to see in whatever format you like. You can even pick different monitoring intervals.

What does collectl stand for?

Collectl is based on the very popular collect tool written by Rob Urban which was distributed as with DEC's Tru64 Unix Operating System and therefore stands for collect for linux.

How do you pronounce collectl?

It rhymes with pterodactyl.

Why is the default socket port 2655??

Those are the first 4 digits of collectl on a telephone numeric key pad.

Running collectl

How do I get started?

The easiest way to get started is to just type 'collectl'. It will report summary statistics on cpu, disk and network once a second. If you want to change the subsystems being reported on use -s and to change the interval use -i. More verbose information can be displayed with --verbose. See the man pages for more detail.

How do I make a plot?

Collectl supports saving data in plot format - space separated fields - through the use of the -P switch. The resultant output can then be easily plotted using gnuplot, excel or any other packages that understand this format. You can redirect collectl's output to a file OR it's much easier to just use the -f switch to specify a location to write the data.

How do I drill down to get a closer look at what's going on?

The first order of business is to familiarize yourself with the types of data collectl is capable of collecting. This is best done by looking at the data produced by all the different settings for -s, both lower and upper case as there is some detail data that is not visible at the summary level. Take a look at -sd and -sD. If you still don't see something it might actually be written in -P format. See -sT for an example.
Next, run collectl and instruct it to log everything (or at least as much as you think you'll need) to a file. When you believe you've collected enough data - and this could span multiple days - identify times of interest or just plot everything (see the -P switch). Visually inspecting the plotted data can often show times of unusually heavy resource loads. Often times there is a strong time delineation between good and bad.
It you want to see the actual numbers in the data as opposed to plots, play back the data using the --from switch to select a begin time, usually a few samples prior to the time when things started to go bad. To reduce the amount of output you can also use --thru to set the end time for the collection. You can also start selecting specific subsystems to look at as well as individual devices. For example, if you've discovered that at 11:03 there was an unusal network load, try 'collectl -p filename --from 11:02 --thru 11:05 -sN' to see the activity at each NIC.
And don't forget process and/or slab activity if either has been collected. You can also play back this data at specific time intervals too.

I want to look at detail data but forgot to specify it when I collected the data. Now what?

Good news! With the exception of CPU data, collectl always collects detail data whether you ask for it or not - that's how it generates the summaries. When you extract data into plot format, by default it extracts the data based on the switches you used when you collected it. So, if you specified -sd you'll only see summary data when you extract it. BUT if you include -s+D during the generation of plotting data you WILL generate disk details as well.

How do I configure collectl to run all the time as a service?

Use the chkconfig to change collectl's setting to 'on'. On boot, collectl will be automatically started. To start collectl immediately, type 'service collectl start'.

How do I change the monitoring parameters for the 'service'?

Edit /etc/collectl.conf and add any switches you like to the 'DaemonCommands' line. To verify these are indeed compatible (some switches aren't), cut/paste that line into a collectl run command to make sure they work before trying to start the service.

What are the differences between --rawtoo, --lexpr and --sexpr

Looking at --lexpr and --sexpr first, these will cause the contents of most counters to be written as a list or s-expression to either a file or socket based on whether -f or -A is specified. If both are specified the data will be sent over the socket and written locally as a raw file. Adding -P will cause the local file to be written in plot format while adding --rawtoo will cause both plot and raw files to be written locally.

For more information also see Logging.

How can I pass collectl data to other applications?

You actually have several choices, all of which are based on --export in which you specify a routine that will export collectl's output in some other format that your application may prefer to see it in. There are currently 2 such routines: If you don't like either, you're free to write your own.

The next thing you need to decide is whether you simply want to write a data snapshot to a local file which some other program/script can retrieve OR send the data over a socket to your application. Clearly using the socket is more efficient, but the choice is all yours.

An example parser script called readS had been provided as a convenient way to parse a file that is an s-expression, but just keep in mind that it is written in perl and every invocation involved starting up the perl interpreter which may be a little heavier-weight than you wish to use.

If you do choose to use it, the arguments to readS take the following form:

dir category variable [instance [divisor]]

Detailed customization instructions for use of data returned by readS is beyond the scope of this FAQ.

Gottchas

This section is for those things people might otherwise miss in the FAQ and Shouldn't, because they can cause misunderstanding of what is being reported.

Round-off vs normalization

Since most of the numbers collectl reports are large, certainly greater than the number of seconds in an interval, they are not affected and so this is rarely an issue. However there are values that can be quite low such as the number of processes created/second or network errors. This can also be the case for many of the nfs metadata operations counters, some of which report values as low as 1 over many minutes. The issue is if you're looking at data at some interval other than 1 second, the result of dividing these numbers by the interval size can result in values less than 0.5 and since collectl rounds off, those values would be reported as 0 and you could miss a critical data point. If you tell collectl not to normalize the data, in other words NOT to divide by the interval, you'll always see the actual numbers. However, since most people prefer values/sec, it's easy to forget. Try not to...

Operational Problems

Why won't collectl run as a service?

As configured, collectl will write its date/time named log files to /var/log/collectl, rolling them every day just after midnight and retaining one week's worth. In addition it also maintains a 'message log' file named for the host, year and month, eg hostname-200508.log - the creation of the message log is driven off the -m switch in DaemonCommands. Check this log for any messages that should explain what is going on.

Why is my 'raw' file so big?

By default, collectl will collect a lot of data - as much as 10 or more MB/day! If the perl-Compress library is installed, these logs will automatically be compressed and are typically less than 2MB/day.
The output file size is also affected by the number of devices being monitored. In general, even on large systems the number network interfaces is small and shouldn't matter, but if the number of disks gets very high, say in the dozens or more, this can begin to have an effect on the file size. The other big variable is the number of processes when collecting process data. As this number grows to the many hundreds (or more), you will see the size of the data file grow.
Finally the other parameter that effects size is the monitoring interval. The aforementioned sizes are based on the defaults which are process/slab monitoring once every 60 seconds and device monitoring once every 10 seconds. Did you override these and make them too small?

Playing back multiple files to terminal doesn't show file names

By design, collectl is expected to be used in multiple ways and a lot of flexibility in the output format has been provided. The most common way of using playback mode is to play back a single file and therefore the name of the file is not displayed. The -m switch will provide the file names as they are processed.

Why don't the averages/totals produced in brief mode look correct?

There may be two reasons for this, the most obvious being that by default the intermediate numbers are normalized into a /sec rate and the averages/totals are based on the raw numbers. If the monitoring interval is 1 sec or you use -on to suppress normalization, the results will be very close.
The other point to consider is that numbers are often stored at a higher resolution than displayed and so there is less round-off error with the averages and totals.

What does New slab created after logging started mean?

When collectl first starts, it builds a list of all the existing slabs. As the message states, collectl has discovered a new slab and adds it to its list. This is relatively rare but can also indicate collection was started too soon, possibly before system processes or applications have allocated system data structures. It is really just an informational message and can safely be ignored.

Why does collectl say waiting for 60 second sample... but doesn't?

This is very rare as it will only happen when collecting a small number of process or slab data samples, but it is also worth understanding what is happening because it gets into the internal mechanics of data collection. In addition to the normal counter collectl uses to collect most data, it also maintains a second one for coarser samples such as process and slab data. When reporting how long collectl is going to wait for a sample, it uses a number based on the type of data being collected. In almost all cases this is the value of the fine-grained counter, but if only collecting process or slab data, it reports the second counter whose default is 60 seconds.

Collection of counters, such as disk traffic or cpu load, always requires 2 samples since it's their different that represent the actual value. Other data such as memory in use or process data only require a single sample but in order to synchronize all the values being reported, collectl always uses its first sampling interval to collectl a base sample and doesn't actually report anything until the second sample is taken which is why it reports the waiting... message even if it isn't being asked to report any counters.

Finally, the -c switch which specifies the number of samples to collect applies to the finer-grained counter. This means if you try to collect a number of samples that will cause the -c switch limit to be reached because any data is actually collected, you will see collectl exit without reporting anything! The best example of this would be the command collectl -sZ -c1. Since the default interactive sample counters are 1 and 60 seconds respectively and collectl has to actually take 2 samples, collectl will only run long enough for one tick of the fine-grained counter or 1 second and immediately exit with no output. Therefore to collect 1 process sample you will actually need to use -c60 but will also have to wait 60 seconds to see anything. Alternatively you could set the fine-grained sample counter to the same as the process sample counter and so the command collectl -i60:60 -sZ -c1 would also report 1 sample after waiting for 60 seconds. If you want to collect a sample after just 1 second, you should use collectl -i:1 -sZ -c1.

Why am I not seeing exceptions only with -ox?

Exception processing requires --verbose. Did you forget to include it?

I'm seeing a bogus data point!

This message means collectl has read a corrupted network statistics record and is ignoring it. It also turns out this has been attributed to some bnx2 chips and a workaround has been generated for newer drivers. If you want a little more information your can read about it here.

The way collectl determines a record is bogus is to look at the transmit and receive rates for each interface and compare them to the speed of that interface from /sys/devices OR if no entry found uses the value of DefNetSpeed which can be overridden in collectl.conf). It either exceeds twice the interface rate, the record is considered bogus and ignored. This will cause collectl to report the previous rate for this interval. While not foolproof, it is hoped this will reduce the frequency of this type of data.

What does the error -sj or -sJ with -P also requires CPU details so add C or remove J. mean?

Interrupt reporting has a unique property in that summary data provides CPU specific data while detail data provides data about individual interrupts and you will get this error if you request interrupt plot data but not CPU detail data. The most common place this can happen is if you run collectl V2.5.0 as a daemon because it collectl interrupt data but not CPU detail data.

In order to play back plot data from a file that did not specify CPU details be collected, you can either tell collectl not to include interrupts by the command collectl -s-j... or tell it to also include CPU details with the command collectl -s+C....

In order to make this less confusing with future releases, and until I think of simpler way to do this, the collectl daemon will be set up to include CPU details, noting this has no impact on data collection but only on the playback. You can always request CPU detail data not be generated on playback but will now also have to request interrupts not be included as well by collectl -s-jC....

Why can't I see Process I/O statistics?

You need to be running version 2.6.22 of the kernel or greater and it must have process I/O statistics enabled. The easiest way to check is to see if /proc/self/io exists. If not, you don't have them enabled and will need to rebuild your kernel and the instructions for doing so are beyond the scope of this FAQ. If you do rebuild, make sure you have the following symbols enabled: CONFIG_TASKSTATS, CONFIG_TASK_XACCT and CONFIG_TASK_IO_ACCOUNTING.

I'm getting an error that formatit.ph can't be found

This component of collectl must be in the same directory as collectl itself. On startup collectl looks at the command used to start it and from there determines its location by following as many links as may be associated with that command. It then extracts its directory name from the last link (if any) in the chain. If one has set up a set of links such that the last one uses a relative path, when collectl prepends that path to formatit.ph it's likely not to find it and hence this message. To fix the problem simply specify the complete path in the final link.

When I use an interval >4 seconds I'm getting non-uniform sample times

Awhile back I found a problem on a SuSE 10 system that was running with a new version of glibc that changed the granularity of timers from micro-seconds to nanoseconds and therefore went from 32 to 64 bits. Guess what, 4.3 seconds is > 32 bits! Once I reported this to the author of HiRes he immediately (within hours) release version 1.91 which addressed the problem. A newer version of HiRes should be the remedy.

I'm getting settimer messages on the console and in dmesg

This problem is actually another form of the previous one and is related to version 2.5 of glibc. See more details on what this means and how to correct it here.

Why is a set of process data missing during playback?

This actually applies to slab data too. Both types of data contain counters and in order to display them as rates, collectl needs to read a sample as a baselevel - that's why you see the waiting for n second sample message when collectl first starts. This applies to playback too and that's why when you playback data from a file you never see the first sample. Since process/slab data are usually collected at a different frequency, collectl has to read more intervals to get to the proper baselevel. As an example, consider data collected at 00:01AM. When you play it back, collectl is able to include processes/slabs in its baselevel since this data is typically collected on an intergral minute boundary. It will therefore start playing back non-process data at 00:01:10 and process data at 00:02:00. In fact, if you play back data from 00:01:10 collectl reads data from the previous interval and will still report process data for 00:02:00. However if you play back data from 00:01:20 it will set its baselevel to 00:01:10 which does not contain process data and so will not report process data until 00:03:00.

Therefore, if you're not seeing process data for the time interval you've chosen use an earlier value for --from.

I played back all summary data with -P but got a cpu detail file. Why?

This will happen when you try to convert interrupt data to plot format file. In fact if you include -m you will see a message that this is automatically being done for you. The basic problem is that by definition summary data is written in a fixed format to help ease the automated processing of the tab file. However interrupt data by definition is variable since there is a counter for each CPU. By forcing a CPU detail file, there is now a place to put the interrupt data. To get rid of the message simply include cpu detail when in playback mode.

What does this message mean: -sj or -sJ with -P also requires CPU details so adding -sC

This message is displayed whenever you try to convert interrupt data to a file in plot format and have not included CPU detail data as explained in the previous section. This message is only displayed when -m is included.

Why are some process usernames being reported as numbers?

Collectl uses the usernames from /etc/passwd to associate with a process's UID. If it can't find that UID in the file there is no username to associated with it and so collectl will report the UID instead. This can happen when using NIS in which case username information is stored remotely OR if playing back a file on a different host than where the raw data was collected. To get around this problem, copy (or create) the passwd file with the correct relationships to the local system and point collectl to it with --passwd OR change the Passwd entry in collectl.conf to point to that file if you get tired of using --passwd.

Why don't I get a prc file with -P and --rawtoo?

The data in the raw file is essentially identical to that in the prc file so the theory is why repeat it when you can easily create it from the raw file. Furthermore the raw file is already compressed and so is a lot smaller.

What happened to -sy in brief mode?

The brief form was always a hack because slab data is typically collected at a different frequency than everything else. With the 2.6 kernel, slab usage was included in the brief memory display anyway and therefore less important via the -sy switch. With the move to supporting slab data in a separate file with -G/--group, that become an even better reason to drop it so I did.

Why do I get an empty file during play back with -sY/-sZ and -P -f?

Trying to get this right was just considered too much effort for too rare a case.

Why don't I get slb/prc files with -P, -sY/Z and --rawtoo but witout -rawtoo I do?

Process data in raw format is almost identical to process data in plot format and it felt like a waste of space to replicate the data. Since slab data tends to be treated the same, at least in rawp files, it just came along for the ride. You can always force the generation of the slb/prc files by playing back the raw file with -sY/Z for a second pass. perl >= 0:5.008000 is needed by collectl...

I'm getting an error from RPM: perl >= 0:5.008000 is needed by collectl...

This seems to be a problem with some older versions of RPM. Include --nodeps to override the dependency check and collectl will install and run just fine.

What does this mean: New network device found: xxx?

When collectl was first developed, the assumption had been that to add new devices, especially a network device, one would have to power off the machine and reboot, making the possibility of a new device showing up after boot unlikely. Therefore, when collectl first starts up it identifies all the networks on the system and if logging to a file writes this information into the header so that during playback it can all the network names without having to look into the file. If a new network device does becomes visible after collectl started or during playback, you'll see this message.

However, this message has recently been seen when running with InfiniBand, so a devices obviously showed up after collectl started. This message also means that the new device has been added to the end of the list of known networks. If it in fact should have been inserted in the middle of that list the results are unpredictable and will probably be wrong. Be sure to let me know if this occurs.

Clearly as newer versions of linux/hardware emerge, the types of hardware changes that can occur without shutting down the systems will no doubt increase and so perhaps more of these types of situations will occur.

What does this mean: NumNets in header is 'x' but only 'y' listed...?

This message only occurs during playback. Versions of collectl older than 3.5.1 were updating the internal number of network devices when a new one was encountered in real-time or during playback, but they were forgetting to update the names used the header as well. This meant when a new logfile was created after midnight (or whenever it was rotated), the new value for the number of networks was changed in the header but not their names. If you see this message it's simply pointing out that fact and you WILL later see New network device found: xxx messages as those new networks are found in the raw file during playback. This should not occur with raw files created after V3.5.1.

Why is -sC showing CPUs with no load and no idle?

Somewhere along the kernel started incorrectly updating the CPU usage stats in /proc/stat and if that's the case with your kernel, you'll see this behavior. There is really nothing collectl can do about it as it relies on the kernel for accurate stats. Furthermore, before you run out and try a different tool remember that all tools use /proc/stat to report CPU usage and so that won't help.

When did playing back a file start taking so long?

This problem was first discovered when a user upgraded to RHEL6.2, but it turns out it has nothing to do with that distro but rather the underlying compression library that collectl relies on, namely Compress::Zlib. It turns out when that compression library went from 1.42 (last rev of the 1.0 architecure) this broke so any versions in the 2.0 generation will show this problem. You can see which version of zlib you're using by running 'collectl -v'. If the playback times are too long for you, just do what I've been doing for years now and run collectl with -P --rawtoo and you're get both a raw file as well as a plot file with not a lot of extra overhead.

What does the message -D requires -f OR -A server mean?

First of all, this should only be seen by someone who has modified /etc/collectl.conf because it already specifies -f. The point of the message is that when you run collectl as a deamon, it needs to know what to do with its output since there is no terminal available. That destination must be a file OR collectl must be run as a server, sending its output out over a socket.
updated July 30, 2013
collectl-4.3.1/docs/Ganglia.jpg0000664000175000017500000004613613366602004014474 0ustar mjsmjsÿØÿàJFIF``ÿÛC    $.' ",#(7),01444'9=82<.342ÿÛC  2!!22222222222222222222222222222222222222222222222222ÿÀJY"ÿÄ ÿĵ}!1AQa"q2‘¡#B±ÁRÑð$3br‚ %&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyzƒ„…†‡ˆ‰Š’“”•–—˜™š¢£¤¥¦§¨©ª²³´µ¶·¸¹ºÂÃÄÅÆÇÈÉÊÒÓÔÕÖרÙÚáâãäåæçèéêñòóôõö÷øùúÿÄ ÿĵw!1AQaq"2B‘¡±Á #3RðbrÑ $4á%ñ&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz‚ƒ„…†‡ˆ‰Š’“”•–—˜™š¢£¤¥¦§¨©ª²³´µ¶·¸¹ºÂÃÄÅÆÇÈÉÊÒÓÔÕÖרÙÚâãäåæçèéêòóôõö÷øùúÿÚ ?÷ú(¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¬jwZ7„oïìœGs@ŽT62ê¤àñКqWi!7eszŠñøK|YÿAïü“‹ü(ÿ„·ÅŸôÿÉ8¿Â½ì¼Ge÷žOöæ ù¿{Eâÿð–ø³þƒßù'øQÿ o‹?è=ÿ’q…ÙxŽËïíÌó~ öŠ+Åÿá-ñgý¿òN/ð£þßÐ{ÿ$âÿ ?²ñ—ÞÛ˜/æüíW‹ÿÂ[âÏúäœ_áGü%¾,ÿ ÷þIÅþeâ;/¼?·0_Íø3Ú(¯ÿ„·ÅŸôÿÉ8¿ÂøK|YÿAïü“‹ü(þËÄv_xn`¿›ðg´Q^?¦ø¿ÄßÛš\7°ž ï"†HÍ´k•fÁä ×°W%|<èK–{ølU×ßø7ÿ^müeè}ÿº|ÙèWŸÿ˜ðoüû_à|ßüUð¦<ÿ>×ßø7ÿ^YîEyÿü)ÿϵ÷þÍÿÅQÿ cÁ¿óí}ÿóñTèWŸÿ˜ðoüû_à|ßüUð¦<ÿ>×ßø7ÿ@Eyÿü)ÿϵ÷þÍÿÅQÿ cÁ¿óí}ÿóñTèWŸÿ˜ðoüû_à|ßüUð¦<ÿ>×ßø7ÿ@Eyÿü)ÿϵ÷þÍÿÅU/iVþø›âmN{…Ó¢´¶™!–f“Àä剠M¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(®WâGüˆ:Ÿý²ÿÑ©]UrßÁ>ÔÀÿªÿÑ©WOã^¤Ïág–UMJÁu+?³5È·HØÈQˆ\ÊÊpRsŽƒ,í;?ùíÿŽŸð£ûNÏþ{ã§ü+ëjJƒ2Gç”(×¥V3t›³ÚÌG±K©<½Zõ-&€ÆÑÜ$²•cÝÎçÛÇO/ÌÎ~m˜©?±´sÔ.ÑlŠ˜|Ëc•Ú¥{·wPxò÷ûí¨ÿ´ìÿç·þ:šº½ƒ³*Ü«2õErªIËßý'½ÿ¯Ÿ•½]½ð¯ïŸ§ŸwkßìÞö—4òhÚ-ÃÞ´——8½e·`"Ø8-³$Ôló;gm>ÛO±‰ËK­jî]ª.­°"÷ˆ† 8௙ï³&«ÿiÙÿÏoütÿ…ÚvóÛÿ?áOÙYÝUü¿뫵®¬[«<+ûçé¦öÓòïis±|;¤­¤6é¬j¬Õ¤¶µÆFhl‚ o¾Úrx{E1Þ¬º•ô†ûi‘î 9B½ lÎßo/x锿ÚvóÛÿ?áL]cOw(·(Yx ‘Sì#ÿ?/OËõòµýboþa_ÉÏ×ó»[ê¢Ýí.y`Ñt´¿ŠúM_P–á£F¹·áã‰`ŽÜÞ~m½­6ÝÇfí™ùw€àüOÖ©ÿiÙÿÏoütÿ…ÚvóÛÿ?á]yiÞõ/÷_ðoòãŪ•Òå â×ngò×m[ùYn›z?ò0hŸöƒÿCîõà:UÜ>$Ñ'ÜÃQ€ãié¼W¿W‹šÉJªiô>“"„¡†´Õú…Q^aíQ@Q@Q@Q@yþó|jñƒrÊÉ~¹Rk¿' H:õâ¾ ñí–«ñ§^H4ÍQdÔ£†ŽH4"C'ÏÀÓ4ítQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEŒÊˆ]Ø*¨É$àYú潦xsK—Rծ㵵ŒrÎyc裩'ÐW¶:ÿÅ'ê©q¢x@ÑØƒ²çP^ÆR>â‘  ß^Oña$ðß„ ÙÛKºçZ’FH€ÿV˜ÿXNyê:B:o„Þ x6;{¸‚jwNg»Á¢®G`?Rk²°Óí4»l¬-¢¶µ…vÇKµT}*ÍQEã¾^ø¯ÆÚΉ%¼4Gí†f*¾b°ñÏÍœtÛšöz(‡ðŸŽ`žî? ë–Øzôlä?»@Àh_£:g?\]Åbø›ÂšG‹tï±jÖÞ`Sº)í–þò7PO\׿⇥§‹][ù ¹, […Ÿ÷ÇêN¦ÑPYÞ[jq]ÙÏÅ´«º9b`ÊÃÔSÐEPEPEPEPV¯â='B /ï9îD ´ôQÏãXÿð˜ê|éžÔg^ÍrËn¸-šæ´ÝR[ëú•Ö5ì_ÚÀ/ Q$„8 ·®Üc‘ú×Ugã/^𚤷B³Ÿ(ƒéóbµ”yvWþ¼Œ”¹ºØ‹ûSÆrÿ«Ñ´È?ë­Ñlß"¹í/Ã^"Ò|Y«xŽ×MÑEö¦¨²æWÂãïmôÜ@'ÜWu¡e0ÌW–ïÆ~Yþµ'Ú`ÿžñÿßb³çÊ‹åó0†­ãy—CÓîqÚ ²™ún¿ðš\ZÈWÃzª½$J'E÷%{Vĺ•Œ3^ÛÆ:üò¨þf².ümáëC·ûF9ä<,vÀÊXú ¹­Rn_góÓízN¹¦ktëÈç ÷”pËõSÈ­ óK;énþ$h×qéi°Ü¤èZPK#',£¦Üf½.‰Ç–Ä®QEAAEPEPEPEPEPEPEPEPEPEPEPEPEPEPEPEPEPEPEPEPEPEP\¯‹|seá—†Â$Եˮ-tËndú·÷Üú¸5«øßQ×õ9|=à(ã¹»Œí»Õ¤µ²úŽþï8Ûð—‚4ï ¤·ä¾Õî~k½JäîšfïÏð¯°ôzÐ>‡àkÝOT‹Ä~9ž;ýQ>k[æÖÇÙWøŸý£ŸÇ×EQEQEQESeŠ9¢x¥E’7Yd0=AÅ:ŠókÏë> ¼—VðÙ;.¼?3â)=Zÿ,ÛÛ§ätÞñ®•âëy>ÈÏô-Õ…ÀÙ=»t!”öÏqÇãÅtuÉø¯Àv^"¸T³¸“J×íÇú>§kÃö\ûñ ²ŠóíÇ—º>§‡üyoŸ¨9Ûm¨Çÿ—¿Fþö8ü2z QEQEQEq>ÿ[â/û \ÿìµ±u¦Ø_swem9õ–%oæ+Âë|Eÿa«Ÿý–—ÄÚ®¥aw¢Ùéiº…Ù¥¹…¥T7|…WLŸ—{ÑU¿i ¡ðêI'ƒ|;)ù´˜9ùr¿ÈÓ?áðßýãÿ¾ßükê¶:_‹u°¸½Ð¡$ðÆë¥¸VBÄ©ä=GJ¹o­ë6^#Ò4ÍJm:ñ58¤u6¶ïÂQCd†‘Ã)Î3•ÁÇZJ¤ÿ™ÿZƒŒ{x?ÃБ·I·8þø-üëRÚÂÎÈbÖÒ¤Q…þB¹=3Vñf«¦Ýjv’i,7sÄ–-k"4‹¬˜ó¼Ò½vžÕÑèZÄþ‰iª[+$w »cýä= ŸpA…K”šÕ•'¢(j_ò?xWþÞÿôP®Ò¸½KþGï ÿÛßþŠÚV¯á§êÉ[¿ë¢ (¢¤ ¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢ŠçŸEgèšî™â-2-GI¼ŽêÖNއ¡ô#¨>ǚЮ[ðÖŸ©Éâ\Ǧj­ó\Y°ÿD½öuu¿Ú¦sZñí¦»vú>¥m&‘âï´ë“‚ßíFÝ}Çòæ€:ú(¢€ (¢€8Ÿ ÿ­ñý†®öZ­ã%u}OÃ0ϧ‹Û4Ô®áó#Uòd¸ Œg{â¬øOýoˆ¿ì5sÿ²×EJ¯Æ(|'šjMæ—¡øÏÃV:uËiòY½Î˜°ẪÌS¾ u’Ѹâ´âÑÃ&ѯtm+ȳ¿ìzŒ6vøU`7G+*Œ Ê[ý®k¸¢¡?ëúþº”Õÿ¯OÕ†õyt]òÄèú¼ÚÛîÚ(O™RMó9C沈ÂAÉlbº?èóh>°Ó®MÄhZmŸt;1fØ#ð­º)_Oë u9ýKþGï ÿÛßþŠÚW©Èýá_û{ÿÑB»JÝü1ôýY wýtAET”QEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQET—–Ú}œ·w“Çom —’YX*¨õ$לɫkߤkmî4 ä¬Ú«.ÙïB°÷Wý£ý  ÇWW:¬žð]´z–´8žáú5ˆé™uaýÑý0t<'à[_O.©}s&«â ‘þ“©\˜ÿ²ƒøØ€‡´¿ iQéºE¢[Û§'³·vcÕ‰õ5©@Q@Q@Q@Q@Q@Q@Q@Q@Q@sþ*ðn‘âëDŽþ7ŽæºÚöÙ=»uút®‚Šó[okž¹LñÏúV˜Ì#µñ1ü§ÑgQ÷ߧד^ Ñ\@“Á*KŠ$F ¬Bê)·6Ð^ÛImu sA*•’92°=A­yÄÞ×þM%÷ƒCêZb÷®KGܵ»Ùïîq€L¢°¼/âíÅÖçK¸%ã;g¶”mšAëíÇ·h¼Ð5/R»¿Ð$·š¹ ÓØÜü£Ì=Yt'Þ >%¾´ùu? ê–ì:´'AÿZí¨ªrOâDòµ³8sã½?×ÜOô–ÚAϧJOøOü1ÿA?ü/ÿ]Í­O³ûÿà¿ßðÿ‚pãÇ:˜û<·7$ôÚÈsùN!Õ/>]/Ã:Œ„ðì túäõÛQOÜ]?ø`´»œ®“áýN]fg]¸€Ïn¬¶Ö¶Àì‹pÃÇ’HãÒºª(¤ÝÆ•‚Š(¤0¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¬/x»IðŽž·:”Ìd”ì·¶ˆnšáÿº‹Üò=†kÄÞ;kMOþï Ù_Än9…OîmG÷æoáÓ©öÈËü/à5Ó5×µëìx’QóÝÈ>H÷!^Уשç¦q@~Ö|uy­ã„û6šŒ$³ðú6Q}sümþÏAùŠôxãH£XãED@UF EQEQEQEQEQEQEQEQEQEQEQEQEq¾(ð :­ø×t;¶Ñ¼Gù/`,ßìʽ~¼ôëŒUmÇÓGªGáÏÚ.‘®ž"“?è×½·Dç¹þéç·^wYzÿ‡t¯ériÚ½œw6ïÈ Ã!þòžª}ÅjQ^b/¼GðÀˆõCq¯øIxKà7]X¯¤£øÐ{ùp+ÐôÍRÇYÓá¿Ón¢º´˜e%‰²øn¢€-ÑEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQEQX^+ñü#\W‚ÐÝ<³¬ “g$œàÿv¹?øZwŸô-ÿäðÿâ+jxzµá£ž®*…j“Iùž“Ey·ü-;ÏúÿòxñÂÓ¼ÿ¡oÿ'‡ÿZ}KüŒËûG ÿ?ÞzMæßð´ï?è[ÿÉáÿÄQÿ Nóþ…¿üžüERÄ#í'üü_yèÓM¼/4Ò$qF¥Ý€U©$ôç>$Öþ Üɦø6G°ÑŒwZû¡ýVܧ¶îÞÜÅë~!Ö¼_«íí5—Ãð¾SI·¼.<4²–ì€;~=…·Ä©lí£¶¶ð¬pÁ„Ž8ïU@è À¡`±ì1ÿhá?çâûÑÚxgºO„´ß±iVûÒÌçt³?vvêOù­ªóoøZwŸô-ÿäðÿâ(ÿ…§yÿBßþOþ"©b?‘‹ûG ÿ?ÞzMæßð´ï?è[ÿÉáÿÄV†ƒñ Mc^µÒæÑ©¸¶Ar$ÁU-Óhô©–¼”¢Ò.ì5I(Âi·æw4QEsAEPEPEPEPEPEPEPEPEPEPEPEVÔoNÓ.ï ­´/1Py!Tœ~•çÉñVéÑ]|9•a~Ü:ß­:*ü æ5±¨ÛÚI+÷=$€À‚‚ yæ§àmKÚŒºï€%ŽÖiû­S‹[¿÷Gü³B0>ƒ9‹þçý ù§§Z¿á߉ž&µÓþË®hÐ^ÜDv­ÜW"/8z²í8?L}(ú–#ùÿ´pŸóñ}èö +Í¿áiÞзÿ“Ãÿˆ£þçý ù¥ˆþF/í'üü_yé4W›ÂÓ¼ÿ¡oÿ'‡ÿGü-;Ïúÿòxñ}KüŒ?´pŸóñ}ç¤Ñ^mÿ Nóþ…¿üžüE5þ+Ï ï›Ã¥cnaz 鲇‚Ä-\Öa…“²¨¯êz]Q\§`QEQEQEQEQEQEQEQEQEQEQEQEQEÃ|Tÿ‘Nÿ°”úמW¡üTÿ‘Nÿ°”úמWÑe?Á~§ÇqñãèU{ÛÈl-$¹œ¶ÄÇ 2I'êI©Å­‡º¸·–Âî‚9üÏ/yÆ0ç9Áü«Ò•HÅÙ³ÂÊ<Éiýš5(¬Uñ%»-´†ÎõbºèîbHØÎÐÈ=y  Î9«)­[}žêYÒ[sjÁeŽE”ÆÒAÎF0M/må<5U¼­¿==t4h¬û}UeºŽÚ{K›IeRÑ ‚üøë‚¬pG¡ÅWÄVÒÇÂÚè[4¾I‘B#îÛƒÎzãç¯\ÖÃêõ:/ëúFÅs«G İGos9…CLШ" zg$qÎíÒ™áÉdŸÃ–Í#É#B¥ÎI>äЪEË• Ñ’§í—ãvjV§„¿ä~Ñí¿þŠjË­O Èý¢ÿÛýÕŽ;ýÞ~‡nOþûOçù3Ú(¢Šù#ô¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(+Äÿò)ë?õã?þ‹5âV¿ñçýs_å^ÛâùõŸúñŸÿEšñ+_øó‡þ¹¯ò¯o'ûGÌq'Á™5V2øŠ)Û˱¼xn¤1Á8ÈÏ<° `2{W³)Æ.ÌùxRœïʶ6h¬y¼EoÚÙnÞ+I6\J¨6ÇÓžNHç<GqÓ6-õx§»ŽÜÁqœ…áyT(Îr:ŽIVƒvL·‡ª•Úþ·üµôÔТ²·àÙçýšçì[ö}¯jù}qœgv3Ævãðæ¤ŸYH¯'´ŠÒêâx]Ö%_ºAç,@íÓ¯ <àöеîW«{[úþ™¥Eg¾³mö{Y`Y.MØÌÄçÉ?1:䊭¢^Iy¨jÌþrªNбKcýÚäc ç=8=y£ÚÇ™EÃÏ–RjÖÿ;^†ÍTÔÿä/áüÅ[ªšŸüƒåü?˜¢¯ðåèÇ„ÿx‡ªüϡ袊øÃô°¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(†ø©ÿ"þÿa(ÿô¯<¯Cø©ÿ"þÿa(ÿô¯<¯¢Ê‚ýOŽâ?ãÇÐÏÖôë½WI’ÎÆ'¹‘ãÙ_}¾uÎÞ§ s'™iáÝh½ßj·Wvå`¹Šö9„ásòFSvÒI;cÄV‹¢H…U”õ 2*¬m „à ¬Äz¢FžýtV¡VU9 Õ­×úü|þO‹ ‰ÂB‡³«^û«y’vëk_[Æ¢øgV{? DzJÖì…„SÆþfØŠ·—µ›´õÙ»ñM¼ð¦­{>¦ÖðÅ!¸x$³):8¹1rÁ6“¿ÁÚÞø­&†&ߺ$;ñ»*>ltÏ­ M¿tHwãvT|ØéŸZ™aëÊ÷k{ìû[¿õ~¶×uÁ'u¯š}y»+Ùú]G§7¹ThÚ¶«ªé÷O0Ã÷‹ý")Z劕"/-›x^K'ÀªgÂÚÄžŽÆ+T–æKñ$SÆþpïc D˜^NÜíïŠÖhbmû¢C¿²£æÇLúÐÐÄÛ÷D‡~7eGÍŽ™õ¡áë»ÞK_'åçåøõ¶©c0 Ö„•škTík¾ÚüOµùzs{•EÖm5;׳°[ÔÔ ½³ÃseP¬Í— ŒŸ,>Þr*]#I»Òü;aÒÆ Æ4s$ŠäuØÊHp:¹ ©š›~èïÆì¨ù±Ó>´¡]œ*†ln r~µt¨Ö„ï)&½?àÿWò×øœé8Â2OMÚ謯§gÒ×µþ׺êÔð—üÚ/ý·ÿÑMYu©á/ù´_ûoÿ¢š«þï?Br÷Ú?ÉžÑEWÉ Q@Q@Q@Q@Q@Q@Q@Q@Q@Q@Q@^'ÿ‘OYÿ¯ÿôY¯µÿ8ëšÿ*ößÿȧ¬ÿ׌ÿú,׉ZÿÇœ?õÍ•{y?Ú>c‰>|É‚î;C"“Æç`ª=É$=É®{Hðιiž÷Ögɶ!š)ïcŒÙ *2/—Á;C¿C…ÉÅt5^+Hfi¢µ‚9[ï:FÜšô«Ò«9ÆTÚVïý_ŠñpXŒ5*sx·}¬Òï÷on»ß¥¥N jãGñl‘K<²0Yg<°È¡|ÍÌ<­Äq¿n{f¬ÜxwR7º[ÉE¤n·fY‘>Ϲ_3q7›±žÙ« K³lH6gn|¹ëJ—fØlÎÜ(ùsו—ÕëÚ×]:>Žýüþvé}:>·þYõ꺫vvÑyÛ›¯/¿Ž|9®/‡Îƒ6ž°ùj!žê[ˆ’8S<4€°d,>èe'‚zÖ¥·‡õmkTžhRZ8´³"yDÿY’<¼äc~7gŒÔ‹ K³lH6gn|¹ëJ—fØlÎÜ(ùs×”,>#ù—Üû5ú¿»ÏG<^I®YkæµÕ>Î×å]íÍ×—ßÇ±ðÆµ¦YèóËbÖ¼Wp=ÌQ´ñ´¹f ™ cq]Ùã5¡¥èš½µÖ­uf¶Þd©#!•KB»Q ÎQŽ8ÝÝÆl,1.ͱ Ù¸Qòç®=(Xb]›bA³;p£åÏ\zS… ñ’|Ëî~Ÿ×§žŠ¶3R2\²»¾·]ïùúÛ›wËᆰjò—ðþb­ÕMOþAòþÌWU_áËÑžnýâ«ó>‡¢Š+ãÒŠ( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š(  ¾‹§ë¶‹k©[‰àWÞˆŒåH= ¬OøVþÿ OþLKÿÅWUM–Xá‰å•Ö8Ðgc€ u$öJrZ&KŒ^èåÿá[øOþ?ù1/ÿ^%ñR;{eì|š;TP6ñÉ/Ù¤-µC9b¹9Æ1Gåé—Zö±ñî]/ÂsIaáøØÇy®†›c·þ¯ÛòÏm xoJðÆ‘—¥Z$6ÉËwi»1þ&>´ý¤û°äc—ð÷ÂýÓÃÖêöu¨ˆTÜÌgnòÝàc°£Ä^ðO†ü=}¬^ix‚Ò#!i”=”|ÝIÀZïj+‹h.¢1\CÑžJH¡ü Ò}ØrG±âað×ôKȵ=5«g1/¶ytNIRîܯà3Ö½+þ¿„ÿèÿ“ÿñU‰ðZÖÞ?Cp–ñ,Ísr­ @9¸'®8è´{I÷aÉÇËZ¦Ÿ}á?‰Wšeôb_Ås=ÕÌRÈ–öò”³FU½W“ÉS^ïÃïÜÁðiÉ,2(t‘.¥ee=!ùÔÏkou °ÜA±Ìž\ˆè:óòzŽOõæsYjŸ .$¼ÒãŸRðc±{‹Kͧg’ñgïGÜŽÝ}MÒ}ØrG±Óÿ·ðŸýòb_þ*­é¾ ðö‘õŽŸå\ÇçHØÈ ðXŽ„Öž—ªØëzlŽ›uͤëº9c9Cê"®PêMèÛH®ETQEQEQEQEQEQEQEQEQEQEQEEsoݬÖÓ xfCŠOÞR0Gå\×ü+ ÿÐ'ÿ&%ÿ⫪¢©JQÙ‰Å=ÑÊÿ·ðŸýòb_þ*øVþÿ OþLKÿÅWUE?i>ì\‘ìr¿ð­ü'ÿ@Ÿü˜—ÿŠ£þ¿„ÿèÿ“ÿñUÕQG´Ÿv‘ì|ïñ^;x³G–ÇIŽãN¹¶o:ÒY¥Ã²·$0l©ÁÞÆ½KJð†5 *Öò=œ³F {Çr™í•rá]F¥£éÚ½¼°_ÙÃ:K ÀÅÐØÿxÔg§ ô®+À÷Z§uà bf’âÁ|Ý2áÿåêÌŸ—þÿ[4{I÷aÉÆÏü+ ÿÐ'ÿ&%ÿâ¨ÿ…oá?úÿäÄ¿üUuTQí'݇${¯ü+ ÿÐ'ÿ&%ÿâ¨<&²òb_þ*ºª(ö“îÃ’=‚Š(¨((¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢¹ÿøÃLðŠMzÏ5Ôçe­”t×ÙQ.zÊ€4µcOÐ4¹µ-Rê;[HF^G? I=€äןEc¬|T•.µhî4¯† žIIõ9 )u;…#W4ê~&ÔáñŽÕHÎûNè,ýû<žýä¡Ð6¶¶ö6±ZÚCñ(HâBªÐ*j( Š( ?ø1ÿ$âßþ¾îôsW WŸüÿ’qoÿ_w?ú9«Ð(£¨Á¢ŠómWÃ:¯µ)üCà«s=Œ­æj:8Y=dƒû¯ì:öì+±ð׉ô¿i ©iWdGåtn'ˆÿõdVÅp^%ðeý–®þ+ð[Çk­uº³cˆ5ôqÐ?£zõÇZïh®oÂ>3°ñm¤¾TrZjVÇeîŸ8ĶïÜÜzäx®’€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ ã~ ønïVÓíµäñŽææÅÀÿYýø¨aÆ=qï]•‹áOZx³Ã–º½¦TJ¸’"~h¤2p¡ï[Uæwßñn<|55ù<3â„wƒøm/Ý“Ù_¡÷äô½2€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¯<Ö|c©øT›Ã~(óÆvßk.7Ad;…ìò{tž4¼W㑤ަƒ¡Zkx–à~êÍËÿž“7ð¨ëê}³šO xéwϯë÷Ú¾%œ~òíÇÉÿžp¯ð¨Î3ÔóÓ8­ ø;Lð“Çh{¹Îû»éÎé®»3~|vüÍt4QEQEQEçÿ9øol{«’¨óš½¼ÿà¯ü’Ý7þºÜè÷¯@ Š( Š( ;ÅÞ:ÅÜZî‡uý—âkQû›Å,ÃþyÊ?‰ONø÷éKá næ]YµþËñ- ÿH±săþzD‰^ø÷ê{ æ¼_à»[Dí,–Z¥©ße¨ÁĶïì{¯¨þGšéh®Ã>4¾¶Õ“¾3Ž;Mt [\¯j ýèÏ@Þ«üºò€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€3õÝËÄZ%Þ‘¨GæZÝFQÇqèG¸8#ÜW)ðïZ½ŒÞx;]“vµ¢á§þ^­¿åœ£×ŒïŒòk»®â&‰z‚ÏÅú{µ½/åùz·ÿ–‘^2GãŽMwtVv…­Ùx‹C´Õôù7Û]Fqê§Üƒî+F€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ Šêê +Yn®¦Žx”¼’HÁUTu$ž•CÄ"Òü/¤Ë©ê×Konœ òÎÝ•GRO¥pÖº³ñ*ê-OÅP˧xqIg¢nÃÏŽ’\Dÿ$’ê:ÏÅ9žÓF’ãJð‚±Iõ,žü©?u;ÿëŠôEÓ¼?¥Ã¦éV‘ÛZB0± üÉ=I=ÉäÕÈ¢ŽRcHâB¢"áT€ÐS袊(¢Š(¢Š(¢Š(Ïþ ÿÉ-Óë­Çþzô óÿ‚¿òKtßúëqÿ£Þ½€ (¢€ (¢€ (¢€1¼Má}/źCiÚ¤Ó;¢•$…û:7b?ýuÈi>(Õ|©ÁáÏÏç[JÛ4íxŒ$ã²L…ýÏ_ÔúERÕ´‹ wLŸMÔíc¹´™v¼n8>ãÐŽÄr(íåÖú†«ð¦æ; fYõ»µÔˆÝ-†xËŽ©Ø7ÿXM‚x®`Žx%Ia‘C¤ˆÁ•”ô Ž¢€$¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(Ìì¿âÜxüéÍòxgÄSµ?Ãix~ô~Êý½ðBkÓ+Å^´ñg‡.ô{Ï•f\Ç(4RUǸ?Ôw¬o‡¾$»Õ4û[ù¢€;*(¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ (¢€ æ|]ã[ E &9/µk£¶ÏM·æYÛéÙ}Xþ½+7Å9ž-SþŸ [&§â78'÷6Kýù›¶?»×óÛð ðôÓjš…Ëjž!»ºÔf7ûˆ? ÿ3|?à›ýGV‹Å7’;½]~kKæßOŠ:3ôËsÏ®®þŠ(¢Š(¢Š(¢Š(¢Š(¢Š(Ïþ ÿÉ-Óë­Çþzô óÿ‚¿òKtßúëqÿ£Þ½€ (¢€ (¢€ (¢€ (¢€#¸·†îÚK{˜’h%R’G"†VSÔzŠó¬µO„÷y¥Ç>¥àÇb÷ —›NÏ%ãÏÞ¹¿3^¥AŒ‘@ô½VÇ[Ó`ÔtÛ¨îm']ÑËÈ#úPyr¼×UðΫà]Jø*ÜÏa+yšŽ‚OY þëû½» ì¼5â}/ÅšBjZUÇ™ù]‡‰û£¯b?ýY±EPEPEPEPEPEPEPEPEPEPEPEPEP^{ñOºÐu+_èð³ÜééåjVéÿ/Vdüßð$ûÃéìz#*ºu ¬0AP};PµÕtÛmBÊešÖæ5–)£)f¼×ÃLÞñ¤ž¸b4=QžçE‘>s%¾~§#ëÜšôª(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢ªê:ž“§Í¨\Çmkî’Y £üöï@I ¥˜€É'µy¶£â­[DŽڥòlbo.ÿ_#)ªAýç÷íÛ±ªùÖ~,Éö#Á õû—˜þiý}ÿ‡Ò4Ý6ËHÓá°Óí£¶´vÇcGùïÞ€3¼/áM+Â:_Øt¸HÜwÍ<‡t³¿vvîAڶ袀 (¢€ (¢€ (¢€ (¢€ (¢€ (¢€<ÿà¯ü’Ý7þºÜè÷¯@¯?ø+ÿ$·Mÿ®·ú=ëÐ(¢Š(¢Š(¢Š(¢Š(¢Š(®ľ ¿±ÕßÅ~ xíuž·vMÄ‚ú0èѽ}:×}Es~ñ‡‹¬åò’KMFØì½Óçâ[wîî=òY‡üó”žñïÒá næ]YµþËñ- ÿH±săþzD‰^ø÷Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( Š( wƾž–ÄIä^ÄÂ{+‘ÃA:òŒoCìMWðŠdñ6„Âú?#Y°Úê6ç‚“/ãѺÄv®ª¼ãÆKàÏAãÛÙ¬Ü-®¹c;áÎ`?¼‡‡§4èôTpOÕ¼WH²C*Ðä2‘Aô"¤ Š( Š( Š( ±µèšÄvúð‚Yz§–îvçùAÇ þU³^OñþG{û¯þŒzèÂÑUª¨7¹ËŒÄ<= UJö:ïøYÿ ·þKËÿÄÑÿ #ÂôÿÉyøšòÚ+×þLJó3çÖYÏ¿Çþê_ð²<'ÿAoü——ÿ‰£þG„ÿè-ÿ’òÿñ5å´Qýæaþ²Ëþ}þ?ðL¼øŸá[[9®ú[† b·“{ã°ÊŸ©ç¶zÞ—ã]R=gÇZŠÁcî±Ð)4ÇG˜…÷·OÔUZ(þLJó0ÿYeÿ>ÿø¨¯Äo¢…]P ipýóKÿ #ÂôÿÉyøšòÚ(þLJó0ÿYeÿ>ÿø©ÂÈðŸý¿ò^_þ&øYÿ ·þKËÿÄ×–ÑGöÿø©ÂÈðŸý¿ò^_þ&øYÿ ·þKËÿÄ×–ÑGö/ÇùšîÁac‰“MÚÇ›™ãÞ ššîìzïü, ÿÐ[ÿ%åÿâhÿ…‘á?ú 伿üMymèÿcÃù™âÿ¬²ÿŸüÔ¿ádxOþ‚ßù//ÿQ\øÿÁ·–²Û\j+,!ŽHÚÚRHÁåèEy•cÃù˜¬²ÿŸüsÀ^3Òü1>£á«½BI´kWót«æ†Cû¦9ò[åÎTô8ééÅvßð²<'ÿAoü——ÿ‰¯-¢ìx3õ–_óïñÿ€z—ü, ÿÐ[ÿ%åÿâhÿ…‘á?ú 伿üMymcÃù˜¬²ÿŸüÔ¿ádxOþ‚ßù//ÿGü, ÿÐ[ÿ%åÿâkËj¯øó›þ¹·ò¤òˆ%~fT8ŽR’³ßÏþïÖóÅumÄP:8èÊFAüªJÊðÇüŠz7ýxÁÿ¢ÅjׄթNè+Éþ"ÿÈïoÿ`ÕÿÑ^±^OñþG{û¯þŒzíË¿Þ"yÙ¿ûœÎn º»‚Ê6áö©`£±f=’}…OXž![™eÒ¢²F’ñ¯TÃ&÷b¸ Jƒèy}q_KVnæ_Ö§Âa骵f={N’™„²¬vÜLÒA"l>‡*9äqH—·~ÛYm¿ù—mõ­>îX£‚rÆa˜›Ë`’q’ˆÁ#¸#ÐÕMk]·³°¿Xn ÜÃmqdI6åAlm§÷µÓ¯_KðªEgpÌ·©U‰‰DA:å€?QëUå·¾FÕtÔÓo®.næœ[˜àfY·ò0ÀmÈ >\ç ‘SV´Ôd¶ß_’üwû)áis¦®Õöºïkíµ­÷îtvÎÒZÄìrÌ€“™m‰anÍ*• d}FGæ)õÜy“V“;?…Ÿòñý»ÿ'¯I¯6øYÿ!Û¿òzôšù,oûÄýOÑ2ï÷J~ˆ(¢Šå;BŠ( â§ü‹úwý„£ÿмò½â§ü‹úwý„£ÿмò¾‹)þ õ>;ˆÿB+‹ˆm-ÞââEŽ(ÆY› RnÆy¥…q$)æH$¶‘6¯©Ü£Ðýj·Š2t'D?¼ybXð2wy‹ŒçØ=j” 3Ýß}º9—PžÌˆ×ÊUVsÂ…wç-ÎOq]u+N3åŽßðüÊ£‡„¨óÉë¯éåæîi/ˆ´¦Ø~×…‘s´lOeb0Çœ`dçŒUˆuK9­æœM²8%óTÆcã?0`ãÖ±þË8±ðˆ$Ì/”l?»ýÑw§y£Š+æSˆÝUŠ9ÆpIöÎjµ¼2mêóymåË"7ÚpØœýEgÛZܯ‡|?·”Kð™¡Êœ’;TûZœÞWíçný‹T(µÖúu]c~݆ÌúÕ…´ÒE$͘ð$e‰Ù#Ï÷˜ñ"ŸqªÙZÌÉ12ºoHÑÙ×ÕB‚OáÛž•Ëac¨ØKgu,òÍ+E²e”98Ë”uÁÉÅK§ØÏk«éë*³ˆtÏ)¥ÁÛ¼2qŸ^ «RV^Ÿ-ô‡¤“m¾½Vºn´ÿ3OûRËìûÏgn`äœãqœçŒc9ªvZ¯Ûµûˆ"•¼vÈÞ[FQ‘Ë09p}zËŽÖê+a;[ÎVßX–vŒ!,Ñ’ÃpOÞ޵£bÒ\ø–êìZÏ»ZƉ$±”ÞC6x<޽~„RYÎQ¾ŸþÍþëéò*T)Â3k]ûiªülmÕMOþAòþÌUº©©ÿÈ>_ÃùŠÞ¯ðåèÎl'ûÄ=Wæ}EWÆ¥…Q@UgRFÐõ VXÚHì­¤¸d^¬K=ø  ÔWü7øÑãObßiVöñͼrShÎ9¡çåì>¾É@ã_¯ƒü¨ëÆÜÜT]±ÆægT\ŸL°Ïµq >+^øÿPÔ,5 : i­ã#ÛïÚWv0sœ‘ßžxã€S¢Šã¾%øá¼áUÕc³SËp¶Ñ#6Õ U›-ßCÓ¾(±¢¼ûáGÄ;ŸˆEì×¶Q[]YȨæÞ[†g§ ñ“Ž=kÐh¢Š(¯ž4ÏùÅøÿ3_C×Ïgüƒâü™¯c'þ$½žâ?÷xzþŒ·Y÷šÝ…ƒÈ—¸1(i6BòÁ鸨;sïZÈÝC=æ½öxg“O’DŠåà…ËšEíÓ×5ëâ*Êš\»Ÿ1„£ ²jnÉ/N«É›òë60´HòH$•<Èâò\»eÆ g½Gˆ4¹Df;­Êä.ðŒUIà8ŸfÁ¦Gi0ñ7Òá-â²hY¤‰—Ë}ãÙ+a[ƒÏÒ³ŸM¿Âš¤/er³-ô€¡‰ƒf$qޏþ¢U¦›µºþ yõ¹¼pÔ“½ô꺷åÒÆÕÖ«gg1†W‘¤ ½’(^B«êvƒõ¢mZ ‘îT¥Æ|’ ·™Æp¸ê}SÐU0·6««IwezÉ$¨Uâ¶y0BmòÎÐvœ©àã¿¡ª¶º>¡dú*ÜÙ̆ÞâjŽK¡5R””dº¦ïù‹Ý]Iæy—s?˜›$ÜÙÞ¼ð6Éêw>IÞÛ‡ºº“Ìó.ç16I¹³½yàþm“Ôî|“½·ußð«/?èdÿÉÿÅÑÿ ²óþ†Oü‘ü]cõœ¿ù?¯ê™ÏüþÿÉŸùZvG"÷WRyžeÜïæ&É76w¯<ͲzÏ’w¶áó<˹ßÌM’nlï^x?›dõ;Ÿ$ïmÝwü*ËÏú?òDñt¬¼ÿ¡“ÿ$GÿGÖpËøÕ3Ÿùýÿ“?òþ´ìŽEó<˹ßÌM’nlï^x?›dõ;Ÿ$ïmÃÝ]Iæy—s¿˜›$ÜÙÞ¼ð6Éêw>IÞÛºïøU—Ÿô2äˆÿâèÿ…YyÿC'þHþ.¬à?—ðªg?óûÿ&åýiÙt²Ë<ždò¼²`.÷98¿2Iõ$“’I-®Ïþeçý Ÿù"?øº?áV^ÐÉÿ’#ÿ‹­á™aaXè½:¹&>´ÝJ’M¾­¿ò…Ÿòñý»ÿ'¯I®cÂ>o ½ô’jl{¿/'ÉòöíÝîs÷¿JéëÂÄÔJÒœvgÔàéJ•S–é$QE`t…Q@7ÅOùôïû Gÿ =yåz÷‹<6|O¥Ãf/ «Ep³¬ž^þ@aŒdz¹OøU—Ÿô2äˆÿâëØÀciP¦ã=ÏŸÍòÊøº±•;Y.§”¥pq•tWVÁeXlÈ8 ÈL/¯„†O¶JÎÀiÈd8ÍÆN7îÛŸ—Ö¬¼ÿ¡“ÿ$GÿGü*ËÏú?òDñu½Lf¤¹¦®ý?¯êÝ‘ÍC/ÍhCÙÓ¨”{_þô›[Iß•þÒÔ;ßÌÞ¾`Y7ÿ¿¸3·îÛü8£ûKPï3zùdßþþà|Ìvß»oð⺯øU—Ÿô2äˆÿâèÿ…YyÿC'þHþ.£ëåü?¯é.ÊÚ}W9ëWñÿ­õ¿{Êÿ¯Êÿijïæo_0,›ÿßÜ™ŽÛ÷mþQý¥¨w¿™½|À²oÿp>f;oÝ·øq]Wü*ËÏú?òDñt¬¼ÿ¡“ÿ$GÿGÖ0Ëø_Ò]•ªç=jþ?ð5¾·ïy_â•ù_í-C½üÍëæ“ûûó1Û~í¿ÃŠŽ[»¹Ô,÷SL ç÷’ÇÕ›ï9bp âºïøU—Ÿô2äˆÿâèÿ…YyÿC'þHþ.ª8¼ 4cgéýIvDTÀæÕ`á:‰§çÿ×ï—óJüejxKþGíþÛÿ覭ÿøU—Ÿô2äˆÿâëCAøy.¯Új“k?jû6üGöP™Ü¥zî>¾”ñ9… ”eîÉÀdØœ>&5gk/?/C¹¢Š+À>¨(¢ŠÊñ?üŠzÏýxÏÿ¢Íx•¯üyÃÿ\×ùW¼jVcQÒ¯,K˜ÅÌ p3·r‘œ~5çéðªéQ|G…Q€>Â:ßuéåøªt9¹úž.q€«ŒQTí§s©ÕÐB‚å‚—ó˜ßÞgOõ¹çÍÿYŸâ®¿þeçý Ÿù"?øº?áV^ÐÉÿ’#ÿ‹®ê¸ì_â+ü3•æxkû¨ß{>ß#•þÒÔ?çþoÉzÿ{§úÏúiþ³ýª?´µùÿ›ò^¿Þéþ³þš¬ÿjº¯øU—Ÿô2äˆÿâèÿ…YyÿC'þHþ.²úÆù£ê¹Ïüýüà¯ö–¡ÿ?ó~K×ûÝ?ÖÓOõŸíQý¥¨Ïüß’õþ÷OõŸôÓýgûUÕ¬¼ÿ¡“ÿ$GÿGü*ËÏú?òDñt}cü¿€}W9ÿŸ¿ü•þÒÔ?çþoÉzÿ{§úÏúiþ³ýª?´µùÿ›ò^¿Þéþ³þš¬ÿjº¯øU—Ÿô2äˆÿâèÿ…YyÿC'þHþ.¬`?—ðªç?ó÷ñÿ€q•SSÿ|¿‡ó߬¼ÿ¡“ÿ$GÿM“áEÄÈc“Ä[õbÿg­ç™áå•þ㎆EЧV3v²iïÿôº(¢¾pû¢Š(¦M wÉ È)£« †`ƒO¢€9_|8ð§„µ¿Ñt¡otècói$!OP71ÅuTQ@5M2ÏYÓ.tÝB=Ê匒7)÷¨æ²<1à_x4Ü NÍq+™^F`: ±8ÑQ@fkþÒ¼Q¤¾—¬Z‹›G`Å7²AÈ ©~u§Ebøkš/„lËC²°Hþcì囦IbMmQEQEóÆ™ÿ ø¿ækèzóHþ\B"ñTÀ6@ãÿ¯G/ÄÓ¡)9õ<œßWJ0§k§}N:•%ž;…ž;©ÒER ¤„pvõÎc–]‰´‹ŽËþeçý Ÿù"?øº?áV^ÐÉÿ’#ÿ‹¯B¦?UZjÿ#ÇÃå9–·Fj7ÞÍÿ‘È¥ÕÔ~_—w:yi²=­‹Çò\£j`‹´K«¨ü¿.îtòÓd{[Žä¸=FÔÁo]ÿ ²óþ†Oü‘ü]ð«/?èdÿÉÿÅÖYÀ/àt}S9ÿŸßù3ÿ/ë^ìäRêê?/Ë»<´ÙÖÆÅãù.Qµ0FÅÚ%ÕÔ~_—w:yi²=­‹Çò\£j`‹·®ÿ…YyÿC'þHþ.øU—Ÿô2äˆÿâèúÎùú¦sÿ?¿ògþ_Ö½ÙÈ¥ÕÔ~_—w:yi²=­‹Çò\£j`‹´K«¨ü¿.îtòÓd{[Žä¸=FÔÁo]ÿ ²óþ†Oü‘ü]ð«/?èdÿÉÿÅÑõœòþõLçþäÏü¿­{³‹U ¡TÀµEuÿs×6þUÜÿ¬¼ÿ¡“ÿ$GÿHÿ ®ÄyV#ì#§ý÷[¼Ïk+ýÇrZš”šß¿üµðÇüŠz7ýxÁÿ¢ÅjÕ]2Ëû;J³±y‚Ú‡~1»jœvéV«ç$îÙöK`¢Š) (¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢Š(¢ŠÿÙcollectl-4.3.1/docs/OperationalModes.html0000664000175000017500000000722513366602004016557 0ustar mjsmjs Operation Modes

Operational Modes

Depending on which combination of switches are selected, collectl will run in one of 3 main modes with various options for added flexibility. The most basic mode, which you get if you don't select one of the other 2, is display. In this mode the output is displayed on the terminal in real-time as it is collected. In record mode, specified by the -f switch, data is written in real-time to a directory of the user's choosing with an optional prefix. In playback mode, selected with -p, data is read from a file that was generated in record mode at an earlier time.

The format of the results can also be selected as either Terminal or Plot. Terminal data is always displayed on the terminal while Plot data, selected by including -P with any of the 3 modes, can be either written to a file or displayed on the terminal. Since plot data is not intended for human consumption, the reason one would typically send it to a terminal would be with the intent of redirecting the output to a file or piping it into another script.

Using the -f, -p and -P switches in different combinations result in the following behaviors:

No switchesData is displayed on the terminal as formatted text
-PData is displayed on the terminal in Plot Format
-f fileRaw data is written to the file (whose name is constructed by collectl) in the same format as it occurred in /proc, with the extension raw. For more details on file naming see file naming.
-f file -PData is written to the specified file in plot format, with one or more of a number of extensions depending on what detail data may have been requested.
-p fileData is played back from the raw file specified by -p and displayed on the terminal as formatted text. If one wishes to view a subset of the data recorded, -s can be included to provide that discrimination or --from/--thru to select a subset of the timeframe. Note that if one specifies subsystems for which data has not been recorded, they will be displayed as zeros. One can also change the format that the data is displayed though various switches such as --verbose and -o.
-p file -PData is played back from the raw file and displayed on the terminal in Plot Format. Note that since one often uses this mode to produce output usable by other tools/programs, the user can force the output format by including -s and only those subsystems specified will be displayed. Furthermore, subsystems for which data has not been collected will also be displayed as zeros to ensure consistent formatting across multiple data files.
-p file1 -f file2This is NOT supported as you can only write data that is played back to another file in plot format. Someone wanting to do this should rethink what it is they are trying to do.
-p file1 -f file2 -PData is played back from the raw file and written to the specified file in Plot Format. Note that here too -s will force specific subsystems to be displayed.
updated Feb 21, 2011
collectl-4.3.1/docs/Playback.html0000664000175000017500000001753713366602004015047 0ustar mjsmjs collectl - Playback

Playback

Playing back one or more files

There are actually 2 reasons for playing back a file, one being to generate plottable files and the other is to simply examine the data in the same format as you would see if running collectl interactively. The following discussion applies to both cases, the only real difference is that to generate plot files you include the switches -P and -f.

You tell collectl to play back one or more files using -p followed by any combination of one or more filenames separated with commas or whitespace, noting you need to quote the string if it contains spaces or wildcard characters. The files will be played back as if a single file with monotonically increasing sample numbers for each unique host. It should be noted that if these files contain samples for different subsystems the resultant stream will contain data elements for all, zero filling as appropriate. When this occurs, a message will be displayed if -m has been specified.

Collectl will generate plot format if requested with -P, writing the output to multiple output files if both summary and detail data is specified or when a file with data for a differnet host or a different collection date is encounted. NOTE: files that contain data that crosses midnight will not force creation of a second file when the date changes.

Filtering with --from and --thru
You restrict the timeframe between which data is reported by using --from and --thru. However, since collectl doesn't require you to specify both switches nor does it require you to specify both a date and time for each switch, it tries to make an intelligent guess as to what timeframe you really meant. In most cases it guesses right but sometimes it guesses wrong. The simple fix if it gets confused is to remove the ambiguity and just specify full dates/times with both switches.

When you specify playback files using wildcarding in the name string, collectl initially selects all the files that match. However, in some cases you may not really intend for them to all be played back, especially if you selected a timeframe using --from and --thru switches. Have no fear. Collectl will use these switches to select the appropriate subset of files that match your selection.

Caution
When filtering wildcarded files using --from and/or --thru switches, collectl compares those values with the timestamps of the filenames to determine whether or not that file is likey to contain the data requested. However, collectl doesn't know if a file contains data that spans midnight or not so it simply assumes it doesn't since this is a rarely used feature. Therefore, collectl only relaxes these tests when wildcards are not specified in the filename playback string.

Processing files that span midnight
The main reason for the complexity in the interpretation of --from and --thru, is to allow collectl to deal with files that contain data that crosses midnight. As an example, consider a single file with data collected from midnite on one day to 2AM, 26 hours later. If you want to tell collectl to process the entire file, don't specify any time filters and it will report everything. But if you want to process the date from midnight to 1AM, you need to tell collectl which dates are involved! As stated in the rules above:

  • if you only specify --from 00:00-01:00 (note we're using the alternate switch format), collectl will process 1 hour's worth of data.
  • If you want 25 hours worth of data you will need to include the date of the second day with --thru.
  • If you only want one hour's worth of data for the second day you will need to specify both dates.

A word about the first record reported
Collectl always needs data from a base interval from which to begin calculating changes in counters and that interval is never displayed. In other words, if you collected data every 10 seconds starting at 10:00:00 and then played it back, the first time reported will be for 10:00:10.

In order to try and mitigate this when playing back data and specifying a --from time, collectl attempts to read a sample from the previous interval so that you actually see the time you requested. Further, when mulitple files are processed collectl is smart enough to know if they are contiguous to use the last set of data from one file as the base interval for the next one and as a result there will be no holes in the data as reported. However if they are not configuous a new base level must be taken for the new file and its first record skipped. This can be confusing and probably not even that important but consider 2 files generated contiguously:

  • If you process each file one at a time, their first samples will not be displayed
  • If you process them in one command using a wild card for the date/time, you will see the first record of the second file
If you have 2 non-contiguous files you will see the same results whether you process them one at a time or together using a wild card, that is no first record for either.

How --from and --thru are really interpretted
This section probably contains more detail than you should really care about and is here more for completeness than anything else. Remember, these switches can contain a date, a time or both! You can also use the shorthand form of combining both switches under --from by separating the two with a hypen. In other words you do something like: --from 12:00-13:00, noting you can use an combination of dates/times on either side of the hyphen.

  • the default from and thru dates are 20000101 and 20380101 respectively
  • if no times are specified, each file will be processed from beginning to end
  • if no dates are specified, the time(s) apply to each file being processed. In other words the switches --from 10:00 --thru 11:00 will result in data being reported from 10:00 to 11:00 for each day! If only --from 10:00, data will be played back every day starting at 10:00. If you want to the first day to start at 10:00 and subsequent days to be reported in full, specify a complete set of dates/times.
  • if one or both dates are specified, things become a little more complicated
    • if one date is specified, use the default for the one not specified
    • if both dates are specified, times only apply to the first/last dates
    • ignore any files created after the thru date/time

Important Tip
Perhaps the most important thing to keep in mind is that when you play back a file, collectl will use the same switches as were specified during collection. In other words if you collect cpu, disk and network data using -scdn, when you play it back you will get cpu, disk and network summary data either displayed on the terminal or written to a file. However, you could just have easily chosen a different subsystem specification such as -scND in which case you'd still get CPU summary data but now you'd get network and disk detail data. This feature can be extremely useful especially when combined with different output formatting switches such as -o and/or --verbose.
updated May 04, 2011
collectl-4.3.1/docs/Hello.html0000664000175000017500000000332513366602004014352 0ustar mjsmjs Hello World

Hello World

Introduction

Included with collectl is the example file hello.ph which is a collectlized version of hello world. It simulates a hw subsystem consisting of 3 hw instances, which in turn report a single counter. Here is an example of its simulated /proc data, which is shown by using -d4. Also notice in this case the discriminator is hw but -n is also included in the calls to record() to further identify individual devices:

collectl --imp hello -d4
>>> 1238167880.003 <<<
hw-0 HelloWorld 0
hw-1 HelloWorld 10
hw-2 HelloWorld 40
You can use this example module with virtually any combinations of switches and any other collectl subsystems as well as exporting the output over a socket, writing to a raw file or playing it back. As you should realize by now the combinations are far too extensive to list so below is only the simplest one, showing this data combined with cpu stats in brief format with timestamps in msecs.
collectl --imp hello -sc -oTm
#             <--------CPU--------><-Hello->
#Time         cpu sys inter  ctxsw   Total
11:40:29.002    0   0  1027    126     140
11:40:30.002    0   0  1012    138     230
For further information on using this capability see hello.ph which has been heavily annotated and should make a good staring template for developing your own custom modules.
updated Feb 21, 2011
collectl-4.3.1/docs/Sockets.html0000664000175000017500000000653613366602004014731 0ustar mjsmjs Socket Interface

Socket Interface

Introduction

Collectl actually provides 2 different mechanisms for socket communications and depending on which style you intend to use, you should pick the appropriate one. Regardless of which style you choose, collectl will send all data that might normally go to a terminal to that interface, prepended with the hostname for easy recognition at the other end. One should also note that you can simultaneously record the data locally in a raw file, a file in plot format or even both depending on your needs.

Short-lived communications

The model for this type of communications is one in which some other tool such as colmux starts collectl on one or more remote systems, instructing it to send its output back to the initiator until the initiator terminates the connection. At this time collectl will permanently exist. Using this style the initiator of the communications opens the socket and then typically starts collectl using ssh and includes its address/port in the command string using collectl's -A switch. The key things to note when using this style are:
  • This mechanism will never survive a reboot without extra work on the initiator's part which must include restarting collectl
  • As soon as it starts, collectl will open a socket to the specified address and immediately start sending its output back to it
  • If the other end of the socket goes away, collectl will exit
  • If collectl is unexpectedly terminated (such as sigint or reboot), it will close the socket without warning the other end and therefore the program at the other end of the communication path must be prepared to deal with it

Long-lived communications

This mechanism has collectl create the initial socket and listen for a remote connection. In this mode, collectl continues to collect data and at the beginning of each monitoring interval looks for a new socket connection request. If it receives a request, collectl then creates a connection back to the requestor and immediately starts sending data to it. If that connection terminates collectl closes down its side and then goes back to listening for a new connection. The key points about this style are:
  • This style of communications will survive a reboot if configured in collectl.conf
  • The switch to put collectl into this mode is also -A but rather than supplying an address as the argument use the text server
  • The other end of the connection needs to periodically try to connect and if it fails must try again. If that connection exits, it must clean up its end and go back to trying to reconnect again.
  • If you want collectl to terminate you much manually do so, noting after a reboot it will restart.
  • To verify collectl is functioning as expected, run the utility /usr/share/collectl/util/client.pl passing it the address of the system collectl is running on. It will display anything collectl sends to it over the socket.
updated Feb 21, 2011
collectl-4.3.1/docs/Network.html0000664000175000017500000001606113366602004014741 0ustar mjsmjs collectl - Network Info

Network Monitoring

Introduction

As with other subsystems which contain instance data, you can monitor both summary (in brief and verbose modes) and detail data. Like disk data, the key brief mode values are bytes and packets (rather than iops). The actual data comes from /proc/net/dev.

The one key thing to keep in mind with network data is that not all networks are the same. Just like there are device mapper disks that shouldn't be included in the summary data the same is true for network devices. Those that are not included in the summary are the loopback, sit, bond and vmnet devices,

Since most lan networks run fairly cleanly and errors are rare, one is usually not interested in seeing long columns of zeros that never change and so by default brief mode does not include any error information. Adding --netopts e will add an additional column with a total error count. To see specific errors one would have to run in summary more and do identify the specific networks on which those errors were occurring you would have to run in detail mode.

What about IB over IP?
Good question. When using Infiniband networking you typically get an IB network device created. So does this mean IB traffice gets counted twice when you monitor both it and network data? As they say, it depends. Some Infiniband data will indeed go over the native IB interface and never show up as network data. This includes MPI traffic or lustre which uses the native IB transport. However, other uses of Infiniband may in fact be counted as network traffic. BUT this is actually a good thing because if you're a heavy user of IB/IP and want to be able to differentiate the native IB traffic from it, simply look at the network detail data and subtract any IB network numbers from the native values.

Tips and Tricks
Ever try looking for a needle in a haystack, in this case maybe it's network errors? --network E works just like its lowercase cousin except it tells collectl to only report intervals that have network errors in them. While this can be extremely boring in real-time mode, consider what happens during playback. During the course of a day you'll have 8640 samples but this switch will allow you to see the one that recorded the network error!

Filtering
Before describing how filtering works, let's quickly review the difference between summary and detail data. When looking at summary data collectl reports the totals across all network interfaces except for those it knows have already been accounted for. For example, if your machine has eth0 and eth1 bonded together, the bond shows up as a third interface and if collectl were to report the totals across all three, the numbers would be twice that expected. For this reason, collectl only looks at specific types of interfaces, which at time of this writing are limited to eth, ib, em and p1p, though no doubt there will be others in the future.

You can change the set of interfaces collectl includes in its summary totals with --netfilt, the target of which is actually one or more comma separated perl expressions, but if you don't know perl all you really need to know is these are strings that are compared to each network name and only those that match will be included in the summary. This not only makes it possible to reduce those network you wish to summarize (say you want to summarize ethernet networks but not infiniband ones) OR include new ones that are not currently known by collectl.

A second form of the switch, in which the first character is a ^ cause all names that match the expression to be excluded from the summary data. Whether to use the first or second form may be more a matter of which is less complex for the particular form of summarization you're looking for, but if you're trying to extend the list of those interfaces recognized as being summarizable, your only choice is the first form.

In addition to controlling how network data is summarized, this switch also controls which interfaces are reported as detail data. However, be careful when using this switch during playback because you may not wish to see detailed and summary data filtered the same way. If this is the case, you will need to play back the data twice, once with -sn using the --netfilt to control the summary data and a second time with -sN and a different value for --netfilt to control the detail listing. It would certainly be possible to introduce separate filtering switches for summary and detail data, but it is felt that this situation is uncommon enough to not make things more confising than they already are.

Dynamic Network Discovery
Network devices typically don't change unless you install a new network card and reboot the machine, in which case collectl when the list of known networks is created it's simply correct. However, especially when running on a host that creates virtual machines, networks can come and go. The way collectl does this is to simply maintain a dynamic list of the active networks and make sure the statistics get associated with the correct network. As a way to preserve network details in plot format, collectl further retains a list of all the networks ever seen since system boot and uses that list to record the statistics, thereby keeping the columnar data consistent. In many cases where hypervisors reuse the virtual network names, the list of unique names remains relatively low.

However it was recently discovered that the OpenStack Quantum service for handling dynamic networks actually uses a new network name every time a VM is created. If a particular host has dozens of VMs and they come and go many times, the result can be a very long list of network names which not only show up in all the headers of all the files collectl generates, they will also show up as unique sets of data in the network detail file if one is generated. On one long running host, over 500 unique network names were generated!

The way collectl has chosen to deal with this is with --netopts o, which tells collectl to not preserve the ordered list of all the networks that have been seen and will in fact cause collectl to prune that list whenever a network is found to have been removed. While this solves the immediate need to keep only the active networks in the headers of the output files it creates a different problem with the detail files in that the data will no longer be consistent. In fact, it is probably best to not even try to generate detail files for dynamic network when using this switch. As of this writing I'm still trying to think of ways to improve the situation.
updated April 21, 2014
collectl-4.3.1/docs/Infiniband.html0000664000175000017500000001413613366602004015352 0ustar mjsmjs collectl - Infiniband

Infiniband

Monitoring

Collectl V3.7.3 now supports monitoring infiniband by looking at 64 bit counters, when the HCA supports them and virtually all of them do. This means several things:
  • collectl no longer has to read/clear the counters to read them and so is non-distructive
  • there is no longer a restriction of multiple instances of collectl monintoring infiniband statistics
  • you can now read these counters at greater intervals (if you really want to) with no fear of them latching when they hit their 32 bit maximum

The easiest way to tell if your HCA supports 64 bit counters is to run perfquery -x and if it works, you have 64 bit counters. Alternatively you could also run:

collectl -sx --showheader

and if it displays an X in the flag field, you have them. If you do have 64 bit counters but collectl doesn't report the X, you have an older version installed. The code to deal with 32 bit counters will be left in place for awhile but eventually removed. The rest of this documentation talks about monitoring the narrower counters and is largely unchanged from before.

32 Bit Counters

The most important thing you should know about 32 bit monitoring is that it is destructive. What is meant by this is that every time collectl reads the counters from the HCA it immediately resets them to zero, thereby destroying their previous contents. You should also note this does not apply error counters, which are never reset.

The obvious question is why? and perhaps the less than obvious answer is because when the hardware specifications were written for the Infiniband HCAs it was decided that performance counters would not wrap, probably because nobody thought someone might want to do continuous sampling. In any event, at even modest traffic rates HCAs with 32-bit counters quickly reach their maximum values and stop incrementing, rendering them useless for performance monitors like collectl. Collectl's solution to this problem is to read the counters and immediately reset them to 0. As long as the next sampling period occurs before the counters fill up, this methodology comes reasonably close to reflecting the traffic rates (some counts are lost between the read and reset).

However, this methodology has a downside in that while collectl is monitoring the Infiniband stats, nobody else can (including other copies of collectl). Unfortunately there is no solution to this problem short of redesigning the HCA and that's simply not going to happen. A second alternative would be to come up with a mechanism in which the read/rest of the counters are moved into an OFED module which exports these to /proc or /sys as rolling counters. This was in fact done in a pre-ofed version of Voltaire's IB stack which is currently supported by collectl. If someone would like to hear more details on how this was done, feel free to contact me or to post something in a collectl forum or to the mailing list.

If you want to run collectl but also prevent it from doing destructive monitoring, simple comment out the line in /etc/collectl.conf that begins with PQuery = and you will be informed by collectl that Infiniband monitoring has been disabled whenever someone try to monitor it.

Monitoring Mechanics

The main purpose of this section is to help you understand how monitoring works so when it doesn't you might be able to figure out what went wrong. There are 2 different ways collectl can monitor Infiniband, one for the OFED stack, which is the Infiniband Stack of choice these days and the other for pre-OFED.

OFED

The OFED stack can be identified by the presence of the /sys/class/infiniband directory. If there, collectl looks inside to find which HCAs are present and which ports are active. This information is then used to query the HCA via the perfquery utility.

Unfortunately, with each release of OFED that utility seems to move to another location and collectl tries to react by using a search path in /etc/collectl.conf. As of the 2.5.1 release of collectl, if it still can't find the utility it will try to find its location with rpm and then add its path to collectl.conf. If a future OFED release eliminates or replaces perfquery collectl will break.

Pre-OFED

All pre-OFED monitoring code has been removed.

Debugging

Collectl has a variety of debugging capabilities built into it, the main one being the debug switch -d. To use this switch you specify a bit mask which is then applied against a variety settings which tells collectl what to display. For debugging interconnect problems simply use -d2. All possible bit settings and their meanings are listed in the beginning of collectl itself.

If collectl runs without errors but you're not seeing IB traffic being reported when you think you should, you can always use -d4 or even -d6, which show the values of the counters returned by both perfquery and get_pcounter. If they don't change something outside of collectl must be wrong.

One example of a non-collectl problem was a system had IB configured and started which could be verified by seein an ib0 interface show up with ifconfig. However, when running collectl -sN, which will show the traffic over all the network interfaces, there was never any traffic on the ib interface however there was unexpected traffic on one of the eth interfaces. Clearly something was wrong and looking at the routing showed the routes were set such that all traffic to the infiniband address was being routed over the eth interface.
updated Feb 04, 2014
collectl-4.3.1/docs/Memory.html0000664000175000017500000001733513366602004014565 0ustar mjsmjs collectl - Memory Info

Memory Info Monitoring

Introduction
Collectl reports the standard values with respect to memory which are fully documented under the data definitions. Trying to rationalize the way these values relate to each other can be a frustrating experience because they rarely add up to what you expect. Part of the reason for this that every byte is simply not accounted for in every category. This can be further complicated because at boot time, some devices will actually grab some memory that the kernel will never even see and so the total memory will not always equal to the physical amount of installed memory.

If one looks at /proc/meminfo, which shows a lot more types of memory than collectl reports, it begs the question "why not report it all?" and the simple answer is there is just too much. Further, collectl uses a second file /proc/vmstat to gather virtual memory stats which further adds to the volume of possibly candidates to report. Again, collectl tries to report values of most use.

Brief, verbose and detail
Like other data in these 3 categories memory also reports values in this way as well. However there are a few important caveats to note:

  • Brief mode combines mapped and anonymous memory into a single value of Map primarily as a mechanism to try and reduce the width of the memory columns since including too many defeats the whole purpose of brief data reporting, which is to allow multiple types of data on a single line.
  • Unlike other subsystems that report summary and detail data, memory summary data is not the result of collecting detail data and adding it up. This means that unlike subsystems like disks or CPUs in which you can record either summary or detail and still be able to play back either, when playing black memory you will only get values for what you chose to record, otherwise you will see values of all zeros.
  • There is no buffer or cache values reported as detail data because that information is not reported
Don't get fooled by cache memory vs used and free memory
One of the most confusing things for people who aren't more familiar with linux memory management is the interpretation of cache memory and its relationship to used and free memory. As one might expect, as cache memory increases so does used and as expected free decreases.

What fools people is that the first (or many) times they see low free memory they think their system is running out of memory when in fact is it not. If they reboot, the memory frees up, but then starts to fill again. So what's going on? It turns out that whenever you read/write a file, unless you explicitly tell linux not to, it passes the file through the cache and this will cause an increase in the amount of cache memory used and a drop in free memory. What many people do not realize is, until that file is deleted or the cache explicitly cleared, all files remain in cache and as a result if the system accesses a lot of files cache will eventually fill up and reduce the amount of free memory.

Naturally linux can't allow the cache to grow unchecked, and so when it reaches a maximum set by kernel, older entries will start to age out. In other words, reading a file will be extremely fast when in cache but its access slowed to disk speeds when not.

The only real way to tell what is going on is to look at the disk subsystem while accessing a file. If a complete or partial file is in cache, read I/O rates will be much higher than normal. If a file is written that will completely fit in cache, again the I/O rates will be very high because the rate at which cache is being filled is what is actually being reported. It is only when a file is a lot larger than cache that the I/O rates slow down, operating only as fast as dirty data in cache can be written to disk and is in fact the only real way to measure how fast your disk subsystem actually is.

tip: --vmstat
Some people find the way vmstat reports virtual memory information to be very handy in some cases. The only problem with vmstat is it doesn't write its output to a file and so even if you wrap it in a script to write its output to a file you're now stuck with memory information in a specific format and if you do want to plot it that takes a little more effort too.

Collectl's --vmstat switch is actually internally turned into --export vmstat and so reports data the same way as vmstat does but now you get some added bonuses:

  • This data is always available when you record memory stats, you don't have to do anything extra to make this information available. However note if you haven't recorded CPU stats those fields will be reported as zero.
  • You can easily include the timestamp format of your choice, at least the 3 different ones collectl allows and even include msec it you need that precision
  • You can playback data in vmstat format and then play it back again in other formats.
  • You can use this with colmux which now means you can do a cluster-wide vmstat and show the top users sorted by the column of your choice
  • And of course you can still use any of colplot's memory plots
  • This is an excellent example of a very simple export module which can be a guide to help you write others of your own choosing.

tip: don't forget about --grep
As previously mentioned, there is a lot more data contained in the memory /proc structures than collectl reports. So does that mean you're out of luck if you want to see the value of say Committed_AS? Absolutely not, at least not during playback. Since collectl actually records the contents of /proc data in its original formats in the raw files, you could always use linux's grep (or zgrep) commands to search them for a particular pattern like this:
zgrep Committed_AS /var/log/collectl/poker-20110928-000000.raw.gz
Committed_AS:   889272 kB
Committed_AS:   889272 kB
Committed_AS:   889272 kB
Committed_AS:   889272 kB
Committed_AS:   889272 kB
Committed_AS:   889272 kB
Committed_AS:   889272 kB
Committed_AS:   891664 kB
Committed_AS:   891664 kB
Committed_AS:   891664 kB
Committed_AS:   891664 kB
But unfortunately this does nothing to tell you what time the values correspond to. You could have included the reporting of fields with >>> in them and you'll see the UTC timestamps, but those aren't easily mapped to conventional time formats. Now look at this:
collectl -p /var/log/collectl/poker-20110928-000000.raw.gz --grep Committed_AS -oT
00:00:00 Committed_AS:   889272 kB
00:00:10 Committed_AS:   889272 kB
00:00:20 Committed_AS:   889272 kB
00:00:30 Committed_AS:   889272 kB
00:00:40 Committed_AS:   889272 kB
00:00:50 Committed_AS:   889272 kB
00:01:00 Committed_AS:   889272 kB
00:01:10 Committed_AS:   891664 kB
00:01:20 Committed_AS:   891664 kB
Pretty slick! And since this is collectl/playback, you can use other switches like from/thru or even change the timestamp format and/or see msec too. Also remember this trick can be applied to any data collectl records, though memory tends to be the most interesting.

updated September 29, 2011
collectl-4.3.1/docs/ColmuxLustre.jpg0000664000175000017500000007550013366602004015575 0ustar mjsmjsÿØÿàJFIF``ÿÛC    $.' ",#(7),01444'9=82<.342ÿÛC  2!!22222222222222222222222222222222222222222222222222ÿÀxš"ÿÄ ÿĵ}!1AQa"q2‘¡#B±ÁRÑð$3br‚ %&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyzƒ„…†‡ˆ‰Š’“”•–—˜™š¢£¤¥¦§¨©ª²³´µ¶·¸¹ºÂÃÄÅÆÇÈÉÊÒÓÔÕÖרÙÚáâãäåæçèéêñòóôõö÷øùúÿÄ ÿĵw!1AQaq"2B‘¡±Á #3RðbrÑ $4á%ñ&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz‚ƒ„…†‡ˆ‰Š’“”•–—˜™š¢£¤¥¦§¨©ª²³´µ¶·¸¹ºÂÃÄÅÆÇÈÉÊÒÓÔÕÖרÙÚâãäåæçèéêòóôõö÷øùúÿÚ ?ëìå.£ÎŽ‘B,0£ÉUùöå”÷v@ô5%äwM:yVík²©[ÊéòŒõÀÏËì*µ„P®¤¸ÔXÊ4Õ û¡WgÊG͸aˆûóRjÛµÕ¿™pìßaC»Ïåzå²rqÁÇ^µ§_ëÌÏúü‰Ê]6™$o§D¤ÛÄ¬Ë ’˸ämèJ®ßSRhÐË¡;Éf–ì`ŒHV]Ïœ0ê=~œ ¬±Û,¢Ý¼£Êˆ-¹„Ÿ›Ìl…±‚~]¹è½jM ìO¨Hö—‚vŽÒ8ÙB`ã=Xç¯ÇjH†²õÔìã[¤ÇÌÉ |¼S€O`kR±üD½œBge˜ácó8Úw62>èɱ q’àÇ zÐr{f‘q±pr00}E+ckdà`äúSbFf„“¥¬¢{tƒ÷™bä7p=@=À¨õÔ©ÛÄn`pçÉÜËÁéë“·å{Òxu!KY„.ç.¤†‹ËãhÚØÉûÞäô¥¬ž%³qq Á‰‚—Ž18þ÷oLž”˜ÐÝ6;ß·éò6Ÿ0‹]²–·´OŽ@>íØ„Ö®¬²¾•p°¢¼…xVMàóÈÇ9ãÚ°´SbÚ…Œp_ÝÉ-´/—*Æž¸ç^G<óŽƒ:WCh®¶P,€¨`«´ŽÃ·Ò¸øô›V·>T“Ê“Y‰Ò ƒ’ ŽFàuë]u€A§[ÚFA…2Œ1ïïL_×äg_ÚÜÏöä †&Ù±Øoãæ9ê£ßž¸§évÆÛT¿E¶Š8HŒ‰%{`ç§_×ëÚ°µÝSR¸[Ë…–Ý•¤‰`òç†àÈéÎ85»¥Æ‘jZS&du”†ˆ(äuÎI$úcÒ’)]I¨ÝZÈ’Y.ÛËómG̸ù0 霂Ý@¦;_\ZÄ­`­$wr`Ij£?7ÊpOŒåºñVüI«Eg-Ô²F±Ü.Öˆe²}9tëϵd[-šÃoaö×gkòèd…„g?7sŽO'< àÎË¿ÉwùyJM§j·BqÀ4óɨ®v}’o1Ìiå¶çT`äŠÀŠZKœHV ì!G©QÀç?•iVV€‘ǧºÆX!܆//aÀà.N8ÁëÞµiˆÊŠ¢˜ú••|›µÝ5Í •@|Éä1œpCçåçÐV­eêh­¨iŒã(³qˆCØãæÈ*:ôzñ@#N«jJ_LºPɈáL~fO¦Þõj ½b– $FLjüÎßÝþ/¥'°-È´µ ¥ZoöÝ.bÙ³iÇ Ã9«•OJ…`ÒmQ+å†.Üîù³ŒœuéVé½Äf[÷ÄwR%¸Eh—|¦ ·rG±ü+R²â@ tã½jQÐlÊ×cok›At¢å !€Kòç’2x8Ï5«Yšâ,ÖQ@Ü,Ó¤d˜|ÁÉÇ<¹$ ÝEi{Pz§¥eø}zk/Ù³y¯¹D ß1Ãmê1Z£¨¬½q]ŒbO´1܈Æp:HéŽGãÍÐÔ¬»¸Ckö2-¸f Û¦0†Ú¾ºƒŸÂ´ë2õ×4×q7lÄY;°‹9ãsÏjÔªšœK.™rœ?ºT[ð<Æ­Övº‹&…z’ga‰·ŽNÐFq×­& rÆž›l’"£¬`TØôÛÛéVj8­¼bF áFYWh›qõ  :Ìñ<š4Á-…ˤDЉCsýÒqøö­:ÏÖü±¥JÒîØ¬¬q˜8a÷—#+ëÍ [—£ #@ˆQ06© víN¨mm¾Çi¶òþRÜzœTÔÄfi‘”¾ÔO‘åFÓdf…Ï99x{ž}zÕ›ÐvÛásþ‘r3Žj®¦¯ª8ÞÝ7)Œ*ô8 ‚rOáÛŽjÍø-²åæ>ÞôvÔ·M“x‰üµ ûNÐzŽ3N¨çÛöyw¹DØÛ˜uQŽM'°ÊMžVX$;@ŒG‘ÆIQÀç#Ü Ödø}"ŽÁÖ2ßë>d1y{~QŒ.N20zòI­jb2¤[øH¢qn†1ß4Æ2?7PsBkV±eH?á(‰ÌŽ$õ|ÚØMùà“ŒušÚ¤=R#6›2 Ä Uaœ÷ Æ*ÌJÉ +¸v e㨅UÕB6—p$Bñíù€EsŽø @éV- h¼¨ÌQìc#F8v¦µŸanb¼¾”ùad“å5Vã©%zòxÏ<{Ö…gië ßj-Ù&2#˜ÕC pçryæ€4k3QYާ¦2B²D$mç`%8à‚Gú`Öcj¶ßñ8Ó/wÏò?”8ÕÇ'<äŒ{›5Áam1EÛ {œt©j Ùâµ±žyØ,H„±+¸céßéIì r¾‹%ÔºL-x'ä cÙ!Ì»¾XÂe¿‹§P8Á<šÕkÃé}«É‘Ûs)ù¢Ù¹yÚýNâyËqœt„?V€É¨X4v«, ·Îð†U黪çÛß5SJ–ú;ÛXe´+o²E ¶¡6}ÓÉÃßžµk[¶3ÙFåÕYÙ$A™XŒ ®:ðõàVV—kz–˜­%ÂÍ(h•$c‡#'¦IÇ=(C{5Ú†²Y Šc`P(bÜtÁàý sö©&áÅÚ,Çav6OËžücŒbº+•fµ™SŠ0@Ã8ô8èMsQK¦A¤r/ï¾Èâ"mW‚Ià“œòço>†£{J2&×Ïb—Ë‘T§ÓY7öÄÞêoš–0) m•¼É21ƒƒ‘앯¦}œiЭªá@P!]¥H8 ß9®Ä1Z›«¯¶Hcó#E‰–ž¿ÞÞ7¿{cކ†(›Íöù®&ŒðÆ<´ ¿žG80{ý)5èDÖ>Î'q*ì SêAíî9ô¦iíd5›ˆbËÝ$‚åv:°9bx$ŸéRkщ4²¬3õ.<¡!ÛžÃ#ÜséL“äÞŸ´<Ö0¾Ú¾\©j§#œ–d~¹={×RpIqÒÁk{axöóÊðµø|E ±‘оNIïø Wb8#?%°3“H¯-¢W±Ó¢ ³HaG·D$ð0Äp Çpç ­­ g])Í·ÙåÜÛ“`^ýpÿÊl±[]º-Ä—ÌU~ΨKゼã‘Ô‘œÖ¦…Ãa$HnX,ò|×*¡‰''îðFO_­[UY[TQ¢Ë›IÊÖÛÂ7ºàÜcÒ«iÑÞÇ{¥–Óà|†Y^8×å8;d uééoY’ kûK«¢1$‡&ø8=[ ã°AÏ=ª­•¥ ¾Ó%·ˆÑ—]±/Þ·r­òä±$`Œàf„ccT$ÒîUá3)Cû°Ë{mnãXòYʶRn…Mš…-S.ùÀ<àcxôé[z‚³é× ˜Üc8Ê÷É ÄŠäc‡O6úKÉ<¿³!‘ò´€mÜ`qÓžI hìmC 8à8rvŒã°íô¬;‹#-Ψ±[G‘£Ã½²ïžX“Ã(÷çŽ+nÊE–ÆÞDbÊÑ© Txîé\î­?j¾3³#´°lu¶/';ðÜç–ÇЊoq-˜®u>L£l‘D#2uÉ#òçžüÓõ¨|ág¶ÜK"Ü.Ö0‰z“žGÔsI¥µ¼šž¨ðÜC4‚EWòã ·ƒ€H8cלsNÖã ¶ñ˜Öá ~äHzñŽAã‘“íG`3­­.î&—}œ0m¾Þ­öxÈÚ3’}øàŽç­t‚§##Ê[Gh^Ì-ÕÒíh‘cE]Àä†É'èsžNug88ëK =ÎbÚÚPšs.ž¸(eû"+$D~@çÐàûÖÎŒ².• Ë”Üá<¡QžQÆ~kšK{8f³’Wt˜´Ëµm¶«°Îr¡ñ·ž‹×Úº-Ø6j$*ÆX \ÀÈ ØVàÖD«qäjHö%¤Ä>[­Ãð3ÙÈ?‡JÙÕÐI¤]!\©Œçå ë‚GóÌÏkc4w0G|òIåDŒ‰218ù[æÆÎœmÉæ£±MÛv7`gãÓµsS[\Ã5Ô–ºz¨Aµ¢î-ƒƒèAl Ý@çéÐY66æ97§”»_Ü09Çjæå³F²¼ÜÎ"û{eÓ”);KóÔÜœàÓ{‰lkhдê öT†#>c)MüsÓ¯=ÏæiÚ¤l÷zqH<ÆY³¼Â ïÏUϨüzT’ ¾Þð¸w3üÿ¹òÈ㌌œœwàžã4ýub+fó;Çs‡2$aŠãñÈú€JAÔ¥oÌÚ†Ÿq5#Η-%²¤‘Ž “ŒùÇ'#® t•ÆiÐYÇ}c¨Ey)F8ذ.йüÇibFO$žÂ»:À÷9‹‘n£ÿ‰'— Ø`%—æÛó!Ï#œ¨qíRÞG3M—¦†F² ÀÀÍäôùGLœgÏ^Â05oí4iš â".Ï•'+Æ_Ÿ¨G]Bd»þÂao´r½NrA8ﺟ_ëÌ?¯È›dͦ¼m¥ª7‘³Xåwž‰ê£æÚrj}í?jŸí6 mˆcc‡`Æ>î{sÇ g®j¨HÎ’P_y©åDÜÆÇçå)œàŸ—~^´¾··“w5ÊÍȲ<8Èíó{tÛÚ’tu•®¤¯gÅh·¼ÎÓ“Ž©À'°<Ö­cø‰#{(„’”ùÉÇ–_§-€„|ÙíŠ#\d¨ÈÁÇ#ÒƒœN8ëH¸Ø¸mÃ×Þ•¾éÉÚ0yô÷¦ÄŒ½ %KYV[E·ýæ@òHàú€{ÅGöÅÖíÚ ¹‹É`XäàÿùFx¼š_¤ii0Žbÿ8$ye1òŒ6 þ!óß5[Uÿð‘[3^Ío8¶ˆ³7 €:SŒsžÔ˜ÐÍ>˜îôùHŽÞB¤JDYÛ’Ù;²@#޹'q­­H1Ó.BBf,퓸úqXZmœ6úŽ“7ÛfÉlÞQdÇšHÝýÜÓÔ ÜÕm&ì4žPò[/ŒíëŠÌæɬî#»Ó[Ëk4ùc´', Ž3êè3œVÝÅݦ…°EºåV0cH]óÓ /QÆzô®~)t£`A¹Vš EŒew‚Aç,9U>Ä×W§Âmôûx‹—Ú€n#¦ “É-ðqxäet?ew% 9É’q÷qÓ5¥¦ÆÉ¨jÈòÑ™Ic &öÇ$Ã~¾µ…¨ý†ÖûSI¯J¼ŽŽÑxržY@‡€¤äzþ5«LFTQâyÜÚ›u p!qÎNT¿Ý<`ãÞµk-Æí´nDxˆ‚yù²ùÁíÁV¥f~²&šè°‰C2†Y“<£’GµZ´ßö(<È–'ò×tkÑ:¥S×L*Ç«®aq!ÏÝ*99þ•jÇhÓí¶JfO)vÈz¸ÇZ‹ë3HŒÇ-þ 1Fn ]дesÏÜuïZu•¤ K½L¦|°XŠ1ÇN;޽ûP>†­fj1–Ôô×H º¹Ì†`‹Æ~a÷I÷àþ§Yz¢¨énàY¾÷”IÝØåïõïÅjUkô2i÷°‰˜ÆÀDz?µYªÚ€S§\‡”†&̃ªñIì"=.(Ú(!*‹°§ÈÊvžW#µ_¬ýU4ÅTnŽùM…gqÊ<€:b´)™e|EtëUhW|¦]íÆnøÎ´ë.$ij³m Öãf##+‘’[8n.ÝëRŽƒ{™zô"m=GÙÌò T¢ùM Ô…9zö­>Õ™¯ })·mòérb/…Ï$`ä~Õ¦:PŽ¢³4HÌvÓ¯`ÏmŠahɤSžŸÎ´ÇQYz"â»RH.ÌUˆÆÀþOltëõ ] J̺ŒvÆD€±Á¥11 ¾›Çç±â´ë.åoYÊ(Þ@ M¸ô?Oâ/¾hê3R©ê‘¤Ú]ÊIL†3˜Ö=彶知\ªÚ‚»é×+Vs4{Çâ¹çéIì pÓÕ“M¶WEG€UT¨Ø<ƬÕM,F4«A&?)vœ‘øóVé½ÄfYFWYÔX@Qi.ae.Øê£ cžµ§Yv­¬jS‚™ ±"d#<ç†ë÷¿ Ô£ Þæ^µ‘-6@e‘n¡òYÂz“·•Ï5¨zšÊ×P5µ¹p¾RÜ!rb,@Ï åyÇ5ª~ñ H Dà.âTü¹Æ}³YºOœbkQnªçbˆÌ}@'å$žF{ã5¥&<§ËìNXk/ÃÉXȱɻ÷™+å”ÛòŒOÌ0ÙîZzÊøJ ›RÑ€[ ðÙåKýÞ«V²æAÿ -«¸00ŒˆÎIî ç¦;úâ˜ÍJÏÖ¡\‘q:’»¡| òvŽ[œV…PÖU_LucÕ— °¸îB£’LR`‰ìŒ¦ÂÜÍ•/–7'÷N:UŠ«¦„]¨ŽS2—‚ÜUªb34ÈÊ^ê$@b¦È&Bç¹Éᇸÿ ³}¶ø]ßéöè3ÖªéÑ„ÕõF C3!1ùe@àá³ß?Ҭ߶Û'éö÷¥ØRÝ2BÂ(Ûi“Ãt§ÔscìòîÆ».ÝëøPö(hI$zyY-„Hvw$©$Žr=ñZu“áôDÓÝQóûÎSË)³ü$“ÈÃÀ«Z˜Œ©câ8¤jP&ß;Ê''@AÀÇ^MjÖ4©ñD.g!Â,ðv·É» e±á­š@UÔQÞÂU‰wIµL{òsÇꨕƒHoe÷ UMYQô«…• FWæ=çÈbØF-a£$[Åa‚«ŽÀ”c#'½eèÓ]Ê—_l³6Î&$.ÌžàÿÖµQÆyéX~DXoLWŸjCpß¼('¹'¿ÿZú•™ªKw ÖžmmŒÀÊDå–Ø§óÛ#<ûVcj°0ÕôËÏ6Pˆþ_–°ï'¯±=3è)²j®§æeÝyVës'–vÂÃ!Ϧ*ÕRÕÂðIpmÐÄwL9)þz~4žÀ· %JéVá 00^c)·>³×מjíQÑ×fj¡Ãœ0@¼vàUêlHÊD›þIìj#1í󼳜``ïèI9zàSõØVm&Uh ç*V1“-ž2£œ.µ$_ð”»ùç~Â6ì<£äÝÓa±ïV5åFÐîÄ„*åŠnÚ3×Ôº naßÁu$ÓîÑÅÏïâÖû¡sÆNIð¯N³ðÅr7†ÓοG½IŽÝ\Isî9Îâ^Ýk®Æ=8¦&(ëY:"L‚ëͳ[}Ònùb)’zŽOÌ=kXu¬_¤Iö¿.s&\c+‘Ξ¥½G\R}F2ÚžšÉwWlÈaf¼gætôÆx?…TŠ+¸ît²öë&$qű_&<ÎÿO¬˜ãºÓç•!ŠF&C%N8ùùG^;â±tË8þÓ¤„¾.wIpƒÊÆá ëÁãõ¡ ìuWj¯e:ºS¡7–é·¿Ò¹Øà¸ylÑôµ1ýƒH-X•?6õ ^8®’è3ZL¨ªÎQ€ ›8î½þë‹–+Yàù/T}šÁĈal(9ÉÆà{ƒÇµ hì4ø^ÛO‚aÕ>aíP}‡5Ÿ®Ø¯Ø®.-¬bžêB¡Ã#1p:p;ôçëZVLÍcn\‚Æ5Éñõ4ë‰ÖÚÞIÝ]•XF¹8úSd£6Â)YšÉ"Soï"|  n„ ž•.·™b`2¸™JHõ!yzŠ­§˜$×®'Žvo1 (1²î.A'ƒ·Ž1ÆãV5ä ¥Á|µ‘K“|.{`äqÒ€1à—Q†PWCò‹^n¤9ÀìWžzŸ› ük«ï×½qÂÖkqêa¾1†–.Kœ®x9, ÀÅv#Œ ž;Ò[ œõ´þ\bÞÑ" v΂[B¤(Ï'å'œëõ­ îZÍ–æ'ca–c㑃èIïŒÖ*¬RI+Eäß³ËåÛ0U8dgŽ;ýIïZ¾‰cÒÙã¸óã–V‘[fßoSéB?Q·ê¶£ìÞ`1:É#[—P¤tÜ2N8<Ê©éÖ·¶ú½¨’Ê,[œÊ‘2Á$…<àœžFORjÖ«mNÑ„kåȪâ"X6Óü@ô'oʨh1Y¨Ó[q™‚KäJ Ú3“‘»=†xé͹©¢É¥Ü£ÄÒ©ŒƒÇæö ßéX6ð\Ê׉>”¨M²ªH–ä‡?/÷ˆ9Ïn£o=t‚»é×*K˜È£óâ½þ•͉ô²Èí,e¢´H7sØàî9cÓð1@=¤-oi ,# ˆùkµsß°®wRK…¼¼úYž%Î&ݲÜrG8qÀíߊèí£ŠXbvĨ/ ÇËëèÚ…ËÉqä? P¶ä‡!wdá¹ÉÉôÅ7¸-!îdžù®¬«ù˜F:ŒÏñqß§4ºÔeÒÐ¥¹šU¸RŸ¹g êI¨Æy¨tD7š‹¦œ‹‚´aHlœä÷ì?*]v1%½¶ìÅÂo,±^xÁ*sŽiv¦M°Ô]­ôtF7eåG‰†ÓÇÍžœ òz1]Q‚C\uª$ús.¤òbîE@ÖãçaÄÿµÁäóÏر 1@É$ð `{œšÅsµ¸‹I $-7”jÊ¥GÝù³ò’q€p8íÆw´eeÒ`nmؘÊlîyÇ^zóÍsövæòÚx¯œñq"ù0¾Ò ¨íÁ9ÀäñìkoÃÖÂ×DVo9yªÛ ðÜŽ ýJÂÙẺûOt`hÆA%Æ>î3ÓÚ³´–½¯›§yHÖ»d“Êa‚ Àž޾ջsp––ÒÜK¸Ç–mŠXà{ ËòÕ¼K á¤ýä;ÔùGmÆÝÙÀ^AÆ:Ð ZÊ 4‹…0´Ù± 9ùG^.µ‘z·Ímz‹&ˆ÷y§'9ÁÝÇ\‘Þ¶5•VÑ®Ãà/–rJîÀõǵ`Kj“Új-䨩1¼[åvƒ– ÷ë‘ÈÆ> ÑÔ[–û4$§–Û)Œm8éŠÆÕtñmövÓôØei.„’© In›²>ïòq[ˆÁ‘X‚ÏÕÜvP‰¥Y  J&ìg¹ö¦÷%ôu•íe·òˆ¹m¤FTH;žOäþQªEæ]éä[ù¬%?1„º¢Ÿ½’>îGð{ñPèi\ßys™ 0<¡]Ã-óäýâyÝ©uˆÄ’i嘦ۤ!Äe°sÓŽ™éK°Ì«®GØ·é+ËÆ~ÌFÄ |Ù=0{ŸN3ŠêÏZä4øßÙ^GpYVáâd†Øù{Ï'Ÿ“#¿·ÖºÿÊšØç+`#ûdetvH×OáöI’v|É–9å¿ TסšhJiÛÐØó$är8àŒàsÀ òõ¦X$ßÚÇS‰ÜiÊ ØŒlù_qg†9çÛ½:ý7]D^ùC}„@r.9^ê9§?7Ž¿×˜_ð ә忑fÛ!w·!GÍ•ûØw<ÔÚ<÷?ox$Ó…¬?gFȇÌã'Œ`‘j®ä&ŽÞf¤†/Ýç9È`à§îç¯ËRè¶¿aÕn¡’ýç â d䎣 `€E$ ߬x3YÆÐ\|ùÁFl|§ Aù¾îzsÍjÖOˆUZÊ0×OÎz†9ùOÍòóòýïN9¡‚5‡Ý8ééíAà ûzпuyÝÀù½}è?tüØã¯§½6$eh!–ÖPÖ‚ß÷™ÀF\ð20ÄŸ—îç§Tz°s«X¨³2£«¬’íså‚1ÆÒ>ýqOðúªÚÌçÎýàã 1ò›æçæûÞœñPë0=Ö¯eWdHžO'óŽ™Êþ]×¥&4A¤¬¾vš¤E¼»c‡Ÿâã§ãÒ¤¨î?ãÚ\Ëå ûÏîq×ð¡ì ÓÊ›qawôç K{sééZu— ºqPãÌ?( p8ÃsïÏ­jS™·%ܱrmñ$…wàÎÓߌdzÖf*‰Þ•jËÂßyãÊ_ÞÿŽ´ž²ô… -øHŠGö‚A(èX÷8b{÷JÔk/FPÍ}:¼l²\°Ä{À]¼`†R84¡©YšŠçTÓYbÝ ‘†âŽB©Æ~e8ÙϵiÖ]ú‰u}6-謤ÁÞ€PzÐ¥W¾ÏØ.6Â&>[b#ü|t«^ÿO¹ÌÞHòÛ÷¿Ýã­&" ÓQL",3ò²î?6–ëÉÍ_¬ýÓ·Ê'äÃsÇNkB˜VeOˆ5aÚûcÛ(YqƒIùItõ­Z˲ßý»©âThñèòûàóÏ˃ÏÝô­J÷35О§ÊóJ…Ça»±; {öô5¥üë3[ßåÙùr¤r}®=¥Ëàœ—åõéÏÖ¥¢²ô5 k:¤F8Äí°t$zä‘ÎzjÓ?tý+3@ßý”7K«æÉ±¿MíÁßÏ#ð :•“8þkBð–o!öÊOäpHù0Fzóòý+Z²åßÿ 5¾ÉPlûãbùa•äüÑÔ©T5¢F‹yˆ„ÇÊ8Œ«6ÿl'Íϵ_¬ýsØw™”D Dy‡~ßäù¸ëÅ&ܵjPÚBc  Ú9÷sùÔÕþϘT¾Ñ¸¡$ŽÙ椦÷2¬#TÖõ"‘VÚYÙd›Ûq*À r1éÚµk2ÀÕõ)ƒÆ@eŒª‡3ó0O=GjÓ£ ÞæNºʶhíLí»×ÎÈF3Ó'Þµ«Ä ­oëŸ'ç'£ß{åçåëϵ¯øçÞ„|ùm…Þvœ/¯µfh!–ÅÕ­„¼ãë»Ù‰<—ÓŽ8­6!U‰è5  ]"'#‰Y¤=Áy=n@ö¦3N²äx–X¾cnC¹GàdàÎÓÏb3ïÒµ+-wÂQ!YSoÙx‰}ß{†möãš:©Tu`[M„™*U›hÈù°¤1Ç^jõPÖ:d€¸_™x!ˆ“‘òayç§Ò`Ma»û:Ûtò×1‰ÇJ³Utì 6Ô üñå/ï¿Ç^jÕ1zj˜õ-M#2‡aÀv çï2=±V¯¾í¿ËŸôˆÿzÕ]5vêZžæPæE;:áyÁà yäpjÍ÷Ý·ù±þ‘ãÍ.Àú–é’çÉ“ù‡iÂâã§ãO¦Mþ¢OÞycaùÿ»Ç_‡°Ìí YtöÜCûÃŒ#.î8bO9ôãŠÔ¬¯¨]=‚Ì$ýáù@a³Æž~÷ü ŠÔ¦#*Pßð’Bßc 6cÎØÝ0~mÙÛÇÝÁù½+Z±åUÿ„šûVϹ†þë|™û¼ýîyùkb€3uÙ¦ƒL-¡¹,꬛K`g9À žƒó­bè¬A€$0ÁZÏ×£Y4[€Ó¬a¼ÆÎ<€ =:cšÐAˆÐd(ä gŠ8uãÞ±|<ÛÒÚwؘÎr»sŒN x­¡×úV7‡£xÒø5ßÚ¿ÒXoØTä“×ð鎴u†Åf_\\G¬éñ%›I n-8BÛ ãp8êNxéZ•“©(mcIÄÁ$ä!RÛ×<Ž9$cÞÕ^ù<Ý>å i.è˜pJ±Çãž¾•b¢¹tŠÒy$ F±³1lãçÒ{ nVÑ×n‘j<ƒ)“B¸9ç‚I÷äÕꥥE:\ &2» Œ“É8<žƒÒ®Ó’ÿá%vûavÆéónÎÞOË€3òõÅO­t{ ±ù­·ˆö±ßí… óõªÈ«ÿ ;·Ú¹òÏîðÜü£åÏÝãïzüÕc[Mú-ÒçnSïmfÛï…ñ×¥'°Öæôr$óùZ¼û–,¬9É#”8ÈÀäcó]׊ä.ÊÁö¸ŸW{S9ŠE“ÈvÜ1ýã”ã8ãšëÉÉÏ­0ÔVF† ­Öbß2Â2äó•ù8õsÀ­qÔV6‚ª¢ïm×—˜g¯Ïó{ÛŽ8¤"MV$’þÀ4RYÔîÊ„#æåNzaϵfhþq›OVÐÚI%ýî,YSƒ†lç±$¸¥¬G¾ãOýù·Äßëv±ÆGLŸr?ÌÓíwÞYˆõs–‘UD˜H ´ž c×>¼P†ö:+À­e:¸% l,HÇ äýs6ÑÜ Ã.гf̨h¼ÄaØY¸8ãÈ­u3óo(Û»(xÁ9ãÐs\´1˜ûl$‹9†fI¿'ƒÀá}:ñÀéC:=4¥Úoör!\ÃŒygwô¬½BæK“¨XÉn$Š3B!—'æ'ÜWáMiéŠKµU}ëå.$äc=ùüéšÂ™4{¤ç”<í'øž:ô§!D¥¢¿Ÿyu$öB €¨3²ATde¸'#NsSë…E½¯™šŸiŒ° !*3Ë œŒrxªÚ.ÄÔn@¼Y̱FGÊÜ•P qòäã<õ«:è‘m!ž7 %vmø NBvíÏ€FT¥Õê¾—7iå¼Ö²Hž¥³òã‚8œ×Sü_qó·Ùµ©$mF%cz²_5p0xàc`2r?•v’ØÊù³Û[Ä-tå‘êEà”aHÁ$18ÎO=íZ¾GI ökhD„PÀ0þ÷ÌIçßÒ²¢ãº¼øã?j’)y˂造…8 ä÷ïZ¾‹ÈÒÌ&w•’W ½J”?ÝäÇÓ½²_Bñ[¤³ydI6mÇBÊqÉìG>¢«é0˜îôöþÏû4ù ¤¿(ÉÇÌXŒžÖäzô«¹FÔ#Ýt–ÞM´²3üáÔcp@Èã­C¥CoöËEiL°´Ñ¢+€ŠÄ€2ßÂ9À<äý(@ö4µ¢F‰yˆ„ÇÊ8Œ«¶ÿöpŸ7>Õ‰yn®.ž|fÕ'‘\ùŒr͸} ÛÉÅnêðÉs¤]E Ä Fî¯ËÉú ç/¦VœÞI}Y¬UU9J28çŸÇÚ¯ëð:»e k ªy`"€›víã¦;}+X~Ôú¼)j¦f6îŠb“  pr `w­Ë=¢ÆßcïO-v¾âÛ†:äõ®wRÊjöÒÞÇn²K«»ÍÈ^‚G9#ør?ZoqGc[L‹Ý@˧µ±.˜”óç už3Ð c4šÒå- ÅæH· S(ä)õ%ËÆy9ÔÍ"#o{©Bó‰$+2íl®G©êaÎ=iúÚ ã´¶YÄ3Kp¾[ÇsÆ\ä}heÛù¬°™´äŽV¿Ü—++ð2äçŒtWNÀ0*À0<Fr+µ†H¥ÿ´ÌëmxRP¶îÙf#“Ç^£=zñ]%°=Ìk&½´6¶ÑÆ‹æØ®¡U¹lm`$ŽÇ?…M ©M(͹€!eTÔ IÆ=ꦫi5Öµè$kYÇå³ë–^ÙäwÏêî…‘¤B…˘Ë!&6N‡¦\sM Ô.L—vDÏjm ‚JĶyRWñØiº\’­Õ¼>B û"ù‡ÊmÁùT;ž§ ò=M3WhâÔ£žiV(’Ö]ÁÑʸÇCŽ:ã9öïL³µk–F)¥‹L$oóÜ’OH=xéŠKúüAÿ_§«t›­«¼ì8\1Ýíò#\ë û¢SJ>v +¤áN÷þÇËÓ½®Å$Ú4ëmq† óü¸9Γô¬›û­*G¹šææ4{ˆÑC*I· Äì8ú g½[ Ò[ÄÎ¥Y‘K)àƒŽœ×?yy=än´ód¶¼>YX¦Q€¹B1‚rr¹è:âº(Õ$XþàPœñŽ9ïYºêùö¶öÂq ³N‹°c†Î{?}i½ÉA£¨ó/œ[ˆ›ùn„úðÄñ’pG®i5pEŃÇ’e”ì,’^9%”ü¿ˆ9©4Õn¯ü¥u”Í™#(˜Ï^9ÈõúTzÈýí‹3*ƳrÄ>sØPq“ë×¥!õ3ty$†âÉ$ÓRßÍirR) ­“ ôÁôì+¦ü+–±‘3¢A%å»0wt ÊÆâË\‚®uTÁèÎKOkF¾“M(Ó²$Ì„·ÉÊäž1ÊdŒñŒô©ïZ?´A¶Èºý„,d&•ëƒÛ®@åëKboMú3êP¹þÏS´3°C·‡ÉäüÙ<ûw§_™~ÓmA…–dÃÈÇ#$m\sÓ#Ÿ›G_ëÌ?¯È¯u$ ¢OºÅýž-ëæJ¢EÞØoÍž­êsÍié¯m.³±Ax¶„3ê@Éù¸ÝÀÉp3T®^á47#U‹i†?-ü÷R¼òH]Øè¼õÚsŽkJÙ¤m~ðJê¯å!òRG?RC½xÊöê(C{^Fdx€em„ß9ë»”ü¿/?7Ýôçš×¬|‘gBßçêY×<¨'åûÞœsIì$j¯ÝmàqéíAèxÏ=hts»Ž¾¾ô‡uô¦ÄŒŸm&ۇ÷ƒ»óòŽ>n~_»éÇ¡>QÉ“ŠÝ¿fK ÊÛÉpvåG÷ŸOîÀŒá@éןα${­Ã¥ëãì{ag’aædà·C’0'šÝ²I#°·I¤ó$XÔ3óóuçŸÎ˜Œ]f#!»2ÙÁ5¸t>c,™ ´ŽpyãåÇÞúÕÍ"æîS Š9v1 îX>Ñ‘ó8tÇjËÔ¾Õ«{2^í åIå–™B"°$ü¨Aq‘Ï8ÅiéÁί|ÆèIÀ%71êr§`8sž¦’¬^=¬Úz­«L’N2ÁKl#§Žy8ú bÅ$,ö¡4èî˜ß¾æv‘Ù8;п óÓŽ•±â)°ˆÇrmÀ™K>‘éò‚qœˆ¬¹…ÂÁåÿh•1êr|âÀº­òúž8æ…ý~Óúó:¾þµÿñí/î¼ß‘¿wýþ:~5!ëQÏŸ³Ë‰<³±±'÷xëøu¡ì~‚AÓ‰ˆÿxyÎþ??ÍíÏ¥jVf„IÓ‰7lÈq†vÙÓŒ¸ ïÈïéZtÄdÄñN¤7š-“k‚2r¤“޼sÍkVc3ÂIi©·;P;üç¾F6{Öfv¶@Ó˜Ã|ëóÃËçï|¿7ÞµjËþ¬Ùçì0n›Ï>ZæQü|uüi!÷¬} ˆžþ1ƾirYݾbH î=pùxäVÀëY9cu}›Á> Çœ·Íó îár>Z:¯Xú«Õ4é‘•ŽÒÀä€@ qœ~n0+b³/É}[N…eDmÍ!_1•Ø 6‘êáLfW¿ÿ}Ïî|ÿÝ·îÿ¿ÇJ±Uu#·Kº;Äx‰¾}å6ñ×* …'°-ÈtR˜„FnnAoÞr~›æç¯<Ö…VÓŒ‡Lµ3:¼žJåÑ‹à|Ù ž¼Žõf›‘dñjK†„òÞ0~PG·÷«^³mØË¯^2Ì¥"£Y žrÊFÞÇ\õ­*÷2|CÆŸòÙöʬ6»¨d‚vž@¶H­HÉhÑ™J±PJž úVn¼HÓŸ»:îà÷@O{ÓåçŠÑýJe÷£çþ÷?­cøyÙî#ò%YK|Îíó7,>rNAëŽ2k`u¬±Žç7‚ã÷€pÎpqË|ØÆî¸zf€5ë"áâ(³GÙßc $àäp@ù0}ùù}«Z³¯†µ§ƒ Ž3¸}÷Ûåàm#©Á ñÇJiVvºÁt+ÒWxòŽSs®ÿörŸ7=8õ­­¿û>ã˾À۱ŸøÍùsI‚Ü}¡ŒÙÀb,cØ6–$œc¹<þu5UÓHm2Ô¬¦Pc¼,[w¾Hþ"­S{ˆÇÓØ®½¨(×ÌÌÎç‘€:¼Ž›z[™`Lº¶£ •–=‰#¤ ÊFçøN1ZttÜÇñÞ Ú‰þsýÿO»òóótçZØü1íYz«Ê·Zh†dG7?uåt §+€>¼àqZ”É"s»nó’1Ǩä~áÖ´+s²r…¶œœ€_œwükM˜*3€<Ö~‡¸éQÈeYD¬Î¬’3® 襀8ö4CF²dÚµ~©i;Æ“mæ8vÙ÷€`í÷¹éëWi±#ÿ„ÇÙyÙþ³-ÇÊ>oîó÷}~Z›^(4i‹À!~~NFÈAëøzÔH[þglydù[›8Àùq¼}샟›Ò¬kYþǹÛ!¶ðà°Ûïò‚xúRè5¹‘«¼©gw#ÙE<[¢;y—|çæP1è2k¤·Æ‚¥”cö®Vå®~ɨۛðŒLS,¡æl&Ð ÎÎ#<?Zê£È‰0fÚ2ùÇZ`ǵ ~׋QÎ:÷ù>oîûqϰ:ÖF†X‹¬Þ Œ83yù¾aÆ}phnúÒ)€¸xüÉ`G1©ªœŽCÁ§ ×?§¶Ù´ÉKû1,áR6uB¾fUhþñÝžÅuOÄns”óœcÞ¹í4ÃäéB{Ýï–hÙZPeÉùwdAõ dô¥Ô} I¶i—m€qpÅ€§eéOm²ÙŒnwóIäð¾O=kSßý—uåÜ‹gò›‰ÇZƳKŸ·iïFÏ%¾OÞ'Þõ'ëƒÆqK¨t4õÉzãGä,Ãw¶Wæç§…©Î˜‘“J¹ž!jˆ“³>Ò¥†{Ž:g¡8ô®“P.4ë’›wyg‹øíù¿.k˜žGWÞšò[¡²TÆIHëÐsÎsœ€zw hêí¶ý–‰å®ÅÂmÛ´c¦;}+š½{y¬­¥ùtÌjÁGφèG#实t‚4‘·Hª˜w rk›ÕÚBÚ¢ýµmîˆýä£åÏ\m<“ÆW#Œ{Ó{‰l_Ò$iµNI-N$ æ†$8ÀäööÇ^”íeâ[0L©ºvI©ÁÃ|¼tù¸æÊò^I&¢·ªY@+(B ’ú~<ÓõÂÂÒ“ùÏOÞà/=öƒ×§8õ£°#*ÒXཌÛÚFRKæRê²nuÚO[žñ]3ˆÍ6‚rÝÖ¹y » ¤jŠ^…s¾B SŒ ä|¼/¡®¡òQ°9ÁÇÍ·õ>´ºS™:”—m¦ê fò\yn6Fò¨Ýœ2¨àtæ+cF™§Ó“1,j„¢l €îsùúŰk±q§<ú‚º"½»ÆL͹ÆãÔ¨ùˆ#ïsÇÍlhEŽ”¡§3#Ç1ù~`:S¹OTuMRB±4¯ö'Ü¡¤à §'Ðg¾zStk¢g²…lÒ%{>v«~ì+p2Iàç>þ¦—]°#»ùµ”|ÒÈŽ8åA}}»ÓôË™¤ÕA{„tšÑ0‚YæBC@pG^9—õøƒÛúò.k~_ö5Ϙ2¸aÎNFÈAëïõâ¹»Ée¶4šBK·ËYœ«çªŽFXàg§¾{ôú¶²®v±S°€À·Ëïò‚!XLVáª-ä’ÞIQfIqޤ‘Ê“ÇQÖŽ£èuIÊ/Ë·òÿwÚ¹»íDÞ@û¡Þö·¤/–Ò¨ár§ŒIÊŽÙ®”vÏ^üW+3Ýym8ÔáKxµûÒË&rÈ~PxÁ8|ÝF)õØ×Ò.¤î•íDdÜxlî$å[=X`8äS5YbMSLYUŽ]ö”w ­´ãxä€9õ©t§µ˜ÝÏmp%1ÈÄ í€Ý3׎jÑv+üØN~m£óíõ¤+b]otØ¿²&DY¥g•ÙáäüüžóÏ$ãúªç4éa–-* «Å3+»ÆVYI”díä‘Ûæë·ŽõÒ~tÁîrtšsß¡KYOüKAóLŒÅ×g*°ùr{jžúKaqo‹rÃìÆL¬ #+‚ØéŽ}¸©¬¤Ôžú2÷оlˆÄÌÛ[oÓ'æÉÁö¥¿’énS7é[=Ò9”HÙ+ÜnóG_ëÌ?¯È§s5šhw mäŒýš/0}¡:lz䜞œäV½“Å.½vËÆå‚5[™s¶UÉû½LçœûU+™o“Cg”X1GåKö˜rÇ9m¹#G¾Njý‹LÚ΢^`Éò~ëÌfò›¯÷@úsIÛúò5+ÄOYDd‡ÌÈûìŸÂr¹üÃåÇ|ÖÅekÏ,vq´wklwà“)$ƒŽ@'ƒƒŽ‡ÐÁ‹÷W ´`|¾žÔ7Ý9\ñÓ×Ú£$ŽO­888=¥6$dxyã{IŒp˜ÿxw³çå\Ÿî—±L×,a¼ºµó­>Ôª½›vÞø‚sŒ*m å{YL—ksûÌ$/‚Ï$w<ã ŠƒU–Km^Úf¿[˜Ø4NåVCƒÇ‚r9<ŒqI4‡„êšr1â?gr¬îKEËdpFzdŽüVþªPi†Ẍ"mÈ ŒtȬ­&kŸ·Ú«\ù±ÉjÅ—íM&þz É9ü«nìí²œçCÎæ\~+Èü9¡ì s‘m.­¦Im¶¢Y,ª¯;6~‹Ïçžâº­9•ôËGXÌjaB“òð8çšÃIÂ4þV«çBÖ ¡’åÃnXz>•»§³¶›j^Q3˜—tŠIq×&Ÿõù‡quöó«[­»$±J€wS&:y¦M_ÒØIªj.Ö“Ã0òÑžMØ`ýÜŒc9õà¥fêR]NõWTŽ1˜¼¸Lî1Èà€;óœsZ:Oœš–£ׯl8e…‰&1Ï9#=0=) cµ‹ág6žnÒ¬³›œ)Ç©çííYö÷‰ÆØí¥ºC¨íóQÞEBx,Üd`ž㌎‚¯x…çKŒb×÷˹Ë2äsÆTzâ³^[ÅÕ<ïí8–Ùo6$šBA8ãùàtäP·þ¼ƒ¡ÔÔWû4Û£óWcf1ÕÆ:~5)ëQ͸Á.ǶÇ=ã¯áCØ íÑôædL~ðå÷—C7'Óð­ZÍÐÝäÓ÷½ÈŸ.v‘)“hã‚ÄyÉükJ˜Œv˜/ŠU<·,ÐìÉã'p^˜ÏõÎ+b³!–câ+˜ÄªÐ tÌ~~J6O;1À=:ö­:ÌÝq•4Æf^ޏråg?x°äcÛÖ­Xíû±HL+å."=Pc¥WÖ]£Ó]ÖqVRÇÌ1îû¡†H'Ú­Zoûd¢gò×tƒ£œuüh0ëXúÛMöÙ!U,픸 #èIÇÖ¶;Öf4Î/"ž_1â¸m¹˜Èʧ 9ü¨CN±µiZ®šûefV;BÊÊHRŽàäç°5³YwÓɱ§2¤/½r¡ò0>\rAÇ$JÔ=j®¤â=.éÉÀ1'Ì)ÛûÃ¥Zªº‹˜ôË—ydDÄ?™åí8뻜}i=n3HhŸG³hš# íÜåÈã¦O\tü*íVÓÚWÓmZr­)‰w2¾ðÇ{8Ï_ƬÓ{ˆÆ·/Š.,»¤ŒnÝ+€AÛŒùˆw¶k.ÖyF¿}Ì­"c&ß”÷^y_}Õ§<ˆö¡ŒmCü#? ÏÖåš h&†O/ËɘƥAäÈ##Ò´‡"„ <xéÞ²|<ö³X<ÖꡚFG ) ¤…žØäz×k7F‘å†å¤—{ ØmóšAã,¸÷ãŠÐÒ¬«ÇEñž»³+#á<Â0½Û§¯áZµ›pîúå¤)2¨Ti$A9 ç)ÐŒž½zQÔ *§ªÆ•te}‘ùgsneÀú¯?•\ªZÇ›ý‘t`—Ê—Ë;dó|½§±ÜÇåI‚Ü~œÛôËV*1/÷ǯzµPZMö‹8fÊê ÚÛ†~¼f§¦ÄŒm>@5ýF0²åþg/+6À…tÇa[5—§Ï)Õu($™\W<ò샌 £<àg­J: îdkrÃú`žFEk YÌdózsëZýë7T–hîtѪŒ×(Óù~bàåqƒ¸÷ǵiP d‡lNwm“»vÜq×=¾µ›ádžM Ù -°ç*Ò™6œò7ýu¥!ÛœôRs»oo^ßZ¥¡É4š5³O*ÌûÖ¬¦@ã'q?ýj¡¡YLÄð/–åš-Ÿë0rKé€ ë’+Z³–Y¿á$’5•Z³ÑòTîá¶cLæ€4«+ÄrGƒpÒçËùC(˜Ç¸g¦@ÏáZµ›¯I$ZD­ À‚\¨I ÞP?ÞÁü»ÐÜÐB¦5(æE aÉÎáëšpëH –]¬z®sƒéšZcè¯m5Πñ*,±ÎÑ0YK€¿{ŒôÉ$àwÍ]¿#m¾FÒcïÓš¯¤Ë3M òo)pY˜»*‘À9 ãûU›ìí·ÃmÿH<õ祾C}KUØû<»£2.ÔÅÇOƤ¦K¸Âû\#m8sÑN:Ò{™á÷GÓܤ{xrÛÙÃð;žx/üµk7By$Ó˽ȟ2)“oÄ×'Û5§LF4¯ü$ð¡€—*í瓵°ûzËž¿5lÖT(ñH/!MÞO˜y91‚IÁÎsòÖ¥ 2|J¶‰+O™cem¥¶ãœО;Ö¤@# 0¡< {óùÕmQÕ4éwMäî‡ÞË‚Ob¼çØu«1 Ž$@ÌÁTÎrO¹>´Æ<}áÛžµƒá“lRûìökûò6$ã®:ñøŸ•oVfÒ4—ûî<ô,ïf*;¯< 0=)CN²µCö––ÒyŸòÊ€ìˆàòH¡À­ZÉÔ Ïöæå]FÛàf`[ÔàuãŽxÍ05ª’¬ÙŽI F2Íì­MUu!)Òî¼‰Ö DLVVèœg?—zO`[è{‰iåÀöë³ýSç+ÉëžkB©éFFÒíÌ®²>ÎX9ôå¹?\¦Äc#Çÿ K§w„'vóÁÚ>}½0F=~Z“Ä3EŽí1ýÙ‘¿zc8,:9õÇ|P/ü$ކñJ÷y>aÎ006ôœç84ýqç‹Ki-å1::6ÿ4Æ ÉœcŒ{ÒÜËÔîá1jj¡$H',“2³‘ÑOáƒÇ=Ítªw*žF@<×;¨I|ÒßMoªCkØÍ#È<öû€Žã°®Œyý)ƒÔV.€ñ¿Úü¸ Xp>û68Nzô ÖÐëY:#Êÿjó/ãl›~Y àŒäò>P}„Ou¨-¾©gdþX[¥˜¾8ã súd°=õ…ºX|‚Ixæ$¬Nâ;¨<ŽÙ´–Ã{œÏÛm59ô»×ŽÝ®…ÃÂÊÜn ãž°qœñÖ´|>Ñ>Ÿ$‘Úó»oG,ÿ{žGÓük>ÖâþAgö”)$w2Ýåy$€N@Â1ÓšÔÑ^i4àó^%ã4ŽË2”ž:óý) dzµÌ^jiÓ, ÌNeY 9ê=ºþU¢OlÚ…œJ´fˆ%Á`Fþ~R8$úñŽÕs\yDöÂ;ß²’>iMÃiéÆ3ßš‡Lšî ëîoEÂOjp|æ!Û;”ªãœ/''­$>†Åù §Ü@6ä¶OZä[&„BÖÏo°Y 3È$Ž9ä`gð®»PgM:á£xBF]gê¼ÃšÇIKwj.íš‚EÌ„dÿïÓääæ#nÕ™ìàvûÍ“óïçÞïõ®_Yš8îõÕîUÞ&lNsžp zñÐzçÓ͔٘šhö ²±$¸ÇRO­dj‘ßGöË´x|ÈÌgí/´×p+žáxâ›Üü=ŒRjÚB‘4sìdIKðAç§%‡ÔkRêÎ;Á˜Ò)Ä‹±Êä_Qõ¬ýKµßÁuyÄ‹&õ [*¤œpxzµãÏdö€•ŽpÓ;`dÁcw_Ò€ês–÷Ç RE¦}›më6Ȧ!$|(Âã!8ÇO”×`Üçu®^9®¾ÒÑ6«™ïPgpBÊãÀ¦k¨=ÿª—@{œ­ÄQ®—"n‘|É‘6\;äc;ðFOâ2?ØÐ/>Û¤E(„B(xòzÿfB÷RÛY2^€þlÆEûkò$ ,3Ç\°âµ´YäŸG¶ieYeTî²ÜÂr@9úŠ`Ìý^H—T`Éç7ØÛt~s.Ñž:uî9ã=…3J’ÕlñÎm¤!ĬØùùnz†ê «Wó•Õ$ÿHXâ†ÑšEL§' ^€tÁæ¢Ó#¿[ËAq|%Ÿw_9‰eÆ>éóƒ¸óÚ’Zñ ɇq$£)ò†iŒ‘‘Ѐ.õªjV†-BØÂ.K e’RÙM¹ xã¦0;ŸS[zãNºDÍo)ŠUÁ$)·žäqíŠÌ¾»–Y¯e†ñ6Ž$TN6;`åJޏ=^3Å[`ªàq€Eq×ÛÊ“Çä=ÙmH¬ˆ·eŽpeÎ:‚Foã]‚A$$õʱªýœJÄÜu¡ëÿ×ì=¿¯#Z±üDðÇeLŒÙv‰|¾6œ®p~ðȹ=ElV^ºó¥œm ÊÀ|ÌiDy$¼ŸC‚Gp(`5ÆÅÀÚ00==¨lm99´ €œrGsAÎ08Í6$cøuá{YŒ1²áÔeó8Ú0¹Àû£ Gb:šnµ{–ÖÏ ”Í€ qèAöÉé“SèO;ÚÊf¹Yÿy€VQ&w#±9 v«jãP}R4±¿Šöw>TŒ1œ@ëžœãŒ}i1¢Ž-§ö–˜Z:fÙö:I2ImÞsÉã'Þº-Be·Ón¦mÛR&'km=;ßZÉÓ&½’êÀ=õ£Åä?™J^F9ÀbyÎ1ëÇ5T›u•Ô÷kÒFâ@¬‹ž§$`y⇰-ÎrKË ëkÉ^¯¢Iä%ØÆÈN1ÁG®Gnkª°Ž8´ëd‡>Pv䓯=ñü…s-ñuoí;g‰,ÐÊRp»ŽFHô' ?þªé,DÂÂpâI¶ Ìsø÷íÍ1–µýšuMI浕æ‡crÇû§ùöÅt:MôW·‚8'Œ¡Œ–˜7Ì ñŒÀÇõïY‹êpÜß”Õ H–Dc²…eN{ƒò¯ÝôÎ?=‹ ¤—V¾ÍÂIØÊF%ÜTrqØ=>”ØšÍïØÍkw™$Aø9ÈÇÅ×8ö¬GûÌèf†@Òêe\‰þë…è¼äc§NíZºµÃÊ–“XêP©såÈæEÛž2§žO^=ùõªÂ[ßíXµ U‹íãI6é ñ”Î;ñü¨_×à?¯3£=j+Ÿd›ÌŒÈž[n@9aŽ@©i’ï0H#`´ífè8&‡°ºÇ&œÍ<Êw9—ÌÞp9 ž0:v­ZÍÐÞY4íòN³çi‰0= ¼çó­*b2"’⻈̬'©„óÔ‚»1‘ƒÏ^õ¯Y‰,«âI¡fKÛ©EiW!9!:à‚}«N½ÌÝmÑ4ÒïD‹µüÏ/ËlðŰp?Ö­XyÙÖÞTm~Ríº¨ÇCPk.ñéŽé0ˆ+)vód®yCVm|ϱÃçH²Kå®÷^ŒqÉ„MÞ²´wGŸQØþ`<á&O§@F:`äzÕ¬½id[ÈçpïË€|åv zŽ˜ô¦>†¥bëê"X‹0I·xÞ¸\cæÁÃcŽ­ªÍ»’uÖôô‰ÁŒ¬ždfe\ŒpÁO-jÒ=j½èSapO)<¶Üû¶íäç‘«[QfM6å•ü¶±¼&ÓŽ»ñ¤ö¹ŒÁ´kB¹Ú#s sÀù€ã‡½_ªÚ{JÚm±œ)‰wáÃqÔÁÏ_ƬÓ{ˆÆ‰à>'•Mæcþ³€v®[f8`g<퇎9­>Ù¬írGNýÜ‚2dQæyÂ=£×'ƒô=kF€u•¡õz£€»A‘69Q¹wƒQÖ¤¦#Åà>!¼D‰ƒ¨n²go#wË—qÁœãµlVv,ï©,Ž%˜Ïœ®SåRÝõÁõ­*Ã{™ܤúXžf‰Zì+:džÁÁÁpíŽ:Ö¹ëYº¹‹H¡“d’N"`ÉÀ?Óñ­#Ö€cd8Èm¤)9È÷Éà}k3à š ³[Èφ”HTäängòïZ‡¡úVv„ï&˜I|Æ27üµíçîîqïÍÐÒ¬Ÿ2âÓLÂqi•O=pË»‘³¹ÍkVd®ÿð‘A“jy JùÃç?î{zåGP4ë+ÄrEƒpÓ3¾Pûf3Ùˆ?–+V¨ë´©™YUΖ›ÊÁ'˜ôþ½)ܸ˜(»dó9Ýï‘ëN¨áWH"I .¨Œ“ŽsŽ*Jb2´·FÔ5@ŽdÄÀ3ùÁðyùq€WœCÖ­_Û#?é1ãŸzƒNyÿQß&BÊÇæ‡Ú=}W>žßZ±{¶øm¿éçœdg¥õ,Ó'Ûöyw¡tØÛuaŽEIM“y‰Älö¬zØšOa£/Ãï–Ñ©ÿYó9—ÌÝòŒa°3…éÁ­YšÊúyygYí"Q&‡^r}§LF,¯ñD(bo0¨?ë8-µ°û1È+»<8­ªÊ‘®?á#‰ÒÌ{¼¯0W?'RsƒŸ@kVž)keѺ…äO9q±¶ìn~bpp1‘Ó½k÷ÈfJìsœãuçóªú£Ó'&o'€<Ï3fÞzæ§€©¶ˆ¤¦dØ6ÈNKŒuϽ46JH‰áë‹{£¨K rDÍpK£žAú`}¹>¦¶ê…Ž å÷úQ‘¼Ïõ>nñúv'ž=©z²µ;HäÕ4ë•„=¾ÁûÒŸ/Rx rzNõ­To¾k»(ÅÓ@L„²í.ÎÝ¿ÄAL Õ‡¶•Yž1ŽüäjZŽwHíäw`¨’Ŷã^ßZO`EM3‘l†?,ùWÌ/ò烒QÎ01Wꦚ¥tèsr׆ã!}ùÏ8¸? ·LF*<ð”ºÛÌ OúÎÚ¹}˜à…Ýž ñV5çHôiÚI|µùyó|½Ü—8=zc4ÄkøI>Ô†1ï+ÌÛ“¨;²sèEK­»Ç¤LѸFà2ù{rzîþétÜÁ¸k‡WSl34I*ý¤;9#œRI==3Åu«Â¨Æ8g5ÎßÈÉi,Z–×Q.n@Dld±ì¤úÿJèPæ59 ކ˜‡µ‹á÷þÕäÄë†QóI¿ ΠÁ^~^qž¦¶‡ZÊÑ^áþÕç]$ûeÛòÊ ü]>èp]Ãû»OÞÏéTlä‘o¬6jBhœÈ6¹Ü3Ø/v#Èõ4!½Ž†¹íVáVΤY¦9¶ã'´ì#Œ¶zv8­‹òãN¹13,‚6ÚÊáH8ë“À¬E¼½uŠV»´(l˜Vãk»€>}ãŒd{zP×ÒdYt«r‹… ´|ÁÁ# €22Eaëw:°ró††#2}¤€0ÇË·Ž ?Å‘ÁŠßÓ¶™lÍ/œÆ1™7ÝøŽ cjÒÏö»äŠñmÙbŒFåT3A ®zq×?Jáú vº–¡ ¬MUCóJ[p#;±Ž9ÏSŸ`+ ïYV’Ëý½uÎ¥V&7qݱÔ~)ÚÔÓÛÛÛË ìÛp…ó2Æ¥sÈ$õÈÏÀÁÔ&³šâã}ëï½Dr_c·å*¥yûÇ× 9WcŒ6:×/5ñ’óÈÔaŒ ¨Õ\€Sƒ‘׌œqõãרFzûR[Üå-­ ¸º¶º…~Ñ2]º,¿kÎáÁ'!2@烎˜É¶4(„:qEŒ$~k”"MÁ†z”`gÒ¡¦TVRÁÏSœqí@ÜYKh—Jò×;¤“xV` “ÔÀOØ·CÎ8ëé\Ó6£$¤[_E°j nFBãî~'<~•Òž‡ºÜã –Åå³ó,þѸM²èÞn 0Ù,p>^NK ŒŽµÑhMº<1a:‰YšA!ÜG °ÆqÓ éX1Ë©·’ñj6ȲÇ*¢ËtkóÜ}ð>^½9éÅt:3JÚM»LêîAùÕ÷îäà繦îfêð[ÜjÏ Î_}™-¸Tî®Þ›‡\ž1PhY˪Ba°’)M˜ÄÞy|¯Ë÷‡NØŽ*ƹ%мd¶ºŽÝ…±l¼à矔ôÀÏÍëéŠv–oNª<ëØÞ#j¥¡où°0G9ÇS“ÉÏqI_ˆ=¿¯"ö¶èš5ËI/”F[ÍòóÈãv3Ó§zæïþÂñê2\Z¿™‚­À%pUvޤg“Ç®›X‘£ÒnV Ã|¼r9ÝÚ¹ù'¿D¾‰õ8 þéâ?h Jäp@8‚2G$ž3@úU¼‚[hdPBº+Nq‘\­Ø³ù¥ÿJŽäf p\ð@2ÀÈq×­u£•ô®bêkï·I$7Ç wН—#íÏUÏ?/Ó¯4ú‰lhhÜ @CÅÓ,Ÿ6âHèxtì=yæ¬ù/>Ÿ ͸K1_'ÎÙ¼‚pA €Op}9©ôý¾}î.Œ§Î#Êówˆ‡§¶zãü)š¤Ž—eòÍó¿œÛ†ÏéK°uf=ŸNŠ+[¥òîdHÜ9e\®~f ßw w9®¿ñ5m$ñ_iò%ãMmrÒ– v¬§<(\òÄcõ5Ð~túÜä4÷ÓMø -˦‚Îe:ù}0…ã9#¦sSß=šOh™”XƒnÂU]£+‚ß.ðœäƒÅKbú—Û¢–à5€Ù’<ïÇ$…<‚Þœ`õÍ>õ¯È ðÄE–n¼kŽFv‚N8Ü2~^œÑ×úóëò*¸ÓäÑîJÉ,Lmc2o‘W{cTò[w8çŒc5oJ{c®\°…ÅÌÑìÌ™ÚNU@ÆK›œàÒÈ׫¢Êó¬LŸfM’¬‘®[qêçåÀ{`ó޵6“,æþò)œHDQ·EœmçåC3ÈÍ$oëÈØ¬< g™d?3R@œl;†H#•È©=­ŠË× ÂÙÆ`‘÷€΋ÎÞ[½ŒŽ¸éCIq±v‚Ò•±µ· ŒÞœ ß{ãÖƒœuíŸZlHÇðëÀö³VAó(%äÆÁ´d8\:ƒÔš‹Q{dñ-‘š <Ñå ©¹n3€3É8èjÖˆóý’f¸–7C‚®‚Ï’¼cvqܵËÍ&·g$0›Ia`Ãr|ãîäž°1Iš2éñêÖmº[‹ˆ¤lù›£` ÁÈPA$1ÆFjÖ¤öãW½I.õ²ReŒ7ñÆ7œ“ŒvÅ:ÄÝ-Õ”PßG4;Kº¬Ñ°-¸`rqò€G¡ÍIznF§{ä„ ö@w³D6òyù†WŒ·ö¦ÌiWO’Þwu™±n%“eÊçx“‚@^rÙÁäc Õ鲤ú]¬©»kD¸ÞÛNç¹÷¬X¾Ó±žgUìgyÐáo—'¶<©9ö«ú‰¹µÐPZÜ3ξXYwÆ›ùÛ Ï·áIh˜·z…®¯¨<¶÷K+ºDÿ¼Äl¾ä) a:sÁíšÒÐä´ºÕu;›5di6º»u‘œ``drxôª÷ï© ïþÏ5” aÙn%@8ÛÀ†ê nj¿¤›£}©y×Kš6+¸uë´œq€3Ï f4²Ú [¦w™Ûí²¬Š'‹—ç…Ç̼ë‘ÁÈ`ÓS]B|øæ†0ÿjP2WxÀ';qëÛ&¶5Mص}:{H”\…”Ÿ,#ô'=G< žjœfç΃tÖïAÁÌ‘e”ãß‚=IÇ8¡n6tÕ ÖϲOæ«4~[oUêF@©©’ùžLžQM§an€öÍaºÆöÑï$Êw³HqÀèÀxÀéÚµk7Di[N ,Šà¹ØC£qîSåëž•¥LFYu>&TÞÎÂØžb‘Ï]¸ÈϨ<ã¥jVaf>$ T[|‘½ n»÷†=zÓ lÎÖÙOÜÛƒ c µ³Á,A}A«”tÛ_%"ò—b·P1À5²d]2FGT䲩۞p[å§Zµmæý’=•¦Ø»ÙzÇ$RB%¬½Hçût«#<Ÿjt<ˆäà ¨éŽ€ç©YzA&kÿºNB xÛn=צ}"˜ú•“©]cJ1¼Í쫺) ŽN ’ÃpF=ÍkVeô’&µ¦€ÄÛÃhÇ8à€ß1 ÿv€FUÔˆ].è’@6pʧõnãVª¶ û6çgÞòÛw¯ü åüø¤ö¸Í(Æt{3­$fØÌÊN1À%@? ¹U4·2iV®ÊŠÆ%Ü©±Éxë“Å[¦÷“lU|Kz«#;<(Ρ º …OsŸÊµ«.ÖI?·ïã)MˆÑ’Ñî^¹6{Ò´èÜÌ×d- I¤xâ–â8Ù–DN Æàwy§Yºé#Ov‡2(WgEÛžÿ?éÞ´½(sŽqÒ²ô#šÂI’V‘žy<Âò#°`Åyeœ{bµÉ¬Í±·¸'hS;m@ѶÑÓNãÈé@t4ë&vþ‹Ei]$û<…:a†W ©½sÛëZÕ™vOöí€]«•bÇre‡+?A‘äÑ 2„óFC”d`ĹO—Ò€èiV\®¿ð’Û¦övò„óÿµ·é{tëZ•™#7ü$P(!T@IùЖÿ€ýáQÇjÓ¬¯´i¡\™Ö/—yGD8Èè\úfµk7^2®;C´H¥H,P½Ëü P sE~èùüÎ>ÿ7¿~T´ÔehÕ•B©PBŒp=8ãò§P$eé¦7Õ5GBYüÅFa(eà0ÁóŸ­Y¿ -¾sÿ1ãZƒO2iêYd0‡P€2–9Ç#·_ʬ^çm¾éç§LÒì©j£ŸgÙ¥ó²lmʽHÇ T”Ù7ùoå$Úv–éžÙ¡ì4eø}â{0ùó>viî;F0@Ó±ëZÕ™¡™›O&YÆó°«£`qœ”ãïgðÅiÓ‹+ÛÿÂQ“Ì »ka¶ã$Ü3œdŽ+j²ä7?ð‘D¢HÄ^^voLíÁÏ{;¶ò8ÆkR57òôÙÛ€¼&ÍÃ=3ƒ×§Nõb"Æ$/”ÅFc;}¿ ¯©¶Ý>\L!c€®d ƒž9àMÌ~Sà1žü~uc½2OõO×§¯jOa­ÌïÉš¹¶£ŒnY·äç'¯^ëR©éJM„ Äü½YXuè àtü*Ý1ÈÖÿð”:“ÌÚHùÆl\¶Üd¸Î2g\mš-Ë—‘UW-å¸RFy õôÇ4Ä7?ð‘ȦXü¯+%7¦vàc½Û¹àÆKcé+ª ™+•˜jR%äq´ 2Íîv€¥@ù°A »$ddçÞº¿­0`:Ö/‡ÞÝÅ×’² ;†Â`í^ÁäžzšÚk+E7$]yòÆø—#£aÿ‹îô0>´„&­,qÞéþc¹]ì|¥qÎÞÚT–DZgŒ“YšDpcNk!rЙ¥&9dQ´©Ú\®Îqޠ޽óZÚ‹0ÔtÕV ™–/à›[“õ^•R¹i…/¼á,²4›§Oî‚9b8è=zP†ö6.›mœíæùX™¿nÞ:䃮+–ómd[q4s“I¯Í–ää.c“޹#­u7GmœíŒâ68Ü\ÜJ5ÆeӆŒªE ïóð~è`jao5æøç/¢eá30/–ÏSèF+±éÚ¹‹Ã©Å©>Ç·†í2’˳gÿ¯½ÓŠéÿŠ’qÉq§¶©j.<Ùœ\;­Ç˜®‚1ó*e°x ÇN§ŠÙðصþÌv´óÄm;‚&ûÀƒAè+: îÓPµ“Í·[F¹pQç‹p<~÷‡98=niešÐÈnÆùƒæ,›F~îáÁÿBØæoˆg‚â7åAŒ`VCÆ>þÆSŸLçŒô<Õ]&KHo´ï³‚¼=žb‘–fcÁPÍó)ÉŽ*ö´×_k…lÞ(æòdei<ƒÆÖçß#ß½2ÂK—¹Ò³$XòÊ«*6WœŽzã§üPšš“Ó.XÌa2L‚M›}÷qõÁ®fõí„ KùþÎQ%Aµ7.rãÐ÷ã^) ‘G_Xf»x¦ÎѤÇ2‚¸9 ´®A$œóŒc¤ÑRÑ5€`3–Í$ ó–YÎ6ŒãsaRë-7ÛdX# 7#÷üÇœpü€;ž•-…ÃÜêpº²„6€¸IbpÍÓ¢œ€0y=©/ëñÛúò-ë.#ÑîXÌbyeFzŽ7qž9ÍbÝÝ[Ü‹»"A$pÂã÷ʤ·˜Rå‡Nç·¹«ºMÉ ªÛ>VfPíË`~u…$º‚ ¤7P*ƈ&džÇ €¤ýÓÙ-Áí@Ž¡7yi¹v¶Ñ•Îp}3Þ¹›™m乸¶žWh美—í1ü§’eAzsŒõë]©‘­a2º<…»'ÝcŽHö®rE¿¥4*ñßòÒù-ò ƒž1“‘O¨-‹º ·7z‹@²†y7¶ùu-Ø´äN1ÍM¬ym.Ÿ’Kܪ²*±#¸ÔŽ?”i-t×z¸xÙ|ÿ#©+׃·Ûyö¥ÕÌ‚K IJyãýc({ýîOáÏó¥Ø:³.ÈF·eŒæ@«<²Û¹¸Œï+Ï@ªHÉl`~uÔ~UÌA%Ù—Oyn m× ÓDw&`ëóghŽø®›4Á³’ÓþÀ5«utX髼±P|¾8„ôã#ëS_ý•n-5Ãd>ÎÀ¨Ï+‚As´óÇ^ÔQG_ëÌooëÈr}•t«—·¸ž9>Ê›òQp7·F#'pü±Ö®i¬‹¬Iû$m:¨§hdsÔsü¨¢”¯Äõø5â#³‹ÎýæÿVT|»ÿ½ÇÝÎ;ç¥PöÜ×Ll]¹ÛŒúR¶66î˜9Ç¥S—QDÅм“auä 7qþ³cq°léòýÜg=úÖ)ŸOµÒôù¥’éà[pÌWËÜqØ0$6~ïãÅReGTZÐm,Q³hŒ¦Qnò¡g\»i’3ÜzÖ–­d©ö­Ežg†(׳d6q“ƒÏ9ãµSz+‰ÿ—¦\ï5–HEv“j¹.¤}Õ˜{u­½FXgðšHZq åÎÆq†$·Èê? (¡ÖKäfjâÌÏz“–Ëດ˽ §ž=+gHHmµ=BÖ6”°òä;˜ ŒdÜã“E¢#&Qj¶·>cÎGÛf€D»~_ŸãqâÏJCsjžU̯re²/a¸\cðïEuûŠg^zš†ëËûþvã–ÛöõÛŽqïŠ(¡ì%¹C@1›)¿wš|ÂåOÍÓo1Òµh¢˜‘–Ç>%@YÜ‹rUr¤GÏ'xg×¾+RŠ(3µ²ƒO·ïóWË*T|ùã;¸ÇÖ¬iþWöm¯Cå.ÍýqŽ3E‹5—¤œÜê$³»yø21R¢‚¾0z~tQL} JËÔüL4Å,ÅLÇ÷jW“Ùˆ<=GOÊŠ(R«ÞÈÑX\H¥Ã,lAM»‡Fî3õ¢ŠO`[‘éKé6¾SFŒ8b'w<ãŒóÍ[¢Šoq‘|Iq–w"ŽT¬c#Ž9õç®>•§E ™ºñƘAg dPÈ…s ÏÝÃuϧSZ]‡Q@ :Ö^ˆA†èîyÜ6éX©Þ}Šð@éíŒQEjVEüèší‚»HBô@hf$+sÉ<ÇAõ¢Š]PºõWQ%tÛ’%0‘ýà`¥}òxPöÜ4УKµ F¾P°¨Uª(ªb2ìÈ:æ¡–w`¨7¤ þèÇ#××5©Eº îeë˜ò-g*nÔ®döÃuõÀô­CÔÑEÄ=n+7A é™Ë±2¹i©2ýì¯>´Q@t4ë.SŸ[ÎØŠ¨*BìGÞéŸÃÖŠ(jVv»(ƒF¸rò¢àhІPHÏÞãúÑEÑz(ÖR$Ûµ(ÀÀÀãú( Kc+O1ÿmj¸2yÙvHÛ·ÇãþsV¯ñ¶Û9ÿ˜ñ­QØo©n¢¸Ùöi¼Ð|½¿qŽqEžÀ·3¼>b62y{÷yŸ9r§«Œmãîí­j(¦#Soÿ <@‰|Ì…vïÚÛxûØÛ»§ÅmQE$WˆÒÑ&7!ÌjTá1¸œãCÍ^³xä±·xF"1)@Np1ÀÍP6OXz)¶þÖÕü“17çÞF'‘|Š(£¨ºu‰­ýŽ=SJšå]ŸÍÂl c‘‚ÇÐ8¢Š}Fnw¨nö}ŠãÌ%SÊmÄ@ÁÏZ(¤ö¹Ÿá³ö>G™³-Ĥn>Õ«EØ‘Œ†ßþ‡½ó0qÊíß±sÇÞû»}³šµ®`h—{·ìòÎí„nÛߢŠOa­Îvæ6[k‰dûg—±Ç…™W`uWÞJç#§^žÕÙh¢Ÿ[Ö±|>mÏÚ¼/UǘTü˜;~ïãœóëE€]u"–[8®O!ˇD+ÓoÞÁ¾Q“‘Óò¬Ý5l¦Y‚[™‘ÔnuÝò9##¨PGoP SCkC¥º$ZLVO,ìl>@ÛÇ\žãÅrkŸw²ÔÉp“µ‰vMÊ‚¸`Xô=sž˜'=h¢—õù‚:}3wö]¶õ•[ËJÁŸñ#­cjè’ÜêBC,-•†ÂŸºƒÆU‰ç9ÇÒŠ)±D±¢Kš– ÑÉ/ï)6ÈdaÏIɬë¿ò å˜)‘AU+™~î®}:š(¡‚ÜÀ7ZI.ãm/ö„gÙ;ËOÝG< ì¿‹š(¤¶þ¼œº_[ZÍÇ7™zù|¡ @q*9 >£•£áÖ‹û4Û¶'é#€À0‚8íE {•|Jm÷ ¼išßÉ´qÈãïàóížœûÕmØ.±dbººk©­X²3åp¼`ûŒž;bŠ)­ÿ¯1½¿¯#¥¸‰§¶’$™ágR‰Ëî3Ås–Ï{nÂU’+DË<¡QÔÊ)áHé‘ÜôæŠ)0GKd¨–Ëo”*?ÞQ€}ëšÕ£¶“Undº$²HŽ‹å«ƒƒŒ’{œóEþУ±­¤¾ßªª †3ýà;×=éuÜ}žÜ9r†u ÌƒÓ ×מ(¢—@[˜#O–ÎÎ8¤º0IrÑ#ƒ±HRäq» Œsמk³nsÍQÐç!gsaõ£Ì×Fæ!,¾o˜6ÉÃ8_•ÉÁ玂ºhæ°\l›«†`Ä·Rr89Îh¢š hÊzœË«æHòŽÒF /ƒ–Áäнj‡î4ûfW³ûD.Ѐð»ü»€PxÇ$ ~$ÑE%¸ÚÓúò6õ‚WH¹"SËË+*œdq–㞟s‰5¬š‹}’ê7ÙȘ¢2yè0zŸJ( :‚ãjí.éŠå¯–ؽô÷_i”CwßÊ»pxBú™ëœÑE>¢Z£CAûÏ©Íg,²fä‰<Ìzð?à]ý)úá~Äg.`3…tJ±< ÀŽ@?ç¥R{/ÖìÌÒnôÈ%²±Q0•çgUr¤dܬHã ÀQÐõ®«ÔQLGÿÙcollectl-4.3.1/docs/Disks.html0000664000175000017500000002135413366602004014366 0ustar mjsmjs collectl - Disk Info

Disk Monitoring

Introduction

As with other subsystems that have specific devices, collectl can report disk summary as well as detail data. Like other summary data, disk summary data represents the total activity across all disks or to be more precise, those enumerated in /proc/diskstats, with the following caveats:
  • Partition data is skipped by default for both summary and detail data, noting filters contain a trailing letter-space
  • Only those disk names that explicitly match the pattern in DiskFilter (see collectl.conf) are collected and subseqently included in the results. You can also override this on demand by specifying --rawdskfilt. Note that in both cases any disks not selected will not have any data recorded for them.
  • You can force collectl to collect/report partition level details by specifying a disk filter without the trailing space. If you do so, the partition data will NOT be included in the summary stats since that would result in double counting.
  • If you specify a filter with --diskfilt, these filters are only applied to output and allow you to report on a subset of disks for which data has been collected.
  • Device mapper disks, while listed in the detail data are NOT included in the summary data to avoid double counting
The three key counters for disk activity are bytes, iops and merges, though only bytes and iops are reported in brief summary mode. The average I/O size which is also reportd in verbose and detail modes may be optionally included in brief mode by including --iosize or --dskopts i. If you're not sure why/when you'd care about summary data, be sure to read this.

Disk detail goes a step further and in addition to including the same information that's reported for summary data also includes key device specific metrics relating to queue lengths, wait and service times as well as utilization. For those familiar with iostat, this is the same data it reports. These numbers will help you determine if your individual disks are operating properly since high wait and/or service times are a bad thing and indicate something is causing an undesired delay somewhere.

Basic Filtering
If you'd like to limit the disks included in either the detail output or the summary totals, you can explicity include or exclude them using --dskfilt. The target of this switch is actually one or more perl expressions, but if you don't know perl all you really need to know is these are strings that are compared to each disk name. If the first (or only) name is preceded with a ^, disks that match the string(s) will be excluded.

No filtering...

collectl -sD
# DISK STATISTICS (/sec)
#          <---------reads---------><---------writes---------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
sda              0      0    0    0     291     67    5   58      58     0     0      0    0
sdb              0      0    0    0       0      0    0    0       0     0     0      0    0
dm-0             0      0    0    0     291      0   73    4       4     0     1      0    0
dm-1             0      0    0    0       0      0    0    0       0     0     0      0    0
hda              0      0    0    0       0      0    0    0       0     0     0      0    0

Only include sd disks...

collectl -sD --dskfilt sd
# DISK STATISTICS (/sec)
#          <---------reads---------><---------writes---------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
sda              0      0    0    0       0      0    0    0       0     0     0      0    0
sdb              0      0    0    0       0      0    0    0       0     0     0      0    0

Exclude sd and dm disks...

collectl -sD --dskfilt ^sd,dm
# DISK STATISTICS (/sec)
#          <---------reads---------><---------writes---------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
hda              0      0    0    0       0      0    0    0       0     0     0      0    0

Exclude disks with the letter 'a' in their name...

collectl -sD --dskfilt ^a
# DISK STATISTICS (/sec)
#          <---------reads---------><---------writes---------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
sdb              0      0    0    0       0      0    0    0       0     0     0      0    0
dm-0             0      0    0    0       0      0    0    0       0     0     0      0    0
dm-1             0      0    0    0       0      0    0    0       0     0     0      0    0

Raw Filtering
As mentioned in the previous section, basic disk fitering is applied after the data is collected, so if you don't collect it in the first place there's nothing to filter. So what about special situations where you maye have a disk collectl doesn't know about in it's default filtering string which is specified in /etc/collectl.conf as DiskFilter (and commented out since that is the default)? OR what if you'd like to see partition level data which is also filtered out?

If you want to override this filtering you actually have 2 choices available to you. Either edit the collectl.conf file or simply use --rawdskfilt which essentially redefines DiskFilter. It can be a handy way to specify filtering via a switch in case you don't want to have to modify the conf file.

Show stats for unknown disk named nvme1n1 - the wrong way

Since we only specified a partial name, we see everything that matches including partition.

collectl.pl -sD --rawdskfilt nvme -c1
# DISK STATISTICS (/sec)
#          <---------reads---------------><---------writes--------------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  Wait  KBytes Merged  IOs Size  Wait  RWSize  QLen  Wait SvcTim Util
nvme1n1          0      0    0    0     0       0      0    0    0     0       0     0     0      0    0
nvme1n1p1        0      0    0    0     0       0      0    0    0     0       0     0     0      0    0
nvme1n1p2        0      0    0    0     0       0      0    0    0     0       0     0     0      0    0
nvme0n1          0      0    0    0     0       0      0    0    0     0       0     0     0      0    0

Show stats for unknown disk named nvme1n1 - the right way

If we want to just see nvme0n1 and nvme1n1, we need to be more specific and be sure to include a space at the end of the pattern which will also require the string to be quoted.

collectl.pl -sD --rawdskfilt 'nvme\dn\d '
# DISK STATISTICS (/sec)
#          <---------reads---------------><---------writes--------------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  Wait  KBytes Merged  IOs Size  Wait  RWSize  QLen  Wait SvcTim Util
nvme1n1          0      0    0    0     0       0      0    0    0     0       0     0     0      0    0
nvme0n1          0      0    0    0     0       0      0    0    0     0       0     0     0      0    0

Show stats for specific partition(s)

collectl.pl -sD --rawdskfilt sda1
# DISK STATISTICS (/sec)
#          <---------reads---------------><---------writes--------------><--------averages--------> Pct
#Name       KBytes Merged  IOs Size  Wait  KBytes Merged  IOs Size  Wait  RWSize  QLen  Wait SvcTim Util
sda1             0      0    0    0     0       0      0    0    0     0       0     0     0      0    0

Dynamic Disk Discovery
Dynamic disks are handled by the exact same algorithms that are applied to dynamic networks and while they have not yet been found to have the same problems as netoworks do with potentially hundreds of orphaned names no longer in use, the same logic for dealing with stale disks has been added to netowrk processing data and rather than be repetitious, read the descrption for dynamic network processing and learn how the new disk option --dskopts o would be applied.

updated Nov 8, 2016
collectl-4.3.1/docs/NfsInfo.html0000664000175000017500000001212113366602004014643 0ustar mjsmjs NFS Monitoring

NFS Monitoring

Introduction

As of version 3.2.1, nfs monitoring has undergone a major change. Unlike the old behavior which only reported on the client or server for a specific version, by default nfs reporting now collects data for all types of nfs data including nfs version 4. This also means when reporting nfs summary data, all statistics are aggregated in both brief and summary formats.

A system typically runs one version of NFS and usually acts as a client or server and so by aggregating the data the numbers being reported in brief mode will already be for a single type of data. Furthermore, in verbose mode the client/server data are broken out so even if the system is acting as both a client and server you will be able to differentiate the data being reported.

As an optimization as well as a convenience, collectl looks at the raw read/write fields for each set of nfs data, which represent the totals since boot. If those fields are both 0, it is assumed there is no nfs activity and the rest of the statistics will also assumed to be zero. With the excepion of the detail format descripted later, this all happens behind the scenes. However, just because those fields are non-zero does not mean there is currently nfs activity. In fact if you mount a filesystem, write to it and dismount it, those counters will remain non-zero until the system is rebooted. In other words, this approach is not perfect but simply provided as a mechanism to help reduce the need for user-specified filters to help focus the detail output.

Data Filtering

As they say, your mileage may vary and if your system is running mixed nfs versions and/or acting as a client and server and you really only want a subset of the activity included in the summary or detail reports, you can modify the behavior by specifying one or more filters with the --nfsfilt switch. If you just select clients, only data for those clients will be included in brief, summary and detail formats. However, if recording to a raw file, collectl will record data for all NFS client versions even if you've only selected one or two. The same holds true for servers. When you do use -nfsfilt, those values are display in the brief and verbose headers both during collection and playback.

Detail Data

The data collected for NFS V2 and V3 is quiet similar in that V3 reports of superset of what V2 reports with the exception of root and wrcache. V4 reports a lot more counters than either but many of the same key ones as V3 and so the detail format has been standardized on the V3 counters. In this mode, one line is reported for each of the 6 types, with blank entries for the non-common fields as shown here. Note that only rows for those types determined to be active by checking the read/write fields or explicitly selected through filters, will be included:
# NFS SERVER/CLIENT DETAILS (/sec)
#Type Read Writ Comm Look Accs Gttr Sttr Rdir Cre8 Rmov Rnam Link Rlnk Null Syml Mkdr Rmdr Fsta Finf Path Mknd Rdr+
 Clt2    0    0         0         0    0    0    0    0    0    0    0    0    0    0    0    0
 Svr2    0    0         0         0    0    0    0    0    0    0    0    0    0    0    0    0
 Clt3    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
 Svr3    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
 Clt4    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0                   0    0
 Svr4    0    0    0    0    0    0    0    0    0    0    0    0    0

TIP
Looking at detail data for more than one type of data can be difficult to watch. Consider using --home which can give the feel of a real-time display in top format.

Playing back data generated by older versions of collectl

Collectl is smart enough to do the right thing. In other words if you're playing back data generated by a pre-3.2.1 version of collectl, collectl figures out what type of data the file contains and actually sets --nfsfilt for you (in fact it won't let you select it yourself) and only displays the type of data in the file.

Playing back newer data with older versions of collectl

Any raw file created by Version 3.2.1 or greater records nfs data in a format that will not be recognized by earlier versions of collectl and any attempts to read it will result in fields of all zeros. It is not expected that this would typically happen but as they say, it is being stated here for completeness.
updated June 26, 2009
collectl-4.3.1/service/0000775000175000017500000000000013366602004013126 5ustar mjsmjscollectl-4.3.1/service/collectl.service0000775000175000017500000000031513366602004016313 0ustar mjsmjs[Unit] Description=collectl metric collection [Service] Type=forking PIDFile=/var/run/collectl.pid ExecStart=/usr/bin/collectl -D ExecReload=/bin/kill -USR1 $MAINPID [Install] WantedBy=multi-user.target collectl-4.3.1/initd/0000775000175000017500000000000013366602004012575 5ustar mjsmjscollectl-4.3.1/initd/collectl-suse0000775000175000017500000000712413366602004015305 0ustar mjsmjs#!/bin/sh # Startup script for collectl on SuSE based distributions # # description: Run data collection for a number of subsystems # see /etc/collectl.conf for startup options # $2: process name, if other than collectl OR "" to use $3 # $3: parameters, in quotes if more than one, to pass to daemon command # ### BEGIN INIT INFO # Provides: collectl # Required-Start: $ALL # Required-Stop: $ALL # Default-Start: 2 3 5 # Default-Stop: 0 1 6 # Short-Description: Collectl monitors system performance. # Description: Collectl is a light-weight performance monitoring # tool capable of reporting interactively as well # as logging to disk. It reports statistics on # cpu, disk, infiniband, lustre, memory, network, # nfs, process, quadrics, slabs and more in easy # to read format. ### END INIT INFO # description: Run data collection for a number of subsystems # see /etc/collectl.conf for startup options # $2: process name extension, if other than collectl [optional for start/restart/reload] # $3: parameters, in quotes if more than one, to pass to daemon command # # EXAMPLES: # run at 1 second interval and only collect cpu/disk data # /etc/init.d/collectl start "-i1 -scd" # run a second instance with instance name of 'int5', with interval of 5 seconds # /etc/init.d/collectl start int5 "-i5" [ -r /etc/rc.status ] && . /etc/rc.status rc_reset COLLECTL=/usr/bin/collectl if [ ! -f $COLLECTL ]; then echo -n "Cannot find $COLLECTL" rc_status -s rc_exit fi # Just to make sure nothing is different when running 'collectl', we # won't use --check even though it's probably ok to use all the time. PNAME=collectl if [ "$2" != "" ]; then EXT=$2 if [ "$1" = "start" ] || [ "$1" = "restart" ] ; then if [ "$3" = "" ]; then SWITCHES=$2 EXT="" else SWITCHES=$3 fi fi # Just to make sure nothing is different when running 'collectl', we # won't use --check even though it's probably ok to use all the time. if [ "$EXT" != "" ]; then PNAME="collectl-$EXT" PSWITCH="--pname $EXT" CHECK="--check $PNAME " fi fi PROCNAME=$PNAME PIDFILE="/var/run/$PNAME.pid" # If a pidfile, make sure it's not stale and if it is, collectl not running if [ -f $PIDFILE ]; then pid=`cat $PIDFILE` pid=`ps ax opid,cmd | grep $PROCNAME | grep $pid | grep -v grep | awk '{ print $1 }'` fi case "$1" in start) if [ "$pid" != "" ]; then echo $PNAME already running exit fi # we used to start with 'startproc', but if an instance of collectl already running it # won't start the next one so we'll just start it this way echo -n "Starting $PNAME:" $COLLECTL -D $SWITCHES $PSWITCH rc_status -v ;; stop) # Note that we need to use a pid file to identify which instance we want to stop if [ -f $PIDFILE ]; then echo -n "Shutting down $PNAME: " killproc -p $PIDFILE collectl rc_status -v else echo "$PNAME not running" fi ;; flush) if [ -f $PIDFILE ]; then pid=`cat $PIDFILE` echo Flushing buffers for $PNAME kill -s USR1 $pid else echo "$PNAME not running" fi rc_status ;; status) if [ "$pid" = "" ]; then echo "$PNAME not running" else echo "$PNAME is running" fi ;; restart|reload) $0 stop $EXT $0 start "$2" "$3" rc_status ;; *) echo "Usage: $0 {start|stop|flush|restart|status}" exit 1 esac rc_exit collectl-4.3.1/initd/collectl-debian0000775000175000017500000000602413366602004015546 0ustar mjsmjs#!/bin/sh # Startup script for collectl on distros that support update-rc.d such # as debian & ubuintu # ### BEGIN INIT INFO # Provides: collectl # Required-Start: $all # Required-Stop: $all # Default-Start: 2 3 4 5 # Default-Stop: 0 1 6 # Short-Description: Collectl monitors system performance. # Description: Collectl is a light-weight performance monitoring # tool capable of reporting interactively as well # as logging to disk. It reports statistics on # cpu, disk, infiniband, lustre, memory, network, # nfs, process, quadrics, slabs and more in easy # to read format. ### END INIT INFO # description: Run data collection for a number of subsystems # see /etc/collectl.conf for startup options # # EXAMPLES: # run at 1 second interval and only collect cpu/disk data # /etc/init.d/collectl start "-i1 -scd" # run a second instance with instance name of 'int5', with interval of 5 seconds # /etc/init.d/collectl start int5 "-i5" PERL=/usr/bin/perl COLLECTL=/usr/bin/collectl if [ ! -f $PERL ]; then echo -n "Cannot find $PERL" exit 0 fi if [ ! -f $COLLECTL ]; then echo -n "Cannot find $COLLECTL" exit 0 fi PNAME=collectl if [ "$2" != "" ]; then EXT=$2 if [ "$1" = "start" ] || [ "$1" = "restart" ] || [ "$1" = "force-reload" ]; then if [ "$3" = "" ]; then SWITCHES=$2 EXT="" else SWITCHES=$3 fi fi # Just to make sure nothing is different when running 'collectl', we # won't use --check even though it's probably ok to use all the time. if [ "$EXT" != "" ]; then PNAME="collectl-$EXT" PSWITCH="--pname $EXT" CHECK="--check $PNAME " fi fi PIDFILE="/var/run/$PNAME.pid" case "$1" in start) echo -n "Starting collectl: $PNAME" start-stop-daemon --quiet --stop --exec $PERL --pidfile $PIDFILE --test >/dev/null if [ $? -eq 1 ]; then start-stop-daemon --quiet --start --exec $COLLECTL -- -D $SWITCHES $PSWITCH echo "." else echo " [already running]" fi ;; stop) echo -n "Stopping collectl: $PNAME" start-stop-daemon --quiet --stop --retry 2 --exec $PERL --pidfile $PIDFILE if [ $? -eq 0 ]; then echo "." else echo " [not running]" fi ;; flush) start-stop-daemon --quiet --stop --exec $PERL --pidfile $PIDFILE --test >/dev/null if [ $? -eq 0 ]; then echo "Flushing buffers for $PNAME" kill -s USR1 `cat $PIDFILE` else echo "$PNAME is not running" fi ;; status) start-stop-daemon --quiet --stop --exec $PERL --pidfile $PIDFILE --test >/dev/null if [ $? -eq 0 ]; then echo "$PNAME is running..." else echo "$PNAME is not running" exit 1 fi ;; restart|force-reload) $0 stop $EXT sleep 1 $0 start "$2" "$3" ;; *) echo "Usage: $0 {start|stop|flush|restart|force-reload|status}" exit 1 esac exit 0 collectl-4.3.1/initd/collectl-generic0000775000175000017500000000646113366602004015745 0ustar mjsmjs#!/bin/sh # generic Startup script for collectl, in case nothing else seems to work! # ### BEGIN INIT INFO # Provides: collectl # Required-Start: $all # Required-Stop: $all # Default-Start: 2 3 4 5 # Default-Stop: 0 1 6 # Short-Description: Collectl monitors system performance. # Description: Collectl is a light-weight performance monitoring # tool capable of reporting interactively as well # as logging to disk. It reports statistics on # cpu, disk, infiniband, lustre, memory, network, # nfs, process, quadrics, slabs and more in easy # to read format. ### END INIT INFO # description: Run data collection for a number of subsystems # see /etc/collectl.conf for startup options # $2: process name extension, if other than collectl [optional for start/restart/reload] # $3: parameters, in quotes if more than one, to pass to daemon command # # EXAMPLES: # run at 1 second interval and only collect cpu/disk data # /etc/init.d/collectl start "-i1 -scd" # run a second instance with instance name of 'int5', with interval of 5 seconds # /etc/init.d/collectl start int5 "-i5" COLLECTL=/usr/bin/collectl if [ ! -f $COLLECTL ]; then echo -n "Cannot find $COLLECTL" exit 0 fi PNAME=collectl if [ "$2" != "" ]; then EXT=$2 if [ "$1" = "start" ] || [ "$1" = "restart" ] || [ "$1" = "force-reload" ]; then if [ "$3" = "" ]; then SWITCHES=$2 EXT="" else SWITCHES=$3 fi fi # Just to make sure nothing is different when running 'collectl', we # won't use --check even though it's probably ok to use all the time. if [ "$EXT" != "" ]; then PNAME="collectl-$EXT" PSWITCH="--pname $EXT" CHECK="--check $PNAME " fi fi PROCNAME=$PNAME PIDFILE="/var/run/$PNAME.pid" # If a pidfile, make sure it's not stale and if it is, collectl not running if [ -f $PIDFILE ]; then pid=`cat $PIDFILE` pid=`ps ax opid,cmd | grep $PROCNAME | grep $pid | grep -v grep | awk '{ print $1 }'` fi case "$1" in start) if [ "$pid" != "" ]; then echo "[already running]" else $COLLECTL -D $SWITCHES $PSWITCH echo "." fi ;; stop) if [ "$pid" = "" ]; then echo "[not running]" else count=1 while [ $count -le 5 ]; do if [ -f $PIDFILE ]; then kill $pid else count=5; break fi sleep 1 count=$(( $count + 1 )) done if [ -f $PIDFILE ]; then echo -n " pid $pid not responding to TERM signal. sending sigkill" kill -9 $pid fi echo "." fi ;; flush) if [ "$pid" != "" ]; then echo "Flushing buffers for $PNAME" kill -s USR1 $pid else echo "$PNAME is not running" fi ;; status) if [ "$pid" != "" ]; then echo "$PNAME is running..." else echo "$PNAME is not running" exit 1 fi ;; restart|force-reload) if [ "$pid" == "" ]; then echo "$PNAME does not appear to be running so will not be shut down" else $0 stop $EXT fi $0 start "$2" "$3" ;; *) echo "Usage: $0 {start|stop|flush|restart|force-reload|status}" exit 1 esac exit 0 collectl-4.3.1/initd/collectl0000775000175000017500000000631713366602004014333 0ustar mjsmjs#!/bin/sh # Startup script for collectl # ### BEGIN INIT INFO # Provides: collectl # Required-Start: $all # Required-Stop: $all # Default-Start: 2 3 4 5 # Default-Stop: 0 1 6 # Short-Description: Collectl monitors system performance. # Description: Collectl is a light-weight performance monitoring # tool capable of reporting interactively as well # as logging to disk. It reports statistics on # cpu, disk, infiniband, lustre, memory, network, # nfs, process, quadrics, slabs and more in easy # to read format. ### END INIT INFO # chkconfig: 345 99 99 # description: Run data collection for a number of subsystems # see /etc/collectl.conf for startup options # $2: process name extension, if other than collectl [optional for start/restart/reload] # $3: parameters, in quotes if more than one, to pass to daemon command # # EXAMPLES: # run at 1 second interval and only collect cpu/disk data # /etc/init.d/collectl start "-i1 -scd" # run a second instance with instance name of 'int5', with interval of 5 seconds # /etc/init.d/collectl start int5 "-i5" RETVAL=0 COLLECTL=/usr/bin/collectl . /etc/rc.d/init.d/functions # older distros do not support "status -p" and we need to know if this one does STATUS=`status` if [[ $STATUS = *-p* ]]; then IFLAG=1; else IFLAG=0; fi if [ $IFLAG -eq 0 ] && [ "$3" != '' ]; then echo "starting multiple instances with this distro is not supported"; exit fi if [ ! -f $COLLECTL ]; then echo -n "Cannot find $COLLECTL" exit 1 fi PNAME=collectl if [ "$2" != "" ]; then EXT=$2 if [ "$1" = "start" ] || [ "$1" = "restart" ] || [ "$1" = "reload" ]; then if [ "$3" = "" ]; then SWITCHES=$2 EXT="" else SWITCHES=$3 fi fi # Just to make sure nothing is different when running 'collectl', we # won't use --check even though it's probably ok to use all the time. if [ "$EXT" != "" ]; then PNAME="collectl-$EXT" PSWITCH="--pname $EXT" CHECK="--check $PNAME " fi fi PIDFILE="/var/run/$PNAME.pid" case "$1" in start) COMMAND="$COLLECTL -D $SWITCHES $PSWITCH" echo -n "Starting $PNAME:" daemon $CHECK$COMMAND RETVAL=$? echo [ $RETVAL -eq 0 ] && touch /var/lock/subsys/$PNAME ;; stop) if [ -f $PIDFILE ]; then echo -n "Shutting down $PNAME: " killproc $PNAME RETVAL=$? echo [ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/$PNAME else echo "$PNAME does not appear to be running so will not be shut down" fi ;; flush) if [ -f $PIDFILE ]; then pid=`cat $PIDFILE` echo Flushing buffers for $PNAME kill -s USR1 $pid else echo "Can't find pid file $PIDFILE" fi ;; restart|reload) $0 stop $EXT $0 start "$2" "$3" RETVAL=$? ;; status) # need to use pid file since there can be multiple instances, but only if # this distro supports it if [ $IFLAG -eq 1 ]; then status -p $PIDFILE else status $PNAME fi RETVAL=$? ;; *) echo "Usage: $0 {start|stop|flush|restart|status}" exit 1 esac exit $RETVAL collectl-4.3.1/misc.ph0000775000175000017500000001574113366602004012765 0ustar mjsmjs# copyright, 2003-2009 Hewlett-Packard Development Company, LP # NOTE - by default, this module only collectl data once a minute, which you can change # with the i=x parameter (eg --import misc,i=x). Regardless of the collection interval, # date will be reported every interval in brief/verbose formats to provide a consisent # set of output each monitoring cycle. However, --export lexpr will report light-weight # counters every interval but heavy-weight ones (currenly only logins) based in i=. # sexpr and gexpr will report all 4 counters every interval independent of when sampled. # To report --export data in ALL lexpr samples, in case the listener expects it, include # the a switch (eg misc,a). # M i s c e l l a n e u o s C o u n t e r use strict; # Allow reference to collectl variables, but be CAREFUL as these should be treated as readonly our ($miniFiller, $rate, $SEP, $datetime, $miniInstances, $interval, $showColFlag); my (%miscNotOpened, $miscUptime, $miscMHz, $miscMounts, $miscLogins); my ($miscUptimeTOT, $miscMHzTOT, $miscMountsTOT, $miscLoginsTOT); my ($miscInterval, $miscImportCount, $miscSampleCounter, $miscAllFlag); sub miscInit { my $impOptsref=shift; my $impKeyref= shift; # If we ever run with a ':' in the inteval, we need to be sure we're # only looking at the main one. NOTE - if --showcolflag, collectl # sets $interval to 0 and we need to make sure out division doesn't bomb my $miscInterval1=(split(/:/, $interval))[0]; $miscInterval1=1 if $showColFlag; # For now, only options are a, 'i=' and s $miscInterval=60; $miscAllFlag=0; if (defined($$impOptsref)) { foreach my $option (split(/,/,$$impOptsref)) { my ($name, $value)=split(/=/, $option); error("invalid misc option: '$name'") if $name ne 'a' && $name ne 'i' && $name ne 's'; $miscInterval=$value if $name eq 'i'; $miscAllFlag=1 if $name eq 'a'; } } $miscImportCount=int($miscInterval/$miscInterval1); error("misc interval option not a multiple of '$miscInterval1' seconds") if $miscInterval1*$miscImportCount != $miscInterval; $$impOptsref='s'; # only one collectl cares about $$impKeyref='misc'; $miscSampleCounter=-1; $miscLogins=0; return(1); } # Nothing to add to header sub miscUpdateHeader { } sub miscGetData { getProc(0, '/proc/uptime', 'misc-uptime'); grepData(1, '/proc/cpuinfo', 'MHz', 'misc-mhz'); grepData(2, '/proc/mounts', ' nfs ', 'misc-mounts'); # we only retrieve heavy-weight counters at the misc sampling interval # as specified by "i=" or the default value of 60. return if ($miscSampleCounter++ % $miscImportCount)!=0; getExec(4, '/usr/bin/who -s -u', 'misc-logins'); } sub miscInitInterval { } sub miscAnalyze { my $type= shift; my $dataref=shift; $type=~/^misc-(.*)/; $type=$1; my @fields=split(/\s+/, $$dataref); if ($type eq 'uptime') { $miscUptime=$fields[0]; } elsif ($type eq 'mhz') { $miscMHz=$fields[3]; } elsif ($type eq 'mounts') { $miscMounts=$fields[0]; } elsif ($type eq 'logins:') # getExec adds on the ':' { $miscLogins=$fields[0]; } } sub miscPrintBrief { my $type=shift; my $lineref=shift; if ($type==1) # header line 1 { $$lineref.="<------Misc------>"; } elsif ($type==2) # header line 2 { $$lineref.=" UTim MHz MT Log "; } elsif ($type==3) # data { $$lineref.=sprintf(" %s %4d %2d %3d ", cvt(1.0*$miscUptime/86400), $miscMHz, $miscMounts, $miscLogins); } elsif ($type==4) # reset 'total' counters { $miscUptimeTOT=$miscMHzTOT=$miscMountsTOT=$miscLoginsTOT=0; } elsif ($type==5) # increment 'total' counters { $miscUptimeTOT+= int($miscUptime/86400+.5); # otherwise we get round off error $miscMHzTOT+= $miscMHz; $miscMountsTOT+= $miscMounts; $miscLoginsTOT+= $miscLogins; } elsif ($type==6) # print 'total' counters { printf " %4d %4d %2d %3d ", $miscUptimeTOT/$miniInstances, $miscMHzTOT/$miniInstances, $miscMountsTOT/$miniInstances, $miscLoginsTOT/$miniInstances; } } sub miscPrintVerbose { my $printHeader=shift; my $homeFlag= shift; my $lineref= shift; my $line=''; if ($printHeader) { $line.="\n" if !$homeFlag; $line.="# MISC STATISTICS\n"; $line.="#$miniFiller UpTime CPU-MHz Mounts Logins\n"; } $$lineref.=$line; return if $showColFlag; $$lineref.=sprintf("$datetime %6s %6d %6d %6d \n", cvt($miscUptime/86400), $miscMHz, $miscMounts, $miscLogins); } sub miscPrintPlot { my $type= shift; my $ref1= shift; # Headers - note we end with $SEP but that's ok because writeData() removes it $$ref1.="[MISC]Uptime${SEP}[MISC]MHz${SEP}[MISC]Mounts${SEP}[MISC]Logins${SEP}" if $type==1; # Summary Data Only - and here we start with $SEP $$ref1.=sprintf("$SEP%d$SEP%d$SEP%d$SEP%d", $miscUptime/86400, $miscMHz, $miscMounts, $miscLogins) if $type==3; } sub miscPrintExport { my $type= shift; my $ref1= shift; my $ref2= shift; my $ref3= shift; my $ref4= shift; my $ref5= shift; my $ref6= shift; my $ref7= shift; my $ref8= shift; # see lexpr for how the above parameters are interpretted and why we have # so many that are not used # The light-weight counters are reported every sampling interval but since I think sexpr # needs to be contant, we'll always report all even if some only sampled periodically. # Same thing for gexpr, at least for now. if ($type eq 'l') { push @$ref1, "misc.uptime"; push @$ref2, sprintf("%.3f", $miscUptime/86400); push @$ref7, '%.3f'; push @$ref1, "misc.cpuMHz"; push @$ref2, sprintf("%d", $miscMHz); push @$ref1, "misc.mounts"; push @$ref2, sprintf("%d", $miscMounts); } elsif ($type eq 'g') { push @$ref2, 'num', 'num', 'num', 'num', 'num'; push @$ref1, "misc.uptime"; push @$ref3, sprintf("%d", $miscUptime/86400); push @$ref1, "misc.cpuMHz"; push @$ref3, sprintf("%d", $miscMHz); push @$ref1, "misc.mounts"; push @$ref3, sprintf("%d", $miscMounts); push @$ref1, "misc.logins"; push @$ref3, sprintf("%d", $miscLogins); } # Heavy-weight lexpr counters are only returned based on "i=" or default of 60 seconds return if !$miscAllFlag && (($miscSampleCounter-1) % $miscImportCount)!=0; if ($type eq 'l') { push @$ref1, "misc.logins"; push @$ref2, sprintf("%d", $miscLogins); } } # Type 1: return contents of first match # Type 2: return count of all matches sub grepData { my $type= shift; my $proc= shift; my $string=shift; my $tag= shift; # From getProc() if (!open PROC, "<$proc") { # but just report it once, but not foe nfs or proc data logmsg("W", "Couldn't open '$proc'") if !defined($miscNotOpened{$proc}); $miscNotOpened{$proc}=1; return(0); } my $count=0; foreach my $line () { next if $line!~/$string/; if ($type==1) { record(2, "$tag $line"); return; } $count++; } record(2, "$tag $count\n"); } 1; collectl-4.3.1/gexpr.ph0000775000175000017500000006763713366602004013172 0ustar mjsmjs# copyright, 2003-2009 Hewlett-Packard Development Company, LP # debug # 1 - print Var, Units and Values # 2 - only print sent 'changed' Var/Units/Vales # 4 - dump packet # 8 - do not open/use socket (typically used with other flags) # 16 - print socket open/close info # the 'magic' g/G flag # -g ONLY report well-known gangia variables # -G report ALL variables but replace those known by ganglia with their ganglia names our $gexInterval; my ($gexSubsys, $gexDebug, $gexCOFlag, $getSendCount, $gexTTL, $gexSocket, $gexPaddr); my ($gexHost, $gexPort); my (%gexDataLast, %gexDataMin, %gexDataMax, %gexDataTot, %gexTTL); my ($gexMinFlag, $gexMaxFlag, $gexAvgFlag, $gexTotFlag)=(0,0,0,0); my $gexPktSize=1024; my $gexOneTB=1024*1024*1024*1024; my $gexCounter=0; my $gexFlags; my $gexGFlag=0; my $gexMcast; my $gexMcastFlag=0; my $gexOutputFlag=1; my $gexColInt; # This sets a flag as soon as we 'require' the module and tells collectl this # module does socket communications w/o -A and so is ok to run as daemon # without requiring -f or -A. $exportComm |= 1; sub gexprInit { my $hostport=shift; help() if $hostport eq 'h'; error('--showcolheader not supported by gexpr') if $showColFlag; # Just like vmstat error("-f requires either --rawtoo or -P") if $filename ne '' && !$rawtooFlag && !$plotFlag; error("-P or --rawtoo require -f") if $filename eq '' && ($rawtooFlag || $plotFlag); # If we ever run with a ':' in the inteval, we need to be sure we're # only looking at the main one. my $gexInterval1=(split(/:/, $interval))[0]; # Options processing. must be combo of co, d, i and s (for now) $gexDebug=$gexCOFlag=0; $gexInterval=''; $gexSubsys=$subsys; $gexTTL=5; foreach my $option (@_) { my ($name, $value)=split(/=/, $option); error("invalid gexpr option '$name'") if $name!~/^[dgGhis]?$|^align|^co$|^ttl$|^min$|^max$|^avg$|^tot$/; $gexAlignFlag=1 if $name eq 'align'; $gexCOFlag=1 if $name eq 'co'; $gexDebug=$value if $name eq 'd'; $gexInterval=$value if $name eq 'i'; $gexGFlag+=1 if $name eq 'g'; $gexGFlag+=2 if $name eq 'G'; $gexSubsys=$value if $name eq 's'; $gexTTL=$value if $name eq 'ttl'; $gexMinFlag=1 if $name eq 'min'; $gexMaxFlag=1 if $name eq 'max'; $gexAvgFlag=1 if $name eq 'avg'; $gexTotFlag=1 if $name eq 'tot'; help() if $name eq 'h'; } error("only 1 of 'g' or 'G' with 'gexpr'") if $gexGFlag>2; error("gexpr does not support standard collectl socket I/O via -A") if $sockFlag; error("host:port must be specified as first parameter") if !defined($hostport) || $hostport eq ''; ($gexHost, $gexPort)=split(/:/, $hostport); error("the port number must be specified") if !defined($gexPort) || $gexPort eq ''; $gexMcastFlag=1 if $gexHost=~/^(\d+)/ && $1>=225 && $1<=239; error("gexpr subsys options '$gexSubsys' not a proper subset of '$subsys'") if $subsys ne '' && $gexSubsys ne '' && $gexSubsys!~/^[$subsys]+$/; $gexColInt=(split(/:/, $interval))[0]; $gexInterval=$gexColInt if $gexInterval eq ''; # convert to the number of samples we want to send $gexSendCount=int($gexInterval/$gexColInt); error("gexpr interval of '$gexInterval' is not a multiple of '$gexColInt' seconds") if $gexColInt*$gexSendCount != $gexInterval; $gexFlags=$gexMinFlag+$gexMaxFlag+$gexAvgFlag+$gexTotFlag; error("only 1 of 'min', 'max', 'avg' or 'tot' with 'gexpr'") if $gexFlags>1; error("'min', 'max', 'avg' & 'tot' require gexpr 'i' that is > collectl's -i") if $gexFlags && $gexSendCount==1; if ($gexAlignFlag) { my $div1=int(60/$gexColInt); my $div2=int($gexColInt/60); error("'align' requires collectl interval be a factor or multiple of 60 seconds") if ($gexColInt<=60 && $div1*$gexColInt!=60) || ($gexColInt>60 && $div2*60!=$gexColInt); error("'align' only makes sense when multiple samples/interval") if $gexInterval<=$gexColInt; error("'lexpr,align' requires -D or --align") if !$gexAlignFlag && !$daemonFlag; } # Since gexpr DOES write over a socket but does not use -A, make sure the default # behavior for -f logs matches that of -A $rawtooFlag=1 if $filename ne '' && !$plotFlag; # O p e n S o c k e t if (!$gexMcastFlag) { openSocket($gexHost, $gexPort); } else { error("must install IO::Socket::Multcast to use multicast feature") if !eval {require "IO/Socket/Multicast.pm"}; $gexMcast = IO::Socket::Multicast->new() or die "create group"; } } sub gexpr { # if not time to print and we're not doing min/max/avg/tot, there's nothing to do. # BUT always make sure time aligns to top of minute based on i= $gexCounter++; $gexOutputFlag=(($gexCounter % $gexSendCount) == 0) ? 1 : 0 if !$gexAlignFlag; $gexOutputFlag=(!(int($lastSecs[$rawPFlag]) % $gexInterval)) ? 1 : 0 if $gexAlignFlag; return if (!$gexOutputFlag && $gexFlags==0); if ($gexSubsys=~/c/i) { if ($gexSubsys=~/c/) { # CPU utilization is a % and we don't want to report fractions my $i=$NumCpus; if ($gexGFlag) # for both 'g' OR 'G' { sendData('cpu_user', 'percent', $userP[$i], 'cpu'); sendData('cpu_nice', 'percent', $niceP[$i], 'cpu'); sendData('cpu_system', 'percent', $sysP[$i], 'cpu'); sendData('cpu_wio', 'percent', $waitP[$i], 'cpu'); sendData('cpu_idle', 'percent', $idleP[$i], 'cpu'); sendData('cpu_num', 'CPUs', $NumCpus, 'cpu'); sendData('proc_total', 'Load/Procs', $loadQue, 'cpu'); sendData('proc_run', 'Load/Procs', $loadRun, 'cpu'); sendData('load_one', 'Load/Procs', $loadAvg1, 'cpu'); sendData('load_five', 'Load/Procs', $loadAvg5, 'cpu'); sendData('load_fifteen', 'Load/Procs', $loadAvg15, 'cpu'); } if (!$gexGFlag) # if not 'g' use standard collectl names { sendData('cputotals.user', 'percent', $userP[$i], 'cpu'); sendData('cputotals.nice', 'percent', $niceP[$i], 'cpu'); sendData('cputotals.sys', 'percent', $sysP[$i], 'cpu'); sendData('cputotals.wait', 'percent', $waitP[$i], 'cpu'); sendData('cputotals.idle', 'percent', $idleP[$i], 'cpu'); } if ($gexGFlag!=1) # 'G' or nothing { sendData('cputotals.irq', 'percent', $irqP[$i], 'cpu'); sendData('cputotals.soft', 'percent', $softP[$i], 'cpu'); sendData('cputotals.steal','percent', $stealP[$i], 'cpu'); sendData('ctxint.ctx', 'switches/sec', $ctxt/$intSecs, 'cpu'); sendData('ctxint.int', 'intrpts/sec', $intrpt/$intSecs, 'cpu'); sendData('ctxint.proc', 'pcreates/sec', $proc/$intSecs, 'cpu'); sendData('ctxint.runq', 'runqSize', $loadQue, 'cpu'); } if (!$gexGFlag) # do it again so that we report ALL cpu %s together { sendData('cpuload.avg1', 'loadAvg1', $loadAvg1, 'cpu'); sendData('cpuload.avg5', 'loadAvg5', $loadAvg5, 'cpu'); sendData('cpuload.avg15', 'loadAvg15', $loadAvg15, 'cpu'); } } if ($gexSubsys=~/C/) { for (my $i=0; $i<$NumCpus; $i++) { sendData("cpuinfo.user.cpu$i", 'percent', $userP[$i], 'cpu'); sendData("cpuinfo.nice.cpu$i", 'percent', $niceP[$i], 'cpu'); sendData("cpuinfo.sys.cpu$i", 'percent', $sysP[$i], 'cpu'); sendData("cpuinfo.wait.cpu$i", 'percent', $waitP[$i], 'cpu'); sendData("cpuinfo.irq.cpu$i", 'percent', $irqP[$i], 'cpu'); sendData("cpuinfo.soft.cpu$i", 'percent', $softP[$i], 'cpu'); sendData("cpuinfo.steal.cpu$i", 'percent', $stealP[$i], 'cpu'); sendData("cpunifo.idle.cpu$i", 'percent', $idleP[$i], 'cpu'); sendData("cpuinfo.intrpt.cpu$i",'percent', $intrptTot[$i], 'cpu'); } } } if ($gexSubsys=~/d/i && $gexGFlag!=1) { if ($gexSubsys=~/d/) { sendData('disktotals.reads', 'reads/sec', $dskReadTot/$intSecs, 'disk'); sendData('disktotals.readkbs', 'readkbs/sec', $dskReadKBTot/$intSecs, 'disk'); sendData('disktotals.writes', 'writes/sec', $dskWriteTot/$intSecs, 'disk'); sendData('disktotals.writekbs', 'writekbs/sec', $dskWriteKBTot/$intSecs, 'disk'); } if ($gexSubsys=~/D/) { for (my $i=0; $i<@dskOrder; $i++) { # preserve display order but skip any disks not seen this interval $dskName=$dskOrder[$i]; next if !defined($dskSeen[$i]); next if ($dskFiltKeep eq '' && $dskName=~/$dskFiltIgnore/) || ($dskFiltKeep ne '' && $dskName!~/$dskFiltKeep/); sendData("diskinfo.reads.$dskName", 'reads/sec', $dskRead[$i]/$intSecs, 'disk'); sendData("diskinfo.readkbs.$dskName", 'readkbs/sec', $dskReadKB[$i]/$intSecs, 'disk'); sendData("diskinfo.writes.$dskName", 'writes/sec', $dskWrite[$i]/$intSecs, 'disk'); sendData("diskinfo.writekbs.$dskName", 'writekbs/sec', $dskWriteKB[$i]/$intSecs, 'disk'); } } } if ($gexSubsys=~/f/ && $gexGFlag!=1) { if ($nfsSFlag) { sendData('nfsinfo.SRead', 'SvrReads/sec', $nfsSReadsTot/$intSecs, 'NFS server'); sendData('nfsinfo.SWrite', 'SvrWrites/sec', $nfsSWritesTot/$intSecs, 'NFS server'); sendData('nfsinfo.Smeta', 'SvrMeta/sec', $nfsSMetaTot/$intSecs, 'NFS server'); sendData('nfsinfo.Scommit', 'SvrCommt/sec' , $nfsSCommitTot/$intSecs, 'NFS server'); } if ($nfsCFlag) { sendData('nfsinfo.CRead', 'CltReads/sec', $nfsCReadsTot/$intSecs, 'NFS client'); sendData('nfsinfo.CWrite', 'CltWrites/sec', $nfsCWritesTot/$intSecs, 'NFS client'); sendData('nfsinfo.Cmeta', 'CltMeta/sec', $nfsCMetaTot/$intSecs, 'NFS client'); sendData('nfsinfo.Ccommit', 'CltCommt/sec' , $nfsCCommitTot/$intSecs, 'NFS client'); } } if ($gexSubsys=~/i/ && $gexGFlag!=1) { sendData('inodeinfo.dentnum', 'dentrynum', $dentryNum, 'inode'); sendData('inodeinfo.dentunused', 'dentryunused', $dentryUnused, 'inode'); sendData('inodeinfo.fhandalloc', 'filesalloc', $filesAlloc, 'inode'); sendData('inodeinfo.fhandmpct', 'filesmax', $filesMax, 'inode'); sendData('inodeinfo.inodenum', 'inodeused', $inodeUsed, 'inode'); } if ($gexSubsys=~/l/ && $gexGFlag!=1) { if ($CltFlag) { sendData('lusclt.reads', 'reads/sec', $lustreCltReadTot/$intSecs, 'Lustre client'); sendData('lusclt.readkbs', 'readkbs/sec', $lustreCltReadKBTot/$intSecs, 'Lustre client'); sendData('lusclt.writes', 'writes/sec', $lustreCltWriteTot/$intSecs, 'Lustre client'); sendData('lusclt.writekbs', 'writekbs/sec', $lustreCltWriteKBTot/$intSecs, 'Lustre client'); sendData('lusclt.numfs', 'filesystems', $NumLustreFS, 'Lustre client'); } if ($MdsFlag) { my $getattrPlus=$lustreMdsGetattr+$lustreMdsGetattrLock+$lustreMdsGetxattr; my $setattrPlus=$lustreMdsReintSetattr+$lustreMdsSetxattr; my $varName=($cfsVersion lt '1.6.5') ? 'reint' : 'unlink'; my $varVal= ($cfsVersion lt '1.6.5') ? $lustreMdsReint : $lustreMdsReintUnlink; my $varTitle = ($cfsVersion lt '1.6.5') ? 'Delete/Set Attr.' : 'File/Dir Deletes'; sendData('lusmds.gattrP', 'gattrP/sec', $getattrPlus/$intSecs, 'Lustre MDS', 'Get Attributes'); sendData('lusmds.sattrP', 'sattrP/sec', $setattrPlus/$intSecs, 'Lustre MDS', 'Set Attributes'); sendData('lusmds.sync', 'sync/sec', $lustreMdsSync/$intSecs, 'Lustre MDS', 'File Syncs'); sendData("lusmds.$varName", "$varName/sec", $varVal/$intSecs, 'Lustre MDS', '$varTitle'); } if ($OstFlag) { sendData('lusost.reads', 'reads/sec', $lustreReadOpsTot/$intSecs, 'Lustre OST'); sendData('lusost.readkbs', 'readkbs/sec', $lustreReadKBytesTot/$intSecs, 'Lustre OST'); sendData('lusost.writes', 'writes/sec', $lustreWriteOpsTot/$intSecs, 'Lustre OST'); sendData('lusost.writekbs', 'writekbs/sec', $lustreWriteKBytesTot/$intSecs, 'Lustre OST'); } } if ($gexSubsys=~/L/ && $gexGFlag!=1) { if ($CltFlag) { # Either report details by filesystem OR OST if ($lustOpts!~/O/) { for (my $i=0; $i<$NumLustreFS; $i++) { sendData("lusost.reads.$lustreCltFS[$i]", 'reads/sec', $lustreCltRead[$i]/$intSecs, 'Lustre client'); sendData("lusost.readkbs.$lustreCltFS[$i]", 'readkbs/sec', $lustreCltReadKB[$i]/$intSecs, 'Lustre client'); sendData("lusost.writes.$lustreCltFS[$i]", 'writes/sec', $lustreCltWrite[$i]/$intSecs, 'Lustre client'); sendData("lusost.writekbs.$lustreCltFS[$i]", 'writekbs/sec', $lustreCltWriteKB[$i]/$intSecs, 'Lustre client'); } } else { for (my $i=0; $i<$NumLustreCltOsts; $i++) { sendData("lusost.reads.$lustreCltOsts[$i]", 'reads/sec', $lustreCltLunRead[$i]/$intSecs, 'Lustre client'); sendData("lusost.readkbs.$lustreCltOsts[$i]", 'readkbs/sec', $lustreCltLunReadKB[$i]/$intSecs, 'Lustre client'); sendData("lusost.writes.$lustreCltOsts[$i]", 'writes/sec', $lustreCltLunWrite[$i]/$intSecs, 'Lustre client'); sendData("lusost.writekbs.$lustreCltOsts[$i]", 'writekbs/sec', $lustreCltLunWriteKB[$i]/$intSecs, 'Lustre client'); } } } if ($OstFlag) { for ($i=0; $i<$NumOst; $i++) { sendData("lusost.reads.$lustreOsts[$i]", 'reads/sec', $lustreReadOps[$i]/$intSecs, 'Lustre OST'); sendData("lusost.readkbs.$lustreOsts[$i]", 'readkbs/sec', $lustreReadKBytes[$i]/$intSecs, 'Lustre OST'); sendData("lusost.writes.$lustreOsts[$i]", 'writes/sec', $lustreWriteOps[$i]/$intSecs, 'Lustre OST'); sendData("lusost.writekbs.$lustreOsts[$i]", 'writekbs/sec', $lustreWriteKBytes[$i]/$intSecs, 'Lustre OST'); } } } if ($gexSubsys=~/m/) { if ($gexGFlag) # 'g' or 'G' { sendData('mem_total', 'Bytes', $memTot, 'memory'); sendData('mem_free', 'Bytes', $memFree, 'memory'); sendData('mem_shared', 'Bytes', $memShared, 'memory'); sendData('mem_buffers', 'Bytes', $memBuf, 'memory'); sendData('mem_cached', 'Bytes', $memCached, 'memory'); sendData('swap_total', 'Bytes', $swapTotal, 'memory'); sendData('swap_free', 'Bytes', $swapFree, 'memory'); } if (!$gexGFlag) # neither { sendData('meminfo.tot', 'kb', $memTot, 'memory'); sendData('meminfo.free', 'kb', $memFree, 'memory'); sendData('meminfo.shared', 'kb', $memShared, 'memory'); sendData('meminfo.buf', 'kb', $memBuf, 'memory'); sendData('meminfo.cached', 'kb', $memCached, 'memory'); sendData('swapinfo.total', 'kb', $swapTotal, 'memory'); sendData('swapinfo.free', 'kb', $swapFree, 'memory'); } if ($gexGFlag!=1) # nothing or 'G' { sendData('meminfo.used', 'kb', $memUsed, 'memory'); sendData('meminfo.slab', 'kb', $memSlab, 'memory'); sendData('meminfo.map', 'kb', $memMap, 'memory'); sendData('meminfo.hugetot', 'kb', $memHugeTot, 'memory'); sendData('meminfo.hugefree', 'kb', $memHugeFree, 'memory'); sendData('meminfo.hugersvd', 'kb', $memHugeRsvd, 'memory'); sendData('swapinfo.used', 'kb', $swapUsed, 'memory'); sendData('swapinfo.in', 'swaps/sec', $swapin/$intSecs, 'memory'); sendData('swapinfo.out', 'swaps/sec', $swapout/$intSecs, 'memory'); sendData('pageinfo.fault', 'faults/sec', $pagefault/$intSecs, 'memory'); sendData('pageinfo.majfault', 'majflt/sec', $pagemajfault/$intSecs, 'memory'); sendData('pageinfo.in', 'pages/sec', $pagein/$intSecs, 'memory'); sendData('pageinfo.out', 'pages/sec', $pageout/$intSecs, 'memory'); } } # gexFlag doesn't apply if ($gexSubsys=~/M/) { for (my $i=0; $i<$CpuNodes; $i++) { foreach my $field ('used', 'free', 'slab', 'map', 'anon', 'lock', 'act', 'inact') { sendData("numainfo.$field.$i", 'kb', $numaMem[$i]->{$field}, 'memory'); } } } if ($gexSubsys=~/n/i) { if ($gexSubsys=~/n/) { if ($gexGFlag) # 'g' or 'G' { sendData('bytes_in', 'Bytes/sec', $netRxKBTot*1024/$intSecs, 'network'); sendData('bytes_out', 'Bytes/sec', $netTxKBTot*1024/$intSecs, 'network'); sendData('pkts_in', 'pkts/sec', $netRxPktTot/$intSecs, 'network'); sendData('pkts_out', 'pkts/sec', $netTxPktTot/$intSecs, 'network'); } else # neither { sendData('nettotals.kbin', 'kb/sec', $netRxKBTot/$intSecs, 'network'); sendData('nettotals.pktin', 'pkts/sec', $netRxPktTot/$intSecs, 'network'); sendData('nettotals.kbout', 'kb/sec', $netTxKBTot/$intSecs, 'network'); sendData('nettotals.pktout', 'pkts/sec', $netTxPktTot/$intSecs, 'network'); } } if ($gexSubsys=~/N/) { for ($i=0; $i<@netOrder; $i++) { $netName=$netOrder[$i]; next if !defined($netSeen[$i]); next if ($netFiltKeep eq '' && $netName=~/$netFiltIgnore/) || ($netFiltKeep ne '' && $netName!~/$netFiltKeep/); next if $netName=~/lo|sit/; sendData("nettotals.kbin.$netName", 'kb/sec', $netRxKB[$i]/$intSecs, 'network'); sendData("nettotals.pktin.$netName", 'pkts/sec', $netRxPkt[$i]/$intSecs, 'network'); sendData("nettotals.kbout.$netName", 'kb/sec', $netTxKB[$i]/$intSecs, 'network'); sendData("nettotals.pktout.$netName", 'pkts/sec', $netTxPkt[$i]/$intSecs, 'network'); } } } if ($gexSubsys=~/s/ && $gexGFlag!=1) { sendData("sockinfo.used", 'sockets', $sockUsed, 'socket'); sendData("sockinfo.tcp", 'sockets', $sockTcp, 'socket'); sendData("sockinfo.orphan",'sockets', $sockOrphan, 'socket'); sendData("sockinfo.tw", 'sockets', $sockTw, 'socket'); sendData("sockinfo.alloc", 'sockets', $sockAlloc, 'socket'); sendData("sockinfo.mem", 'sockets', $sockMem, 'socket'); sendData("sockinfo.udp", 'sockets', $sockUdp, 'socket'); sendData("sockinfo.raw", 'sockets', $sockRaw, 'socket'); sendData("sockinfo.frag", 'sockets', $sockFrag, 'socket'); sendData("sockinfo.fragm", 'sockets', $sockFragM, 'socket'); } if ($gexSubsys=~/t/ && $gexGFlag!=1) { sendData("tcpinfo.iperrs", 'num/sec', $ipErrors/$intSecs, 'tcp') if $tcpFilt=~/i/; sendData("tcpinfo.tcperrs", 'num/sec', $tcpErrors/$intSecs, 'tcp') if $tcpFilt=~/t/; sendData("tcpinfo.udperrs", 'num/sec', $udpErrors/$intSecs, 'tcp') if $tcpFilt=~/u/; sendData("tcpinfo.icmperrs", 'num/sec', $icmpErrors/$intSecs, 'tcp') if $tcpFilt=~/c/; sendData("tcpinfo.tcpxerrs", 'num/sec', $tcpExErrors/$intSecs, 'tcp') if $tcpFilt=~/T/; } if ($gexSubsys=~/x/i && $gexGFlag!=1) { if ($NumXRails) { $kbInT= $elanRxKBTot; $pktInT= $elanRxTot; $kbOutT= $elanTxKBTot; $pktOutT=$elanTxTot; } if ($NumHCAs) { $kbInT= $ibRxKBTot; $pktInT= $ibRxTot; $kbOutT= $ibTxKBTot; $pktOutT=$ibTxTot; } sendData("iconnect.kbin", 'kb/sec', $kbInT/$intSecs, 'infiniband', 'Data Received'); sendData("iconnect.pktin", 'pkt/sec', $pktInT/$intSecs, 'infiniband', 'Packets Received'); sendData("iconnect.kbout", 'kb/sec', $kbOutT/$intSecs, 'infiniband', 'Data Transmitted'); sendData("iconnect.pktout", 'pkt/sec', $pktOutT/$intSecs, 'infiniband', 'Packets Transmitted'); } if ($gexSubsys=~/E/i && $gexGFlag!=1) { foreach $key (sort keys %$ipmiData) { for (my $i=0; $i{$key}}); $i++) { my $name=$ipmiData->{$key}->[$i]->{name}; my $inst=($key!~/power/ && $ipmiData->{$key}->[$i]->{inst} ne '-1') ? $ipmiData->{$key}->[$i]->{inst} : ''; sendData("env.$name$inst", $name, $ipmiData->{$key}->[$i]->{value}, 'IPMI'); } } } # if any imported data, it may want to include gexpr output. However this means getting a list of # 3-tuples to call OUR formatting routines with so the import module doesn't have to. # NOTE - the assumption is no ganglia specific counters. If there ever are, we'll need to remove # restriction and ALL imports will have to deal with $gexFlag if called from here if ($gexGFlag!=1) { my (@names, @units, @vals, @groups, @titles); for (my $i=0; $i<$impNumMods; $i++) { &{$impPrintExport[$i]}('g', \@names, \@units, \@vals, \@groups, \@titles); } foreach (my $i=0; $i$gexDataMax{$name}; $gexDataTot{$name}+=$value if $gexAvgFlag || $gexTotFlag; } return('') if !$gexOutputFlag; # A c t u a l S e n d H a p p e n s H e r e # If doing min/max/avg, reset $value if ($gexFlags) { $value=$gexDataMin{$name} if $gexMinFlag; $value=$gexDataMax{$name} if $gexMaxFlag; $value=$gexDataTot{$name} if $gexTotFlag; $value=($gexDataTot{$name}/$gexCounter) if $gexAvgFlag; } # Always send send data if not CO mode,but if so only send when it has # indeed changed OR TTL about to expire my $valSentFlag=0; if (!$gexCOFlag || $value!=$gexDataLast{$name} || $gexTTL{$name}==1) { $valSentFlag=1; sendMetaPacket($name, $units, $group, $title); sendDataPacket($name, $value); $gexDataLast{$name}=$value; } # A fair chunk of work, but worth it if ($gexDebug & 3) { my ($intSeconds, $intUsecs); if ($hiResFlag) { # we have to fully qualify name because or 'require' vs 'use' ($intSeconds, $intUsecs)=Time::HiRes::gettimeofday(); } else { $intSeconds=time; $intUsecs=0; } $intUsecs=sprintf("%06d", $intUsecs); my ($sec, $min, $hour)=localtime($intSeconds); my $timestamp=sprintf("%02d:%02d:%02d.%s", $hour, $min, $sec, substr($intUsecs, 0, 3)); printf "$timestamp Name: %-25s Units: %-12s Val: %8d Group: %-10s TTL: %3d Title: %-20s %s\n", $name, $units, $value, defined($group) ? $group : '-', $gexTTL{$name}, defined($title) ? $title : '-', ($valSentFlag) ? 'sent' : '' if $gexDebug & 1 || $valSentFlag; } # TTL only applies when in 'CO' mode, noting we already made expiration # decision above when we saw counter of 1 if ($gexCOFlag) { $gexTTL{$name}-- if !$valSentFlag; $gexTTL{$name}=$gexTTL if $valSentFlag || $gexTTL{$name}==0; } } sub sendMetaPacket { my $name= shift; my $units=shift; my $group=shift; my $title=shift; my $numOptArgs=0; $numOptArgs++ if defined($group); $numOptArgs++ if defined($title); my $string=''; $string.=pack('N', 0x80); $string.=pack('N', length($myHost)); $string.=packString($myHost); $string.=pack('N', length($name)); $string.=packString($name); $string.=pack('N', 0); # spoof $string.=pack('N', length('double')); $string.=packString('double'); $string.=pack('N', length($name)); $string.=packString($name); $string.=pack('N', length($units)); $string.=packString($units); $string.=pack('N', 3); # slope $string.=pack('N', 2*$gexTTL*$gexInterval); # time to live $string.=pack('N', 4*$gexTTL*$gexInterval); # dmax $string.=pack('N', $numOptArgs); if (defined($group)) { $string.=pack('N', length('GROUP')); $string.=packString('GROUP'); $string.=pack('N', length($group)); $string.=packString($group) } if (defined($title)) { $string.=pack('N', length('TITLE')); $string.=packString('TITLE'); $string.=pack('N', length($title)); $string.=packString($title); } sendUDP($string); } sub sendDataPacket { my $name= shift; my $value=shift; my $string=''; $string.=pack('N', 0x85); $string.=pack('N', length($myHost)); $string.=packString($myHost); $string.=pack('N', length($name)); $string.=packString($name); $string.=pack('N', 0); $string.=pack('N', 2); $string.=packString("%s"); $string.=pack('N', length($value)); $string.=packString($value); sendUDP($string); } sub sendUDP { my $data=shift; dumpUDP($data) if $gexDebug & 4; return if $gexDebug & 8; my $length=length($data); for (my $offset=0; $length>0; ) { # Either send as regular UDP packet(s) OR send to the multicast address my $bytes=(!$gexMcastFlag) ? send($gexSocket, substr($data, $offset, $gexPktSize), 0, $gexPaddr) : $gexMcast->mcast_send($data, "$gexHost:$gexPort"); if (!defined($bytes)) { print "Error: '$!' writing to socket"; last; } $offset+=$bytes; $length-=$bytes; } } sub packString { my $string=shift; my $pad=4-(length($string) % 4); $pad=0 if $pad==4; for (my $i=0; $i<$pad; $i++) { $string.=pack('c', 0); } return($string); } sub dumpUDP { my $output=shift; for (my $i=0; $i{this}->[$i]=0; } } if ($versionFlag) { print "statsd V$version -s$server -t$stype -p$ptype\n"; exit(0); } $$impOptsref='s'; # only summary data $$impKeyref='statsd'; return(1); } # Anything you might want to add to collectl's header. sub statsdUpdateHeader { } sub statsdGetData { # rare that the log isn't aleady open other than duing initial passs, # perhaps at system startup if (!$logfileOpen) { $logfileOpen=1 if open LOG, "<$logfile"; } # but only if successfully opened if ($logfileOpen) { seek(LOG, 0, 0); while (my $line=) { if (defined($line)) { next if $line=~/^#/; record(2, "statsd $line"); } } } } sub statsdInitInterval { } sub statsdAnalyze { my $type=shift; # not used my $dataref=shift; if ($$dataref!~/^#/ && $$dataref!~/V/) { my ($statsType, @fields)=split(/\s+/, $$dataref); return if ($statsType=~/^a/ && $server!~/a/) || ($statsType=~/^c/ && $server!~/c/) || ($statsType=~/^o/ && $server!~/o/) || ($statsType=~/^p/ && $server!~/p/); if (!defined($data{$statsType}->{last})) { for (my $i=0; $i<@fields; $i++) { $data{$statsType}->{last}->[$i]=0; } } for (my $i=0; $i<@fields; $i++) { # normally fields just contain raw counters, but in some cases they may contain a # return status code (such as with prxycon). so let's be more general and when we # find a ':', treat the left-side as a subtype if ($fields[$i]!~/:/) { $data{$statsType}->{this}->[$i]=$fields[$i]-$data{$statsType}->{last}->[$i]; $data{$statsType}->{last}->[$i]=$fields[$i]; } else { # since subtypes are dynamic, we need to make sure if we're seeing for # the first time we properly initialize it. Also note we want to use # a string for the subtype (to be more flexible) and so need a slightly # different structure since {last} and {this} point to arrays my ($subtype, $value)=split(/:/, $fields[$i]); $data{$statsType}->{sub}->{last}->{$subtype}=$value if !defined($data{$statsType}->{sub}->{last}->{$subtype}); $data{$statsType}->{sub}->{this}->{$subtype}=$value-$data{$statsType}->{sub}->{last}->{$subtype}; $data{$statsType}->{sub}->{last}->{$subtype}=$value; } } } } sub statsdPrintBrief { my $type=shift; my $lineref=shift; if ($type==1) # header line 1 { $$lineref.="<--------Ops--------|------Errors-------|-----Misc----->"; } elsif ($type==2) # header line 2 { $$lineref.=" Acct Cont Objs Prxy Acct Cont Objs Prxy HOff Asyn Ulnk "; } elsif ($type==3) # data { # first 5 ops ALWAYS put, get, post, delete and head my ($accOps, $conOps, $objOps, $prxOps)=(0,0,0,0); for (my $i=0; $i<6; $i++) { $accOps+=$data{accsrvr}->{this}->[$i] if defined($data{accsrvr}); $conOps+=$data{consrvr}->{this}->[$i] if defined($data{consrvr}); $objOps+=$data{objsrvr}->{this}->[$i] if defined($data{objsrvr}); # these are actually broken out by server type in the proxy $prxOps+=$data{prxyacc}->{this}->[$i] if defined($data{prxyacc}); $prxOps+=$data{prxycon}->{this}->[$i] if defined($data{prxycon}); $prxOps+=$data{prxyobj}->{this}->[$i] if defined($data{prxyobj}); } my ($accErrs, $conErrs, $objErrs, $prxErrs)=(0,0,0,0); $accErrs=$data{accsrvr}->{this}->[6] if defined($data{accsrvr}); $conErrs=$data{consrvr}->{this}->[6] if defined($data{consrvr}); $objErrs=$data{objsrvr}->{this}->[6] if defined($data{objsrvr}); # these are also by server type $prxErrs+=$data{prxyacc}->{this}->[8] if defined($data{prxyacc}); $prxErrs+=$data{prxycon}->{this}->[8] if defined($data{prxycon}); $prxErrs+=$data{prxyobj}->{this}->[8] if defined($data{prxyobj}); $handoff=(defined($data{prxsrvr})) ? $data{prxsrvr}->{this}->[9] : 0; $async= (defined($data{objsrvr})) ? $data{objsrvr}->{this}->[8] : 0; $unlink= (defined($data{objupdt})) ? $data{objupdt}->{this}->[4] : 0; $$lineref.=sprintf(" %4d %4d %4d %4d %4d %4d %4d %4d %4d %4d %4d", $accOps/$intSecs, $conOps/$intSecs, $objOps/$intSecs, $prxOps/$intSecs, $accErrs/$intSecs, $conErrs/$intSecs, $objErrs/$intSecs, $prxErrs/$intSecs, $handoff/$intSecs, $async/$intSecs, $unlink/$intSecs); } elsif ($type==4) # reset 'total' counters { } elsif ($type==5) # increment 'total' counters { } elsif ($type==6) # print 'total' counters { } } sub statsdPrintVerbose { my $printHeader=shift; my $homeFlag= shift; my $lineref= shift; # when mixing multiple server types the column spacing can get messed up my $has_acc=($server=~/a/ && $stype=~/[arls]/) ? 1 : 0; my $has_con=($server=~/c/ && $stype=~/[arsyu]/) ? 1 : 0; my $has_obj=($server=~/c/ && $stype=~/[axlsu]/) ? 1 : 0; # P r i n t 1 s t H e a d e r L i n e # Note that last line of verbose data (if any) still sitting in $$lineref my $line=$temp=''; if ($printHeader) { $line.="#$miniFiller"; my ($acc_width, $con_width, $obj_width)=(0,0,0); if ($server=~/a/) { $temp=''; $temp.="---Auditor----|" if defined($selTypes{aa}); $temp.="---------------------Reaper---------------------|" if defined($selTypes{ap}); $temp.="----------------------Replicator----------------------|" if defined($selTypes{ar}); $temp.="--------------Server--------------|" if defined($selTypes{as}); $temp=~s/\|$/>/; $line.="<$temp" if $temp ne ''; $acc_width=length($temp); } if ($server=~/c/) { $temp=''; $temp.="---Auditor----|" if defined($selTypes{ca}); $temp.="----------------------Replicator----------------------|" if defined($selTypes{cr}); $temp.="--------------Server--------------|" if defined($selTypes{cs}); $temp.="----------Sync----------|" if defined($selTypes{cy}); $temp.="---Updater----|" if defined($selTypes{cu}); $temp=~s/\|$/>/; $line.="<$temp" if $temp ne ''; $con_width=length($temp); } if ($server=~/o/) { $temp=''; $temp.="-Auditor-|" if defined($selTypes{oa}); $temp.="-Expirer-|" if defined($selTypes{ox}); $temp.="-----Replicator------|" if defined($selTypes{or}); $temp.="-----------------------Server-----------------------|" if defined($selTypes{os}); $temp.="---------Updater---------|" if defined($selTypes{ou}); $temp=~s/\|$/>/; $line.="<$temp" if $temp ne ''; $obj_width=length($temp); } # if including html status codes we need to calculate extra padding # which though a pain in the butt is worth it my ($pre, $post)=('',''); if (defined($selTypes{pa})) { if ($returnFlag) { my $pad=0; foreach my $key (sort keys %{$data{prxyacc}->{sub}->{this}}) { $pad+=5; } $pre=$post='-' x int($pad/2); $post.='-' if $pad % 2 != 0; # if odd... } $line.="<$pre-------------------------Proxy Acc Server$post------------------------->"; } if (defined($selTypes{pc})) { if ($returnFlag) { my $pad=0; foreach my $key (sort keys %{$data{prxycon}->{sub}->{this}}) { $pad+=5; } $pre=$post='-' x int($pad/2); $post.='-' if $pad % 2 != 0; # if odd... } $line.="<$pre-------------------------Proxy Con Server$post------------------------->"; } if (defined($selTypes{po})) { if ($returnFlag) { my $pad=0; foreach my $key (sort keys %{$data{prxyobj}->{sub}->{this}}) { $pad+=5; } $pre=$post='-' x int($pad/2); $post.='-' if $pad % 2 != 0; # if odd... } $line.="<$pre-------------------------Proxy Obj Server$post------------------------->"; } $line.="\n"; # real ugly, but once we know the lenghts of each section, we can # preface the headers with the section names. it's still not # perfect in all cases but it's close enough! my $pre_header=''; if ($acc_width) { my ($pre, $post)=getpad($acc_width, 'Account'); $temp="${pre}Account$post"; $pre_header.=" $temp"; } if ($con_width) { my ($pre, $post)=getpad($con_width, 'Container'); $temp="${pre}Container$post"; $pre_header.=" $temp"; } if ($obj_width) { my ($pre, $post)=getpad($obj_width, 'Object'); $temp="${pre}Object$post"; $pre_header.=" $temp"; } $line="#$miniFiller$pre_header\n$line"; # P r i n t 2 n d H e a d e r L i n e $line.="#$miniDateTime"; if ($server=~/a/) { $temp=''; $temp.="$headers{'accaudt'} " if defined($selTypes{aa}); $temp.="$headers{'accreap'} " if defined($selTypes{ap}); $temp.="$headers{'accrepl'} " if defined($selTypes{ar}); $temp.="$headers{'accsrvr'} " if defined($selTypes{as}); $temp=~s/\|$/ /; $line.=" $temp" if $temp ne ''; } if ($server=~/c/) { $temp=''; $temp.="$headers{'conaudt'} " if defined($selTypes{ca}); $temp.="$headers{'conrepl'} " if defined($selTypes{cr}); $temp.="$headers{'consrvr'} " if defined($selTypes{cs}); $temp.="$headers{'consync'} " if defined($selTypes{cy}); $temp.="$headers{'conupdt'} " if defined($selTypes{cu}); $temp=~s/\|$/ /; $line.=" $temp" if $temp ne ''; } if ($server=~/o/) { $temp=''; $temp.="$headers{'objaudt'} " if defined($selTypes{oa}); $temp.="$headers{'objexpr'} " if defined($selTypes{ox}); $temp.="$headers{'objrepl'} " if defined($selTypes{or}); $temp.="$headers{'objsrvr'} " if defined($selTypes{os}); $temp.="$headers{'objupdt'} " if defined($selTypes{ou}); $line.=" $temp" if $temp ne ''; } # when ONLY reporting on proxies we need an extra space to start thing out right $line.=' ' if $server eq 'p'; if (defined($selTypes{pa})) { $line.="$headers{'prxyacc'} "; if ($returnFlag) { foreach my $key (sort keys %{$data{prxyacc}->{sub}->{this}}) { $line.=sprintf("%4s ", $key); } } } if (defined($selTypes{pc})) { $line.="$headers{'prxycon'} "; if ($returnFlag) { foreach my $key (sort keys %{$data{prxycon}->{sub}->{this}}) { $line.=sprintf("%4s ", $key); } } } if (defined($selTypes{po})) { $line.="$headers{'prxyobj'} "; if ($returnFlag) { foreach my $key (sort keys %{$data{prxyobj}->{sub}->{this}}) { $line.=sprintf("%4s ", $key); } } } $line.="\n"; } $$lineref=$line; return if $showColFlag; # P r i n t D a t a $line=''; foreach my $serverType ('acc', 'con', 'obj', 'prxy') { my $serverShort=substr($serverType, 0, 1); next if $server!~/$serverShort/; # if we printed any data for acc/con, we may need to shift over a space $line.=' ' if $serverType eq 'con' && $has_acc; $line.=' ' if $serverType eq 'obj' && ($has_acc || $has_con); for my $dataType (sort keys %headers) { next if $dataType!~/^$serverType/; my $subType=($serverShort ne 'p') ? substr($dataType, 3, 1) : substr($dataType, 4, 1); $subType='p' if $dataType=~/reap$/; $subType='x' if $dataType=~/expr$/; $subType='y' if $dataType=~/sync$/; next if !defined($selTypes{"$serverShort$subType"}); # we need to trim leading whitespace or first field will be empty my $trimmed=$headers{$dataType}; $trimmed=~s/^\s+//; @fields=split(/\s+/, $trimmed); #printf "DType: $dataType Trimmed: $trimmed NUM: %d\n", scalar(@fields); # last field for objsrvr is special because it's a composite of 9 and 10 my $type="$serverType$subType"; for (my $i=0; $i<@{$data{$dataType}->{this}}; $i++) { if ($type ne 'objs' || $i<9) { $width=length($fields[$i]); $width=4 if $width==3; $line.=sprintf(" %${width}d", $data{$dataType}->{this}->[$i]/$intSecs); } elsif ($type eq 'objs' && $i==10) { # make sure there's a non-zero value to compute $secs=($data{$dataType}->{this}->[9]) ? $data{$dataType}->{this}->[10]/$data{$dataType}->{this}->[9] : 0; $line.=sprintf(" %7.3f", $secs); } } # only proxy data has return codes if ($serverShort eq 'p' && $returnFlag) { foreach my $key (sort keys %{$data{$dataType}->{sub}->{this}}) { $line.=sprintf(" %4d", $data{$dataType}->{sub}->{this}->{$key}/$intSecs); } } } } $$lineref.="$datetime $line\n"; } # NOTE - only summary data collected sub statsdPrintPlot { my $type= shift; my $ref1= shift; foreach my $serverType ('acc', 'con', 'obj', 'prxy') { my $serverShort=substr($serverType, 0, 1); next if $server!~/$serverShort/; for my $dataType (sort keys %headers) { next if $dataType!~/^$serverType/; { # the single char subtype is the 4th char in the datatype # except for proxy servers in which case it's the 5th. # Also for 'reap', 'expr' and 'sync' we need to reset the subtype my $subType=($serverShort ne 'p') ? substr($dataType, 3, 1) : substr($dataType, 4, 1); $subType='p' if $dataType=~/reap$/; $subType='x' if $dataType=~/expr$/; $subType='y' if $dataType=~/sync$/; next if !defined($selTypes{"$serverShort$subType"}); #print "DataType: $dataType Short: $serverShort SvcType: $stype SubType: $subType\n"; if ($type==1) { # need to get rid of leading whitespace before split my $trimmed=$headers{$dataType}; $trimmed=~s/^\s+//; for my $header (split(/\s+/, $trimmed)) { $$ref1.=sprintf("[SW-%s]$header${SEP}", uc($dataType)); } } else { my $type="$serverType$subType"; for (my $i=0; $i<@{$data{$dataType}->{this}}; $i++) { if ($type ne 'objs' || $i<9) { $$ref1.=sprintf("$SEP%d", $data{$dataType}->{this}->[$i]/$intSecs); } elsif ($type eq 'objs' && $i==10) { # make sure there's a non-zero value to compute $secs=($data{$dataType}->{this}->[9]) ? $data{$dataType}->{this}->[10]/$data{$dataType}->{this}->[9] : 0; $$ref1.=sprintf("$SEP%.3f", $secs); } } } } } } } # REMEMBER - only summary data collected sub statsdPrintExport { my $type=shift; my $ref1=shift; my $ref2=shift; my $ref3=shift; my $ref4=shift; my $ref5=shift; my $ref6=shift; if ($type eq 'l') { } elsif ($type eq 'g') { } } sub getpad { my $width=shift; my $title=shift; # here's the problem, with width we're being passed include the terminatin '>' # in the header but we want to centers over the ----s. So, if the pad is an # even number we want to shift one to the left. But if odd, we come up sort # in our pad chars so add one more to the right because that's how the uneven # headers centering is done. my $totpad=$width-length($title); my $pad=int($totpad/2); my $pre=$post=' ' x $pad; if ($totpad % 2 == 0) { $pre=' ' x ($pad-1); $post=' ' x ($pad+1); } else { $pre=' ' x $pad; $post=' ' x ($pad+1); } return($pre, $post); } sub statsdHelp { my $help=<ProLiant BL460c G1,ProLiant BL460c G6,ProLiant BL490c G6< >ProLiant BL685c G1,ProLiant BL2x220c G5< >ProLiant DL145 G2< >ProLiant DL360 G3,ProLiant DL380 G3,ProLiant DL380 G5< >ProLiant DL385 G5,ProLiant DL585 G5,ProLiant DL585 G7< >ProLiant ML370 G6< # These are the real rules and the headers much EXACTLY match everything to the # right of the ": " returned by the command "dmidecode|grep -m1 'Product Name'" # for example: "Product Name: ProLiant BL460c G1" >ProLiant DL185 G5< [pre] /CPU(\d) Diode/CPUTemp$1/ /Power Ambient/ATemp/ /Front Panel Temp/FPTemp/ >ProLiant DL160 G5< [pre] / ROTOR/R/ /Rear Ambient/RTemp/ /PCI Ambient/ATemp/ /CPU(\d) Dmn (\d) Temp/Temp$1-$2/ >ProLiant DL785< [pre] /Zone /Z/ >SE4208< [pre] /^CPU/CTemp/ /^System/STemp/ /^FAN6CPU/Fan6C/ # There is a lot of remappings to get SL170h temperatures into fixed format as follows # CPU1 Temp->CTemp1 CPU2 Temp->CTemp2 CPU1 DIMM->CTemp1D CPU2 DIMM->CTemp2D # CPU2_DIMM6G->CTemp6G CPU2_DIMM5B->CTemp5B CPU2_DIMM4E->CTemp4E # CPU2_DIMM3H->CTemp3H CPU2_DIMM2C->CTemp2C CPU2_DIMM1F->CTemp1F # Node Exit Amb->ETemp Front I/O Amb->FTemp HDD BP Amb->HTemp # Node Inlet Amb->ITemp PCIE local Temp->LTemp Northbridge Amb->NTemp PSU BP->PTemp >ProLiant SL170h G6,ProLiant SL2x170z G6< [pre] /^CPU(\d) DIMM Amb/CTemp$1D/ /^Node // /Amb/Temp/ /DIMM(.*)/Temp$1/ /.*Internal/Temp/ /^PCIE l/L/ # This on also has a lot of remapping >ProLiant DL160 G6,ProLiant DL160se G6< [pre] /_INLET/I/ /_OUTLET/O/ /Rear Board/xTemp/ /^Front Board/BTemp/ /^Front dimm/FTemp/ /^Rear dimm/RTemp/ /^PCI outlet/PTemp/ /^IOH outlet/ITemp1/ /^Front IOH/ITemp2/ /^Inlet ambient/ATemp/ / sensor/ Temp/ [post] /^BTemp/BTempF/ /^xTemp/BTempR/ [ignore] /^Fan Redundant/ >ProLiant SL230s Gen8,ProLiant SL250s Gen8< [pre] /^\d*-// /Inlet Ambient/ATemp/ /CPU/CTemp/ /Chipset/CTemp/ /.*P(\d) DIMM (\d).*/$1Temp$2D/ /HD Max/mTemp/ /HD C.*/cTemp/ /HD z.*/zTemp/ /VR P(\d)Mem Zone/VTemp$1MZ/ /VR P(\d) ([MZ]).*/VTemp$1$2/ /VR P(\d)/VTemp$1/ /SuperCap.*/STemp/ /LOM Card/lTemp/ /LOM/LTemp/ /PCI 2 Zone/pTemp/ /PCI 2/PTemp/ /PS Board/uTemp/ /Sys Ex.*/STemp/ /Coprocessor (\d)/GTemp$1/ [post] /^STemp/STempE/ /^cTemp/HTempC/ /^mTemp/HTempM/ /^zTemp/HTempZ/ /^lTemp/LTempC/ /^pTemp/PTempZ/ /^uTemp/PTempS/ >SE1170s< [pre] /Fan(\d)a/F$1a/ /CPU(\d)_DIMM(.*)/$1Temp$2/ /HDD BP(\d).*/HTemp$1/ /PCIE.*/PTemp/ /Front.*/ATemp/ collectl-4.3.1/hello.ph0000775000175000017500000001364713366602004013140 0ustar mjsmjs# copyright, 2003-2009 Hewlett-Packard Development Company, LP # H e l l o W o r l d # Though not required, it is especially useful to use strict to force all variable to be # declare and minimize the risk of stepping on any that collectl itself uses use strict; # Allow reference to collectl variables, but be CAREFUL as these should be treated as readonly our ($miniFiller, $rate, $SEP, $datetime, $intSecs, $showColFlag); # Global to this module my $counter=0; my ($hwOpts, $hwTot, @hwNow, $hwTotTOT); # support for 's','d', and 'sd, assuming 's' is the default. Since collectl doesn't # restrict which options are valid, we must to it ourself and call errror() accordingly # For an additional example of error handling, if this module required CPU data to be # collected as well, you could incldue an error message based on the condition $subsys!~/c/i; sub helloInit { my $impOptsref=shift; my $impKeyref= shift; $hwOpts=$$impOptsref; error('valid hw options are: s,d and sd') if defined($hwOpts) && $hwOpts!~/^[sd]*$/; $hwOpts='s' if !defined($hwOpts); $$impOptsref=$hwOpts; $$impKeyref='hw'; return(1); } # Anything you might want to add to collectl's header. # Try the command 'collectl --import hello --showheader' sub helloUpdateHeader { my $lineref=shift; $$lineref.="# HelloWorld: Version 1.0\n"; } # Simulate 3 lines of data being read from /proc and include a further qualifier to act # as a device number. See how this is used in Analyze(). sub helloGetData { for (my $i=0; $i<3; $i++) { my $string=sprintf("HelloWorld %d\n", $i*10*$counter++); record(2, "hw-$i $string"); } } # Reset running total for the 3 'devices' for the current interval, which will be # diplayed in both brief and summary formats. sub helloInitInterval { $hwTot=0; } # We could get fancier and look at how much each counter changed between intervals by # subtracting the last value from the current one, but we're only going to look at # explict values to keep things simple. sub helloAnalyze { my $type= shift; my $dataref=shift; $type=~/^hw-(.*)/; my $index=$1; my @fields=split(/\s+/, $$dataref); $hwNow[$index]=$fields[1]; $hwTot+=$fields[1]; } # This and the 'print' routines should be self explanitory as they pretty much simply # return a string in the appropriate format for collectl to dispose of. sub helloPrintBrief { my $type=shift; my $lineref=shift; if ($type==1) # header line 1 { $$lineref.="<-Hello->"; } elsif ($type==2) # header line 2 { $$lineref.=" Total "; } elsif ($type==3) # data { $$lineref.=sprintf(" %4s ", cvt($hwTot/$intSecs)); } elsif ($type==4) # reset 'total' counters { $hwTotTOT=0; } elsif ($type==5) # increment 'total' counters { $hwTotTOT+=$hwTot; } elsif ($type==6) # print 'total' counters { # Since this never goes over a socket we can just do a simple print. printf " %4s ", cvt($hwTotTOT); } } # The only magic here is knowing when to print a headers. Note the use of $rate which # you can see change with you use -on. Since all -on does is set $intSecs to 1, there's # no custom coding required. Also note how $datetime and $miniFiller are used together to # allow the actual timestamps to align with the header correctly. sub helloPrintVerbose { my $printHeader=shift; my $homeFlag= shift; my $lineref= shift; # Note that last line of verbose data (if any) still sitting in $$lineref my $line=$$lineref=''; if ($hwOpts=~/s/) { if ($printHeader) { $line.="\n" if !$homeFlag; $line.="# HELLO STATISTICS ($rate)\n"; $line.="#$miniFiller Total\n"; } $$lineref.=$line; return if $showColFlag; $$lineref.=sprintf("$datetime %7s\n", cvt($hwTot/$intSecs,7)); } $line=''; if ($hwOpts=~/d/) { if ($printHeader) { $line.="\n" if !$homeFlag; $line.="# HELLO DETAIL ($rate)\n"; $line.="#$miniFiller HW Value\n"; } $$lineref.=$line; return if $showColFlag; $line=''; for (my $i=0; $i<3; $i++) { $line.=sprintf("$datetime %2d %7s\n", $i, cvt($hwNow[$i],7)); } } $$lineref.=$line; } # Just be sure to use $SEP in the right places. A simple trick to make sure you've done it # correctly is to generste a small plot file and load it into a speadsheet, making sure each # column of data has a header and that they aling 1:1. sub helloPrintPlot { my $type= shift; my $ref1= shift; # H e a d e r s # Summary if ($type==1 && $hwOpts=~/s/) { $$ref1.="[HW]Tot${SEP}"; } # Detail - these typically have :devname inside the []s if ($type==2 && $hwOpts=~/d/) { for (my $i=0; $i<3; $i++) { $$ref1.="[HW:$i]Val$SEP"; } } # D a t a # Summary if ($type==3 && $hwOpts=~/s/) { $$ref1.=sprintf("$SEP%d", int($hwTot/$intSecs)); } # Detail if ($type==4 && $hwOpts=~/d/) { for (my $i=0; $i<3; $i++) { $$ref1.=sprintf("$SEP%d$SEP%d", $i, int($hwNow[$i]/$intSecs)); } } } sub helloPrintExport { my $type=shift; my $ref1=shift; my $ref2=shift; my $ref3=shift; my $ref4=shift; my $ref5=shift; my $ref6=shift; if ($hwOpts=~/s/) { if ($type eq 'l') { push @$ref1, "hwtotals.val"; push @$ref2, int($hwTot/$intSecs); push @$ref5, 1; # makes it a gauge and so an avg for 'tot' } elsif ($type eq 'g') { push @$ref1, "hwtotals.hw"; push @$ref2, 'num/sec'; push @$ref3, int($hwTot/$intSecs); push @$ref4, 'hello'; } } if ($hwOpts=~/d/) { if ($type=~/[gl]/) { for (my $i=0; $i<3; $i++) { if ($type eq 'l') { push @$ref3, "hwinfo.hw$i.val"; push @$ref4, int($hwNow[$i]/$intSecs); } else { push @$ref1, "hwinfo.hw$i.val"; push @$ref2, 'num/sec'; push @$ref3, int($hwNow[$i]/$intSecs); } } } } } 1; collectl-4.3.1/vmsum.ph0000775000175000017500000003502713366602004013200 0ustar mjsmjs# copyright, 2012 Hewlett-Packard Development Company, LP # Debug # 1 - print linux commands # 2 - show most command output # 4 - show instance, tap device and net index # Restrictions # - the design of export only allows you to call a single module and in most cases that's # 'lexpr' BUT that interface allows lexpr to call another export module so you can do this: # --export lexpr,x=vmsum # External globals our %virtMacs; # from vnet.ph our $lexOutputFlag; # tells us when lexpr output interval reached # Internal globals my $program='vmsum V1.0'; my %instances; my $oneMB=1024*1024; my ($debug, $helpFlag, $instMin, $versionFlag, $zeroFlag); my $Ssh= '/usr/bin/ssh'; my $Ping='/bin/ping'; my $PingTimeout=1; # these control writing the vm text file my $today; my $lastDate=0; my $printHeader; my $textDir=''; my $lexprFlag=0; my $noNetMsg=''; # if not null, problem with n/w stats (very rare) my $hostname=`hostname`; chomp $hostname; sub vmsumInit { # To keep things obvious, the first set of filters are passed to collectl # to just get what we want and the second set are used to verify its output. $colFilt='ckvm$,clibvirtd$,cqemu'; $cmdFilt='kvm$|libvirtd$|qemu'; if ($playback eq '') { error2("this is an --export module") if $import=~/vmsum/; error2("vmsum requires --import vnet") if $import!~/vnet/; print "warning: no disk stats unless you're root\n" if !$rootFlag; # since this can be called via lexpr and we only want to do our processing during interval2, the easiest # thing to do is use the same values for both intervals. Otherwise we need to figure out how to keep the # I1 stats correct when sending them to lexpr. this is far easier, at least for now. as long as I2 is # in the 5 second range that should be fine but if we want finer graularity we may need to rethink things. my ($int1, $int2)=split(/:/, $interval); $int2=$int1 if !defined($int2); $interval="$int2:$int2"; # NOTE - for its first few seconds on life a kvm VM is named libvirtd and so # we need to check for both names or we might miss it. $procFilt=$colFilt; } else { error2("-s not allowed in playback mode") if $userSubsys ne ''; error2("this file does not contain process data!") if $subsys!~/Z/; $noNetMsg="this file recorded without network stats" if $noNetMsg eq '' && $subsys!~/n/i; my $options; my $daemonFlag=($header=~/Options: -D/) ? 1 : 0; $options=$1 if $daemonFlag && $header=~/DaemonOpts:(.*)/m; # we want leading space $options=$1 if !$daemonFlag && $header=~/Options:(.*)/m; my $temp=$1 if $options=~/--im\S+\s+(\S+)/; } # for now, if called via lexpr, we report data a different way! $lexprFlag=1 if $export=~/lexpr/; $instMin=''; $startFlag=0; $uuidFlag=0; $debug=$helpFlag=$versionFlag=$zeroFlag=0; foreach my $option (@_) { # if called by lexpr,x=... $option passed as null string so can't split last if $option eq ''; my ($name, $value)=split(/=/, $option); error2("valid options are: [adhmsStuv]") if $name!~/^[adhmsStuvz]$/; $addrFlag=1 if $name eq 'a'; $debug=$value if $name eq 'd'; $helpFlag=1 if $name eq 'h'; $instMin=$value if $name eq 'm'; $startFlag|=1 if $name eq 's'; $startFlag|=2 if $name eq 'S'; $textDir=$value if $name eq 't'; $uuidFlag=1 if $name eq 'u'; $versionFlag=1 if $name eq 'v'; $zeroFlag=1 if $name eq 'z'; } vmsumHelp() if $helpFlag; vmsumVersion() if $versionFlag; # make sure if not specified by user, we collectl process and n/w data $tempsys=$subsys; $tempsys.='Z' if $subsys!~/Z/; $tempsys.='n' if $subsys!~/n/i; $subsys=$userSubsys=$tempsys; error2("-f requires --rawtoo to get a raw file") if $filename ne '' && !$rawtooFlag; error2("z only makes sense with sd") if $zeroFlag; error2("you can only specify s or S not both!") if $startFlag==3; error2("--procopts s OR s/S flags but not both") if $startFlag && $procOpts=~/s/i; error2("t= only with lexpr") if $textDir ne '' && !$lexprFlag; error2("'$textDir' doesn't exist or is not a directory") if $textDir ne '' && (!-e $textDir || !-d $textDir); error2("instance specified by -m must be exactly 8 chars") if $instMin ne '' && length($instMin) != 8; # set up some things in collectl itself (requires DEEP knowledge) setOutputFormat(); loadPids($procFilt); $interval2Secs=0; $DefNetSpeed=-1; # disable checking for bogus network speeds on vlans if ($procOpts!~/s/i) { $procOpts.='s' if $startFlag & 1; $procOpts.='S' if $startFlag & 2; } } sub vmsum { my $lineref=shift; # lexpr the only one who passes this to us # when nothing to report these variables aren't set and do nothing sent back # if called from colmux. Then, when colmux exists, the socket write won't # happen and writeData() will never see a SIGPIPE and collectl won't exist. # so, since int1/int2 always the same during collection, this will force # socket activity if ($playback eq '') { $interval2Print=1; $interval2Secs=$intSecs; } # only if time to print, noting colmux can call us with --showcolflag. # in realtime this is every interval because we've force i2=i return if !$interval2Print && !$showColFlag; # if a process is discovered AFTER we start, this routine gets called called the first # time a process is seen and '$interval2Secs' will be 0! In that one special case # we need to wait for the next interval before printing. return if !$interval2Secs; $seconds=time; # needed if printSeparator ever used # F o r m a t t e d T e x t O u t p u t my $lines=''; my $lexpr=''; if (!$lexprFlag || $textDir ne '') { $datetime=''; $tempFiller=''; $separatorHeaderPrinted=1; # suppress separator if ($options=~/[dDTm]/ || $textDir ne '') { my ($ss, $mm, $hh, $mday, $mon, $year)=localtime($lastSecs[0]); $today=sprintf('%d%02d%02d', $year+1900, $mon+1, $mday); $datetime=sprintf("%02d:%02d:%02d", $hh, $mm, $ss); $datetime=sprintf("%02d/%02d %s", $mon+1, $mday, $datetime) if $options=~/d/; $datetime=sprintf("%04d%02d%02d %s", $year+1900, $mon+1, $mday, $datetime) if $options=~/D/; $datetime.=".$usecs" if ($options=~/m/); $datetime.=" "; $tempFiller=' ' x length($miniDateTime); } # see if we need a new VM log file if ($textDir ne '' && $today!=$lastDate) { my $filename="$textDir/$hostname-$today.vm"; logmsg('I', "Opening $filename"); open VMLOG, ">>$filename" or logmsg('E', "Couldn't open '$filename'"); $lastDate=$today; $printHeader=1; } # we only print headers in the text file after opening it if (!$lexprFlag || $printHeader) { $lines.="\n" if !$homeFlag; $temp1=($procOpts=~/f/) ? "(counters are cumulative)" : "(counters are $rate)"; $temp2=''; $lines.="# PROCESS SUMMARY $temp1$temp2$cpuDisabledMsg\n"; $tempHdr=''; $tempHdr.="#${tempFiller} PID THRD S VSZ RSS CP SysT UsrT Pct N AccumTim "; # using 'Time' breaks colmux $tempHdr.=sprintf("%s ", $procOpts=~/s/ ? 'StrtTime' : 'StartTime ') if $procOpts=~/s/i; $tempHdr.=sprintf("%-14s ", $addrFlag ? 'NetworkAddr' : '') if $addrFlag; $tempHdr.="DskI DskO NetI NetO Instance"; $tempHdr.=" UUID" if $uuidFlag; $lines.="$tempHdr\n"; $printHeader=0; if ($showColFlag) { printText($lines); exit; } } } # Process in PID order my %procSort; foreach $pid (keys %procIndexes) { $procSort{sprintf("%06d", $pid)}=$pid; } my $eol=''; my $procCount=0; foreach $key (sort keys %procSort) { # if screen already full last if $numTop && ++$procCount>$numTop; # if we had partial data for this pid don't try to print! $i=$procIndexes{$procSort{$key}}; next if !defined($procSTimeTot[$i]); # even though we're looking for libvirtd initially so its pid doesn't get ignored # later, we DON'T want any stats on processes of that name. next if $procName[$i] eq 'libvirtd'; # If wide mode we include the command arguments AND chop trailing spaces ($cmd0, $cmd1)=(defined($procCmd[$i])) ? split(/\s+/,$procCmd[$i],2) : ($procName[$i],''); next if $cmd0!~/$cmdFilt/; # can get anything in playback and eveb libvirt in real time $qemuFlag=($cmd0=~/qemu/) ? 1 : 0; # the number cpus occurs here for both types of VMS (for at least now...) $cmd1=~/sockets=(\d+)/; my $numCPUs=$1; my $i2Secs=$interval2Secs; $cmd1=~/uuid (\S+)/; my $uuid=$1; #print "UUID: $uuid Flag: $uuidFlag\n"; # it looks like a process can show up w/o complete set of args, so if this happens # we'll probably catch the full string in the next cycle (or two). also the command # itself looks different for qemu and kvm $cmd1=~/instance-(\w+)/; my $instance=(defined($1)) ? $1 : ''; next if $instMin ne '' && $instance lt $instMin; next if !defined($uuid) || !defined($instance); # only if no network problems, noting problems are rare, we need to find the index # to this VM's network stats my $netIndex=-1; if ($noNetMsg eq '') { # for now, it's either qemu or assume it's kvm if ($qemuFlag) { if ($cmd1=~/,mac=(.*?),/) { my $mac=$1; $mac=~s/^.{3}//; # always ignore first octet since vnet does too AND they're different! $netIndex=$networks{$virtMacs{$mac}} if $noNetMsg eq ''; print "Inst: $instance VIRT: $virtMacs{$mac} MAC: $mac NetIndex: $netIndex\n" if $debug & 4; } else { print "Inst: $instance No Net!\n" if $debug & 4; } } else { # so far haven't see any instances w/o ifname in them... $cmd1=~/,ifname=(.*?),/; my $tapdev=$1; $netIndex=$networks{$tapdev}; print "Inst: $instance Tap: $tapdev NetIndex: $netIndex\n" if $debug & 4; } } # Write to terminal OR vm text file if (!$lexprFlag || $textDir) { $line=sprintf("$datetime%5d%s %4d %1s %5s %5s %2d %s %s %s %2d %10s ", $procPid[$i], $procThread[$i] ? '+' : ' ', $procTCount[$i], $procState[$i], defined($procVmSize[$i]) ? cvt($procVmSize[$i],4,1,1) : 0, defined($procVmRSS[$i]) ? cvt($procVmRSS[$i],4,1,1) : 0, $procCPU[$i], cvtT1($procSTime[$i]), cvtT1($procUTime[$i]), cvtP(($procSTime[$i]+$procUTime[$i])/$numCPUs), $numCPUs, cvtT2($procSTimeTot[$i]+$procUTimeTot[$i])); $line.=sprintf("%s ", cvtT5($procSTTime[$i])) if $startFlag || $procOpts=~/s/i; $line.=sprintf("%-14s ", (defined($instances{$instance}->{address})) ? $instances{$instance}->{address} : '') if $addrFlag; if ($rootFlag || $lexprFlag) { $line.=sprintf("%4s %4s ", cvt($procRKB[$i]/$i2Secs,4,0,1), cvt($procWKB[$i]/$i2Secs,4,0,1)); } else { $line.=sprintf("%4s %4s ", '-1', '-1'); } # we'll virtually always have network stats but we want to differentiate between an uninitt network index, # which is a bug, and somehow not having recorded things with -sn. There are also cases where there is no # network (seen during testing for qemu) and for those report -1 if ($subsys=~/n/i) { # this is the normal case if (defined($netIndex) && $netIndex != -1) { $line.=sprintf("%4s %4s ", defined($netRxKB[$netIndex]) ? cvt($netRxKB[$netIndex]/$i2Secs) : '???', defined($netTxKB[$netIndex]) ? cvt($netTxKB[$netIndex]/$i2Secs) : '???'); } # these nest 2 cases typically shouldn't happen but during testing I've seen # transient cases where $netIndex wasn't defined, perhaps a network was just coming up? # I could have combined with the -1 case but want to differentiate, at least for now. elsif (!defined($netIndex)) { $line.=sprintf("%4s %4s ", '!!!', '!!!'); } else { $line.=sprintf("%4s %4s ", '-1', '-1'); } } else { $line.=sprintf("%4s %4s ", 0, 0); } $line.=sprintf("%s", $instance); $line.=sprintf(" %s", $uuid) if $uuidFlag; $line.=$eol if $playback eq '' && $numTop; $line.="\n" if $playback ne '' || !$numTop || $procCount<$numTop; $lines.=$line; } # we might end up writing to 2 places... if ($lexprFlag) { # remember, even with i=60,tot, lexpr calls us every time and we need to call sendData() so # so the totals are correctly calculated. Further, when passing rates that's ok too because # lexpr totals them up and divides by the number of samples preserving the correct average $lexpr.=sendData("vm.$instance.dskrkb", $procRKB[$i]/$i2Secs); $lexpr.=sendData("vm.$instance.dskwkb", $procWKB[$i]/$i2Secs); if ($noNetMsg eq '') { $lexpr.=sendData("vm.$instance.netrx", $netRxKB[$netIndex]/$i2Secs); $lexpr.=sendData("vm.$instance.nettx", $netTxKB[$netIndex]/$i2Secs); } } } # A c t u a l O u t p u t H a p p e n s H e r e if (!$lexprFlag) { # only time we go to terminal, which is probably most of the time printText($lines) if $filename eq ''; # clear to the end of the display in case doing --procopts z, since the process list # length changes dynamically print $clr if $numTop && $playback eq ''; } else { $$lineref.=$lexpr; print VMLOG $lines if $textDir ne ''; } } # The main point of this routine is for cases where we might be run from colmux, there's no way to tell which # node error messages may have come from! sub error2 { error("$hostname: $_[0]"); } sub vmsumVersion { print "$program\n"; exit; } sub vmsumHelp { my $help=<[$kids]=$pidZ; #printf "PID: $pidZ I: $i PPID: $ppidZ KIDS: $kids\n"; } if ($debug & 1) { # Look for orphans. Probably only useful for debugging and then not even sure... foreach my $pid (keys %procIndexes) { my $i=$procIndexes{$pid}; my $ppid=(defined($procTgid[$i]) && $procTgid[$i]!=$procPid[$i]) ? $procTgid[$i] : $procPpid[$i]; my $ppidZ=sprintf("%05d", $ppid); print "*** Pid $pid is an orphan ***\n" if !defined($procChild{$ppidZ}); } foreach my $ppid (sort keys %procChild) { printf "Parent: %5d Kids: %5d PIDS:", $ppid, scalar(@{$procChild{$ppid}}); foreach my $pid (@{$procChild{$ppid}}) { printf " %d", $pid; } print "\n"; } } printTreeHeader(); aggregate('00000') if $aggregateFlag; $procCount=0; $level{'00000'}=-1; push @stack, '00000'; my $i2=$interval2Secs; while (scalar(@stack)) { my $ppidZ=pop(@stack); my $i=$procIndexes{$ppidZ*1}; #print "POPPED: $ppidZ I: $i LEVEL: $level{$ppidZ}\n"; my $level=$level{$ppidZ}+1; foreach my $pidZ (@{$procChild{$ppidZ}}) { $level{$pidZ}=$level; push @stack, $pidZ; } next if $level==0 || $level>$depth; next if !$threadsFlag && $procThread[$i]; if ($skipFlag) { next if $topType eq 'syst' && $procSTime[$i]==0; next if $topType eq 'usrt' && $procUTime[$i]==0; next if $topType eq 'time' && $procSTime[$i]+$procUTime[$i]==0; if ($procOpts!~/f/) { next if $topType eq 'majf' && $procMajFlt[$i]==0; next if $topType eq 'minf' && $procMinFlt[$i]==0; next if $topType eq 'flt' && $procMajFlt[$i]+$procMinFlt[$i]==0; } else { next if $topType eq 'majf' && $procMajFltTot[$i]==0; next if $topType eq 'minf' && $procMinFltTot[$i]==0; next if $topType eq 'flt' && $procMajFltTot[$i]+$procMinFltTot[$i]==0; } # I/O KBs are special... next if $topType eq 'rkb' && $procRKB[$i]/$i2<=$ioSkip; next if $topType eq 'wkb' && $procWKB[$i]/$i2<=$ioSkip; next if $topType eq 'iokb' && ($procRKB[$i]+$procWKB[$i])/$i2<=$ioSkip; next if $topType eq 'rkbc' && $procRKBC[$i]/$i2<=$ioSkip; next if $topType eq 'wkbc' && $procWKBC[$i]/$i2<=$ioSkip; next if $topType eq 'iokbc' && ($procRKBC[$i]+$procWKBC[$i])/$i2<=$ioSkip; next if $topType eq 'ioall' && ($procRKB[$i]+$procWKB[$i]+$procRKBC[$i]+$procWKBC[$i])/$i2<=$ioSkip; next if $topType eq 'rsys' && $procRSys[$i]==0; next if $topType eq 'wsys' && $procWSys[$i]==0; next if $topType eq 'iosys' && $procRSys[$i]+$procWSys[$i]==0; } next if defined($pidSelect) && ($ppidZ ne $pidSelect) && !$pidPrinted; $pidPrinted=1; my $parent=defined($procTgid[$i]) && $procTgid[$i]!=$procPid[$i] ? $procTgid[$i] : $procPpid[$i]; my $pIndex=$procIndexes{$parent}; printPid($pIndex, $parent) if $procThread[$i] && !defined($procPrinted{$parent}); printPid($i, $ppidZ*1); last if $numTop && $procCount>=$numTop; } $clscr=$home; print $clr; } sub printTreeHeader { my $tempTime=''; $tempTime= " ".(split(/\s+/,localtime($lastInt2Secs)))[3]; $tempTime.=sprintf(".%03d", $usecs) if $options=~/m/; my $line="Process Tree$tempTime "; $line.=sprintf("[skip when '$expSkip'<=%d%s is '%s' ", $ioSkip, $ioSkip ? 'KB' : '', $skipFlag ? 'on' : 'off'); $line.=sprintf("aggr: '%s' x1024: '%s' depth $depth", $aggregateFlag ? 'on' : 'off', $kFlag ? 'on' : 'off'); $line.=sprintf(" threads: %s", $threadsFlag ? 'on' : 'off') if $procOpts=~/t/; $line.="]$eol\n"; $line.=$eol if $playback eq '' && $numTop; printText("\n") if !$homeFlag; printText($line); my $filler=' 'x$depth; if ($procOpts!~/[im]/) { $tempHdr= "# PID$filler PPID User PR S VSZ RSS CP SysT UsrT Pct AccuTime "; $tempHdr.=" RKB WKB " if $processIOFlag; $tempHdr.="MajF MinF Command\n"; } elsif ($procOpts=~/i/) { $tempHdr= "# PID$filler PPID User S SysT UsrT AccuTime RKB WKB RKBC WKBC RSys WSys Cncl Command\n"; } elsif ($procOpts=~/m/) { $tempHdr= "# PID$filler PPID User S VmSize VmLck VmRSS VmData VmStk VmExe VmLib MajF MinF Command\n"; } printText($tempHdr); } sub printPid { my $i= shift; my $ppid=shift; my $ppidZ=sprintf("%05d", $ppid); my $pad=($level{$ppidZ}); my $padL=' 'x$pad; my $padR=' 'x($depth-$pad); my $parent=defined($procTgid[$i]) && $procTgid[$i]!=$procPid[$i] ? $procTgid[$i] : $procPpid[$i]; my $line=sprintf("$padL%05d%s$padR %5d ", $ppid, $procThread[$i] ? '+' : ' ', $parent); # Handle --procopts f if ($procOpts=~/f/) { $majFlt=$procMajFltTot[$i]; $minFlt=$procMinFltTot[$i]; } else { $majFlt=$procMajFlt[$i]/$interval2Secs; $minFlt=$procMinFlt[$i]/$interval2Secs; } my ($cmd0, $cmd1)=(defined($procCmd[$i])) ? split(/\s+/,$procCmd[$i],2) : ($procName[$i],''); $cmd0=basename($cmd0) if $procOpts=~/r/ && $cmd0=~/^\//; $cmd1='' if $procOpts!~/w/ || !defined($cmd1); $cmd1=~s/\s+$// if $procOpts=~/w/; $cmd1=substr($cmd1, 0, $cmd1Width); if ($procOpts!~/[im]/) { # Note we only started fetching Tgid in V3.0.0 $line.=sprintf("%-8s %2s %1s %5s %5s %2d %s %s %s %s ", substr($procUser[$i],0,8), $procPri[$i], $procState[$i], defined($procVmSize[$i]) ? cvt($procVmSize[$i],4,1,1) : 0, defined($procVmRSS[$i]) ? cvt($procVmRSS[$i],4,1,1) : 0, $procCPU[$i], cvtT1($procSTime[$i]), cvtT1($procUTime[$i]), cvtP($procSTime[$i]+$procUTime[$i]), cvtT2($procSTimeTot[$i]+$procUTimeTot[$i])); $line.=sprintf("%4s %4s ", cvt($procRKB[$i]*$mult/$interval2Secs,4,0,1), cvt($procWKB[$i]*$mult/$interval2Secs,4,0,1)) if $processIOFlag; $line.=sprintf("%4s %4s %s %s", cvt($majFlt), cvt($minFlt), "$padL$cmd0", $cmd1); } elsif ($procOpts=~/i/) { $line.=sprintf("%-8s %1s %s %s %s ", substr($procUser[$i],0,8), $procState[$i], cvtT1($procSTime[$i]), cvtT1($procUTime[$i]), cvtT2($procSTimeTot[$i]+$procUTimeTot[$i])); $line.=sprintf("%5s %5s %5s %5s %5s %5s %5s %s %s", cvt($procRKB[$i]*$mult/$interval2Secs,5,0,1), cvt($procWKB[$i]*$mult/$interval2Secs,5,0,1), cvt($procRKBC[$i]*$mult/$interval2Secs,5,0,1), cvt($procWKBC[$i]*$mult/$interval2Secs,5,0,1), cvt($procRSys[$i]/$interval2Secs,5,0,1), cvt($procWSys[$i]/$interval2Secs,5,0,1), cvt($procCKB[$i]*$mult/$interval2Secs,5,0,1), "$padL$cmd0", $cmd1); } elsif ($procOpts=~/m/) { $line.=sprintf("%-8s %1s %6s %6s %6s %6s %6s %6s %6s %4s %4s %s %s", $procUser[$i], $procState[$i], defined($procVmSize[$i]) ? cvt($procVmSize[$i],6,1,1) : 0, defined($procVmLck[$i]) ? cvt($procVmLck[$i],6,1,1) : 0, defined($procVmRSS[$i]) ? cvt($procVmRSS[$i],6,1,1) : 0, defined($procVmData[$i]) ? cvt($procVmData[$i],6,1,1) : 0, defined($procVmStk[$i]) ? cvt($procVmStk[$i],6,1,1) : 0, defined($procVmExe[$i]) ? cvt($procVmExe[$i],6,1,1) : 0, defined($procVmLib[$i]) ? cvt($procVmLib[$i],6,1,1) : 0, cvt($majFlt), cvt($minFlt), "$padL$cmd0", $cmd1); } $line.=$eol if $playback eq '' && $numTop; $procCount++; $line.="\n" if $playback ne '' || !$numTop || $procCount<$numTop; printText($line); $procPrinted{$ppid}=1; # string leading 0s } sub aggregate { my $pidZ=shift; my $kidArray=$procChild{$pidZ}; foreach my $kidZ (@$kidArray) { my $kidI=aggregate($kidZ); } if ($pidZ ne '00000') { my $i=$procIndexes{$pidZ*1}; # Aggregate everything that makes sense... foreach my $kidZ (@$kidArray) { $kidI=$procIndexes{$kidZ*1}; $procSTime[$i]+= $procSTime[$kidI]; $procUTime[$i]+= $procUTime[$kidI]; $procSTimeTot[$i]+=$procSTimeTot[$kidI]; $procUTimeTot[$i]+=$procUTimeTot[$kidI]; $procMinFlt[$i]+= $procMinFlt[$kidI]; $procMajFlt[$i]+= $procMajFlt[$kidI]; $procMinFltTot[$i]+=$procMinFltTot[$kidI]; $procMajFltTot[$i]+=$procMajFltTot[$kidI]; if ($processIOFlag) { $procRKB[$i]+= $procRKB[$kidI]; $procWKB[$i]+= $procWKB[$kidI]; $procRKBC[$i]+=$procRKBC[$kidI]; $procWKBC[$i]+=$procWKBC[$kidI]; $procRSys[$i]+=$procRSys[$kidI]; $procWSys[$i]+=$procWSys[$kidI]; $procCKB[$i]+= $procCKB[$kidI]; } # If one not defined for this process, none defined and same for parent $procVmSize[$kidI]=$procVmLck[$kidI]=$procVmRSS[$kidI]=$procVmData[$kidI]= $procVmStk[$kidI]=$procVmExe[$kidI]=$procVmLib[$kidI]=0 if !defined($procVmSize[$kidI]); $procVmSize[$i]=$procVmLck[$i]=$procVmRSS[$i]=$procVmData[$i]= $procVmStk[$i]=$procVmExe[$i]=$procVmLib[$i]=0 if !defined($procVmSize[$i]); $procVmSize[$i]+=$procVmSize[$kidI]; $procVmLck[$i]+= $procVmLck[$kidI]; $procVmRSS[$i]+= $procVmRSS[$kidI]; $procVmData[$i]+=$procVmData[$kidI]; $procVmStk[$i]+= $procVmStk[$kidI]; $procVmExe[$i]+= $procVmExe[$kidI]; $procVmLib[$i]+= $procVmLib[$kidI]; } } } sub commandCheck { # see if user entered a command my @ready=$proctreeSelect->can_read(0); if (scalar(@ready)) { my $command=; chomp $command; if ($command=~/^(\d+)/) { $pidSelect=sprintf("%05d", $1); $pidPrinted=0; } elsif ($command=~/^a/) { $aggregateFlag=($aggregateFlag+1) % 2; } elsif ($command=~/^d(\d+)/) { $depth=$1; } elsif ($command=~/^([imp])/) { my $format=$1; $procOpts=~s/[imp]//; $procOpts.=$format; } elsif ($command=~/^h/ || $command eq '') { helpMenu(); } elsif ($command=~/^k/) { $kFlag=($kFlag+1) % 2; $mult=($kFlag) ? 1024 : 1; } elsif ($command=~/^s(\S+)/) { my $skip=$1; if (defined($TopProcTypes{$skip})) { $topType=$skip; } else { treeError("Invalid process sorting field '$sort'"); } } elsif ($command=~/^t/) { if ($procOpts=~/t/) { $threadsFlag=($threadsFlag + 1) %2; } else { treeError("threads must be selected with --procopts to use 't' command") } } elsif ($command=~/^w(\d+)/) { $cmd1Width=$1; } elsif ($command=~/^z/) { my $saveOpts=$procOpts; $skipFlag=($skipFlag+1) % 2; $procOpts.=($saveOpts!~/z/) ? 'z' : ''; } elsif ($command=~/^Z(\d+)/) { if ($processIOFlag) { $ioSkip=$1; $expSkip=$ioSkip; # for reporting name in header } else { treeError("'Z' only applies to kernels that track process I/O") } } else { helpMenu("Invalid command: $command"); } } } sub treeError { print "$clscr$clr"; print "$_[0]\n" if defined($_[0]); print "Press RETURN to go back to display mode...\n"; ; } sub helpMenu { print "$clscr$clr"; print "$_[0]\n" if defined($_[0]); print "Enter a command and RETURN while in display mode:\n"; print " pid only display this pid and its children\n"; print " a toggle aggregation between 'on' and 'off'\n"; print " dxx change display hierarchy depth to xx\n"; print " i change display format to 'I/O'\n"; print " k toggle multiplication of I/O numbers by 1024 between 'on' and 'off'\n"; print " m change display format to 'memory'\n"; print " p change display format to 'process'\n"; print " h show this menu\n"; print " stype where 'type' is a valid sorting type (see --showtopopts)\n"; print " entries with 0s in those field(s) will be skipped\n"; print " wxx max width for display of command arguments\n"; print " z toggle 'skip' logic between 'on' and 'off'\n"; print " Zxx when skipping, only keep entries with I/O fields > xxKB\n"; print "Press RETURN to go back to display mode...\n"; ; } 1; collectl-4.3.1/README-WINDOWS0000664000175000017500000000072313366602004013500 0ustar mjsmjscollectl only runs in playback mode on windows so you will have to copy over one or more raw files to use it. You will also have to have perl installed. To Install - unpack src tarball, which can be done by most utilitis, into a temporary directory - create a directory to install collectl into, say \collectl - copy the following files into \collectl - collectl - collectl.conf - *ph You're done! To verify the installation run \collectl\collectl.pl -v collectl-4.3.1/formatit.ph0000775000175000017500000132066413366602004013663 0ustar mjsmjs# copyright, 2003-20016 Hewlett-Packard Development Company, LP # # collectl may be copied only under the terms of either the Artistic License # or the GNU General Public License, which may be found in the source kit # local flags not needed/used by mainline. probably others in this category my $printTermFirst=0; # shared with main collectl our %netSpeeds; # these are only init'd when in 'record' mode, one of the reasons being that # many of these variables may be different on the system on which the data # is being played back on sub initRecord { print "initRecord() - Subsys: $subsys\n" if $debug & 1; initDay(); $rawPFlag=0; # always 0 when no files involved # In some case, we need to know if we're root. $rootFlag=`whoami`; $rootFlag=($rootFlag=~/root/) ? 1 : 0; # be sure to remove domain portion if present. also note we keep the hostname in # two formats, one in it's unaltered form (at least needed by lustre directory # parsing) as well as all lc because it displays nicer. $Host=`hostname`; chomp $Host; $Host=(split(/\./, $Host))[0]; $HostLC=lc($Host); # when was system booted? $uptime=(split(/\s+/, `cat /proc/uptime`))[0]; $boottime=time-$uptime; $Distro=cat('/etc/redhat-release') if -e '/etc/redhat-release'; chomp $Distro; if (-e '/etc/redhat-release') { $Distro=cat('/etc/redhat-release'); chomp $Distro; } elsif (-e '/etc/SuSE-release') { my @temp=split(/\n/, cat('/etc/SuSE-release', 1)); $Distro="$temp[0]"; # distro/version $Distro.="SP:$1" if $temp[2]=~/PATCHLEVEL = (\d+$)/; # if patchlevel defined } elsif (-e '/etc/debian_version') { # Both debian and ubuntu have 2 files $Distro='debian '.cat('/etc/debian_version'); chomp $Distro; # append distro to base release, if there if (-e '/etc/lsb-release') { my $temp=cat('/etc/lsb-release',1); $temp=~/DESCRIPTION=(.*)/; $temp=$1; $temp=~s/\"//g; $Distro.=", $temp"; } } # for jiffy based calculations, we need the HZ of the system $HZ=POSIX::sysconf(&POSIX::_SC_CLK_TCK); $PageSize=POSIX::sysconf(_SC_PAGESIZE); # If we have process IO everyone must. This was added in 2.6.23, # but then only if someone builds the kernel with it enabled, though # that, will probably change with future kernels. $processIOFlag=(-e '/proc/self/io') ? 1 : 0; $slabinfoFlag= (-e '/proc/slabinfo') ? 1 : 0; $slubinfoFlag= (-e '/sys/slab') ? 1 : 0; $processCtxFlag=($subsys=~/Z/ && `$Grep ctxt /proc/self/status` ne '') ? 1 : 0; # just because slab structures there, are they readable? A chunk of extra work, but worth it. if ($subsys=~/y/i && $slabinfoFlag || $slubinfoFlag) { $message=''; $message='/proc/slabinfo' if $slabinfoFlag && !(eval {`cat /proc/slabinfo 2>/dev/null` or die}); $message='/sys/slab' if $slubinfoFlag && !(eval {`cat /proc/slubinfo 2>/dev/null` or die}); if ($message ne '') { my $whoami=`whoami`; chomp $whoami; disableSubsys('y', "/proc/slabinfo is not readable by $whoami"); $interval=~s/(^\d*):\d+/$1:/ if $subsys!~/z/i; # remove int2 if not needed or we'll get error } } # Get number of ACTIVE CPUs from /proc/stat and in case we're not running on a # kernel that will set the CPU states (or let us change them), enable CPU flag $NumCpus=`$Grep cpu /proc/stat | wc -l`; $NumCpus=~/(\d+)/; $NumCpus=$1-1; for (my $i=0; $i<$NumCpus; $i++) { $cpuEnabled[$i]=1; } $cpusEnabled=$NumCpus; # Now get the total number the system sees, and if different, reset the # number as well as a flag for the header $cpuDisabledFlag=0; $cpuDisabledMsg=''; $cpusDisabled=0; if (-e '/sys') { my $totalCpus=`ls /sys/devices/system/cpu/|$Grep '^cpu[0-9]'|wc -l`; chomp $totalCpus; if ($totalCpus!=$NumCpus) { $NumCpus=$totalCpus; $cpusDisabled++; # for use in header $cpuDisabledFlag=1; # we really only have to worry about lower level details of WHO is disabled when # doing cpu or interrupt stats. Furthermore, if doing cpu, the dynamic processing # will figure out who's disabled but its too much overhead for interrupts alone so # we'll do that here one time only. This may have to be do dynamically if a problem. if ($subsys=~/j/i && $subsys!~/c/i) { # CPU0 always online AND no 'online' entry exists! $cpusEnabled=1; $cpuEnabled[0]=1; for (my $i=1; $i<$NumCpus; $i++) { my $online=`cat /sys/devices/system/cpu/cpu$i/online`; chomp $online; $cpuEnabled[$i]=$online; $cpusEnabled++ if $online; $intrptTot[$i]=0; } } } } $temp=`$Grep vendor_id /proc/cpuinfo`; $CpuVendor=($temp=~/: (.*)/) ? $1 : '???'; $temp=`$Grep siblings /proc/cpuinfo`; $CpuSiblings=($temp=~/: (\d+)/) ? $1 : 1; # if not there assume 1 $temp=`$Grep "cpu cores" /proc/cpuinfo`; $CpuCores=($temp=~/: (\d+)/) ? $1 : 1; # if not there assume 1 $temp=`$Grep "cpu MHz" /proc/cpuinfo`; $CpuMHz=($temp=~/: (.*)/) ? $1 : '???'; $Hyper=($CpuSiblings/$CpuCores==2) ? "[HYPER]" : ""; if (-e "/sys/devices/system/node") { $CpuNodes=`ls /sys/devices/system/node |$Grep '^node[0-9]'|wc -l`; } else { # if doesn't exist set nodes to 1 and disable '-sM' if specified $CpuNodes=1; disableSubsys('M', "/sys/devices/system/node doesn't exist", 1) if $subsys=~/M/; } chomp $CpuNodes; # /proc read speed test, note various reasons to skip it # These tests should be updated as we learn more about other distros $ProcReadTest='no' if $NumCpus<32 || $Kernel lt '2.6.32'; $ProcReadTest='no' if $Distro=~/Red Hat.*release (\S+)/ && $1>=6.2; $ProcReadTest='no' if $Distro=~/SUSE.*Server (\d+).*SP(\d+)/ && ($1!=11 || $2>=1); if ($ProcReadTest=~/yes/i) { $procReadTested=1; # can call this routine twice! my $strace=`strace -c cat /proc/stat 2>&1`; $strace=~/^\s*\S+\s+(\S+).*read$/m; my $speed=$1; print "ProcReadSpeed: $speed\n" if $debug * 1; if ($speed>0.01) { # may be going to a lot of effort here but I want to make sure these messages aren't # recorded as errors and you don't have to include -m to see them as they are pretty # important to the users to know about this. my $line1="Slow /proc/stat read speed of $speed seconds"; my $line2="Consider a kernel patch/upgrade. See http://collectl.sourceforge.net/FAQ for more"; my $line3="Change 'ProcReadSpeed' in /etc/collectl.conf to suppress this message in the future"; if ($DaemonFlag) { logmsg('W', $line1); logmsg('I', $line2); logmsg('I', $line3); } else { print "$line1\n$line2\n$line3\n"; } } } $Memory=`$Grep MemTotal /proc/meminfo`; $Memory=(split(/\s+/, $Memory, 2))[1]; chomp $Memory; $Swap=`$Grep SwapTotal /proc/meminfo`; $Swap=(split(/\s+/, $Swap, 2))[1]; chomp $Swap; # B u d d y i n f o if ($subsys=~/b/i) { if (!open BUD, ') { $NumBud++ } close BUD; } } # D i s k C h e c k s undef @dskOrder; $dskIndexNext=0; my @temp=`$Cat /proc/diskstats`; foreach my $line (@temp) { next if $line!~/$DiskFilter/; my @fields=split(/\s+/, $line); my $diskName=$fields[3]; $diskName=diskRemapName($diskName); push @dskOrder, $diskName; $disks{$diskName}=$dskIndexNext++; } $dskSeenLast=$dskIndexNext; logmsg("I", "initDisk initialized $dskIndexNext disks") if $debug & 1; # I n o d e s if ($subsys=~/i/) { $dentryFlag= (-e '/proc/sys/fs/dentry-state') ? 1 : 0; $inodeFlag= (-e '/proc/sys/fs/inode-state') ? 1 : 0; $filenrFlag= (-e '/proc/sys/fs/file-nr') ? 1 : 0; if ($debug & 1) { print "/proc/sys/fs/dentry-state missing\n" if !$dentryFlag; print "/proc/sys/fs/dentry-state missing\n" if !$inodeFlag; print "/proc/sys/fs/dentry-state missing\n" if !$filenrFlag; } } # I n t e r c o n n e c t C h e c k s # Set IB speeds non-conditionally (even if not running IB) and then only for ofed. # Furthermore assume if mulitple IB interfaces they're all the same speed. $ibSpeed='??'; if (-e '/sys/class/infiniband') { $line=`cat /sys/class/infiniband/*/ports/1/rate 2>&1`; if ($line=~/\s*(\d+)\s+(\S)/) { $ibSpeed=$1; $ibSpeed*=1000 if $2 eq 'G'; } } # if doing interconnect, the first thing to do is see what interconnect # hardware is present via lspci. $NumHCAs=$mellanoxFlag=$opaFlag=0; if ($subsys=~/x/i) { my $lspciVer=`$Lspci --version`; $lspciVer=~/ (\d+\.\d+)/; $lspciVer=$1; my $lspciVendorField=($lspciVer<2.2) ? 3 : 2; # Turns out SuSE put 'Class' string back into V2.4.4 without changing # version number in SLES 10. It also looks like they got it right in # SLES 11, but who know what will happen in SLES 12! $lspciVendorField=3 if $Distro=~/SUSE.*10/; print "lspci -- Version: $lspciVer Vendor Field: $lspciVendorField\n" if $debug & 1; $command="$Lspci -n | $Egrep '15b3|0c06|14c1|14fc|1077|8086:24f0'"; print "Command: $command\n" if $debug & 1; @pci=`$command`; $HCANames=''; foreach $temp (@pci) { ($vendorID, $type)=split(/:/,(split(/\s+/, $temp))[$lspciVendorField]); if ($vendorID=~/15b3|0c06|1077/) { next if $type eq '5a46'; # ignore pci bridge print "Found Infiniband Interconnect\n" if $debug & 1; $mellanoxFlag=1; ibCheck(''); } elsif ($vendorID=~/8086/) { print "Found OPA Interconnect\n" if $debug & 1; $opaFlag=1; ibCheck(''); } } # OPA V4 exposes 64 it counters in /sys, bypassing the need for perfquery so we need to know # since we can have a mix of montitoring methods for different HCAs we want to at least say once # if V4 counters present. my $ibV4Flag=0; for (my $i=0; $i<$NumHCAs; $i++) { $ibV4Flag=1 if $HCAOpaV4[$i][1]; } print "Found OPA 64 bit counters in /sys\n" if $debug & 1 && $ibV4Flag; disableSubsys('x', 'no interconnect or opa hardware/drivers found') if $mellanoxFlag+$opaFlag==0; # User had ability to turn off in case they don't want destructive monitoring my $firstHCA=''; if ($mellanoxFlag) { # use name for first non-opa hca port 1 and see if extended counters supported, # noting we can force perfquery usage during debug, which itself might use -x for (my $i=0; $i<$NumHCAs; $i++) { if ($HCAName[$i]!~/hfi/) { $firstHCA="$SysIB/${HCAName[$i]}0/ports/1"; last; } } $PQopt = '-r'; $PQopt = 'sys' if -e "$firstHCA/counters_ext" && !($debug & 16384); # We usually only care about perfquery for non-extended counters if ($PQopt eq '-r') { # no monitoring if disabled in config file if ($PQuery eq '') { logmsg("W", "Open Fabric IB Stats disabled in $configFile"); $subsys=~s/x//ig; $mellanoxFlag=0; } else { print "Looking for 'perfquery' and 'ofed_info'\n" if $debug & 2; $PQuery=getOfedPath($PQuery, 'perfquery', 'PQuery'); if ($PQuery eq '') { disableSubsys('x', "couldn't find perfquery!"); $mellanoxFlag=0; } } # I hate support questions and this is the place to catch perfquery problems! # so, if perfquery IS there, since it generates warnings on stderr in V1.5 and # we don't know the version yet, always ignore them if ($mellanoxFlag) { my $message=''; my $temp=`$PQuery 2>/dev/null`; $message="Permission denied" if $temp=~/Permission denied/; $message="Failed to open IB device" if $temp=~/Failed to open/; $message="Required module missing" if $temp=~/required by/; $message="No such file or directory" if $temp=~/No such file/; if ($message ne '') { disableSubsys('x', "perfquery error: $message!"); $mellanoxFlag=0; $PQuery=''; last; } # perfquery IS there and we can execute it w/o error... # Can you believe it? PQuery writes its version output to stderr! $temp=`$PQuery -V 2>&1`; $temp=~/VERSION: (\d+\.\d+\.\d+)/; $PQVersion=$1; # perfquery there, but what is ofed's version? # NOTE - looks like RedHat is no longer shipping ofed if (!-e $OfedInfo) { # comment out the warning, at least for now. we WILL still see '???' # in header $OfedInfo=getOfedPath($OfedInfo, 'ofed_info', 'OfedInfo'); #logmsg('W', "Couldn't find 'ofed_info'. Won't be able to determine OFED version") # if $OfedInfo eq ''; } # Unfortunately the ofed_info that ships with voltaire adds 5 extra # line at front end so let's look at first 10 lines for version. $IBVersion=($OfedInfo ne '' && `$OfedInfo|head -n10`=~/OFED-(.*)/) ? $1 : '???'; } # last possibility is even though extended stats not in /sys they may still be # available with perfquery so let's see $PQopt = '-x' if !($debug & 16384) && `$PQuery -h 2>&1`=~/--extended/m; } print "PQopt: $PQopt\n" if $debug & 1; print "reading extended IB stats from $SysIB\n" if $debug & 2 && $PQopt eq 'sys'; print "OFED V: $IBVersion PQ V:$PQVersion PQOpt: $PQopt\n" if $debug & 2 && $PQopt ne 'sys'; } # One last check and this is a doozie! Because we do destructive counter access # with perfquery -r, multiple copies of collectl will step on each other. # Therefore we can only allow one instance to actually monitor the IB and the # first one wins, unless we're trying to start a daemon in which case we let # step on the other [hopefully temporary] instance. Since there are odd cases # where it may not always catch exception, one can override checking in .conf if ($PQopt eq '-r') { my $myppid=getppid(); $command="$Ps axo pid,cmd | $Grep collectl | $Grep -vE 'grep|ssh'"; foreach my $line (`$command`) { $line=~s/^\s+//; # some pids have leading white space my ($pid, $procCmd)=split(/ /, $line, 2); next if $pid==$$ || $pid==$myppid; # check ppid in case started by a script # If not running as a daemon, '$procCmd' has the command invocation string # from the 'ps' above. If a daemon, we need to pull it out of collectl.cont. my $tempDaemonFlag=($procCmd=~/-D/) ? 1 : 0; if ($tempDaemonFlag) { # This is getting even uglier, but if someone chose to duplicate # 'DaemonCommands' and comment one out, we really need to look for # the last uncommented one. foreach my $cmd (`$Grep 'DaemonCommands =' $configFile`) { next if $cmd=~/^#/; $procCmd=$cmd; } } # Now that we have the full command passed to collectl, pull out -s (if any) # which may be surrounded by optional white space. chomp $procCmd; $procSubsys=($procCmd=~/-s\s*(\S+)\s*/) ? $1 : ''; # The default subsys is different for daemon and interactive use # if no -s, we use default and if there, assume we're overriding $tempSubsys=($tempDaemonFlag) ? $SubsysDefDaemon : $SubsysDefInt; # So now we need to figure out what actual subsystems are in use # by that instance in case it was started with either +/- OR # a fixed set if ($procSubsys=~/^[\+\-]/) { # the stolen from main collectl switch validation code noting # we don't need to validate the switches since done when # daemon started if ($procSubsys=~/-(.*)/) { my $pat=$1; $pat=~s/\+.*//; # if followed by '+' string $tempSubsys=~s/[$pat]//g; } if ($procSubsys=~/\+(.*)/) { my $pat=$1; $pat=~s/-.*//; # if followed by '-' string $tempSubsys.=$pat; } } elsif ($procSubsys ne '') { $tempSubsys=$procSubsys; } # At this point if there IS an instance of collectl running with -sx, # we need to disable it here, unless we're a daemon in which case we # just log a warning. if ($tempSubsys=~/x/i) { if (!$daemonFlag) { disableSubsys('x', 'another instance already monitoring Infiniband'); } else { logmsg("W", "another instance is monitoring IB and the stats will be in error until it is stopped"); } last; } } } } # Let's always get the platform name if dmidecode is there if ($Dmidecode ne '') { $ProductName=($rootFlag) ? `$Dmidecode | grep -m1 'Product Name'` : ''; $ProductName=~s/\s*Product Name: //; chomp $ProductName; $ProductName=~s/\s*$//; # some have trailing whitespace } # E n v i r o n m e n t a l C h e c k s if ($subsys=~/E/ && $envTestFile eq '') { # Note that these tests are in the reverse order since the last value of $message # in the one reported AND only if not using a 'test' file for data source. my $message=''; $message="'IpmiCache' not defined or specifies a directory" if $IpmiCache eq '' || -d $IpmiCache; $message="cannot find /dev/ipmi* (is impi_si loaded?)" if !-e '/dev/ipmi0' && !-e '/dev/ipmi/0' && !-e '/dev/ipmidev/0'; $message="cannot find 'ipmitool' in '$ipmitoolPath'" if $Ipmitool eq ''; $message="you must be 'root' to do environmental monitoring" if !$rootFlag; if ($message eq '') { # If specified by --envopts, set -d for ipmitool $Ipmitool.=" -d $1" if $envOpts=~/(\d+)/; logmsg('I', "Initialized ipmitool cache file '$IpmiCache'"); my $command="$Ipmitool sdr dump $IpmiCache"; # If we can't dump the cache, something is wrong so make sure we pass along # error and disable E monitoring. Ok to create 'exec' below since we'll # never execute it $message=`$command 2>&1`; if ($message=~/^Dumping/) { # Create 'exec' option file in save directory as cache, but only for # those options that actually return data my $cacheDir=dirname($IpmiCache); $ipmiExec="$cacheDir/collectl-ipmiexec"; if (open EXEC, ">$ipmiExec") { $message=''; # indicates no errors for test below foreach my $type (split(/,/, $IpmiTypes)) { my $command="$Ipmitool -S $IpmiCache sdr type $type"; next if `$command` eq ''; print EXEC "sdr type $type\n"; } close EXEC; } else { $message="couldn't create '$ipmiExec'"; } } } disableSubsys('E', $message) if $message ne ''; } # find all the networks and when possible include their speeds undef @temp; $netIndexNext=0; $NetWidth=$netOptsW; # Minimum size $null=($debug & 1) ? '' : '2>/dev/null'; my $interval1=(split(/:/, $interval))[0]; # but first look up all the network speed in /sys/devices and load them into a hash # for easier access in the loop below my $command="find /sys/devices/ 2>&1 | grep net | grep speed"; open FIND, "$command|" or logmsg('E', "couldn't execute '$command'"); while (my $line=) { chomp $line; $line=~/.*\/(\S+)\/speed/; my $netName=$1; my $mode=$line; $mode=~s/speed/operstate/; $mode=cat($mode); next if $mode!~/up|unknown/; # get speed, noting the kernel hardcodes vnets and tap devices to 10, which is wrong! # and if problems reading (which can happen) we get '' my $speed=cat($line); chomp $speed; $netSpeeds{$netName}=(defined($speed) && $speed ne '' && $netName!~/^vnet|^tap/) ? $speed : '??'; print "set netSpeeds{$netName}=>$netSpeeds{$netName}<\n" if $debug & 1; } close FIND; # Since this routine can get called multiple times during # initialization, we need to make sure @netOrder gets clean start. undef @netOrder; @temp=`$Grep -v -E "Inter|face" /proc/net/dev`; foreach my $temp (@temp) { next if $rawNetFilter ne '' && $temp!~/$rawNetFilter/; $temp=~/^\s*(\S+)/; # most names have leading whitespace $netName=$1; $netName=~s/:.*//; # get rid of : AND possible stats if no whitespace $NetWidth=length($netName) if length($netName)>$NetWidth; $speed=($netName=~/^ib/) ? $ibSpeed : $netSpeeds{$netName}; $speed='??' if !defined($speed); push @netOrder, $netName; $netIndex=$netIndexNext; $networks{$netName}=$netIndexNext++; # Since speeds are in Mb we really need to multiple by 125 to conver to KB $NetMaxTraffic[$netIndex]=($speed ne '' && $speed ne '??') ? 2*$interval1*$speed*125 : 2*$interval1*$DefNetSpeed*125; } $netSeenLast=$netIndexNext; $NetWidth++; # make room for trailing colon # S C S I C h e c k s # not entirely sure what to do with SCSI info, but if feels like a good # thing to have. also, if no scsi present deal accordingly undef @temp; $ScsiInfo=''; if (-e "/proc/scsi/scsi") { @temp=`$Grep -E "Host|Type" /proc/scsi/scsi`; foreach $temp (@temp) { if ($temp=~/^Host: scsi(\d+) Channel: (\d+) Id: (\d+) Lun: (\d+)/) { $scsiHost=$1; $channel=$2; $id=$3; $lun=$4; } if ($temp=~/Type:\s+(\S+)/) { $scsiType=$1; $type="??"; $type="SC" if $scsiType=~/scanner/i; $type="DA" if $scsiType=~/Direct-Access/i; $type="SA" if $scsiType=~/Sequential-Access/i; $type="CD" if $scsiType=~/CD-ROM/i; $type="PR" if $scsiType=~/Processor/i; $ScsiInfo.="$type:$scsiHost:$channel:$id:$lun "; } } $ScsiInfo=~s/ $//; } # L u s t r e C h e c k s $CltFlag=$MdsFlag=$OstFlag=0; $NumLustreFS=$numBrwBuckets=0; if ($subsys=~/l/i) { if ((`ls /lib/modules/*/kernel/net/lustre 2>/dev/null|wc -l`==0) && (`ls /lib/modules/*/*/kernel/net/lustre 2>/dev/null|wc -l`==0) && (`ls /lib/modules/*/*/lustre-client 2>/dev/null|wc -l`==0)) { disableSubsys('l', 'this system does not have lustre modules installed'); } else { # Get Luster and SFS Versions before looking at any data structures in the # 'lustreCheck' routines because things change over time $temp=`cat /proc/fs/lustre/version | grep lustre 2>/dev/null`; $temp=~/lustre: (\d+.*)/; $cfsVersion=$1; $sfsVersion=''; if (-e '/etc/sfs-release') { $temp=cat('/etc/sfs-release'); $temp=~/(\d.*)/; $sfsVersion=$1; } elsif (-e "/usr/sbin/sfsmount" && -e $Rpm) { # XC and client enabler $llite=`$Rpm -qa | $Grep lustre-client`; $llite=~/lustre-client-(.*)/; $sfsVersion=$1; } $OstWidth=$FSWidth=0; $NumMds=$NumOst=0; $MdsNames=$OstNames=$lustreCltInfo=''; $inactiveOstFlag=0; lustreCheckClt(); lustreCheckMds(); lustreCheckOst(); print "Lustre -- CltFlag: $CltFlag NumMds: $NumMds NumOst: $NumOst\n" if $debug & 8; disableSubsys('l', "no lustre services running and I don't know its type. You will need to use --lustsvc to force type.") if $CltFlag+$NumMds+$NumOst==0 && $lustreSvcs eq ''; # Global to count how many buckets there are for brw_stats @brwBuckets=(1,2,4,8,16,32,64,128,256); push @brwBuckets, (512,1024) if $sfsVersion ge '2.2'; $numBrwBuckets=scalar(@brwBuckets); # if we're doing lustre DISK stats, figure out what kinds of disks # and then build up a list of them for collection to use. To keep switch # error processing clean, only try to open the file if an MDS or OSS. # Since services may not be up, we also need to look at '$lustreSvcs', # though ultimately we'll only set the disk types and the maximum buckets if ($subsys=~/l/i && $lustOpts=~/D/ && ($MdsFlag || $OstFlag || $lustreSvcs=~/[mo]/i)) { # The first step is to build up a hash of the sizes of all the # existing partitions. Since we're only doing this once, a 'cat's # overhead should be minimal @partitions=`cat /proc/partitions`; foreach $part (@partitions) { # ignore blank lines and header next if $part=~/^\s*$|^major/; # now for the magic. Get the partition size and name, but ignore # cciss devices on controller 0 OR any devices with partitions # noting cciss device partitions end in 'p-digit' and sd partitions # always end in a digit. ($size, $name)=(split(/\s+/, $part))[3,4]; $name=~s/cciss\///; next if $name=~/^c0|^c.*p\d$|^sd.*\d$/; $partitionSize{$name}=$size; } # Determine which directory to look in based on whether or not there # is an EVA present. If so, we look at 'sd' stats; otherwize 'cciss' $LusDiskNames=''; $LusDiskDir=(-e '/proc/scsi/sd_iostats') ? '/proc/scsi/sd_iostats' : '/proc/driver/cciss/cciss_iostats'; # Now find all the stat files, noting that in the case of cciss, we # always skip c0 disks since they're local ones... Also note that # if we're doing a showHeader with -Lm or -Lo on a client, the file # isn't there AND we don't want to report an error either. $openFlag=(opendir(DIR, $LusDiskDir)) ? 1 : 0; logmsg('F', "Disk stats requested but couldn't open '$LusDiskDir'") if !$openFlag && !$showHeaderFlag; while ($diskname=readdir(DIR)) { next if $diskname=~/^\.|^c0/; # if this has a partition within the range of a service lun, # ignore it. if ($partitionSize{$diskname}/(1024*1024)<$LustreSvcLunMax) { print "Ignoring $diskname because its size of ". "$partitionSize{$diskname} is less than ${LustreSvcLunMax}GB\n" if $debug & 1; next; } push @LusDiskNames, $diskname; $LusDiskNames.="$diskname "; } $LusDiskNames=~s/ $//; $NumLusDisks=scalar(@LusDiskNames); $LusMaxIndex=($LusDiskNames=~/sd/) ? 16 : 24; } } } # S L A B C h e c k s # Header for /proc/slabinfo changed in 2.6 if ($slabinfoFlag && $subsys=~/y/i) { $SlabGetProc=($slabFilt eq '') ? 99 : 14; $temp=`head -n 1 /proc/slabinfo`; $temp=~/(\d+\.\d+)/; $SlabVersion=$1; $NumSlabs=`cat /proc/slabinfo | wc -l`*1; chomp $NumSlabs; $NumSlabs-=2; if ($SlabVersion!~/^1\.1|^2/) { # since 'W' will echo on terminal, we only use when writing to files $severity=(defined($opt_s)) ? "E" : "I"; $severity="W" if $logToFileFlag; logmsg($severity, "unsupported /proc/slabinfo version: $SlabVersion"); $subsys=~s/y//gi; $yFlag=$YFlag=0; } } } sub diskRemapName { my $diskName=shift; foreach my $key (keys %diskRemap) { if ($diskName=~/$key/) { my $temp=$diskName; $diskName=~s/$key/$diskRemap{$key}/; $remapped{$temp}=$diskName; # save, just in case we want to use some day #print "$temp RENAMED via REMAP: $key TO $diskName\n"; } } return($diskName); } # Why is initFormat() so damn big? # # Since logs can be analyzed on a system on which they were not generated # and to avoid having to read the actual data to determine things like how # many cpus or disks there are, this info is written into the log file # header. initFormat() then reads this out of the head and initialized the # corresponding variables. # # Counters are always incrementing (until they wrap) and therefore to get the # value for the current interval one needs decrement it by the sample from # the previous interval. Therefore, theere are 3 different types of # variables to deal with: # - current sample: some 'root', ends in 'Now' # - last sample: some 'root', end in 'Last' # - true value: 'root' only - rootNow-rootLast # # To make all this work the very first time through, all 'Last' variables # need to be initialized to 0 both to suppress -w initialization warnings AND # because it's good coding practice. Furthermore, life is a lot cleaner just # to initialize everything whether we've selected the corresponding subsystem # or not. Furthermore, since it is possible to select a subsystem in plot # mode for which we never gathered any data, we need to initialize all the # printable values to 0s as well. That's why there is so much crap in # initFormat(). sub initFormat { my $playfile=shift; my ($day, $mon, $year, $i, $recsys, $host); my ($version, $datestamp, $timestamp, $interval); $temp=(defined($playfile)) ? $playfile : ''; print "initFormat($temp)\n" if $debug & 1; # Constants local to formatting $OneKB=1024; $OneMB=1024*1024; $OneGB=1024*1024*1024; $TenGB=$OneGB*10; # in normal mode we report "/sec", but with -on we report "/int", noting # this is also appended to plot format headers $rate=$options!~/n/ ? "/sec" : "/int"; if (defined($playfile)) { $header=getHeader($playfile); return undef if $header eq ''; # save the first two lines of the header for writing into the new header. # since the Deamon Options have been renamed in V1.5.3 we need to get a # little trickier to handle both. Since they are so specific I'm leaving # them global. $header=~/(Collectl.*)/; $recHdr1=$1; $recHdr2=(($header=~/(Daemon Options: )(.*)/ || $header=~/(DaemonOpts: )(.*)/) && $2 ne '') ? "$1$2" : ""; $header=~/Collectl:\s+V(\S+)/; $version=$1; $hiResFlag=$1 if $header=~/HiRes:\s+(\d+)/; # only after V1.5.3 $boottime=($header=~/Booted:\s+(\S+)/) ? $1 : 0; $Distro=''; if ($header=~/Distro:\s+(.+)/) # was optional before 'Platform' added { $Distro=$1; $ProductName=$1 if $Distro=~s/Platform: (.*)//; } # Prior to collect V3.2.1-4, use the header to determine the type of nfs data in the # file noting very old versions used SubOpts. $recNfsFilt=$1 if $header=~/NfsFilt: (\S*) \S/; $subOpts=($header=~/SubOpts:\s+(\S*)\s*Options/) ? $1 : ''; # pre V3.2.1-4 if ($version lt '3.2.1-4') { $nfsOpts=($header=~/NfsOpts: (\S*)\s*Interval/) ? $1 : $subOpts; $nfsOpts=~s/[BDMORcom]//g; # in case it came from SubOpts remove lustre stuff if ($version lt '3.2.1-3') { $recNfsFilt=($nfsOpts=~/C/) ? 'c' : 's'; $recNfsFilt.=($nfsOpts=~/([234])/) ? $1 : 3; } else { # very limited release $recNfsFilt=($nfsOpts=~/C/) ? 'c3,c4' : 's3,s4'; } } if ($header=~/TcpFilt:\s+(\S+)/) { # remember, even if an option is not recorded we still report on it my $recOpts=(defined($1)) ? $1 : $tcpFiltDefault; $tcpFilt=$recOpts if $tcpFilt eq ''; } # Users CAN overrider LustOpts so we need to do it this way, again accounting for # older versions of collectl storing them as part of SubOpts if ($lustOpts eq '') { $lustOpts=($header=~/LustOpts: (\S*)\s*Services/) ? $1 : $subOpts; $lustOpts=~s/[23C]//g; # remove nfs options } # we want to preserve original subsys from the header, but we # also want to override it if user did a -s. If user specified a # +/- we also need to deal with as in collectl.pl, but in this # case without the error checking since it already passed through. # NOTE - rare, but if not subsys, set to ' ' also noting '' won't work # in regx in collectl after call to this routine $header=~/SubSys:\s+(\S*) /; $recSubsys=$subsys=($1!~/Options/) ? $1 : ' '; $recHdr1.=" Subsys: $subsys"; $recSubsys=$subsys='Y' if $topSlabFlag && $userSubsys eq ''; $recSubsys=$subsys='Z' if $topProcFlag && $userSubsys eq ''; # reset subsys based on what was recorded and -s $subsys=mergeSubsys($recSubsys); $subsys.='Y' if $subsys!~/Y/ && $topSlabFlag; # if --top need to include Y or Z if not in -s $subsys.='Z' if $subsys!~/Z/ && $topProcFlag; # I'm not sure the Mds/Ost/Clt names still need to be initialized # but it can't hurt. Clearly the 'lustre' variables do. $MdsNames=$OstNames=$lustreClts=''; $lustreMdss=$lustreOsts=$lustreClts=''; # This can only happen with pre 3.0.0 version of collectl if ($subsys=~/LL/) { $subsys=~s//L/; $lustOpts.='O'; } # We ONLY override the settings for the raw file, never any others. # Even though currently only 'rawp' files, we're doing pattern match below # with [p] to make easier to add others if we ever need to. $playfile=~/(.*-\d{8})-\d{6}\.raw([p]*)/; if (defined($playbackSettings{$1}) && $2 eq '') { # NOTE - when -L not specified for lustre, $lustreSvcs will end up being # set to the combined values of all files for this prefix ($subsys, $lustreSvcs, $lustreMdss, $lustreOsts, $lustreClts)= split(/\|/, $playbackSettings{$1}); print "OVERRIDES - Subsys: $subsys LustreSvc: $lustreSvcs ". "MDSs: $lustreMdss Osts: $lustreOsts Clts: $lustreClts\n" if $debug & 2048; } print "Playfile: $playfile Subsys: $subsys\n" if $debug & 1; setFlags($subsys); # In case not in current file header but defined within set for prefix/date $CltFlag=$MdsFlag=$OstFlag=$NumMds=$NumOst=$OstWidth=$FSWidth=0; $MdsNames=$lustreMdss if $lustreMdss ne ''; $OstNames=$lustreOsts if $lustreOsts ne ''; # Maybe some day we can get rid of pre 1.5.0 support? $numBrwBuckets=0; if ($header=~/Lustre/ && $version ge '1.5.0') { # Remember, we could have cfs without sfs so need 2 separate pattern tests $cfsVersion=$sfsVersion=''; if ($version ge '2.1') { $header=~/CfsVersion:\s+(\S+)/; $cfsVersion=$1; $header=~/SfsVersion:\s+(\S+)/; $sfsVersion=$1; } # In case not already defined (for single or consistent files, these are # not specified as overrides), get them from the file header. Note that # when no osts, this will grab the next line it I include \s* after # OstNames:, so for now I'm doing it this way and chopping leading space. $MdsHdrNames=$OstHdrNames=''; if ($header=~/MdsNames:\s+(.*)\s*NumOst:\s+\d+\s+OstNames:(.*)$/m) { $MdsHdrNames=$1; $OstHdrNames=$2; $OstHdrNames=~s/\s+//; $MdsNames=($lustreMdss ne '') ? $lustreMdss : $MdsHdrNames; $OstNames=($lustreOsts ne '') ? $lustreOsts : $OstHdrNames; } if ($MdsNames ne '') { @MdsMap=remapLustreNames($MdsHdrNames, $MdsNames, 0) if $MdsHdrNames ne ''; foreach $name (split(/ /, $MdsNames)) { $NumMds++; $MdsFlag=1; } } if ($OstNames ne '') { # This build list for interpretting input from 'raw' file if there is any @OstMap=remapLustreNames($OstHdrNames, $OstNames, 0) if $OstHdrNames ne ''; # This builds data needed for display foreach $name (split(/ /, $OstNames)) { $lustreOstName[$NumOst]=$name; $lustreOsts[$NumOst++]=$name; $OstWidth=length($name) if length($name)>$OstWidth; $OstFlag=1; } } if ($header=~/CltInfo:\s+(.*)$/m) { $CltHdrNames=$1; $lustreCltInfo=($lustreCltInfo ne '') ? $lustreCltInfo : $CltHdrNames; } undef %fsNames; $CltFlag=$NumLustreFS=$NumLustreCltOsts=0; $lustreCltInfo=$lustreClts if $lustreClts ne ''; if ($lustreCltInfo ne "") { $CltFlag=1; foreach $name (split(/ /, $lustreCltInfo)) { ($fsName, $ostName)=split(/:/, $name); $lustreCltFS[$NumLustreFS++]=$fsName if !defined($fsNames{$fsName}); $fsNames{$fsName}=1; $FSWidth=length($fsName) if length($fsName)>$FSWidth; # if osts defined, we just overwrite anything with did for the non-ost if ($ostName ne '') { $lustreCltOsts[$NumLustreCltOsts]=$ostName; $lustreCltOstFS[$NumLustreCltOsts]=$fsName; $OstWidth=length($ostName) if length($ostName)>$OstWidth; $NumLustreCltOsts++; } } @CltFSMap= remapLustreNames($CltHdrNames, $lustreCltInfo, 1) if defined($CltHdrNames); @CltOstMap=remapLustreNames($CltHdrNames, $lustreCltInfo, 2) if defined($CltHdrNames); } print "CLT: $CltFlag OST: $OstFlag MDS: $MdsFlag\n" if $debug & 1; # if disk I/O stats specified in header, init appropriate variables if ($header=~/LustreDisks.*Names:\s+(.*)/) { @lusDiskDirs=split(/\s+/, $1); $NumLusDisks=scalar(@lusDiskDirs); $LusDiskNames=$1; @LusDiskNames=split(/\s+/, $LusDiskNames); } } else # PRE 1.5.0 lustre stuff goes here... { if ($header=~/NumOsts:\s+(\d+)\s+NumMds:\s+(\d+)/) { $NumOst=$1; $NumMds=$2; $OstNames=$MdsNames=''; for ($i=0; $i<$NumOst; $i++) { $OstMap[$i]=$i; $OstNames.="Ost$i "; $lustreOsts[$i]="Ost$i"; $OstWidth=length("Ost$i") if length("ost$i")>$OstWidth; $OstFlag=1; } $OstNames=~s/ $//; for ($i=0; $i<$NumMds; $i++) { $MdsMap[$i]=$i; $MdsNames.="Mds$i "; $MdsFlag=1; } $MdsNames=~s/ $//; } $NumLustreFS=$NumLustreCltOsts=0; if ($header=~/FS:\s+(.*)\s+Luns:\s+(.*)\s+LunNames:\s+(.*)$/m) { $CltFlag=1; $tempFS=$1; $tempLuns=$2; $tempFSNames=$3; foreach $fsName (split(/ /, $tempFS)) { $CltFSMap[$NumLustreFS]=$NumLustreFS; $lustreCltFS[$NumLustreFS]=$fsName; $FSWidth=length($fsName) if length($fsName)>$FSWidth; $NumLustreFS++; } # If defined, user did --lustopts O and need to reset FS info # Also note that since these numbers appear in raw data, we can't use a # simple index but rather need lun number if ($tempLuns ne '') { # The lun numbers will be mapped into OSTs foreach $lunNum (split(/ /, $tempLuns)) { $CltFSMap[$lunNum]=$NumLustreCltOsts; $CltOstMap[$lunNum]=$NumLustreCltOsts; $lustreCltOsts[$NumLustreCltOsts]=$lunNum; $OstWidth=length($lunNum) if length($lunNum)>$FSWidth; $NumLustreCltOsts++; } $NumLustreFS=0; foreach $fsName (split(/ /, $tempFSNames)) { $lustreCltOstFS[$NumLustreFS]=$fsName; $FSWidth=length($fsName) if length($fsName)>$FSWidth; $NumLustreFS++; } } } } $header=~/Host:\s+(\S+)/; $Host=$1; $HostLC=lc($Host); # we need this for timezone conversions... $header=~/Date:\s+(\d+)-(\d+)/; $datestamp=$1; $timestamp=$2; $timesecs=$timezone=''; # for logs generated with older versions if ($header=~/Secs:\s+(\d+)\s+TZ:\s+(.*)/) { $timesecs=$1; $timezone=$2; } # Allows us to move its location in the header $header=~/Interval: (\S+)/; $interval=$1; # save HZ and archictecture for later use $header=~/HZ:\s+(\d+)\s+Arch:\s+(\S+)/; $HZ=$1; $SrcArch=$2; # In case pagesize not defined in header (for earlier versions # of collectl) pick a default based on architecture; $PageSize=($SrcArch=~/ia64/) ? 16384 : 4096; $PageSize=$1 if $header=~/PageSize:\s+(\d+)/; # Even though we don't do anything with CPU, Speed, Cores and Siblings we need # to put them in new header. $header=~/Cpu:\s+(.*) Speed/; $CpuVendor=$1; $header=~/Speed\(MHz\): (\S+)/; $CpuMHz=$1; $header=~/Cores: (\d+)/; $CpuCores=$1; $header=~/Siblings: (\d+)/; $CpuSiblings=$1; $header=~/Nodes: (\d+)/; $CpuNodes=$1; # when playing back from a file we need to make sure the KERNEL is that of # the file and not the one the data was collected on. $header=~/Kernel:\s+(\S+)/; $Kernel=$1; error("collectl no longer supports 2.4 kernels") if $Kernel=~/^2\.4/; $header=~/NumCPUs:\s+(\d+)/; $NumCpus=$1; $Hyper=($header=~/HYPER/) ? "[HYPER]" : ""; $header=~/NumBud:\s+(\d+)/; $NumBud=$1; $flags=($header=~/Flags:\s+(\S+)/) ? $1 : ''; $tworawFlag= ($flags=~/[g2]/) ? 1 : 0; $processIOFlag= ($flags=~/i/) ? 1 : 0; $slubinfoFlag= ($flags=~/s/) ? 1 : 0; $processCtxFlag= ($flags=~/x/) ? 1 : 0; $cpuDisabledFlag=($flags=~/D/) ? 1 : 0; # IB flags a little different becuase various combinations noting we can # have 2 different types of extended counters, from /sys and from perfquery if ($flags=~/[PX]/) { if ($flags=~/PX/) { $PQopt='-x'; } elsif ($flags=~/X/) { $PQopt='sys'; } else { $PQopt='-r'; } } # If we're not processing CPU data, this message will never be set so # just initialized for all cases. $cpuDisabledMsg=''; $header=~/Memory:\s+(\d+)/; $Memory=$1; # Since disks are discovered dynamically all we need to init a few pointers. $dskIndex=$dskSeenLast=0; # networks are dynamic too but also messier because while we can't get speeds in playback mode # we do need those that have been recorded so our 'bogus' checks. $header=~/NumNets:\s+(\d+)\s+NetNames:\s+(.*)/; $numNets=$1; $netNames=$2; $NetWidth=$netOptsW; undef(@netOrder); my $netIndex=0; my $interval1=(split(/:/, $interval))[0]; foreach my $netName (split(/ /, $netNames)) { my $speed=($netName=~/:(\d+)/) ? $1 : $DefNetSpeed; $netName=~s/(\S+):.*/$1/; $NetMaxTraffic[$netIndex]=2*$interval1*$speed*125; $NetWidth=length($netName) if $NetWidth2; } # Now get OFED/Perqquery versions which for earlier versions were not in header # Not clear if we really need these in playback mode but since we may some day... $IBVersion=($header=~/IBVersion:\s+(\S+)/) ? $1 : ''; $PQVersion=($header=~/PQVersion:\s+(\S+)/) ? $1 : ''; } # Scsi info is optional $ScsiInfo=($header=~/SCSI:\s+(.*)/) ? $1 : ''; # Pass header to import routines BUT only if they have a callback defined for (my $i=0; $i<$impNumMods; $i++) { &{$impGetHeader[$i]}(\$header) if defined(&{$impGetHeader[$i]});} } # Initialize global arrays with sizes of buckets for lustre brw stats and # not to worry if lustre not there. @brwBuckets=(1,2,4,8,16,32,64,128,256); push @brwBuckets, (512,1024) if defined($sfsVersion) && $sfsVersion ge '2.2'; $numBrwBuckets=scalar(@brwBuckets); # same thing for lustre disk state though these are a little tricker. if ($LusDiskNames=~/sd/) { @diskBuckets=(.5,1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192,16384); } else { @diskBuckets=(.5,1,2,4,8,16,32,63,64,65,80,96,112,124,128,129,144,252,255,256,257,512,1024,2048); } $LusMaxIndex=scalar(@diskBuckets); # this inits lustre variables in both playback and collection modes. initLustre('o', 0, $NumOst); initLustre('m', 0, $NumMds); initLustre('c', 0, $NumLustreFS); initLustre('c2', 0, $NumLustreCltOsts) if $NumLustreCltOsts ne '-'; # I n i t ' C o r e ' V a r i a b l e s # when we're generating plot data and we're either not collecting # everything or we're in playback mode and it's not all in raw file, make # sure all the core variables that get printed have been initialized to 0s. # for disks, nets and pars the core variables are the totals and so get # initialized in the initInterval() routine every cycle $i=$NumCpus; $userP[$i]=$niceP[$i]=$sysP[$i]=$idleP[$i]=$totlP[$i]=0; $irqP[$i]=$softP[$i]=$stealP[$i]=$waitP[$i]=0; $guestP[$i]=$guestNP[$i]=0; for (my $i=0; $i<$CpuNodes; $i++) { foreach my $numa ('used', 'free', 'slab', 'map', 'anon', 'anonH', 'lock', 'inact',) { $numaMem[$i]->{$numa}=$numaMem[$i]->{$numa.'C'}=0; } foreach my $hits ('for', 'miss', 'hits') { $numaStat[$i]->{$hits}=0; } } $dentryNum=$dentryUnused=$filesAlloc=$filesMax=$inodeUsed=$inodeMax=0; $loadAvg1=$loadAvg5=$loadAvg15=$loadRun=$loadQue=$ctxt=$intrpt=$proc=0; $memDirty=$clean=$target=$laundry=$memAct=$memInact=0; $procsRun=$procsBlock=0; $pagein=$pageout=$swapin=$swapout=$swapTotal=$swapUsed=$swapFree=0; $pagefault=$pagemajfault=0; $memTot=$memUsed=$memFree=$memShared=$memBuf=$memCached=$memSlab=0; $memAnon=$memAnonH=$memMap=$memCommit=$memLocked=0; $memHugeTot=$memHugeFree=$memHugeRsvd=$memSUnreclaim=0; $sockUsed=$sockTcp=$sockOrphan=$sockTw=$sockAlloc=0; $sockMem=$sockUdp=$sockRaw=$sockFrag=$sockFragM=0; # extended memory stats, just in case some are missing $pageFree=$pageActivate=0; $pageAllocDma=$pageAllocDma32=$pageAllocNormal=$pageAllocMove=0; $pageRefillDma=$pageRefillDma32=$pageRefillNormal=$pageRefillMove=0; $pageStealDma=$pageStealDma32=$pageStealNormal=$pageStealMove=0; $pageKSwapDma=$pageKSwapDma32=$pageKSwapNormal=$pageKSwapMove=0; $pageDirectDma=$pageDirectDma32=$pageDirectNormal=$pageDirectMove=0; # Lustre MDS stuff - in case no data $lustreMdsReintCreate=$lustreMdsReintLink=$lustreMdsReintSetattr=0; $lustreMdsReintRename=$lustreMdsReintUnlink=$lustreMdsReint=0; $lustreMdsGetattr=$lustreMdsGetattrLock=$lustreMdsStatfs=0; $lustreMdsGetxattr=$lustreMdsSetxattr=$lustreMdsSync=0; $lustreMdsConnect=$lustreMdsDisconnect=0; # Common nfs stats $rpcCCalls=$rpcSCalls=$rpcBadAuth=$rpcBadClnt=$rpcRetrans=$rpcCredRef=0; $nfsPkts=$nfsUdp=$nfsTcp=$nfsTcpConn=0; # V2 $nfs2CNull=$nfs2CGetattr=$nfs2CSetattr=$nfs2CRoot=$nfs2CLookup=$nfs2CReadlink= $nfs2CRead=$nfs2CWrcache=$nfs2CWrite=$nfs2CCreate=$nfs2CRemove=$nfs2CRename= $nfs2CLink=$nfs2CSymlink=$nfs2CMkdir=$nfs2CRmdir=$nfs2CReaddir=$nfs2CFsstat=$nfs2CMeta=0; $nfs2SNull=$nfs2SGetattr=$nfs2SSetattr=$nfs2SRoot=$nfs2SLookup=$nfs2SReadlink= $nfs2SRead=$nfs2SWrcache=$nfs2SWrite=$nfs2SCreate=$nfs2SRemove=$nfs2SRename= $nfs2SLink=$nfs2SSymlink=$nfs2SMkdir=$nfs2SRmdir=$nfs2SReaddir=$nfs2SFsstat=$nfs2SMeta=0; # V3 $nfs3CNull=$nfs3CGetattr=$nfs3CSetattr=$nfs3CLookup=$nfs3CAccess=$nfs3CReadlink=0; $nfs3CRead=$nfs3CWrite=$nfs3CCreate=$nfs3CMkdir=$nfs3CSymlink=$nfs3CMknod=$nfs3CRemove=0; $nfs3CRmdir=$nfs3CRename=$nfs3CLink=$nfs3CReaddir=$nfs3CReaddirplus=$nfs3CFsstat=0; $nfs3CFsinfo=$nfs3CPathconf=$nfs3CCommit=$nfs3CMeta=0; $nfs3SNull=$nfs3SGetattr=$nfs3SSetattr=$nfs3SLookup=$nfs3SAccess=$nfs3SReadlink=0; $nfs3SRead=$nfs3SWrite=$nfs3SCreate=$nfs3SMkdir=$nfs3SSymlink=$nfs3SMknod=$nfs3SRemove=0; $nfs3SRmdir=$nfs3SRename=$nfs3SLink=$nfs3SReaddir=$nfs3SReaddirplus=$nfs3SFsstat=0; $nfs3SFsinfo=$nfs3SPathconf=$nfs3SCommit=$nfs3SMeta=0; # V4 $nfs4CNull=$nfs4CRead=$nfs4CWrite=$nfs4CCommit=$nfs4CSetattr=$nfs4CFsinfo=0; $nfs4CAccess=$nfs4CGetattr=$nfs4CLookup=$nfs4CRemove=$nfs4CRename=$nfs4CLink=0; $nfs4CSymlink=$nfs4CCreate=$nfs4CPathconf=$nfs4CReadlink=$nfs4CReaddir=$nfs4CMeta=0; $nfs4SAccess=$nfs4SCommit=$nfs4SCreate=$nfs4SGetattr=$nfs4SLink=$nfs4SLookup=0; $nfs4SRead=$nfs4SReaddir=$nfs4SReadlink=$nfs4SRemove=$nfs4SRename=$nfs4SSetattr=0; $nfs4SWrite=$nfs4SMeta=0; # tcp - this is sooo ugly. not all variable are part of all kernels and these are at least # some of the ones I've found to be missing in some. This list may need to be augmented over # time. The alternative it to have conditional tests on all the printing and there is just # too much of that. Also, if you don't collect ANY tcp stats but try playing back in verbose # they need to be initialized too $tcpData{Ip}->{InReceives}= $tcpData{Ip}->{InDelivers}=0; $tcpData{Ip}->{ForwDatagrams}= $tcpData{Ip}->{InDiscards}=0; $tcpData{Ip}->{InAddrErrors}= $tcpData{Ip}->{OutRequests}=0; $tcpData{Ip}->{OutDiscards}= $tcpData{Ip}->{ReasmReqds}=0; $tcpData{Ip}->{ReasmOKs}= $tcpData{Ip}->{FragOKs}=$tcpData{Ip}->{FragCreates}=0; $tcpData{Tcp}->{ActiveOpens}= $tcpData{Tcp}->{PassiveOpens}=0; $tcpData{Tcp}->{AttemptFails}= $tcpData{Tcp}->{EstabResets}=0; $tcpData{Tcp}->{CurrEstab}= $tcpData{Tcp}->{InSegs}=0; $tcpData{Tcp}->{OutSegs}= $tcpData{Tcp}->{RetransSegs}=0; $tcpData{Tcp}->{InErrs}= $tcpData{Tcp}->{OutRsts}=0; $tcpData{Udp}->{InDatagrams}= $tcpData{Udp}->{OutDatagrams}=0; $tcpData{Udp}->{NoPorts}= $tcpData{Udp}->{InErrors}=0; $tcpData{Icmp}->{InMsgs}= $tcpData{Icmp}->{InErrors}=0; $tcpData{Icmp}->{InDestUnreachs}=$tcpData{Icmp}->{InEchos}=0; $tcpData{Icmp}->{InEchoReps}= $tcpData{Icmp}->{OutMsgs}=0; $tcpData{Icmp}->{OutErrors}= $tcpData{Icmp}->{OutDestUnreachs}=0; $tcpData{Icmp}->{OutEchos}= $tcpData{Icmp}->{OutEchoReps}=0; $tcpData{IpExt}->{InMcastPkts}=$tcpData{IpExt}->{InBcastPkts}=0; $tcpData{IpExt}->{InOctets}= $tcpData{IpExt}->{InMcastOctets}=0; $tcpData{IpExt}->{InBcastOctets}=$tcpData{IpExt}->{OutMcastPkts}=0; $tcpData{IpExt}->{OutOctets}= $tcpData{IpExt}->{OutMcastOctets}=0; $tcpData{TcpExt}->{TW}= $tcpData{TcpExt}->{PAWSEstab}=0; $tcpData{TcpExt}->{DelayedACKs}=$tcpData{TcpExt}->{DelayedACKLost}=0; $tcpData{TcpExt}->{TCPPrequeued}=$tcpData{TcpExt}->{TCPDirectCopyFromPrequeue}=0; $tcpData{TcpExt}->{TCPHPHits}= $tcpData{TcpExt}->{TCPPureAcks}=0; $tcpData{TcpExt}->{TCPHPAcks}= $tcpData{TcpExt}->{TCPDSACKOldSent}=0; $tcpData{TcpExt}->{TCPAbortOnData}=$tcpData{TcpExt}->{TCPAbortOnClose}=0; $tcpData{TcpExt}->{TCPSackShiftFallback}=0; $tcpData{TcpExt}->{TCPLoss}=$tcpData{TcpExt}->{TCPFastRetrans}=0; $ipErrors=$icmpErrors=$tcpErrors=$udpErrors=$ipExErrors=$tcpExErrors=0; # this is here strictly for compatibility with older raw files $NumTcpFields=65; for ($i=0; $i<$NumTcpFields; $i++) { $tcpValue[$i]=$tcpLast[$i]=0; } # get ready to process first interval noting '$lastSecs' gets initialized # when the data file is read in playback mode $lastSecs[0]=$lastSecs[1]=0 if $playback eq ''; $intFirstSeen=0; initInterval(); # I n i t ' E x t e n d e d ' V a r i a b l e s # The current thinking is if someone wants to plot extended variables and # they haven't been collected (remember the rule that when you report for # plotting, you always produce what's in -s) we better intialize the results # variables to all zeros. for ($i=0; $i<$NumCpus; $i++) { $userP[$i]=$niceP[$i]=$sysP[$i]=$idleP[$i]=$totlP[$i]=0; $irqP[$i]=$softP[$i]=$stealP[$i]=$waitP[$i]=0; $guestP[$i]=$guestNP[$i]=0; } # these all need to be initialized in case we use /proc/stats since not all variables # supplied by that for ($i=0; $i<$dskIndexNext; $i++) { $dskOps[$i]=$dskTicks[$i]=0; $dskRead[$i]=$dskReadKB[$i]=$dskReadMrg[$i]=0; $dskWrite[$i]=$dskWriteKB[$i]=$dskWriteMrg[$i]=0; $dskRqst[$i]=$dskQueLen[$i]=$dskWait[$i]=$dskSvcTime[$i]=$dskUtil[$i]=0; $dskWaitR[$i]=$dskWaitW[$i]=0; } for ($i=0; $i<$netIndexNext; $i++) { $netName[$i]=""; $netRxPkt[$i]=$netTxPkt[$i]= $netRxKB[$i]= $netTxKB[$i]= $netRxErr[$i]= $netRxDrp[$i]=$netRxFifo[$i]=$netRxFra[$i]= $netRxCmp[$i]= $netRxMlt[$i]= $netTxErr[$i]=$netTxDrp[$i]= $netTxFifo[$i]=$netTxColl[$i]=$netTxCar[$i]= $netTxCmp[$i]=$netRxErrs[$i]=$netTxErrs[$i]=0; } # Don't forget infiniband for ($i=0; $i<$NumHCAs; $i++) { $ibTxKB[$i]=$ibTx[$i]=$ibRxKB[$i]=$ibRx[$i]=$ibErrorsTot[$i]=0; } # if we ever want to map scsi devices to their host/channel/etc, this does it # for partitions undef @scsi; $scsiIndex=0; foreach $device (split(/\s+/, $ScsiInfo)) { $scsi[$scsiIndex++]=(split(/:/, $device, 2))[1] if $device=~/DA/; } # C o n s t a n t H e a d e r S t u f f # I suppose for performance it would be good to build all headers once, # but for now at least do a few pieces. # get mini date/time header string according to $options but also note these # don't apply to --top mode $miniDateTime=""; # so we don't get 'undef' down below $miniDateTime="Time " if $miniTimeFlag; $miniDateTime="Date Time " if $miniDateFlag && $options=~/d/; $miniDateTime="Date Time " if $miniDateFlag && $options=~/D/; $miniDateTime.=" " if $options=~/m/; $miniFiller=' ' x length($miniDateTime); # sometimes we want to shift things 1 space to the left. $miniFiller1=substr($miniFiller, 0, length($miniFiller)-1); # If we need two lines, we need to align $len=length($miniDateTime); $miniBlanks=sprintf("%${len}s", ''); $interval1Counter=0; # S l a b S t u f f $slabIndexNext=0; $slabDataFlag=0; undef %slabIndex; # P r o c e s s S t u f f $procIndexNext=0; # I n t e r v a l 2 S t u f f $interval2Counter=0; # I n t e r v a l 3 S t u f f $interval3Counter=0; $ipmiFile->{pre}=[]; # in case no --envrules specified $ipmiFile->{post}=[]; $ipmiFile->{ignore}=[]; loadEnvRules() if $subsys=~/E/ || $envTestFile ne ''; # Wasn't sure if this should have been buried in 'loadEnvRules()' # since they're not actualy 'rules' if ($envRemap ne '') { @envRemaps=split(/,/,$envRemap); for (my $i=0; $i<@envRemaps; $i++) { $envRemaps[$i]=~/\/(.*?)\/(.*?)\//; $ipmiRemap->[$i]->[1]=$1; $ipmiRemap->[$i]->[2]=$2; } } # A r c h i t e c t u r e S t u f f $word32=2**32; $maxword= ($SrcArch=~/ia64|x86_64/) ? 2**64 : $word32; return(($version, $datestamp, $timestamp, $timesecs, $timezone, $interval, $recSubsys, $recNfsFilt, $recHeader)) if defined($playfile); } # I n i t i a l i z e ' L a s t ' V a r i a b l e s sub initLast { # 0=raw 1=rawp my $rawType=shift; # just init slab variables because process ones are all dynamic if (!defined($rawType) || $rawType) { for ($i=0; $i<$NumSlabs; $i++) { $slabObjActLast[$i]=$slabObjAllLast[$i]=0; $slabSlabActLast[$i]=$slabSlabAllLast[$i]=0; } return if defined($rawType); } # Since dynamically defined need to start clean. undef(%intrptType); $ctxtLast=$intrptLast=$procLast=0; $rpcCCallsLast=$rpcSCallsLast=$rpcBadAuthLast=$rpcBadClntLast=0; $rpcRetransLast=$rpcCredRefLast=0; $nfsPktsLast=$nfsUdpLast=$nfsTcpLast=$nfsTcpConnLast=0; $pageinLast=$pageoutLast=$swapinLast=$swapoutLast=0; $pagefaultLast=$pagemajfaultLast=0; $opsLast=$readLast=$readKBLast=$writeLast=$writeKBLast=0; $memFreeLast=$memUsedLast=$memBufLast=$memCachedLast=0; $memInactLast=$memSlabLast=$memMapLast=0; $memAnonLast=$memAnonHLast=$memCommitLast=$memLockedLast=0; $swapFreeLast=$swapUsedLast=0; for ($i=0; $i<18; $i++) { $nfs2CValuesLast[$i]=0; $nfs2SValuesLast[$i]=0; } for ($i=0; $i<22; $i++) { $nfs3CValuesLast[$i]=0; $nfs3SValuesLast[$i]=0; } for ($i=0; $i<59; $i++) { $nfs4CValuesLast[$i]=0; $nfs4SValuesLast[$i]=0; } for ($i=0; $i<=$NumCpus; $i++) { $userLast[$i]=$niceLast[$i]=$sysLast[$i]=$idleLast[$i]=0; $waitLast[$i]=$irqLast[$i]=$softLast[$i]=$stealLast[$i]=0; $guestLast[$i]=$guestNLast[$i]=0; } for (my $i=0; $i<$CpuNodes; $i++) { $numaStat[$i]->{hitsLast}=$numaStat[$i]->{missLast}=$numaStat[$i]->{forLast}=0; $numaMem[$i]->{freeLast}= $numaMem[$i]->{usedLast}=$numaMem[$i]->{actLast}=0; $numaMem[$i]->{inactLast}=$numaMem[$i]->{mapLast}= $numaMem[$i]->{anonLast}=0; $numaMem[$i]->{anonHLast}=$numaMem[$i]->{lockLast}= $numaMem[$i]->{slabLast}=0; } # ...and disks for ($i=0; $i<$dskIndexNext; $i++) { $dskOpsLast[$i]=0; $dskReadLast[$i]=$dskReadKBLast[$i]=$dskReadMrgLast[$i]=$dskReadTicksLast[$i]=0; $dskWriteLast[$i]=$dskWriteKBLast[$i]=$dskWriteMrgLast[$i]=$dskWriteTicksLast[$i]=0; $dskInProgLast[$i]=$dskTicksLast[$i]=$dskWeightedLast[$i]=0; for ($j=0; $j<11; $j++) { $dskFieldsLast[$i][$j]=0; } } for ($i=0; $i<$netIndexNext; $i++) { $netRxKBLast[$i]=$netRxPktLast[$i]=$netTxKBLast[$i]=$netTxPktLast[$i]=0; $netRxErrLast[$i]=$netRxDrpLast[$i]=$netRxFifoLast[$i]=$netRxFraLast[$i]=0; $netRxCmpLast[$i]=$netRxMltLast[$i]=$netTxErrLast[$i]=$netTxDrpLast[$i]=0; $netTxFifoLast[$i]=$netTxCollLast[$i]=$netTxCarLast[$i]=$netTxCmpLast[$i]=0; } # IB - we only need 16 for 32bit counters, but 20 for OPA! for ($i=0; $i<$NumHCAs; $i++) { # in almost all cases we only have 64 bit counters $ibRxLast[$i][1]=$ibRxLast[$i][2]=0; $ibTxLast[$i][1]=$ibTxLast[$i][2]=0; $ibRxKBLast[$i][1]=$ibRxKBLast[$i][2]=0; $ibTxKBLast[$i][1]=$ibTxKBLast[$i][2]=0; # maybe some day we can just get rid of these... for ($j=0; $j<20; $j++) { # There are 2 ports on an hca, numbered 1 and 2 $ibFieldsLast[$i][1][$j]=$ibFieldsLast[$i][2][$j]=0; } } } # When a subsys is selected for which this is no possibility of collecting # data, we must disable it in subsys as well as any --export modules which # explicitly selects that subsys too sub disableSubsys { my $type= shift; my $why= shift; my $unique=shift; # If user specified --all, they shouldn't see these messages logmsg("W", "-s$type disabled because $why") if !$allFlag && !$AllFlag; $subsys=~s/$type//ig if !defined($unique) || !$unique; # disable using /i if unique not set $subsys=~s/$type//g if defined($unique) && $unique; # otherwise just disablehe one specified # Not really sure if need to do this but it certainly can't hurt. $EFlag=0 if $type=~/E/; $bFlag=$BFlag=0 if $type=~/b/; $lFlag=$LFlag=0 if $type=~/l/; $xFlag=$XFlag=0 if $type=~/x/; # Now make sure any occurances in s= of an export are disabled too. for (my $i=0; $i<@expOpts; $i++) { if ($expOpts[$i]=~/s=.*$type/i) { logmsg('W', "found 's=$type' in lexpr so disabled there too") if !$allFlag || $AllFlag; $expOpts[$i]=~s/$type//ig; } } } # when playing back lustre data, the indexes on the detail stats may be shifted # relative to collectl logs in which other OSTs existed. In other words in one # file one may have "ostY ostZ", in a second "ostX ostZ" and in a third "ostY". # We need to generate index mappings such that ost1 will always map to 0, ost2 # to 1 and so on. sub remapLustreNames { my $hdrNames=shift; my $allNames=shift; my $cltType= shift; my ($i, $j, $uuid, @hdrTemp, @allTemp, @maps); # the names as contained in the header are always unique, including ':ost' for # --lustopt O. However, for --lustopts O reporting, we only want the ost part # and hence the special treatment. Type=1 used to be meaningful before I realized # stripping off the ':ost' lead to non-unique names and incorrect remapping. if ($cltType==2) { $hdrNames=~s/\S+:(\S+)/$1/g; $allNames=~s/\S+:(\S+)/$1/g; } print "remapLustrenames() -- Type: $cltType HDR: $hdrNames ALL: $allNames\nREMAPPED: " if $debug & 8; if ($hdrNames ne '') { @hdrTemp=split(/ /, $hdrNames); @allTemp=split(/ /, $allNames); for ($i=0; $i=$NumSlabs) { $NumSlabs++; $slabObjActLast[$i]=$slabObjAllLast[$i]=0; $slabSlabActLast[$i]=$slabSlabAllLast[$i]=0; logmsg("W", "New slab created after logging started") } # since these are NOT counters, the values are actually totals from which we # can derive changes from individual entries. if ($SlabVersion eq '1.1') { ($slabObjActTot[$i], $slabObjAllTot[$i], $slabObjSize[$i], $slabSlabActTot[$i], $slabSlabAllTot[$i], $slabPagesPerSlab[$i])=(split(/\s+/, $data))[1..6]; $slabObjPerSlab[$i]=($slabSlabAllTot[$i]) ? $slabObjAllTot[$i]/$slabSlabAllTot[$i] : 0; } elsif ($SlabVersion=~/^2/) { ($slabObjActTot[$i], $slabObjAllTot[$i], $slabObjSize[$i], $slabObjPerSlab[$i], $slabPagesPerSlab[$i], $slabSlabActTot[$i], $slabSlabAllTot[$i])=(split(/\s+/, $data))[1..5,13,14]; } # Total Sizes of objects and slabs $slabObjActTotB[$i]=$slabObjActTot[$i]*$slabObjSize[$i]; $slabObjAllTotB[$i]=$slabObjAllTot[$i]*$slabObjSize[$i]; $slabSlabActTotB[$i]=$slabSlabActTot[$i]*$slabPagesPerSlab[$i]*$PageSize; $slabSlabAllTotB[$i]=$slabSlabAllTot[$i]*$slabPagesPerSlab[$i]*$PageSize; $slabObjAct[$i]= $slabObjActTot[$i]- $slabObjActLast[$i]; $slabObjAll[$i]= $slabObjAllTot[$i]- $slabObjAllLast[$i]; $slabSlabAct[$i]=$slabSlabActTot[$i]-$slabSlabActLast[$i]; $slabSlabAll[$i]=$slabSlabAllTot[$i]-$slabSlabAllLast[$i]; $slabObjActLast[$i]= $slabObjActTot[$i]; $slabObjAllLast[$i]= $slabObjAllTot[$i]; $slabSlabActLast[$i]=$slabSlabActTot[$i]; $slabSlabAllLast[$i]=$slabSlabAllTot[$i]; # Changes in total allocation since last one, noting on first pass it's always 0 my $slabTotMemNow=$slabSlabAllTotB[$i]; my $slabTotMemLast=(defined($slabTotalMemLast{$name})) ? $slabTotalMemLast{$name} : $slabTotMemNow; $slabTotMemChg[$i]=$slabTotMemNow-$slabTotMemLast; $slabTotMemPct[$i]=($slabTotMemLast!=0) ? 100*$slabTotMemChg[$i]/$slabTotMemLast : 0; $slabTotalMemLast{$name}=$slabTotMemNow; # if --slabopt S, only count slabs whose objects or sizes have changed # since last interval. # note -- this is only if !S and the slabs themselves change if ($slabOpts!~/S/ || $slabSlabAct[$i]!=0 || $slabSlabAll[$i]!=0) { $slabObjActTotal+= $slabObjActTot[$i]; $slabObjAllTotal+= $slabObjAllTot[$i]; $slabObjActTotalB+= $slabObjActTot[$i]*$slabObjSize[$i]; $slabObjAllTotalB+= $slabObjAllTot[$i]*$slabObjSize[$i]; $slabSlabActTotal+= $slabSlabActTot[$i]; $slabSlabAllTotal+= $slabSlabAllTot[$i]; $slabSlabActTotalB+=$slabSlabActTot[$i]*$slabPagesPerSlab[$i]*$PageSize; $slabSlabAllTotalB+=$slabSlabAllTot[$i]*$slabPagesPerSlab[$i]*$PageSize; $slabNumAct++ if $slabSlabAllTot[$i]; $slabNumTot++; } } else { # Note as efficient as if..then..elsif..elsif... but a lot more readable # and more important, no appreciable difference in processing time my ($slabname, $datatype, $value)=split(/\s+/, $data); $slabdata{$slabname}->{objsize}=$value if $datatype=~/^object_/; # object_size $slabdata{$slabname}->{slabsize}=$value if $datatype=~/^slab_/; # slab_size $slabdata{$slabname}->{order}=$value if $datatype=~/^or/; # order $slabdata{$slabname}->{objper}=$value if $datatype=~/^objs/; # objs_per_slab $slabdata{$slabname}->{objects}=$value if $datatype=~/^objects/; # This is the second of the ('objects','slabs') tuple if ($datatype=~/^slabs/) { my $numSlabs=$slabdata{$slabname}->{slabs}=$value; $interval2Print=1; $slabdata{$slabname}->{avail}=$slabdata{$slabname}->{objper}*$numSlabs; $slabNumTot+= $numSlabs; $slabObjAvailTot+=$slabdata{$slabname}->{objper}*$numSlabs; $slabNumObjTot+= $slabdata{$slabname}->{objects}; $slabUsedTot+= $slabdata{$slabname}->{used}=$slabdata{$slabname}->{slabsize}*$slabdata{$slabname}->{objects}; $slabTotalTot+= $slabdata{$slabname}->{total}=$value*($PageSize<<$slabdata{$slabname}->{order}); # Changes in total allocation since last one, noting on first pass it's always 0 my $slabTotMemNow=$slabdata{$slabname}->{total}; my $slabTotMemLast=(defined($slabTotalMemLast{$slabname})) ? $slabTotalMemLast{$slabname} : $slabTotMemNow; $slabdata{$slabname}->{memchg}=$slabTotMemNow-$slabTotMemLast; $slabdata{$slabname}->{mempct}=($slabTotMemLast!=0) ? 100*$slabdata{$slabname}->{memchg}/$slabTotMemLast : 0; $slabTotalMemLast{$slabname}=$slabTotMemNow; } } } } elsif ($subsys=~/b/i && $type=~/^buddy/) { my @fields=split(/\s+/, $data); $buddyNode[$budIndex]=$fields[1]; $buddyZone[$budIndex]=$fields[3]; $buddyNode[$budIndex]=~s/,$//; for (my $i=0; $i=0; $i--) { # if this CPU disabled, just set its count to 0 and move on to next one $vals[$index--]=0 if !$cpuEnabled[$index]; $vals[$index]=$vals[$i]; $index--; } } # I n i t i a l i z e ' l a s t ' v a l u e s # Since I'm not sure if new entries can show up dynamically AND because we # have to find non-numeric entries so we can initialize them, let's just # always do our initialization dynamically instead of in initRecord(). $type=~s/:$//; my $typeSort=($type=~/^\d/) ? sprintf("%03d", $type) : $type; if (!defined($intrptType{$typeSort})) { $intrptType{$typeSort}=1; if ($type!~/ERR|MIS/) { # Pull devicename/time BUT note on earlier kernels for non-numeric types # these fields aren't always filled in my ($intType, $intDevices)=split(/\s+/, $vals[$NumCpus], 2); $intType='' if !defined($intType); $intDevices='' if !defined($intDevices); chomp $intDevices; $intDevices=~s/\s+//g; # remove whitespace $intName{$typeSort}=sprintf("%-15s %s", $intType, $intDevices); if ($type!~/^\d/) { $intName{$typeSort}="$intType$intDevices"; $intName{$typeSort}=~s/interrupts$//; } } if ($type=~/^\d/) { # We use array for numeric values and a hash for strings as the array # access is a little faster expecially as the number of entries grows # We're also reformatting the modifier so the devices line up... for (my $i=0; $i<$NumCpus; $i++) { $intrptLast[$type]->[$i]=0; } } else { for (my $i=0; $i<$NumCpus; $i++) { $intrptLast{$type}->[$i]=0; } } } # M a t h h a p p e n s h e r e for (my $i=0; $i<$NumCpus; $i++) { # If a CPU is disabled, just set it's count to zero. if ($subsys=~/c/i && !$cpuEnabled[$i]) { $intrpt[$type]->[$i]=0 if $type=~/^\d/; $intrpt{$type}->[$i]=0 if $type!~/^\d/; next; } if ($type=~/^\d/) { $intrpt[$type]->[$i]=$vals[$i]-$intrptLast[$type]->[$i]; $intrptLast[$type]->[$i]=$vals[$i]; $intrptTot[$i]+=$intrpt[$type]->[$i]; } # Not sure if other types that only hit cpu0 elsif ($i==0 || ($type ne 'ERR' && $type ne 'MIS')) { $intrpt{$type}->[$i]=$vals[$i]-$intrptLast{$type}->[$i]; $intrptLast{$type}->[$i]=$vals[$i]; $intrptTot[$i]+=$intrpt{$type}->[$i]; } } } elsif ($subsys=~/l/i && $type=~/OST_(\d+)/) { chomp $data; $index=$1; ($lustreType, $lustreOps, $lustreBytes)=(split(/\s+/, $data))[0,1,6]; $index=$OstMap[$index] if $playback ne ''; # handles remapping is OSTs change position #print "IDX: $index, $lustreType, $lustreOps, $lustreBytes\n"; $lustreBytes=0 if $lustreOps==0; if ($lustreType=~/read/) { $lustreReadOpsNow= $lustreOps; $lustreReadKBytesNow= $lustreBytes/$OneKB; $lustreReadOps[$index]= fix($lustreReadOpsNow-$lustreReadOpsLast[$index]); $lustreReadKBytes[$index]= fix($lustreReadKBytesNow-$lustreReadKBytesLast[$index]); $lustreReadOpsLast[$index]= $lustreReadOpsNow; $lustreReadKBytesLast[$index]=$lustreReadKBytesNow; $lustreReadOpsTot+= $lustreReadOps[$index]; $lustreReadKBytesTot+= $lustreReadKBytes[$index]; } else { $lustreWriteOpsNow= $lustreOps; $lustreWriteKBytesNow= $lustreBytes/$OneKB; $lustreWriteOps[$index]= fix($lustreWriteOpsNow-$lustreWriteOpsLast[$index]); $lustreWriteKBytes[$index]= fix($lustreWriteKBytesNow-$lustreWriteKBytesLast[$index]); $lustreWriteOpsLast[$index]= $lustreWriteOpsNow; $lustreWriteKBytesLast[$index]=$lustreWriteKBytesNow; $lustreWriteOpsTot+= $lustreWriteOps[$index]; $lustreWriteKBytesTot+= $lustreWriteKBytes[$index]; } } elsif ($subsys=~/l/i && $type=~/OST-b_(\d+):(\d+)/) { chomp $data; $index=$1; $bufNum=$2; ($lustreBufReadNow, $lustreBufWriteNow)=(split(/\s+/, $data))[1,5]; $index=$OstMap[$index] if $playback ne ''; $lustreBufRead[$index][$bufNum]=fix($lustreBufReadNow-$lustreBufReadLast[$index][$bufNum]); $lustreBufWrite[$index][$bufNum]=fix($lustreBufWriteNow-$lustreBufWriteLast[$index][$bufNum]); $lustreBufReadTot[$bufNum]+=$lustreBufRead[$index][$bufNum]; $lustreBufWriteTot[$bufNum]+=$lustreBufWrite[$index][$bufNum]; $lustreBufReadLast[$index][$bufNum]= $lustreBufReadNow; $lustreBufWriteLast[$index][$bufNum]=$lustreBufWriteNow; } elsif ($subsys=~/l/ && $type=~/MDS/) { chomp $data; ($name, $value)=(split(/\s+/, $data))[0,1]; # if we ever do mds detail, this goes here! #$index=$MdsMap[$index] if $playback ne ''; if ($name=~/^mds_getattr$/) { $lustreMdsGetattr=fix($value-$lustreMdsGetattrLast); $lustreMdsGetattrLast=$value; } elsif ($name=~/^mds_getattr_lock/) { $lustreMdsGetattrLock=fix($value-$lustreMdsGetattrLockLast); $lustreMdsGetattrLockLast=$value; } elsif ($name=~/^mds_statfs/) { $lustreMdsStatfs=fix($value-$lustreMdsStatfsLast); $lustreMdsStatfsLast=$value; } elsif ($name=~/^mds_getxattr/) { $lustreMdsGetxattr=fix($value-$lustreMdsGetxattrLast); $lustreMdsGetxattrLast=$value; } elsif ($name=~/^mds_setxattr/) { $lustreMdsSetxattr=fix($value-$lustreMdsSetxattrLast); $lustreMdsSetxattrLast=$value; } elsif ($name=~/^mds_sync/) { $lustreMdsSync=fix($value-$lustreMdsSyncLast); $lustreMdsSyncLast=$value; } elsif ($name=~/^mds_connect/) { $lustreMdsConnect=fix($value-$lustreMdsConnectLast); $lustreMdsConnectLast=$value; } elsif ($name=~/^mds_disconnect/) { $lustreMdsDisconnect=fix($value-$lustreMdsDisconnectLast); $lustreMdsDisconnectLast=$value; } elsif ($name=~/^mds_reint$/) { $lustreMdsReint=fix($value-$lustreMdsReintLast); $lustreMdsReintLast=$value; } # These 5 were added in 1.6.5.1 and are mutually exclusive with mds_reint elsif ($name=~/^mds_reint_create/) { $lustreMdsReintCreate=fix($value-$lustreMdsReintCreateLast); $lustreMdsReintCreateLast=$value; } elsif ($name=~/^mds_reint_link/) { $lustreMdsReintLink=fix($value-$lustreMdsReintLinkLast); $lustreMdsReintLinkLast=$value; } elsif ($name=~/^mds_reint_setattr/) { $lustreMdsReintSetattr=fix($value-$lustreMdsReintSetattrLast); $lustreMdsReintSetattrLast=$value; } elsif ($name=~/^mds_reint_rename/) { $lustreMdsReintRename=fix($value-$lustreMdsReintRenameLast); $lustreMdsReintRenameLast=$value; } elsif ($name=~/^mds_reint_unlink/) { $lustreMdsReintUnlink=fix($value-$lustreMdsReintUnlinkLast); $lustreMdsReintUnlinkLast=$value; } } elsif ($subsys=~/l/i && $type=~/LLITE:(\d+)/) { $fs=$1; chomp $data; ($name, $ops, $value)=(split(/\s+/, $data))[0,1,6]; $fs=$CltFSMap[$fs] if $playback ne ''; if ($name=~/dirty_pages_hits/) { $lustreCltDirtyHits[$fs]=fix($ops-$lustreCltDirtyHitsLast[$fs]); $lustreCltDirtyHitsLast[$fs]=$ops; $lustreCltDirtyHitsTot+=$lustreCltDirtyHits[$fs]; } elsif ($name=~/dirty_pages_misses/) { $lustreCltDirtyMiss[$fs]=fix($ops-$lustreCltDirtyMissLast[$fs]); $lustreCltDirtyMissLast[$fs]=$ops; $lustreCltDirtyMissTot+=$lustreCltDirtyMiss[$fs]; } elsif ($name=~/read/) { # if brand new fs and no I/0, this field isn't defined. $value=0 if !defined($value); $lustreCltRead[$fs]=fix($ops-$lustreCltReadLast[$fs]); $lustreCltReadLast[$fs]=$ops; $lustreCltReadTot+=$lustreCltRead[$fs]; $lustreCltReadKB[$fs]=fix(($value-$lustreCltReadKBLast[$fs])/$OneKB); $lustreCltReadKBLast[$fs]=$value; $lustreCltReadKBTot+=$lustreCltReadKB[$fs]; } elsif ($name=~/write/) { $value=0 if !defined($value); # same as 'read' $lustreCltWrite[$fs]=fix($ops-$lustreCltWriteLast[$fs]); $lustreCltWriteLast[$fs]=$ops; $lustreCltWriteTot+=$lustreCltWrite[$fs]; $lustreCltWriteKB[$fs]=fix(($value-$lustreCltWriteKBLast[$fs])/$OneKB); $lustreCltWriteKBLast[$fs]=$value; $lustreCltWriteKBTot+=$lustreCltWriteKB[$fs]; } elsif ($name=~/open/) { $lustreCltOpen[$fs]=fix($ops-$lustreCltOpenLast[$fs]); $lustreCltOpenLast[$fs]=$ops; $lustreCltOpenTot+=$lustreCltOpen[$fs]; } elsif ($name=~/close/) { $lustreCltClose[$fs]=fix($ops-$lustreCltCloseLast[$fs]); $lustreCltCloseLast[$fs]=$ops; $lustreCltCloseTot+=$lustreCltClose[$fs]; } elsif ($name=~/seek/) { $lustreCltSeek[$fs]=fix($ops-$lustreCltSeekLast[$fs]); $lustreCltSeekLast[$fs]=$ops; $lustreCltSeekTot+=$lustreCltSeek[$fs]; } elsif ($name=~/fsync/) { $lustreCltFsync[$fs]=fix($ops-$lustreCltFsyncLast[$fs]); $lustreCltFsyncLast[$fs]=$ops; $lustreCltFsyncTot+=$lustreCltFsync[$fs]; } elsif ($name=~/setattr/) { $lustreCltSetattr[$fs]=fix($ops-$lustreCltSetattrLast[$fs]); $lustreCltSetattrLast[$fs]=$ops; $lustreCltSetattrTot+=$lustreCltSetattr[$fs]; } elsif ($name=~/getattr/) { $lustreCltGetattr[$fs]=fix($ops-$lustreCltGetattrLast[$fs]); $lustreCltGetattrLast[$fs]=$ops; $lustreCltGetattrTot+=$lustreCltGetattr[$fs]; } } elsif ($subsys=~/l/i && $type=~/LLITE_RA:(\d+)/) { $fs=$1; chomp $data; $fs=$CltFSMap[$fs] if $playback ne ''; if ($data=~/^pending.* (\d+)/) { # This is NOT a counter but a meter $ops=$1; $lustreCltRAPending[$fs]=$ops; $lustreCltRAPendingTot+=$lustreCltRAPending[$fs]; } elsif ($data=~/^hits.* (\d+)/) { $ops=$1; $lustreCltRAHits[$fs]=fix($ops-$lustreCltRAHitsLast[$fs]); $lustreCltRAHitsLast[$fs]=$ops; $lustreCltRAHitsTot+=$lustreCltRAHits[$fs]; } elsif ($data=~/^misses.* (\d+)/) { $ops=$1; $lustreCltRAMisses[$fs]=fix($ops-$lustreCltRAMissesLast[$fs]); $lustreCltRAMissesLast[$fs]=$ops; $lustreCltRAMissesTot+=$lustreCltRAMisses[$fs]; } elsif ($data=~/^readpage.* (\d+)/) { $ops=$1; $lustreCltRANotCon[$fs]=fix($ops-$lustreCltRANotConLast[$fs]); $lustreCltRANotConLast[$fs]=$ops; $lustreCltRANotConTot+=$lustreCltRANotCon[$fs]; } elsif ($data=~/^miss inside.* (\d+)/) { $ops=$1; $lustreCltRAMisWin[$fs]=fix($ops-$lustreCltRAMisWinLast[$fs]); $lustreCltRAMisWinLast[$fs]=$ops; $lustreCltRAMisWinTot+=$lustreCltRAMisWin[$fs]; } elsif ($data=~/^failed grab.* (\d+)/) { $ops=$1; $lustreCltRAFalGrab[$fs]=fix($ops-$lustreCltRAFalGrabLast[$fs]); $lustreCltRAFalGrabLast[$fs]=$ops; $lustreCltRAFalGrabTot+=$lustreCltRAFalGrab[$fs]; } elsif ($data=~/^failed lock.* (\d+)/) { $ops=$1; $lustreCltRALckFail[$fs]=fix($ops-$lustreCltRALckFailLast[$fs]); $lustreCltRALckFailLast[$fs]=$ops; $lustreCltRALckFailTot+=$lustreCltRALckFail[$fs]; } elsif ($data=~/^read but.* (\d+)/) { $ops=$1; $lustreCltRAReadDisc[$fs]=fix($ops-$lustreCltRAReadDiscLast[$fs]); $lustreCltRAReadDiscLast[$fs]=$ops; $lustreCltRAReadDiscTot+=$lustreCltRAReadDisc[$fs]; } elsif ($data=~/^zero length.* (\d+)/) { $ops=$1; $lustreCltRAZeroLen[$fs]=fix($ops-$lustreCltRAZeroLenLast[$fs]); $lustreCltRAZeroLenPLast[$fs]=$ops; $lustreCltRAZeroLenTot+=$lustreCltRAZeroLen[$fs]; } elsif ($data=~/^zero size.* (\d+)/) { $ops=$1; $lustreCltRAZeroWin[$fs]=fix($ops-$lustreCltRAZeroWinLast[$fs]); $lustreCltRAZeroWinLast[$fs]=$ops; $lustreCltRAZeroWinTot+=$lustreCltRAZeroWin[$fs]; } elsif ($data=~/^read-ahead.* (\d+)/) { $ops=$1; $lustreCltRA2Eof[$fs]=fix($ops-$lustreCltRA2EofLast[$fs]); $lustreCltRA2EofLast[$fs]=$ops; $lustreCltRA2EofTot+=$lustreCltRA2Eof[$fs]; } elsif ($data=~/^hit max.* (\d+)/) { $ops=$1; $lustreCltRAHitMax[$fs]=fix($ops-$lustreCltRAHitMaxLast[$fs]); $lustreCltRAHitMaxLast[$fs]=$ops; $lustreCltRAHitMaxTot+=$lustreCltRAHitMax[$fs]; } elsif ($data=~/^wrong.* (\d+)/) { $ops=$1; $lustreCltRAWrong[$fs]=fix($ops-$lustreCltRAWrongLast[$fs]); $lustreCltRAWrong[$fs]=$ops; $lustreCltRAWrongTot+=$lustreCltRAWrong[$fs]; } } elsif ($subsys=~/l/i && $type=~/LLITE_RPC:(\d+):(\d+)/) { chomp $data; $index=$1; $bufNum=$2; ($lustreCltRpcReadNow, $lustreCltRpcWriteNow)=(split(/\s+/, $data))[1,5]; $index=$CltOstMap[$index] if $playback ne ''; $lustreCltRpcRead[$index][$bufNum]= fix($lustreCltRpcReadNow-$lustreCltRpcReadLast[$index][$bufNum]); $lustreCltRpcWrite[$index][$bufNum]=fix($lustreCltRpcWriteNow-$lustreCltRpcWriteLast[$index][$bufNum]); $lustreCltRpcReadTot[$bufNum]+= $lustreCltRpcRead[$index][$bufNum]; $lustreCltRpcWriteTot[$bufNum]+=$lustreCltRpcWrite[$index][$bufNum]; $lustreCltRpcReadLast[$index][$bufNum]= $lustreCltRpcReadNow; $lustreCltRpcWriteLast[$index][$bufNum]=$lustreCltRpcWriteNow; } elsif ($subsys=~/l/i && $type=~/LLDET:(\d+)/) { $ost=$1; chomp $data; ($name, $ops, $value)=(split(/\s+/, $data))[0,1,6]; $ost=$CltOstMap[$ost] if $playback ne ''; if ($name=~/^read_bytes|ost_r/) { $lustreCltLunRead[$ost]=fix($ops-$lustreCltLunReadLast[$ost]); $lustreCltLunReadLast[$ost]=$ops; if (defined($value)) # not always defined { $lustreCltLunReadKB[$ost]=fix(($value-$lustreCltLunReadKBLast[$ost])/$OneKB); $lustreCltLunReadKBLast[$ost]=$value; } } elsif ($name=~/^write_bytes|ost_w/) { $lustreCltLunWrite[$ost]=fix($ops-$lustreCltLunWriteLast[$ost]); $lustreCltLunWriteLast[$ost]=$ops; if (defined($value)) # not always defined { $lustreCltLunWriteKB[$ost]=(fix($value-$lustreCltLunWriteKBLast[$ost])/$OneKB); $lustreCltLunWriteKBLast[$ost]=$value; } } } # disk stats apply to both MDS and OSTs elsif ($subsys=~/l/i && $type=~/LUS-d_(\d+):(\d+)/) { $lusDisk=$1; $bufNum= $2; # The units of 'readB/writeB' are number of 512 byte blocks # in case partial table [rare], make sure totals go in last bucket. chomp $data; ($size, $reads, $readB, $writes, $writeB)=split(/\s+/, $data); $bufNum=$LusMaxIndex if $size=~/^total/; # Numbers for individual disks $lusDiskReads[$lusDisk][$bufNum]= fix($reads-$lusDiskReadsLast[$lusDisk][$bufNum]); $lusDiskReadB[$lusDisk][$bufNum]= fix($readB-$lusDiskReadBLast[$lusDisk][$bufNum]); $lusDiskWrites[$lusDisk][$bufNum]=fix($writes-$lusDiskWritesLast[$lusDisk][$bufNum]); $lusDiskWriteB[$lusDisk][$bufNum]=fix($writeB-$lusDiskWriteBLast[$lusDisk][$bufNum]); #print "BEF DISKTOT[$bufNum] R: $lusDiskReadsTot[$bufNum] W: $lusDiskWritesTot[$bufNum]\n"; # Numbers for ALL disks $lusDiskReadsTot[$bufNum]+= $lusDiskReads[$lusDisk][$bufNum]; $lusDiskReadBTot[$bufNum]+= $lusDiskReadB[$lusDisk][$bufNum]; $lusDiskWritesTot[$bufNum]+=$lusDiskWrites[$lusDisk][$bufNum]; $lusDiskWriteBTot[$bufNum]+=$lusDiskWriteB[$lusDisk][$bufNum]; #print "AFT DISKTOT[$bufNum] R: $lusDiskReadsTot[$bufNum] W: $lusDiskWritesTot[$bufNum]\n"; $lusDiskReadsLast[$lusDisk][$bufNum]= $reads; $lusDiskReadBLast[$lusDisk][$bufNum]= $readB; $lusDiskWritesLast[$lusDisk][$bufNum]=$writes; $lusDiskWriteBLast[$lusDisk][$bufNum]=$writeB; #print "DISK[$lusDisk][$bufNum] R: $lusDiskReads[$lusDisk][$bufNum] W: $lusDiskWrites[$lusDisk][$bufNum]\n"; } elsif ($subsys=~/c/ && $type=~/^intr/) { $intrptNow=$data; $intrpt=fix($intrptNow-$intrptLast); $intrptLast=$intrptNow; } elsif ($subsys=~/c/ && $type=~/^ctx/) { $ctxtNow=$data; $ctxt=fix($ctxtNow-$ctxtLast); $ctxtLast=$ctxtNow; } elsif ($subsys=~/c/ && $type=~/^proce/) { $procNow=$data; $proc=fix($procNow-$procLast); $procLast=$procNow; } elsif ($subsys=~/E/ && $type=~/^ipmi/) { $interval3Print=1; my @fields=split(/,/, $data); # This very first set removes any entries that are to be ignored, even if valid for (my $i=0; $i{ignore}}); $i++) { my $f1=$ipmiFile->{ignore}->[$i]->{f1}; if ($data=~/$f1/) { print "Ignore: $data\n" if $envDebug; return; } } # These are applied BEFORE the pattern match below print "$data\n" if $envDebug; my $premap=$fields[0]; $fields[0]=~s/\.|\///g; # get rid of any '.'s or '/'s for (my $i=0; $i{pre}}); $i++) { my $f1=$ipmiFile->{pre}->[$i]->{f1}; my $f2=$ipmiFile->{pre}->[$i]->{f2}; print "/$f1/$f2/\n" if $envDebug; # No need paying the price of an eval if not symbols to interpret if ($f2!~/\$/) { $fields[0]=~s/$f1/$f2/; } else { eval "\$fields[0]=~s/$f1/$f2/"; } print " Pre-Remapped '$premap' to '$fields[0]'\n" if $premap ne $fields[0] && $envDebug; } # matches: Virtual Fan | Fan n | Fans | xxx FANn | Power Meter # Not really sure why I need the '\s*' but it won't work without it! if ($fields[0]=~/^(.*)(fan.*?|temp.*?|power meter.*?)\s*(\d*)(.*)$/i) { $prefix= defined($1) ? $1 : ''; $name=$2; $instance=defined($3) ? $3 : ''; $suffix= defined($4) ? $4 : ''; printf " Prefix: %s Name: %s Instance: %s Suffix: %s\n", $prefix, $name, $instance, $suffix if $envDebug; $name=~s/Power Meter/Power/; $type='fan' if $name=~/fan/i; $type='temp' if $name=~/temp/i; $type='power' if $name=~/power/i; $name=~s/\s+$//; # If a pattern such as 'Fan1A (xxx)', the suffix will actually be set to '1 (xxx)' so # make 'xxx' the prefix and everything after the '1' will be dropped later anyway # Power doesn't have a prefix, at least I haven't found any that do yet. $prefix=$1 if $fields[0]=~/(^fan|^temp)/i && $suffix=~/\((.*)\)/; # If an instance, append the first 'word' of the suffix as a modifier if ($instance ne '') { $instance.=$suffix; $instance=~s/\s.*//; } # If a pattern like 'Fan xxx' (note the check for NOT starting with a digit), # there is no prefix or instance so make it start with 'xxx Fan' for which # we already have logic for checking it for an instance later on. $prefix=$1 if $prefix eq '' && $instance eq '' && $suffix=~/(^\D+\S+)/; # If a prefix, typically something like cpu, sys, virtual, etc., # prepend the first letter to the name. If it contains any digits # and we don't yet have an instance, use that as well. if ($prefix ne '') { $prefix=~/(.{1})[a-z]*(\d*)/i; $name="$1$name"; $instance=$2 if $instance eq ''; } # Remove all whitespace $name=~s/\s+//g; my $postmap=$fields[0]; for (my $i=0; $i{post}}); $i++) { my $f1=$ipmiFile->{post}->[$i]->{f1}; my $f2=$ipmiFile->{post}->[$i]->{f2}; print " Post-Remapped '$postmap' to '$name'\n" if $name=~s/$f1/$f2/ && $envDebug; } my $index; $index=$envFanIndex++ if $type eq 'fan'; $index=$envTempIndex++ if $type eq 'temp'; $index=0 if $type eq 'power'; $fields[1]=-1 if $fields[1] eq '' || $fields[1] eq 'no reading'; # If any last minute name remapping, this is the place for it for (my $i=0; defined($ipmiRemap) && $i<@{$ipmiRemap}; $i++) { my $p1=$ipmiRemap->[$i]->[1]; my $p2=$ipmiRemap->[$i]->[2]; $name=~s/$p1/$p2/; } $ipmiData->{$type}->[$index]->{name}= $name; $ipmiData->{$type}->[$index]->{inst}= $instance; $ipmiData->{$type}->[$index]->{value}= ($fields[1]!~/h$/) ? $fields[1] : $fields[3]; $ipmiData->{$type}->[$index]->{status}=$fields[3]; # we may need to convert temperatures, but be sure it ignore negative values if ($name=~/Temp/ && $envOpts=~/[CF]/ && ($ipmiData->{$type}->[$index]->{value} != -1)) { $ipmiData->{$type}->[$index]->{value}= $ipmiData->{$type}->[$index]->{value}*1.8+32 if $envOpts=~/F/ && $fields[2]=~/C$/; $ipmiData->{$type}->[$index]->{value}= ($ipmiData->{$type}->[$index]->{value}-32)*5/9 if $envOpts=~/C/ && $fields[2]=~/F$/; } # finally, if 'T', truncate final value $ipmiData->{$type}->[$index]->{value}=int($ipmiData->{$type}->[$index]->{value}) if $envOpts=~/T/; } } elsif ($subsys=~/d/i && $type=~/^disk/) { ($major, $minor, $diskName, @dskFields)=split(/\s+/, $data); # if using --dskremap, remap disk name $diskName=diskRemapName($diskName); if (!defined($disks{$diskName})) { $dskChangeFlag|=1; # new disk found # if available indexes use one of them otherwise generate a new one. if (@dskIndexAvail>0) { $dskIndex=pop @dskIndexAvail;} else { $dskIndex=$dskIndexNext++; } $disks{$diskName}=$dskIndex; print "new disk $diskName [$major,$minor] with index $dskIndex\n" if !$firstPass && $debug & 1; # add to ordered list of disks if seen for first time my $newDisk=1; foreach my $dsk (@dskOrder) { $newDisk=0 if $diskName eq $dsk; } push @dskOrder, $diskName if $newDisk; # by initializing the 'last' variable to the current value, we're assured to report 0s for the first # interval while teeing up the correct last value for the next interval. for (my $i=0; $i<11; $i++) { $dskFieldsLast[$dskIndex][$i]=$dskFields[$i]; } } $dskIndex=$disks{$diskName}; $dskSeen[$dskIndex]=$diskName; $dskSeenCount++; # faster than looping through to count # Clarification of field definitions: # Excellent reference: http://cvs.sourceforge.net/viewcvs.py/linux-vax # /kernel-2.5/Documentation/iostats.txt?rev=1.1.1.2 # ticks - time in jiffies doing I/O (some utils call it 'r/w-use') # inprog - I/O's in progress (some utils call it 'running') # ticks - time actually spent doing I/O (some utils call it 'use') # aveque - average time in queue (some utils call it 'aveq' or even 'ticks') $dskRead[$dskIndex]= fix($dskFields[0]-$dskFieldsLast[$dskIndex][0]); $dskReadMrg[$dskIndex]= fix($dskFields[1]-$dskFieldsLast[$dskIndex][1]); $dskReadKB[$dskIndex]= fix($dskFields[2]-$dskFieldsLast[$dskIndex][2])/2; $dskReadTicks[$dskIndex]= fix($dskFields[3]-$dskFieldsLast[$dskIndex][3]); $dskWrite[$dskIndex]= fix($dskFields[4]-$dskFieldsLast[$dskIndex][4]); $dskWriteMrg[$dskIndex]= fix($dskFields[5]-$dskFieldsLast[$dskIndex][5]); $dskWriteKB[$dskIndex]= fix($dskFields[6]-$dskFieldsLast[$dskIndex][6])/2; $dskWriteTicks[$dskIndex]=fix($dskFields[7]-$dskFieldsLast[$dskIndex][7]); $dskInProg[$dskIndex]= $dskFieldsLast[$dskIndex][8]; $dskTicks[$dskIndex]= fix($dskFields[9]-$dskFieldsLast[$dskIndex][9]); # according to the author of iostat this field can sometimes be negative # so handle the same way he does $dskWeighted[$dskIndex]=($dskFields[10]>=$dskFieldsLast[$dskIndex][10]) ? fix($dskFields[10]-$dskFieldsLast[$dskIndex][10]) : fix($dskFieldsLast[$dskIndex][10]-$dskFields[10]); # If read/write had bogus value, reset ALL current values for this disk to 0, noting that 1st pass # is initialization and numbers NOT valid so don't generate message if ($DiskMaxValue>0 && ($dskReadKB[$dskIndex]>$DiskMaxValue || $dskWriteKB[$dskIndex]>$DiskMaxValue)) { logmsg('E', "One of ReadKB/WriteKB of '$dskRead[$dskIndex]/$dskWriteKB[$dskIndex]' > '$DiskMaxValue' for '$diskName'") if !$firstPass; logmsg('W', "Resetting all current performance values for this disk to 0"); $dskOps[$dskIndex]=$dskRead[$dskIndex]=$dskReadKB[$dskIndex]=$dskWrite[$dskIndex]=$dskWriteKB[$dskIndex]=0; $dskReadMrg[$dskIndex]=$dskWriteMrg[$dskIndex]=$dskWriteTicks[$dskIndex]=0; $dskInProg[$dskIndex]=$dskTicks[$dskIndex]=$dskWeighted[$dskIndex]=0; } # Apply filters to summary totals, explicitly ignoring dm and psv devices which we know contain # duplicates and should never be considered as summary data. We also ignore other devices # explicitly told to do so with '--dskfilt ^' which means to ignore if ($diskName!~/^dm-|^psv/ && ($dskFilt eq '' || $diskName!~/$dskFiltIgnore/)) { # Never include partitions in summary stats and we're using lookbehind to deal with nvme # perl lookbehind makes my head explode but what we're doing is to ignore nvme partitions # which is any device name that ends in a digit AND not preceded with 'nvme\dn' in which # case the digit is part of the device name. whew... if ($diskName!~/(?{free}=$value; } elsif ($name=~/^MemUsed/) { $numaMem[$node]->{used}=$value; } elsif ($name=~/^Active$/) # equal to Active(anon) + Active(file) { $numaMem[$node]->{act}=$value; } elsif ($name=~/^Inactive$/) # equal to Inactive(anon) + Inactive(file) { $numaMem[$node]->{inact}=$value; } elsif ($name=~/^Mapped/) { $numaMem[$node]->{map}=$value; } elsif ($name=~/^AnonPages/) { $numaMem[$node]->{anon}=$value; } elsif ($name=~/^AnonHugePages/) { $numaMem[$node]->{anonH}=$value; } elsif ($name=~/^Mlock/) { $numaMem[$node]->{lock}=$value; } # currently the last entry read... elsif ($name=~/^Slab/) { $numaMem[$node]->{slab}=$value; # these are changed since all last seen if ($memOpts=~/R/) { $numaMem[$node]->{freeC}= $numaMem[$node]->{free}- $numaMem[$node]->{freeLast}; $numaMem[$node]->{usedC}= $numaMem[$node]->{used}- $numaMem[$node]->{usedLast}; $numaMem[$node]->{actC}= $numaMem[$node]->{act}- $numaMem[$node]->{actLast}; $numaMem[$node]->{inactC}=$numaMem[$node]->{inact}-$numaMem[$node]->{inactLast}; $numaMem[$node]->{mapC}= $numaMem[$node]->{map}- $numaMem[$node]->{mapLast}; $numaMem[$node]->{anonC}= $numaMem[$node]->{anon}- $numaMem[$node]->{anonLast}; $numaMem[$node]->{anonHC}=$numaMem[$node]->{anonH}-$numaMem[$node]->{anonHLast}; $numaMem[$node]->{lockC}= $numaMem[$node]->{lock}- $numaMem[$node]->{lockLast}; $numaMem[$node]->{slabC}= $numaMem[$node]->{slab}- $numaMem[$node]->{slabLast}; $numaMem[$node]->{freeLast}= $numaMem[$node]->{free}; $numaMem[$node]->{usedLast}= $numaMem[$node]->{used}; $numaMem[$node]->{actLast}= $numaMem[$node]->{act}; $numaMem[$node]->{inactLast}=$numaMem[$node]->{inact}; $numaMem[$node]->{mapLast}= $numaMem[$node]->{map}; $numaMem[$node]->{anonLast}= $numaMem[$node]->{anon}; $numaMem[$node]->{anonHLast}=$numaMem[$node]->{anonH}; $numaMem[$node]->{lockLast}= $numaMem[$node]->{lock}; $numaMem[$node]->{slabLast}= $numaMem[$node]->{slab}; } } } else { if ($name=~/^numa_hit/) { $numaStat[$node]->{hitsNow}=$value; } elsif ($name=~/^numa_miss/) { $numaStat[$node]->{missNow}=$value; } # currently last entry processed elsif ($name=~/^numa_foreign/) { $numaStat[$node]->{forNow}=$value; $numaStat[$node]->{hits}=$numaStat[$node]->{hitsNow}-$numaStat[$node]->{hitsLast}; $numaStat[$node]->{miss}=$numaStat[$node]->{missNow}-$numaStat[$node]->{missLast}; $numaStat[$node]->{for}=$numaStat[$node]->{forNow}-$numaStat[$node]->{forLast}; # These MUST be caused by a kernel bug as counters shouldn't go backwards!!! if ($numaStat[$node]->{miss}<0) { logmsg('E', "Possible kernel metric bug, miss counter went backwards from $numaStat[$node]->{missLast} to $numaStat[$node]->{missNow}"); $numaStat[$node]->{miss}=0; } if ($numaStat[$node]->{for}<0) { logmsg('E', "Possible kernel metric bug, foreign counter went backwards from $numaStat[$node]->{forLast} to $numaStat[$node]->{forNow}"); $numaStat[$node]->{for}=0; } $numaStat[$node]->{hitsLast}=$numaStat[$node]->{hitsNow}; $numaStat[$node]->{missLast}=$numaStat[$node]->{missNow}; $numaStat[$node]->{forLast}=$numaStat[$node]->{forNow}; } } } # S o c k e t S t a t s elsif ($subsys=~/s/ && $type=~/^sock/) { if ($data=~/^sock/) { $data=~/(\d+)$/; $sockUsed=$1; } elsif ($data=~/^TCP/) { ($sockTcp, $sockOrphan, $sockTw, $sockAlloc, $sockMem)= (split(/\s+/, $data))[2,4,6,8,10]; } elsif ($data=~/^UDP/) { $data=~/(\d+)$/; $sockUdp=$1; } elsif ($data=~/^RAW/) { $data=~/(\d+)$/; $sockRaw=$1; } elsif ($data=~/^FRAG/) { $data=~/(\d+).*(\d)$/; $sockFrag=$1; $sockFragM=$1; } } # N e t w o r k S t a t s # a few design notes... # - %networks is the name of all current networks # - @netOrder is the discovery order # - @netIndexAvail is a stack of available, previously used indexes # - $netIndex is the index assigned the current network being processed # - $netIndexNext is next available index NOT on @netIndexAvail # - @netSeen is list of networks seen this interval # - $netSeenCount is the number of entries in @netSeen # - $netSeenLast save number seen in last interval elsif ($subsys=~/n/i && $type=~/^Net/) { # insert space after interface if none already there $data=~s/:(\d)/: $1/; undef @fields; @fields=split(/\s+/, $data); if (@fields<17) { incomplete("NET:".$fields[0], $lastSecs[$rawPFlag]); return; } # N e w N e t S e e n my $netName=$fields[0]; $netName=~s/://; if (!defined($networks{$netName})) { $netChangeFlag|=1; # could be useful to external modules print "new network found: $netName\n" if !$firstPass && $debug & 1; # if available indexes use one of them otherwise generate a new one if (@netIndexAvail>0) { $netIndex=pop @netIndexAvail;} else { $netIndex=$netIndexNext++; } $networks{$netName}=$netIndex; print "new network $netName with index $netIndex\n" if $debug & 1; # add to ordered list of networks if seen for first time my $newNet=1; foreach my $net (@netOrder) { $net=~s/:.*//; $newNet=0 if $netName eq $net; } push @netOrder, $netName if $newNet; # by initializing the 'last' variable to the current value, we're assured to report 0s for the first # interval while teeing up the correct last value for the next interval. $netRxKBLast[$netIndex]= $fields[1]; $netRxPktLast[$netIndex]= $fields[2]; $netRxErrLast[$netIndex]= $fields[3]; $netRxDrpLast[$netIndex]= $fields[4]; $netRxFifoLast[$netIndex]=$fields[5]; $netRxFraLast[$netIndex]= $fields[6]; $netRxCmpLast[$netIndex]= $fields[7]; $netRxMltLast[$netIndex]= $fields[8]; $netTxKBLast[$netIndex]= $fields[9]; $netTxPktLast[$netIndex]= $fields[10]; $netTxErrLast[$netIndex]= $fields[11]; $netTxDrpLast[$netIndex]= $fields[12]; $netTxFifoLast[$netIndex]=$fields[13]; $netTxCollLast[$netIndex]=$fields[14]; $netTxCarLast[$netIndex]= $fields[15]; $netTxCmpLast[$netIndex]= $fields[16]; # won't do anything with speed until we create a new file, but then we'll get a new header my $line=`find /sys/devices/ 2>&1 | grep net | grep $netName | grep speed`; $netSpeeds{$netName}='??'; if ($line ne '') { $speed=`cat $line 2>&1`; chomp $speed; $line=~/.*\/(\S+)\/speed/; my $netName=$1; $netSpeeds{$netName}=$speed if $speed=~/Invalid/; } # user for bogus speed checks my $netspeed=($netSpeeds{$netName} ne '??') ? $netSpeeds{$netName} : $DefNetSpeed; $NetMaxTraffic[$netIndex]=2*$interval*$netspeed*125; } $netIndex=$networks{$netName}; $netSeen[$netIndex]=$netName; $netSeenCount++; $netNameNow= $fields[0]; $netRxKBNow= $fields[1]; $netRxPktNow= $fields[2]; $netRxErrNow= $fields[3]; $netRxDrpNow= $fields[4]; $netRxFifoNow=$fields[5]; $netRxFraNow= $fields[6]; $netRxCmpNow= $fields[7]; $netRxMltNow= $fields[8]; $netTxKBNow= $fields[9]; $netTxPktNow= $fields[10]; $netTxErrNow= $fields[11]; $netTxDrpNow= $fields[12]; $netTxFifoNow=$fields[13]; $netTxCollNow=$fields[14]; $netTxCarNow= $fields[15]; $netTxCmpNow= $fields[16]; $netRxKB[$netIndex]= fix($netRxKBNow-$netRxKBLast[$netIndex])/1024; $netTxKB[$netIndex]= fix($netTxKBNow-$netTxKBLast[$netIndex])/1024; $netRxPkt[$netIndex]=fix($netRxPktNow-$netRxPktLast[$netIndex]); $netTxPkt[$netIndex]=fix($netTxPktNow-$netTxPktLast[$netIndex]); # extended/errors $netRxErr[$netIndex]= fix($netRxErrNow- $netRxErrLast[$netIndex]); $netRxDrp[$netIndex]= fix($netRxDrpNow- $netRxDrpLast[$netIndex]); $netRxFifo[$netIndex]=fix($netRxFifoNow-$netRxFifoLast[$netIndex]); $netRxFra[$netIndex]= fix($netRxFraNow- $netRxFraLast[$netIndex]); $netRxCmp[$netIndex]= fix($netRxCmpNow- $netRxCmpLast[$netIndex]); $netRxMlt[$netIndex]= fix($netRxMltNow- $netRxMltLast[$netIndex]); $netTxErr[$netIndex]= fix($netTxErrNow- $netTxErrLast[$netIndex]); $netTxDrp[$netIndex]= fix($netTxDrpNow- $netTxDrpLast[$netIndex]); $netTxFifo[$netIndex]=fix($netTxFifoNow-$netTxFifoLast[$netIndex]); $netTxColl[$netIndex]=fix($netTxCollNow-$netTxCollLast[$netIndex]); $netTxCar[$netIndex]= fix($netTxCarNow- $netTxCarLast[$netIndex]); $netTxCmp[$netIndex]= fix($netTxCmpNow- $netTxCmpLast[$netIndex]); # It has occasionally been observed that bogus data is returned for some networks. # If we see anything that looks like twice the typical speed, ignore it but remember # that during the very first interval this data should be bogus! Also, set ALL data # points to 0 since we can't trust any of them. Note that the bogus value is now in # the 'last' variable and so the next valid value will be bogus relative to it, but # then its value will become 'last' and the following values should be 'happy'. if ($DefNetSpeed>0 && $intFirstSeen && ($netRxKB[$netIndex]>$NetMaxTraffic[$netIndex] || $netTxKB[$netIndex]>$NetMaxTraffic[$netIndex])) { # we're going through some extra pain to make error messages very explicit. we also can't use # int() because some bogus values are too big, especially if data collectl on 64 bit machine # and processed on 32 bit one. $netTxKB[$netIndex]=~s/\..*//; $netRxKB[$netIndex]=~s/\..*//; incomplete("NET:".$netNameNow, $lastSecs[$rawPFlag], 'Bogus'); logmsg('I', "Network speed threshhold: $NetMaxTraffic[$netIndex] Bogus Value(s) - TX: $netTxKB[$netIndex]KB RX: $netRxKB[$netIndex]KB"); my $i=$netIndex; $netRxKB[$i]=$netTxKB[$i]=$netRxPkt[$i]=$netTxPkt[$i]=0; $netRxErr[$i]=$netRxDrp[$i]=$netRxFifo[$i]=$netRxFra[$i]=$netRxCmp[$i]=$netRxMlt[$i]=0; $netTxErr[$i]=$netTxDrp[$i]=$netTxFifo[$i]=$netTxColl[$i]=$netTxCar[$i]=$netTxCmp[$i]=0; } # these are derived for simplicity of plotting $netRxErrs[$netIndex]=$netRxErr[$netIndex]+$netRxDrp[$netIndex]+ $netRxFifo[$netIndex]+$netRxFra[$netIndex]; $netTxErrs[$netIndex]=$netTxErr[$netIndex]+$netTxDrp[$netIndex]+ $netTxFifo[$netIndex]+$netTxColl[$netIndex]+ $netTxCar[$netIndex]; # Ethernet totals only, but no longer using anywhere if ($netNameNow=~/eth/) { $netEthRxKBTot+= $netRxKB[$netIndex]; $netEthRxPktTot+=$netRxPkt[$netIndex]; $netEthTxKBTot+= $netTxKB[$netIndex]; $netEthTxPktTot+=$netTxPkt[$netIndex]; } # at least for now, we're only worrying about totals on real network # first, always ignore those in ignore list if (($netFilt eq '' && $netNameNow=~/^eth|^hed|^ib|^em|^en|^p\dp/) || ($netFiltKeep ne '' && $netNameNow=~/$netFiltKeep/) || ($netFiltIgnore ne '' && $netNameNow!~/$netFiltIgnore/)) { # NOTE - we >>>never<<< include aliased networks in the summary calculations if ($netNameNow!~/\./) { $netRxKBTot+= $netRxKB[$netIndex]; $netRxPktTot+=$netRxPkt[$netIndex]; $netTxKBTot+= $netTxKB[$netIndex]; $netTxPktTot+=$netTxPkt[$netIndex]; $netRxErrTot+= $netRxErr[$netIndex]; $netRxDrpTot+= $netRxDrp[$netIndex]; $netRxFifoTot+=$netRxFifo[$netIndex]; $netRxFraTot+= $netRxFra[$netIndex]; $netRxCmpTot+= $netRxCmp[$netIndex]; $netRxMltTot+= $netRxMlt[$netIndex]; $netTxErrTot+= $netTxErr[$netIndex]; $netTxDrpTot+= $netTxDrp[$netIndex]; $netTxFifoTot+=$netTxFifo[$netIndex]; $netTxCollTot+=$netTxColl[$netIndex]; $netTxCarTot+= $netTxCar[$netIndex]; $netTxCmpTot+= $netTxCmp[$netIndex]; $netRxErrsTot+=$netRxErrs[$netIndex]; $netTxErrsTot+=$netTxErrs[$netIndex]; } } $netName[$netIndex]= $netNameNow; $netRxKBLast[$netIndex]= $netRxKBNow; $netRxPktLast[$netIndex]=$netRxPktNow; $netTxKBLast[$netIndex]= $netTxKBNow; $netTxPktLast[$netIndex]=$netTxPktNow; $netRxErrLast[$netIndex]=$netRxErrNow; $netRxDrpLast[$netIndex]=$netRxDrpNow; $netRxFifoLast[$netIndex]=$netRxFifoNow; $netRxFraLast[$netIndex]=$netRxFraNow; $netRxCmpLast[$netIndex]=$netRxCmpNow; $netRxMltLast[$netIndex]=$netRxMltNow; $netTxErrLast[$netIndex]=$netTxErrNow; $netTxDrpLast[$netIndex]=$netTxDrpNow; $netTxFifoLast[$netIndex]=$netTxFifoNow; $netTxCollLast[$netIndex]=$netTxCollNow; $netTxCarLast[$netIndex]=$netTxCarNow; $netTxCmpLast[$netIndex]=$netTxCmpNow; } # N e t w o r k S t a c k S t a t s # note that even though each line type IS already unique, by including our own type # we get to skip a bunch of compares when not doing -st # also note the older versions ignored the IpExt data even though collected so I am too. elsif ($subsys=~/t/i && $type=~/^tcp-|^Tcp/) { # Data comes in pairs, the first line being the headers and the second the data. # if 'tcp-' present, this is V3.6.4 or more and by removing, we assure old/new data looks the same # but also, the earlier versions didn't write headers which can change from kernel to kernel!!! $type=~s/^tcp-//; $type=~s/:$//; # NEW TCP STATS if ($playback eq '' || $recVersion ge '3.6.4') { # type always precedes data if ($data=~/^\d/) { my @vals=split(/\s+/, $data); # init 'last' variables here because we don't know how may there are in the normal init section of code # and since $intFirstSeen does't get cleared until second pass we also need to see if already defined for (my $i=0; !$intFirstSeen && $i<@vals; $i++) { $tcpData{$type}->{last}->[$i]=0 if !defined($tcpData{$type}->{last}->[$i]); } for (my $i=0; $i<@vals; $i++) { my $name=$tcpData{$type}->{hdr}->[$i]; my $value=$vals[$i]-$tcpData{$type}->{last}->[$i]; #print "Seen: $intFirstSeen Type: $type Name: $name I: $i Val: $vals[$i] Last: $tcpData{$type}->{last}->[$i]\n" if $i==0 && $type=~/Icmp/; $tcpData{$type}->{$name}=$value; $tcpData{$type}->{last}->[$i]=$vals[$i]; } # Error summaries for brief/plot data, nothing nothing for IpExt. if ($briefFlag || $plotFlag ne '') { $ipErrors= $tcpData{Ip}->{InHdrErrors}+$tcpData{Ip}->{InAddrErrors}+ $tcpData{Ip}->{InUnknownProtos}+$tcpData{Ip}->{InDiscards}+ $tcpData{Ip}->{OutDiscards}+ $tcpData{Ip}->{ReasmFails}+ $tcpData{Ip}->{FragFails} if $type eq 'Ip'; $tcpErrors= $tcpData{Tcp}->{AttemptFails}+$tcpData{Tcp}->{InErrs} if $type eq 'Tcp'; $udpErrors= $tcpData{Udp}->{NoPorts}+$tcpData{Udp}->{InErrors} if $type eq 'Udp'; $icmpErrors= $tcpData{Icmp}->{InErrors}+$tcpData{Icmp}->{InDestUnreachs}+ $tcpData{Icmp}->{OutErrors} if $type eq 'Icmp'; $tcpExErrors=$tcpData{TcpExt}->{TCPLoss}+$tcpData{TcpExt}->{TCPFastRetrans} if $type eq 'TcpExt'; } } # header: only need to grab on the first interval we see elsif (!$intFirstSeen) { my @headers=split(/\s+/, $data); for (my $i=0; $i<@headers; $i++) { $tcpData{$type}->{hdr}->[$i]=$headers[$i]; } } } # OLD TCP STATS elsif ($type=~/^TcpExt/) # this is the old way, IP header, but no TCP one { chomp $data; @tcpFields=split(/ /, $data); for ($i=0; $i<$NumTcpFields; $i++) { $tcpValue[$i]=fix($tcpFields[$i]-$tcpLast[$i]); $tcpLast[$i]=$tcpFields[$i]; #print "$i: $tcpValue[$i] "; } # store old version data in new version structures even though the positions # may be wrong for some kernels. $tcpData{TcpExt}->{TCPPureAcks}= $tcpValue[27]; $tcpData{TcpExt}->{TCPHPAcks}= $tcpValue[28]; $tcpData{TcpExt}->{TCPLoss}= $tcpValue[40]; $tcpData{TcpExt}->{TCPFastRetrans}=$tcpValue[45]; } } # I n f i n i b a n d S t a t s # these stats can come from multiple sources depending on values of' # $PQopt and/or whether or not we're getting them from opa elsif ($subsys=~/x/i && $type=~/^ib(\d+)-(\d):(\S*)/) { my $i=$1; my $port=$2; my $name=$3; ####################### # ib stats from /sys ####################### # as a optimization don't even look for these unless OPA V4 or /sys if ($HCAOpaV4[$i][$port] || $PQopt eq 'sys') { if ($name eq 'rcvd') { $ibRxKB[$i]=fix($data-$ibRxKBLast[$i][$port])/256; $ibRxKBLast[$i][$port]=$data; $ibRxKBTot+=$ibRxKB[$i]; } elsif ($name eq 'xmtd') { $ibTxKB[$i]=fix($data-$ibTxKBLast[$i][$port])/256; $ibTxKBLast[$i][$port]=$data; $ibTxKBTot+=$ibTxKB[$i]; } elsif ($name eq 'rcvp') { $ibRx[$i]=fix($data-$ibRxLast[$i][$port]); $ibRxLast[$i][$port]=$data; $ibRxTot+=$ibRx[$i]; } elsif ($name eq 'xmtp') { $ibTx[$i]=fix($data-$ibTxLast[$i][$port]); $ibTxLast[$i][$port]=$data; $ibTxTot+=$ibTx[$i]; } } ############################### # ib stats from pquery ############################### # note that these can either be extended or 32 bit counters elsif ($name eq 'pquery:') { # extended status if ($PQopt eq '-x') { my ($port, @fieldsNow)=(split(/\s+/, $data))[0,4..7]; for ($j=0; $j<4; $j++) { $fields[$j]=fix($fieldsNow[$j]-$ibFieldsLast[$i][$port][$j]); $ibFieldsLast[$i][$port][$j]=$fieldsNow[$j]; } $ibTxKB[$i]=$fields[0]/256; $ibTx[$i]= $fields[2]; $ibRxKB[$i]=$fields[1]/256; $ibRx[$i]= $fields[3]; $ibTxKBTot+=$ibTxKB[$i]; $ibTxTot+= $ibTx[$i]; $ibRxKBTot+=$ibRxKB[$i]; $ibRxTot+= $ibRx[$i]; $ibErrorsTotTot+=$ibErrorsTot[$i]; } # regular elsif ($PQopt eq '-r') { my ($port, @fieldsNow)=(split(/\s+/, $data))[0,4..19]; # Only 1 of the two ports are actually active at any one time if ($HCAPorts[$i][$port]) { $ibErrorsTot[$i]=0; for ($j=0; $j<16; $j++) { $fields[$j]=fix($fieldsNow[$j]-$ibFieldsLast[$i][$port][$j]); $ibFieldsLast[$i][$port][$j]=$fieldsNow[$j]; # the first 12 are accumulated as a single error count and ultimately # reporting as anbsolute number and NOT a rate so don't use 'last' $ibErrorsTot[$i]+=$fieldsNow[$j] if $j<12; } # these are already absolute since they're reset after reading $ibTxKB[$i]=$fieldsNow[12]/256; $ibTx[$i]= $fieldsNow[14]; $ibRxKB[$i]=$fieldsNow[13]/256; $ibRx[$i]= $fieldsNow[15]; } $ibTxKBTot+=$ibTxKB[$i]; $ibTxTot+= $ibTx[$i]; $ibRxKBTot+=$ibRxKB[$i]; $ibRxTot+= $ibRx[$i]; $ibErrorsTotTot+=$ibErrorsTot[$i]; } } ################################ # opastats from opapmsquery ################################ elsif ($name eq 'opa:') { my ($port, @fieldsNow)=(split(/\s+/, $data))[4,5..24]; for ($j=0; $j<4; $j++) { $fields[$j]=fix($fieldsNow[$j]-$ibFieldsLast[$i][$port][$j]); $ibFieldsLast[$i][$port][$j]=$fieldsNow[$j]; } $ibErrorsTot[$i]=$fieldsNow[19]-$ibFieldsLast[$i][$port][19]; $ibFieldsLast[$i][$port][19]=$fieldsNow[19]; $ibTxKB[$i]=$fields[0]*976.5625; # need to express as KB $ibTx[$i]= $fields[2]; $ibRxKB[$i]=$fields[1]*976.5625; $ibRx[$i]= $fields[3]; $ibTxKBTot+=$ibTxKB[$i]; $ibTxTot+= $ibTx[$i]; $ibRxKBTot+=$ibRxKB[$i]; $ibRxTot+= $ibRx[$i]; $ibErrorsTotTot+=$ibErrorsTot[$i]; } } } # headers for plot formatted data sub printPlotHeaders { my $i; ############################## # Core Plot Format Headers ############################## $headersAll=''; $datetime=(!$utcFlag) ? "#Date${SEP}Time${SEP}" : "#UTC${SEP}"; $headers=($filename ne '') ? "$commonHeader$datetime" : $datetime; if ($subsys=~/c/) { $headers.="[CPU]User%${SEP}[CPU]Nice%${SEP}[CPU]Sys%${SEP}[CPU]Wait%${SEP}"; $headers.="[CPU]Irq%${SEP}[CPU]Soft%${SEP}[CPU]Steal%${SEP}[CPU]Idle%${SEP}[CPU]Totl%${SEP}"; $headers.="[CPU]Guest%${SEP}[CPU]GuestN%${SEP}"; $headers.="[CPU]Intrpt$rate${SEP}[CPU]Ctx$rate${SEP}[CPU]Proc$rate${SEP}"; $headers.="[CPU]ProcQue${SEP}[CPU]ProcRun${SEP}[CPU]L-Avg1${SEP}[CPU]L-Avg5${SEP}[CPU]L-Avg15${SEP}"; $headers.="[CPU]RunTot${SEP}[CPU]BlkTot${SEP}"; } if ($subsys=~/m/) { $headers.="[MEM]Tot${SEP}[MEM]Used${SEP}[MEM]Free${SEP}[MEM]Shared${SEP}[MEM]Buf${SEP}[MEM]Cached${SEP}"; $headers.="[MEM]Slab${SEP}[MEM]Map${SEP}[MEM]Anon${SEP}[MEM]AnonH${SEP}[MEM]Commit${SEP}[MEM]Locked${SEP}"; $headers.="[MEM]SwapTot${SEP}[MEM]SwapUsed${SEP}[MEM]SwapFree${SEP}[MEM]SwapIn${SEP}[MEM]SwapOut${SEP}"; $headers.="[MEM]Dirty${SEP}[MEM]Clean${SEP}[MEM]Laundry${SEP}[MEM]Inactive${SEP}"; $headers.="[MEM]PageIn${SEP}[MEM]PageOut${SEP}[MEM]PageFaults${SEP}[MEM]PageMajFaults${SEP}"; $headers.="[MEM]HugeTotal${SEP}[MEM]HugeFree${SEP}[MEM]HugeRsvd${SEP}[MEM]SUnreclaim${SEP}"; } if ($subsys=~/s/) { $headers.="[SOCK]Used${SEP}[SOCK]Tcp${SEP}[SOCK]Orph${SEP}[SOCK]Tw${SEP}[SOCK]Alloc${SEP}"; $headers.="[SOCK]Mem${SEP}[SOCK]Udp${SEP}[SOCK]Raw${SEP}[SOCK]Frag${SEP}[SOCK]FragMem${SEP}"; } if ($subsys=~/n/) { $headers.="[NET]RxPktTot${SEP}[NET]TxPktTot${SEP}[NET]RxKBTot${SEP}[NET]TxKBTot${SEP}"; $headers.="[NET]RxCmpTot${SEP}[NET]RxMltTot${SEP}[NET]TxCmpTot${SEP}"; $headers.="[NET]RxErrsTot${SEP}[NET]TxErrsTot${SEP}"; } if ($subsys=~/d/) { $headers.="[DSK]ReadTot${SEP}[DSK]WriteTot${SEP}[DSK]OpsTot${SEP}"; $headers.="[DSK]ReadKBTot${SEP}[DSK]WriteKBTot${SEP}[DSK]KbTot${SEP}"; $headers.="[DSK]ReadMrgTot${SEP}[DSK]WriteMrgTot${SEP}[DSK]MrgTot${SEP}"; } if ($subsys=~/i/) { $headers.="[INODE]NumDentry${SEP}[INODE]openFiles${SEP}[INODE]MaxFile%${SEP}[INODE]used${SEP}"; } if ($subsys=~/f/) { # Alway write client/server fields $headers.="[NFS]ReadsS${SEP}[NFS]WritesS${SEP}[NFS]MetaS${SEP}[NFS]CommitS${SEP}"; $headers.="[NFS]Udp${SEP}[NFS]Tcp${SEP}[NFS]TcpConn${SEP}[NFS]BadAuth${SEP}[NFS]BadClient${SEP}"; $headers.="[NFS]ReadsC${SEP}[NFS]WritesC${SEP}[NFS]MetaC${SEP}[NFS]CommitC${SEP}"; $headers.="[NFS]Retrans${SEP}[NFS]AuthRef${SEP}"; } if ($subsys=~/l/) { if ($reportMdsFlag) { $headers.="[MDS]Getattr${SEP}[MDS]GetattrLock${SEP}[MDS]Statfs${SEP}[MDS]Sync${SEP}"; $headers.="[MDS]Getxattr${SEP}[MDS]Setxattr${SEP}[MDS]Connect${SEP}[MDS]Disconnect${SEP}"; $headers.="[MDS]Reint${SEP}[MDS]Create${SEP}[MDS]Link${SEP}[MDS]Setattr${SEP}"; $headers.="[MDS]Rename${SEP}[MDS]Unlink${SEP}"; } if ($reportOstFlag) { # We always report basic I/O independent of what user selects with --lustopts $headers.="[OST]Read${SEP}[OST]ReadKB${SEP}[OST]Write${SEP}[OST]WriteKB${SEP}"; if ($lustOpts=~/B/) { foreach my $i (@brwBuckets) { $headers.="[OSTB]r${i}P${SEP}"; } foreach my $i (@brwBuckets) { $headers.="[OSTB]w${i}P${SEP}"; } } } if ($lustOpts=~/D/) { $headers.="[OSTD]Rds${SEP}[OSTD]Rdk${SEP}[OSTD]Wrts${SEP}[OSTD]Wrtk${SEP}"; foreach my $i (@diskBuckets) { $headers.="[OSTD]r${i}K${SEP}"; } foreach my $i (@diskBuckets) { $headers.="[OSTD]w${i}K${SEP}"; } } if ($reportCltFlag) { # 4 different sizes based on whether which value for --lustopts chosen # NOTE - order IS critical $headers.="[CLT]Reads${SEP}[CLT]ReadKB${SEP}[CLT]Writes${SEP}[CLT]WriteKB${SEP}"; $headers.="[CLTM]Open${SEP}[CLTM]Close${SEP}[CLTM]GAttr${SEP}[CLTM]SAttr${SEP}[CLTM]Seek${SEP}[CLTM]FSync${SEP}[CLTM]DrtHit${SEP}[CLTM]DrtMis${SEP}" if $lustOpts=~/M/; $headers.="[CLTR]Pend${SEP}[CLTR]Hits${SEP}[CLTR]Misses${SEP}[CLTR]NotCon${SEP}[CLTR]MisWin${SEP}[CLTR]FalGrab${SEP}[CLTR]LckFal${SEP}[CLTR]Discrd${SEP}[CLTR]ZFile${SEP}[CLTR]ZerWin${SEP}[CLTR]RA2Eof${SEP}[CLTR]HitMax${SEP}[CLTR]Wrong${SEP}" if $lustOpts=~/R/; if ($lustOpts=~/B/) { foreach my $i (@brwBuckets) { $headers.="[CLTB]r${i}P${SEP}"; } foreach my $i (@brwBuckets) { $headers.="[CLTB]w${i}P${SEP}"; } } } } if ($subsys=~/x/) { my $int='IB'; $headers.="[$int]InPkt${SEP}[$int]OutPkt${SEP}[$int]InKB${SEP}[$int]OutKB${SEP}[$int]Err${SEP}"; } if ($subsys=~/t/) { # fixed size easier for plotting, keeping Loss & FTrans for historical reasons... $headers.="[TCP]IpErr${SEP}[TCP]TcpErr${SEP}[TCP]UdpErr${SEP}[TCP]IcmpErr${SEP}[TCP]Loss${SEP}[TCP]FTrans${SEP}"; } if ($subsys=~/y/) { $headers.="[SLAB]ObjInUse${SEP}[SLAB]ObjInUseB${SEP}[SLAB]ObjAll${SEP}[SLAB]ObjAllB${SEP}"; $headers.="[SLAB]InUse${SEP}[SLAB]InUseB${SEP}[SLAB]All${SEP}[SLAB]AllB${SEP}[SLAB]CacheInUse${SEP}[SLAB]CacheTotal${SEP}"; } if ($subsys=~/b/) { for (my $i=0; $i<11; $i++) { $headers.=sprintf("[BUD]%dPage%s$SEP", 2**$i, $i==0 ? '' : 's'); } } # custom import headers get appended here if doing summary data. for (my $i=0; $impSummaryFlag && $i<$impNumMods; $i++) { &{$impPrintPlot[$i]}(1, \$headers) if $impOpts[$i]=~/s/; } # only if at least one core subsystem selected. if not, make sure # $headersAll contains the date/time in case writing to the terminal writeData(0, '', \$headers, $LOG, $ZLOG, 'log', \$headersAll) if $coreFlag || $impSummaryFlag; $headersAll=$headers if !$coreFlag; ################################# # Non-Core Plot Format Headers ################################# # here's the deal with these. if writing to files, each file always gets # their own headers. However, if writing to the terminal we want one long # string begining with a single date/time AND we don't bother with the # common header. $cpuHeaders=$dskHeaders=$envHeaders=$nfsHeaders=$netHeaders=''; $ostHeaders=$mdsHeaders=$cltHeaders=$tcpHeaders=''; # Whenever we print a header to a file, we do both the common header # and date/time. Remember, if we're printing the terminal, this is # completely ignored by writeData(). $ch=($filename ne '') ? "$commonHeader$datetime" : $datetime; if ($subsys=~/C/) { for ($i=0; $i<$NumCpus; $i++) { next if (@cpuFiltKeep && !defined($cpuFiltKeep[$i])) || (@cpuFiltIgnore && defined($cpuFiltIgnore[$i])); $cpuHeaders.="[CPU:$i]User%${SEP}[CPU:$i]Nice%${SEP}[CPU:$i]Sys%${SEP}"; $cpuHeaders.="[CPU:$i]Wait%${SEP}[CPU:$i]Irq%${SEP}[CPU:$i]Soft%${SEP}"; $cpuHeaders.="[CPU:$i]Steal%${SEP}[CPU:$i]Idle%${SEP}[CPU:$i]Totl%${SEP}"; $cpuHeaders.="[CPU:$i]Guest%${SEP}[CPU:$i]GuestN%${SEP}"; $cpuHeaders.="[CPU:$i]Intrpt${SEP}"; } writeData(0, $ch, \$cpuHeaders, CPU, $ZCPU, 'cpu', \$headersAll); } if ($subsys=~/D/ && $options!~/x/) { for (my $i=0; $i<@dskOrder; $i++) { $dskName=$dskOrder[$i]; next if ($dskFiltKeep eq '' && $dskName=~/$dskFiltIgnore/) || ($dskFiltKeep ne '' && $dskName!~/$dskFiltKeep/); $temp= "[DSK]Name${SEP}[DSK]Reads${SEP}[DSK]RMerge${SEP}[DSK]RKBytes${SEP}[DSK]WaitR${SEP}"; $temp.="[DSK]Writes${SEP}[DSK]WMerge${SEP}[DSK]WKBytes${SEP}[DSK]WaitW${SEP}[DSK]Request${SEP}"; $temp.="[DSK]QueLen${SEP}[DSK]Wait${SEP}[DSK]SvcTim${SEP}[DSK]Util${SEP}"; $temp=~s/DSK/DSK:$dskName/g; $dskHeaders.=$temp; } writeData(0, $ch, \$dskHeaders, DSK, $ZDSK, 'dsk', \$headersAll); } if ($subsys=~/E/) { foreach $key (sort keys %$ipmiData) { for (my $i=0; $i{$key}}); $i++) { my $name=$ipmiData->{$key}->[$i]->{name}; my $inst=($key!~/power/ && $ipmiData->{$key}->[$i]->{inst} ne '-1') ? $ipmiData->{$key}->[$i]->{inst} : ''; $envHeaders.=sprintf("[ENV:$name$inst]Speed$SEP") if $key=~/fan/; $envHeaders.=sprintf("[ENV:$name$inst]Temp$SEP") if $key=~/temp/; $envHeaders.=sprintf("[ENV:$name]Watts$SEP") if $key=~/power/; } } writeData(0, $ch, \$envHeaders, ENV, $ZENV, 'env', \$headersAll); } if ($subsys=~/M/) { $numaHeaders=''; for ($i=0; $i<$CpuNodes; $i++) { $numaHeaders.="[NUMA:$i]Used${SEP}[NUMA:$i]Free${SEP}[NUMA:$i]Slab${SEP}[NUMA:$i]Mapped${SEP}"; $numaHeaders.="[NUMA:$i]Anon${SEP}[NUMA:$i]AnonH${SEP}[NUMA:$i]Inactive${SEP}[NUMA:$i]Hits${SEP}"; } writeData(0, $ch, \$numaHeaders, NUMA, $ZNUMA, 'numa', \$headersAll); } if ($subsys=~/F/) { if ($nfs2CFlag) { my $type='NFS:2cd'; $nfsHeaders.="[$type]Read${SEP}[$type]Write${SEP}[$type]Lookup${SEP}[$type]Getattr${SEP}[$type]Setattr${SEP}[$type]Readdir${SEP}"; $nfsHeaders.="[$type]Create${SEP}[$type]Remove${SEP}[$type]Rename${SEP}[$type]Link${SEP}[$type]ReadLink${SEP}[$type]Null${SEP}"; $nfsHeaders.="[$type]Symlink${SEP}[$type]Mkdir${SEP}[$type]Rmdir${SEP}[$type]Fsstat${SEP}"; } if ($nfs2SFlag) { my $type='NFS:2sd'; $nfsHeaders.="[$type]Read${SEP}[$type]Write${SEP}[$type]Lookup${SEP}[$type]Getattr${SEP}[$type]Setattr${SEP}[$type]Readdir${SEP}"; $nfsHeaders.="[$type]Create${SEP}[$type]Remove${SEP}[$type]Rename${SEP}[$type]Link${SEP}[$type]ReadLink${SEP}[$type]Null${SEP}"; $nfsHeaders.="[$type]Symlink${SEP}[$type]Mkdir${SEP}[$type]Rmdir${SEP}[$type]Fsstat${SEP}"; } if ($nfs3CFlag) { my $type='NFS:3cd'; $nfsHeaders.="[$type]Read${SEP}[$type]Write${SEP}[$type]Commit${SEP}[$type]Lookup${SEP}"; $nfsHeaders.="[$type]Access${SEP}[$type]Getattr${SEP}[$type]Setattr${SEP}[$type]Readdir${SEP}"; $nfsHeaders.="[$type]Create${SEP}[$type]Remove${SEP}[$type]Rename${SEP}[$type]Link${SEP}[$type]ReadLink${SEP}[$type]Null${SEP}"; $nfsHeaders.="[$type]Symlink${SEP}[$type]Mkdir${SEP}[$type]Rmdir${SEP}[$type]Fsstat${SEP}"; $nfsHeaders.="[$type]Fsinfo${SEP}[$type]Pathconf${SEP}[$type]Mknod${SEP}[$type]Readdirplus${SEP}"; } if ($nfs3SFlag) { my $type='NFS:3sd'; $nfsHeaders.="[$type]Read${SEP}[$type]Write${SEP}[$type]Commit${SEP}[$type]Lookup${SEP}"; $nfsHeaders.="[$type]Access${SEP}[$type]Getattr${SEP}[$type]Setattr${SEP}[$type]Readdir${SEP}"; $nfsHeaders.="[$type]Create${SEP}[$type]Remove${SEP}[$type]Rename${SEP}[$type]Link${SEP}[$type]ReadLink${SEP}[$type]Null${SEP}"; $nfsHeaders.="[$type]Symlink${SEP}[$type]Mkdir${SEP}[$type]Rmdir${SEP}[$type]Fsstat${SEP}"; $nfsHeaders.="[$type]Fsinfo${SEP}[$type]Pathconf${SEP}[$type]Mknod${SEP}[$type]Readdirplus${SEP}"; } if ($nfs4CFlag) { my $type='NFS:4cd'; $nfsHeaders.="[$type]Read${SEP}[$type]Write${SEP}[$type]Commit${SEP}[$type]Lookup${SEP}"; $nfsHeaders.="[$type]Access${SEP}[$type]Getattr${SEP}[$type]Setattr${SEP}[$type]Readdir${SEP}"; $nfsHeaders.="[$type]Create${SEP}[$type]Remove${SEP}[$type]Rename${SEP}[$type]Link${SEP}[$type]ReadLink${SEP}[$type]Null${SEP}"; $nfsHeaders.="[$type]Symlink${SEP}[$type]Fsinfo${SEP}[$type]Pathconf${SEP}"; } if ($nfs4SFlag) { my $type='NFS:4sd'; $nfsHeaders.="[$type]Read${SEP}[$type]Write${SEP}[$type]Commit${SEP}[$type]Lookup${SEP}"; $nfsHeaders.="[$type]Access${SEP}[$type]Getattr${SEP}[$type]Setattr${SEP}[$type]Readdir${SEP}"; $nfsHeaders.="[$type]Create${SEP}[$type]Remove${SEP}[$type]Rename${SEP}[$type]Link${SEP}[$type]ReadLink${SEP}"; } writeData(0, $ch, \$nfsHeaders, NFS, $ZNFS, 'nfs', \$headersAll); } if ($subsys=~/N/) { for (my $i=0; $i<@netOrder; $i++) { # remember, order include net speed $netName=$netOrder[$i]; $netName=~s/:.*//; next if ($netFiltKeep eq '' && $netName=~/$netFiltIgnore/) || ($netFiltKeep ne '' && $netName!~/$netFiltKeep/); $temp= "[NET]Name${SEP}[NET]RxPkt${SEP}[NET]TxPkt${SEP}[NET]RxKB${SEP}[NET]TxKB${SEP}"; $temp.="[NET]RxErr${SEP}[NET]RxDrp${SEP}[NET]RxFifo${SEP}[NET]RxFra${SEP}[NET]RxCmp${SEP}[NET]RxMlt${SEP}"; $temp.="[NET]TxErr${SEP}[NET]TxDrp${SEP}[NET]TxFifo${SEP}[NET]TxColl${SEP}[NET]TxCar${SEP}"; $temp.="[NET]TxCmp${SEP}[NET]RxErrs${SEP}[NET]TxErrs${SEP}"; $temp=~s/NET/NET:$netName/g; $temp=~s/:]/]/g; $netHeaders.=$temp; } writeData(0, $ch, \$netHeaders, NET, $ZNET, 'net', \$headersAll); } if ($subsys=~/L/) { if ($reportOstFlag) { # We always start with this section # BRW stats are optional, but if there group them together separately. for ($i=0; $i<$NumOst; $i++) { $inst=$lustreOsts[$i]; $ostHeaders.="[OST:$inst]Ost${SEP}[OST:$inst]Read${SEP}[OST:$inst]ReadKB${SEP}[OST:$inst]Write${SEP}[OST:$inst]WriteKB${SEP}"; } for ($i=0; $lustOpts=~/B/ && $i<$NumOst; $i++) { $inst=$lustreOsts[$i]; foreach my $j (@brwBuckets) { $ostHeaders.="[OSTB:$inst]r$j${SEP}"; } foreach my $j (@brwBuckets) { $ostHeaders.="[OSTB:$inst]w$j${SEP}"; } } writeData(0, $ch, \$ostHeaders, OST, $ZOST, 'ost', \$headersAll); } if ($reportCltFlag) { $temp=''; if ($lustOpts=~/O/) # client OST details { # we always record I/O in one chunk for ($i=0; $i<$NumLustreCltOsts; $i++) { $inst=$lustreCltOsts[$i]; $temp.="[CLT:$inst]FileSys${SEP}[CLT:$inst]Ost${SEP}[CLT:$inst]Reads${SEP}[CLT:$inst]ReadKB${SEP}[CLT:$inst]Writes${SEP}[CLT:$inst]WriteKB${SEP}"; } # and if specified, brw stats follow if ($lustOpts=~/B/) { for ($i=0; $i<$NumLustreCltOsts; $i++) { $inst=$lustreCltOsts[$i]; foreach my $j (@brwBuckets) { $temp.="[CLTB:$inst]r${j}P${SEP}"; } foreach my $j (@brwBuckets) { $temp.="[CLTB:$inst]w${j}P${SEP}"; } } } } else # just fs details { # just like with --lustopts O, these three follow each other in groups for ($i=0; $i<$NumLustreFS; $i++) { $inst=$lustreCltFS[$i]; $temp.="[CLT:$inst]FileSys${SEP}[CLT:$inst]Reads${SEP}[CLT:$inst]ReadKB${SEP}[CLT:$inst]Writes${SEP}[CLT:$inst]WriteKB${SEP}"; } for ($i=0; $lustOpts=~/M/ && $i<$NumLustreFS; $i++) { $inst=$lustreCltFS[$i]; $temp.="[CLTM:$inst]Open${SEP}[CLTM:$inst]Close${SEP}[CLTM:$inst]GAttr${SEP}[CLTM:$inst]SAttr${SEP}"; $temp.="[CLTM:$inst]Seek${SEP}[CLTM:$inst]Fsync${SEP}[CLTM:$inst]DrtHit${SEP}[CLTM:$inst]DrtMis${SEP}"; } for ($i=0; $lustOpts=~/R/ && $i<$NumLustreFS; $i++) { $inst=$lustreCltFS[$i]; $temp.="[CLTR:$inst]Pend${SEP}[CLTR:$inst]Hits${SEP}[CLTR:$inst]Misses${SEP}[CLTR:$inst]NotCon${SEP}[CLTR:$inst]MisWin${SEP}[CLTR:$inst]FalGrab${SEP}[CLTR:$inst]LckFal${SEP}"; $temp.="[CLTR:$inst]Discrd${SEP}[CLTR:$inst]ZFile${SEP}[CLTR:$inst]ZerWin${SEP}[CLTR:$inst]RA2Eof${SEP}[CLTR:$inst]HitMax${SEP}[CLTR:$inst]WrongMax${SEP}"; } } $cltHeaders.=$temp; writeData(0, $ch, \$cltHeaders, CLT, $ZCLT, 'clt', \$headersAll); } if ($lustOpts=~/D/) { $rdHeader="[OSTD]rds${SEP}[OSTD]rdkb${SEP}"; $wrHeader="[OSTD]wrs${SEP}[OSTD]wrkb${SEP}"; foreach my $i (@diskBuckets) { $rdHeader.="[OSTD]r${i}K${SEP}"; } foreach my $i (@diskBuckets) { $wrHeader.="[OSTD]w${i}K${SEP}"; } for ($i=0; $i<$NumLusDisks; $i++) { $temp="[OSTD]Disk${SEP}$rdHeader${SEP}$wrHeader"; $temp=~s/OSTD/OSTD:$LusDiskNames[$i]/g; $blkHeaders.="$temp${SEP}"; } writeData(0, $ch, \$blkHeaders, BLK, $ZBLK, 'blk', \$headersAll); } } if ($subsys=~/T/) { # This is going to be big!!! for my $type ('Ip', 'Tcp', 'Udp', 'Icmp', 'IpExt', 'TcpExt') { next if $type eq 'Ip' && $tcpFilt!~/i/; next if $type eq 'Tcp' && $tcpFilt!~/t/; next if $type eq 'Udp' && $tcpFilt!~/u/; next if $type eq 'Icmp' && $tcpFilt!~/c/; next if $type eq 'IpExt' && $tcpFilt!~/I/; next if $type eq 'TcpExt' && $tcpFilt!~/T/; foreach my $header (@{$tcpData{$type}->{hdr}}) { $tcpHeaders.="[TCPD]$header$SEP"; } } writeData(0, $ch, \$tcpHeaders, TCP, $ZTCP, 'tcp', \$headersAll); } if ($subsys=~/X/ && $NumHCAs) { for ($i=0; $i<$NumHCAs; $i++) { $HCAName[$i]=~/(\S+?)_*$/; $ibHeaders.="[IB:$1]HCA${SEP}[IB:$1]InPkt${SEP}[IB:$1]OutPkt${SEP}[IB:$1]InKB${SEP}[IB:$1]OutKB${SEP}[IB:$1]Err${SEP}"; } writeData(0, $ch, \$ibHeaders, IB, $ZIB, 'ib', \$headersAll); } $budHeaders=''; if ($subsys=~/B/) { for (my $i=0; $i<$NumBud; $i++) { my $buddyName="$buddyZone[$i]-$buddyNode[$i]"; $budHeaders.="[BUD:$buddyName]Node${SEP}[BUD:$buddyName]Zone${SEP}"; for (my $j=0; $j<11; $j++) { $budHeaders.=sprintf("[BUD:$buddyName]%dPage%s$SEP", 2**$j, $j==0 ? '' : 's'); } } writeData(0, $ch, \$budHeaders, BUD, $ZBUD, 'bud', \$headersAll); } # only make call(s) if respective modules if detail reporting has been requested for (my $i=0; $impDetailFlag && $i<$impNumMods; $i++) { if ($impOpts[$i]=~/d/) { my $impHeaders=''; &{$impPrintPlot[$i]}(2, \$impHeaders); writeData(0, $ch, \$impHeaders, $impText[$i], $impGz[$i], 'imp-$i', \$headersAll); } } # When going to the terminal OR socket we need a final call with no 'data' # to write. Also note that there is a final separator that needs to be removed. # It also turns out if doing --export -P, THAT module is responsible for sending # data over the socket and the plot data ONLY gets written locally. # Finally, if there is an error writing to a socket, stop trying to record anything else # as it's probably a broken socket and '!$doneFlag' has been set and we'll exit cleanly $headersAll=~s/$SEP$//; if (!$logToFileFlag || ($sockFlag && $export eq '')) { return if writeData(1, '', undef, $LOG, undef, undef, \$headersAll)==0; } ################################# # Exception File Headers ################################# if ($options=~/x/i) { if ($subsys=~/D/) { $dskHeaders="Num${SEP}"; $dskHeaders.="[DISKX]Name${SEP}[DISKX]Reads${SEP}[DISKX]Merged${SEP}[DISKX]KBytes${SEP}[DISKX]Writes${SEP}[DISKX]Merged${SEP}"; $dskHeaders.="[DISKX]KBytes${SEP}[DISKX]Request${SEP}[DISKX]QueLen${SEP}[DISKX]Wait${SEP}[DISKX]SvcTim${SEP}[DISKX]Util\n"; # Since we never write exception data over a socket the last parameter is undef. writeData(0, $ch, \$dskHeaders, DSKX, $ZDSKX, 'dskx', undef); } } $headersPrinted=1; } sub intervalPrint { my $seconds=shift; # If seconds end in .000, $seconds comes across as integer with no $usecs! ($seconds, $usecs)=split(/\./, $seconds); $usecs='000' if !defined($usecs); # in case user specifies -om if ($hiResFlag) { $usecs=substr("${usecs}00", 0, 3); $seconds.=".$usecs"; } # This is causing confusion because this ALWAYS gets incremented even if no # output, such as when we only interval2 data $totalCounter++; my $tempSubsys=$subsys; $tempSubsys=~s/Y// if $slabAnalOnlyFlag; $tempSubsys=~s/Z// if $procAnalOnlyFlag; printPlot($seconds, $usecs) if $plotFlag && ($tempSubsys ne '' || $import ne ''); printTerm($seconds, $usecs) if !$plotFlag && $expName eq ''; procAnalyze($seconds, $usecs) if $procAnalFlag && $interval2Print; slabAnalyze($seconds, $usecs) if $slabAnalFlag && $interval2Print; if ($expName ne '') { logdiag('export data') if $utimeMask & 1; &$expName($expOpts); exit(0) if $showColFlag; } } # anything that needs to be derived should be done only once and this is the place sub derived { $swapUsed=$swapTotal-$swapFree; $swapUsedC=$swapUsed-$swapUsedLast; $swapUsedLast=$swapUsed; $memUsed=$memTot-$memFree; $memUsedC=$memUsed-$memUsedLast; $memUsedLast=$memUsed; } ########################### # P l o t F o r m a t ########################### sub printPlot { my $seconds=shift; my $usecs= shift; my ($datestamp, $time, $hh, $mm, $ss, $mday, $mon, $year, $i, $j); # We always print some form of date and time in plot format and in the case of # --utc, it's a single value. Now that I'm pulling out usecs for utc we # probably don't have to pass it as the second parameter. $utcSecs=(split(/\./, $seconds))[0]; ($ss, $mm, $hh, $mday, $mon, $year)=localtime($seconds); $date=($options=~/d/) ? sprintf("%02d/%02d", $mon+1, $mday) : sprintf("%d%02d%02d", $year+1900, $mon+1, $mday); $time= sprintf("%02d:%02d:%02d", $hh, $mm, $ss); my $datetime=(!$utcFlag) ? "$date$SEP$time": $utcSecs; $datetime.=".$usecs" if $options=~/m/; # slab detail and processes have their own print routines because they # do multiple lines of output and can't be mixed with anything else. # Furthermore, if we're doing -rawtoo, we DON'T generate these files since # the data is already being recorded in the raw file and we don't want to do # both if (!$rawtooFlag && $subsys=~/[YZ]/ && $interval2Print && !$firstTime2) { printPlotSlab($date, $time) if $subsys=~/Y/ && !$slabAnalOnlyFlag; printPlotProc($date, $time) if $subsys=~/Z/ && !$procAnalOnlyFlag; return if $subsys=~/^[YZ]$/; # we're done if ONLY printing slabs or processes } # Print headers noting that by default $headerRepeat set to 0 for -P. Also note we have to # get more elaborate for terminal/file-based plot data. On the terminal when HR is 0, we only # want one header but when going to files we ALWAYS want a new header each day when # $headersPrinted gets reset to 0. $interval1Counter++; printPlotHeaders() if ($headerRepeat==0 && $filename eq '' && $interval1Counter==1) || ($headerRepeat==0 && $filename ne '' && !$headersPrinted) || ($headerRepeat>0 && ($interval1Counter % $headerRepeat)==1); exit(0) if $showColFlag; ####################### # C O R E D A T A ####################### my $netErrors=0; $plot=$oneline=''; if ($coreFlag || $impSummaryFlag) { # CPU Data cols if ($subsys=~/c/) { $i=$NumCpus; $plot.=sprintf("$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $userP[$i], $niceP[$i], $sysP[$i], $waitP[$i], $irqP[$i], $softP[$i], $stealP[$i], $idleP[$i], $totlP[$i], $guestP[$i], $guestNP[$i]); $plot.=sprintf("$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%4.2f$SEP%4.2f$SEP%4.2f$SEP%d$SEP%d", $intrpt/$intSecs, $ctxt/$intSecs, $proc/$intSecs, $loadQue, $loadRun, $loadAvg1, $loadAvg5, $loadAvg15, $procsRun, $procsBlock); } # MEM if ($subsys=~/m/) { $plot.=sprintf("$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $memTot, $memUsed, $memFree, $memShared, $memBuf, $memCached); $plot.=sprintf("$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $memSlab, $memMap, $memAnon, $memAnonH, $memCommit, $memLocked); # Always from V1.7.5 forward $plot.=sprintf("$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $swapTotal, $swapUsed, $swapFree, $swapin/$intSecs, $swapout/$intSecs, $memDirty, $clean, $laundry, $memInact, $pagein/$intSecs, $pageout/$intSecs, $pagefault/$intSecs, $pagemajfault/$intSecs); $plot.=sprintf("$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $memHugeTot, $memHugeFree, $memHugeRsvd, $memSUnreclaim); } # SOCKETS if ($subsys=~/s/) { $plot.="$SEP$sockUsed$SEP$sockTcp$SEP$sockOrphan$SEP$sockTw$SEP$sockAlloc"; $plot.="$SEP$sockMem$SEP$sockUdp$SEP$sockRaw$SEP$sockFrag$SEP$sockFragM"; } # NETWORKS if ($subsys=~/n/) { # NOTE - rx/tx errs are the totals of all error counters $plot.=sprintf("$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $netRxPktTot/$intSecs, $netTxPktTot/$intSecs, $netRxKBTot/$intSecs, $netTxKBTot/$intSecs, $netRxCmpTot/$intSecs, $netRxMltTot/$intSecs, $netTxCmpTot/$intSecs, $netRxErrsTot/$intSecs, $netTxErrsTot/$intSecs); $netErrors=$netRxErrsTot+$netTxErrsTot; } # DISKS if ($subsys=~/d/) { $plot.=sprintf("$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $dskReadTot/$intSecs, $dskWriteTot/$intSecs, $dskOpsTot/$intSecs, $dskReadKBTot/$intSecs, $dskWriteKBTot/$intSecs, ($dskReadKBTot+$dskWriteKBTot)/$intSecs, $dskReadMrgTot/$intSecs, $dskWriteMrgTot/$intSecs, ($dskReadMrgTot+$dskWriteMrgTot)/$intSecs); } # INODES if ($subsys=~/i/) { $plot.=sprintf("$SEP%d$SEP%d$SEP%$FS$SEP%d", $dentryNum, $filesAlloc, $filesMax ? $filesAlloc*100/$filesMax : 0, $inodeUsed); } # NFS if ($subsys=~/f/) { $plot.=sprintf("$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $nfsSReadsTot/$intSecs, $nfsSWritesTot/$intSecs, $nfsSMetaTot/$intSecs, $nfsSCommitTot/$intSecs, $nfsUdpTot/$intSecs, $nfsTcpTot/$intSecs, $nfsTcpConnTot/$intSecs, $rpcBadAuthTot/$intSecs, $rpcBadClntTot/$intSecs); $plot.=sprintf("$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $nfsCReadsTot/$intSecs, $nfsCWritesTot/$intSecs, $nfsCMetaTot/$intSecs, $nfsCCommitTot/$intSecs, $rpcRetransTot/$intSecs, $rpcCredRefTot/$intSecs); } # Lustre if ($subsys=~/l/) { # MDS goes first since for detail, the OST is variable and if we ever # do both we want consistency of order. Also note that by reporting all 6 # reints we assure consisency across lustre versions if ($reportMdsFlag) { $mdsReint=$lustreMdsReintCreate+$lustreMdsReintLink+ $lustreMdsReintSetattr+$lustreMdsReintRename+$lustreMdsReintUnlink if $cfsVersion lt '1.6.5'; $plot.=sprintf("$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $lustreMdsGetattr/$intSecs, $lustreMdsGetattrLock/$intSecs, $lustreMdsStatfs/$intSecs, $lustreMdsSync/$intSecs, $lustreMdsGetxattr/$intSecs, $lustreMdsSetxattr/$intSecs, $lustreMdsConnect/$intSecs, $lustreMdsDisconnect/$intSecs, $lustreMdsReint/$intSecs, $lustreMdsReintCreate/$intSecs, $lustreMdsReintLink/$intSecs, $lustreMdsReintSetattr/$intSecs, $lustreMdsReintRename/$intSecs, $lustreMdsReintUnlink/$intSecs); } if ($reportOstFlag) { # We always do this... $plot.=sprintf("$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $lustreReadOpsTot/$intSecs, $lustreReadKBytesTot/$intSecs, $lustreWriteOpsTot/$intSecs, $lustreWriteKBytesTot/$intSecs); if ($lustOpts=~/B/) { for ($j=0; $j<$numBrwBuckets; $j++) { $plot.=sprintf("$SEP%$FS", $lustreBufReadTot[$j]/$intSecs); } for ($j=0; $j<$numBrwBuckets; $j++) { $plot.=sprintf("$SEP%$FS", $lustreBufWriteTot[$j]/$intSecs); } } } # Disk Block Level Stats can apply to both MDS and OST if ($lustOpts=~/D/) { $plot.=sprintf("$SEP%d$SEP%d$SEP%d$SEP%d", $lusDiskReadsTot[$LusMaxIndex]/$intSecs, $lusDiskReadBTot[$LusMaxIndex]*0.5/$intSecs, $lusDiskWritesTot[$LusMaxIndex]/$intSecs, $lusDiskWriteBTot[$LusMaxIndex]*0.5/$intSecs); for ($i=0; $i<$LusMaxIndex; $i++) { $plot.=sprintf("$SEP%d", $lusDiskReadsTot[$i]/$intSecs); } for ($i=0; $i<$LusMaxIndex; $i++) { $plot.=sprintf("$SEP%d", $lusDiskWritesTot[$i]/$intSecs); } } if ($reportCltFlag) { # There are actually 3 different formats depending on --lustopts $plot.=sprintf("$SEP%d$SEP%d$SEP%d$SEP%d", $lustreCltReadTot/$intSecs, $lustreCltReadKBTot/$intSecs, $lustreCltWriteTot/$intSecs, $lustreCltWriteKBTot/$intSecs); $plot.=sprintf("$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d", $lustreCltOpenTot/$intSecs, $lustreCltCloseTot/$intSecs, $lustreCltGetattrTot/$intSecs, $lustreCltSetattrTot/$intSecs, $lustreCltSeekTot/$intSecs, $lustreCltFsyncTot/$intSecs, $lustreCltDirtyHitsTot/$intSecs, $lustreCltDirtyMissTot/$intSecs) if $lustOpts=~/M/; $plot.=sprintf("$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d", $lustreCltRAPendingTot, $lustreCltRAHitsTot, $lustreCltRAMissesTot, $lustreCltRANotConTot, $lustreCltRAMisWinTot, $lustreCltRAFalGrabTot, $lustreCltRALckFailTot, $lustreCltRAReadDiscTot, $lustreCltRAZeroLenTot, $lustreCltRAZeroWinTot, $lustreCltRA2EofTot, $lustreCltRAHitMaxTot, $lustreCltRAWrongTot) if $lustOpts=~/R/; if ($lustOpts=~/B/) { for ($i=0; $i<$numBrwBuckets; $i++) { $plot.=sprintf("$SEP%d", $lustreCltRpcReadTot[$i]/$intSecs); } for ($i=0; $i<$numBrwBuckets; $i++) { $plot.=sprintf("$SEP%d", $lustreCltRpcWriteTot[$i]/$intSecs); } } } } # INFINIBAND # Now if 'x' specified and no IB, we still want to print all 0s so lets # do it here if ($subsys=~/x/ && $NumHCAs) { $plot.=sprintf("$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $ibRxTot/$intSecs, $ibTxTot/$intSecs, $ibRxKBTot/$intSecs, $ibTxKBTot/$intSecs, $ibErrorsTotTot); } # TCP if ($subsys=~/t/) { # while tempted to control printing via $tcpFilt, by doing them all, we have a more # consistent file that is easier to plot and not much more expensive in size $plot.=sprintf("$SEP%$FS", $ipErrors/$intSecs); $plot.=sprintf("$SEP%$FS", $tcpErrors/$intSecs); $plot.=sprintf("$SEP%$FS", $udpErrors/$intSecs); $plot.=sprintf("$SEP%$FS", $icmpErrors/$intSecs); $plot.=sprintf("$SEP%$FS", $tcpData{TcpExt}->{TCPLoss}/$intSecs); $plot.=sprintf("$SEP%$FS", $tcpData{TcpExt}->{TCPFastRetrans}/$intSecs); } # SLAB if ($subsys=~/y/) { $plot.=sprintf("$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d", $slabObjActTotal, $slabObjActTotalB, $slabObjAllTotal, $slabObjAllTotalB, $slabSlabActTotal, $slabSlabActTotalB, $slabSlabAllTotal, $slabSlabAllTotalB, $slabNumAct, $slabNumTot,6); } # BUDDYINFO if ($subsys=~/b/) { for (my $i=0; $i<11; $i++) { $plot.=sprintf("$SEP%d", $buddyInfoTot[$i]); } } # only if summary data for (my $i=0; $impSummaryFlag && $i<$impNumMods; $i++) { &{$impPrintPlot[$i]}(3, \$plot) if $impOpts[$i]=~/s/; } writeData(0, $datetime, \$plot, $LOG, $ZLOG, 'log', \$oneline) if $netOpts!~/E/ || $netErrors; } ############################### # N O N - C O R E D A T A ############################### if ($subsys=~/C/) { $cpuPlot=''; for ($i=0; $i<$NumCpus; $i++) { next if (@cpuFiltKeep && !defined($cpuFiltKeep[$i])) || (@cpuFiltIgnore && defined($cpuFiltIgnore[$i])); $cpuPlot.=sprintf("$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $userP[$i], $niceP[$i], $sysP[$i], $waitP[$i], $irqP[$i], $softP[$i], $stealP[$i], $idleP[$i], $totlP[$i], $intrptTot[$i]/$intSecs, $guestP[$i], $guestNP[$i]); } writeData(0, $datetime, \$cpuPlot, CPU, $ZCPU, 'cpu', \$oneline); } ##################### # D S K F i l e ##################### if ($subsys=~/D/) { $dskPlot=''; for (my $i=0; $i<@dskOrder; $i++) { $dskName=$dskOrder[$i]; next if ($dskFiltKeep eq '' && $dskName=~/$dskFiltIgnore/) || ($dskFiltKeep ne '' && $dskName!~/$dskFiltKeep/); if (defined($disks{$dskName})) { my $i=$disks{$dskName}; $dskRecord=sprintf("%s$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $dskName, $dskRead[$i]/$intSecs, $dskReadMrg[$i]/$intSecs, $dskReadKB[$i]/$intSecs, $dskWaitR[$i], $dskWrite[$i]/$intSecs, $dskWriteMrg[$i]/$intSecs, $dskWriteKB[$i]/$intSecs, $dskWaitW[$i], $dskRqst[$i], $dskQueLen[$i], $dskWait[$i], $dskSvcTime[$i], $dskUtil[$i]); } else { $dskRecord=sprintf("%s$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $dskName, 0,0,0,0,0,0,0,0,0,0,0,0,0); } # If exception processing in effect and writing to a file, make sure this entry # qualities if ($options=~/x/i) { # All we care about for I/O rates is if one is greater than exception. $ios=$dskRead[$i]/$intSecs>=$limIOS || $dskWrite[$i]/$intSecs>=$limIOS; $svc=$dskSvcTime[$i]*100; # Either both tests are > limits or just one, depending on whether AND or OR writeData(0, $datetime, \$dskRecord, DSKX, $ZDSKX, 'dskx', undef) if ($limBool && $ios && $svc>=$limSVC) || (!$limBool && ($ios || $svc>=$limSVC)); } # If not doing x-exception reporting, just build one long string $dskPlot.="$SEP$dskRecord" if $options!~/x/; } # we only write DSK data when NOT doing x type execption processing writeData(0, $datetime, \$dskPlot, DSK, $ZDSK, 'dsk', \$oneline) if $options!~/x/; } ############################### # E N V I R O N M E N T A L ############################### if ($subsys=~/E/ && $interval3Print) { $envPlot=''; foreach $key (sort keys %$ipmiData) { for (my $i=0; $i{$key}}); $i++) { my $name= $ipmiData->{$key}->[$i]->{name}; my $inst= $ipmiData->{$key}->[$i]->{inst}; my $value= $ipmiData->{$key}->[$i]->{value}; my $status=$ipmiData->{$key}->[$i]->{status}; $value=0 if $value eq ''; $envPlot.="$SEP$value"; } } writeData(0, $datetime, \$envPlot, ENV, $ZENV, 'env', \$oneline); } ########################################## # L U S T R E D E T A I L F i l e ########################################## if ($subsys=~/L/) { if ($reportOstFlag) { # Basic I/O always there and grouped together $ostPlot=''; for ($i=0; $i<$NumOst; $i++) { $ostPlot.=sprintf("$SEP%s$SEP%d$SEP%d$SEP%d$SEP%d", $lustreOsts[$i], $lustreReadOps[$i]/$intSecs, $lustreReadKBytes[$i]/$intSecs, $lustreWriteOps[$i]/$intSecs, $lustreWriteKBytes[$i]/$intSecs); } # These guys are optional and follow ALL the basic stuff for ($i=0; $lustOpts=~/B/ && $i<$NumOst; $i++) { for ($j=0; $j<$numBrwBuckets; $j++) { $ostPlot.=sprintf("$SEP%d", $lustreBufRead[$i][$j]/$intSecs); } for ($j=0; $j<$numBrwBuckets; $j++) { $ostPlot.=sprintf("$SEP%d", $lustreBufWrite[$i][$j]/$intSecs); } } writeData(0, $datetime, \$ostPlot, OST, $ZOST, 'ost', \$oneline); } if ($lustOpts=~/D/) { $blkPlot=''; for ($i=0; $i<$NumLusDisks; $i++) { $blkPlot.=sprintf("$SEP%s$SEP%d$SEP%d", $LusDiskNames[$i], $lusDiskReads[$i][$LusMaxIndex]/$intSecs, $lusDiskReadB[$i][$LusMaxIndex]*0.5/$intSecs); for ($j=0; $j<$LusMaxIndex; $j++) { $temp=(defined($lusDiskReads[$i][$j])) ? $lusDiskReads[$i][$j]/$intSecs : 0; $blkPlot.=sprintf("$SEP%d", $temp); } $blkPlot.=sprintf("$SEP%d$SEP%d", $lusDiskWrites[$i][$LusMaxIndex]/$intSecs, $lusDiskWriteB[$i][$LusMaxIndex]*0.5/$intSecs); for ($j=0; $j<$LusMaxIndex; $j++) { $temp=(defined($lusDiskWrites[$i][$j])) ? $lusDiskWrites[$i][$j]/$intSecs : 0; $blkPlot.=sprintf("$SEP%d", $temp); } } writeData(0, $datetime, \$blkPlot, BLK, $ZBLK, 'blk', \$online); } if ($reportCltFlag) { $cltPlot=''; if ($lustOpts=~/O/) # either OST details or FS details but not both { for ($i=0; $i<$NumLustreCltOsts; $i++) { # when lustre first starts up none of these have values $cltPlot.=sprintf("$SEP%s$SEP%s$SEP%d$SEP%d$SEP%d$SEP%d", $lustreCltOstFS[$i], $lustreCltOsts[$i], defined($lustreCltLunRead[$i]) ? $lustreCltLunRead[$i]/$intSecs : 0, defined($lustreCltLunReadKB[$i]) ? $lustreCltLunReadKB[$i]/$intSecs : 0, defined($lustreCltLunWrite[$i]) ? $lustreCltLunWrite[$i]/$intSecs : 0, defined($lustreCltLunWriteKB[$i]) ? $lustreCltLunWriteKB[$i]/$intSecs : 0); } for ($i=0; $lustOpts=~/B/ && $i<$NumLustreCltOsts; $i++) { for ($j=0; $j<$numBrwBuckets; $j++) { $cltPlot.=sprintf("$SEP%3d", $lustreCltRpcRead[$i][$j]/$intSecs); } for ($j=0; $j<$numBrwBuckets; $j++) { $cltPlot.=sprintf("$SEP%3d", $lustreCltRpcWrite[$i][$j]/$intSecs); } } } else # must be FS { for ($i=0; $i<$NumLustreFS; $i++) { $cltPlot.=sprintf("$SEP%s$SEP%d$SEP%d$SEP%d$SEP%d", $lustreCltFS[$i], $lustreCltRead[$i]/$intSecs, $lustreCltReadKB[$i]/$intSecs, $lustreCltWrite[$i]/$intSecs, $lustreCltWriteKB[$i]/$intSecs); } for ($i=0; $lustOpts=~/M/ && $i<$NumLustreFS; $i++) { $cltPlot.=sprintf("$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d", $lustreCltOpen[$i]/$intSecs, $lustreCltClose[$i]/$intSecs, $lustreCltGetattr[$i]/$intSecs, $lustreCltSetattr[$i]/$intSecs, $lustreCltSeek[$i]/$intSecs, $lustreCltFsync[$i]/$intSecs, $lustreCltDirtyHits[$i]/$intSecs, $lustreCltDirtyMiss[$i]/$intSecs); } for ($i=0; $lustOpts=~/R/ && $i<$NumLustreFS; $i++) { $cltPlot.=sprintf("$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d", $lustreCltRAPendingTot, $lustreCltRAHitsTot, $lustreCltRAMissesTot, $lustreCltRANotConTot, $lustreCltRAMisWinTot, $lustreCltRAFalGrabTot, $lustreCltRALckFailTot, $lustreCltRAReadDiscTot, $lustreCltRAZeroLenTot, $lustreCltRAZeroWinTot, $lustreCltRA2EofTot, $lustreCltRAHitMaxTot, $lustreCltRAWrongTot); } } writeData(0, $datetime, \$cltPlot, CLT, $ZCLT, 'clt', \$oneline); } } ######################### # N U M A F i l e ######################### if ($subsys=~/M/) { my $numaPlot=''; for (my $i=0; $i<$CpuNodes; $i++) { # don't see how total can ever be 0, but let's be careful anyways my $hitsplusmisses=$numaStat[$i]->{hits}+$numaStat[$i]->{for}+$numaStat[$i]->{miss}; my $hitrate=($hitsplusmisses) ? $numaStat[$i]->{hits}/$hitsplusmisses*100 : 100; $numaPlot.=sprintf("$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%.2f", $numaMem[$i]->{used}, $numaMem[$i]->{free}, $numaMem[$i]->{slab}, $numaMem[$i]->{map}, $numaMem[$i]->{anon}, $numaMem[$i]->{anonH},$numaMem[$i]->{inact}, $hitrate); } writeData(0, $datetime, \$numaPlot, NUMA, $ZNUMA, 'numa', \$oneline); } ##################### # N F S F i l e ##################### if ($subsys=~/F/) { $nfsPlot=''; if ($nfs2CFlag) { $nfsPlot.=sprintf("$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $nfs2CRead/$intSecs, $nfs2CWrite/$intSecs, $nfs2CLookup/$intSecs, $nfs2CGetattr/$intSecs, $nfs2CSetattr/$intSecs, $nfs2CReaddir/$intSecs, $nfs2CCreate/$intSecs, $nfs2CRemove/$intSecs,); $nfsPlot.=sprintf("$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $nfs2CRename/$intSecs, $nfs2CLink/$intSecs, $nfs2CReadlink/$intSecs, $nfs2CNull/$intSecs, $nfs2CSymlink/$intSecs, $nfs2CMkdir/$intSecs, $nfs2CRmdir/$intSecs, $nfs2CFsstat/$intSecs); } if ($nfs2SFlag) { $nfsPlot.=sprintf("$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $nfs2SRead/$intSecs, $nfs2SWrite/$intSecs, $nfs2SLookup/$intSecs, $nfs2SGetattr/$intSecs, $nfs2SSetattr/$intSecs, $nfs2SReaddir/$intSecs, $nfs2SCreate/$intSecs, $nfs2SRemove/$intSecs); $nfsPlot.=sprintf("$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $nfs2SRename/$intSecs, $nfs2SLink/$intSecs, $nfs2SReadlink/$intSecs, $nfs2SNull/$intSecs, $nfs2SSymlink/$intSecs, $nfs2SMkdir/$intSecs, $nfs2SRmdir/$intSecs, $nfs2SFsstat/$intSecs); } if ($nfs3CFlag) { $nfsPlot.=sprintf("$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $nfs3CRead/$intSecs, $nfs3CWrite/$intSecs, $nfs3CCommit/$intSecs, $nfs3CLookup/$intSecs, $nfs3CAccess/$intSecs, $nfs3CGetattr/$intSecs, $nfs3CSetattr/$intSecs, $nfs3CReaddir/$intSecs, $nfs3CCreate/$intSecs, $nfs3CRemove/$intSecs, $nfs3CRename/$intSecs, $nfs3CLink/$intSecs); $nfsPlot.=sprintf("$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $nfs3CReadlink/$intSecs, $nfs3CNull/$intSecs, $nfs3CSymlink/$intSecs, $nfs3CMkdir/$intSecs, $nfs3CRmdir/$intSecs, $nfs3CFsstat/$intSecs, $nfs3CFsinfo/$intSecs, $nfs3CPathconf/$intSecs, $nfs3CMknod/$intSecs, $nfs3CReaddirplus/$intSecs); } if ($nfs3SFlag) { $nfsPlot.=sprintf("$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $nfs3SRead/$intSecs, $nfs3SWrite/$intSecs, $nfs3SCommit/$intSecs, $nfs3SLookup/$intSecs, $nfs3SAccess/$intSecs, $nfs3SGetattr/$intSecs, $nfs3SSetattr/$intSecs, $nfs3SReaddir/$intSecs, $nfs3SCreate/$intSecs, $nfs3SRemove/$intSecs, $nfs3SRename/$intSecs, $nfs3SLink/$intSecs); $nfsPlot.=sprintf("$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $nfs3SReadlink/$intSecs, $nfs3SNull/$intSecs, $nfs3SSymlink/$intSecs, $nfs3SMkdir/$intSecs, $nfs3SRmdir/$intSecs, $nfs3SFsstat/$intSecs, $nfs3SFsinfo/$intSecs, $nfs3SPathconf/$intSecs, $nfs3SMknod/$intSecs, $nfs3SReaddirplus/$intSecs); } if ($nfs4CFlag) { $nfsPlot.=sprintf("$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $nfs4CRead/$intSecs, $nfs4CWrite/$intSecs, $nfs4CCommit/$intSecs, $nfs4CLookup/$intSecs, $nfs4CAccess/$intSecs, $nfs4CGetattr/$intSecs, $nfs4CSetattr/$intSecs, $nfs4CReaddir/$intSecs, $nfs4CCreate/$intSecs, $nfs4CRemove/$intSecs, $nfs4CRename/$intSecs, $nfs4CLink/$intSecs); $nfsPlot.=sprintf("$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $nfs4CReadlink/$intSecs, $nfs4CNull/$intSecs, $nfs4CSymlink/$intSecs, $nfs4CFsinfo/$intSecs, $nfs4CPathconf/$intSecs); } if ($nfs4SFlag) { $nfsPlot.=sprintf("$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $nfs4SRead/$intSecs, $nfs4SWrite/$intSecs, $nfs4SCommit/$intSecs, $nfs4SLookup/$intSecs, $nfs4SAccess/$intSecs, $nfs4SGetattr/$intSecs, $nfs4SSetattr/$intSecs, $nfs4SReaddir/$intSecs, $nfs4SCreate/$intSecs, $nfs4SRemove/$intSecs, $nfs4SRename/$intSecs, $nfs4SLink/$intSecs, $nfs4SReadlink/$intSecs); } writeData(0, $datetime, \$nfsPlot, NFS, $ZNFS, 'nfs', \$oneline); } ##################### # N E T F i l e ##################### if ($subsys=~/N/) { $netPlot=''; for (my $i=0; $i<@netOrder; $i++) { # remember the order includes the speed $netName=$netOrder[$i]; $netName=~s/:.*//; next if ($netFiltKeep eq '' && $netName=~/$netFiltIgnore/) || ($netFiltKeep ne '' && $netName!~/$netFiltKeep/); # remember 'err' is a single error counter and 'errs' is the total of those counters # we also have to be sure to preseve network order if (defined($networks{$netName})) { my $i=$networks{$netName}; $netPlot.=sprintf("$SEP%s$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $netName, $netRxPkt[$i]/$intSecs, $netTxPkt[$i]/$intSecs, $netRxKB[$i]/$intSecs, $netTxKB[$i]/$intSecs, $netRxErr[$i]/$intSecs, $netRxDrp[$i]/$intSecs, $netRxFifo[$i]/$intSecs,$netRxFra[$i]/$intSecs, $netRxCmp[$i]/$intSecs, $netRxMlt[$i]/$intSecs, $netTxErr[$i]/$intSecs, $netTxDrp[$i]/$intSecs, $netTxFifo[$i]/$intSecs,$netTxColl[$i]/$intSecs, $netTxCar[$i]/$intSecs, $netTxCmp[$i]/$intSecs, $netRxErrs[$i]/$intSecs,$netTxErrs[$i]/$intSecs); $netErrors+=$netRxErrs[$i]+$netTxErrs[$i]; } else { $netPlot.=sprintf("$SEP%s$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $netName, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0); } } # since we can't have holes in a line, with --netopts E we print ALL interfaces in the offending interval writeData(0, $datetime, \$netPlot, NET, $ZNET, 'net', \$oneline) if $netOpts!~/E/ || $netErrors; } ############################ # I n t e r c o n n e c t ############################ # INFINIBAND if ($subsys=~/X/ && $NumHCAs) { $ibPlot=''; for ($i=0; $i<$NumHCAs; $i++) { $ibPlot.=sprintf("$SEP%d$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS$SEP%$FS", $i, $ibRx[$i]/$intSecs, $ibTx[$i]/$intSecs, $ibRxKB[$i]/$intSecs, $ibTxKB[$i]/$intSecs, $ibErrorsTot[$i]); } writeData(0, $datetime, \$ibPlot, IB, $ZIB, 'ib', \$oneline); } ####################### # T C P F i l e ####################### if ($subsys=~/T/) { # This is going to be big!!! $tcpPlot=''; for my $type ('Ip', 'Tcp', 'Udp', 'Icmp', 'IpExt', 'TcpExt') { next if $type eq 'Ip' && $tcpFilt!~/i/; next if $type eq 'Tcp' && $tcpFilt!~/t/; next if $type eq 'Udp' && $tcpFilt!~/u/; next if $type eq 'Icmp' && $tcpFilt!~/c/; next if $type eq 'IpExt' && $tcpFilt!~/I/; next if $type eq 'TcpExt' && $tcpFilt!~/T/; # unfortunately the data is indexed by header name so we need an extra hop to get to it foreach my $header (@{$tcpData{$type}->{hdr}}) { $tcpPlot.=sprintf("$SEP%d", $tcpData{$type}->{$header}/$intSecs); } } writeData(0, $datetime, \$tcpPlot, TCP, $ZTCP, 'tcp', \$oneline); } ######################### # B U D D Y I N F O ######################### if ($subsys=~/B/) { $budPlot=''; for (my $i=0; $i<$NumBud; $i++) { $budPlot.="$SEP$buddyNode[$i]$SEP$buddyZone[$i]"; for (my $j=0; $j<11; $j++) { $budPlot.=sprintf("$SEP%d", $buddyInfo[$i][$j]); } } writeData(0, $datetime, \$budPlot, BUD, $ZBUD, 'bud', \$oneline); } ##################### # I M P O R T S ##################### # only if detail data for (my $i=0; $impDetailFlag && $i<$impNumMods; $i++) { if ($impOpts[$i]=~/d/) { $impPlot=''; &{$impPrintPlot[$i]}(4, \$impPlot); if ($impPlot ne '') { $impDetFlag[$i]++; writeData(0, $datetime, \$impPlot, , $impText[$i], $impGz[$i], $impKey[$i], \$oneline); } } } # F i n a l w r i t e # we can't have holes in line so if --netopts E and no errors, do NOT print line to terminal or file. return if $netOpts=~/E/ && !$netErrors; # This write is necessary to write complete record to terminal or socket. # Note if there is a socket error we're returning to the caller anyway; writeData(1, $datetime, undef, $LOG, undef, undef, \$oneline) if !$logToFileFlag || ($sockFlag && $export eq ''); } # First and foremost, this is ONLY used to plot data. It will send it to the terminal, # a socket, a data file or a combination of socket and data file. # Secondly, we only call after processing a complete subsystem so in the case of # core ones there's a single call but for detail subsystems one per. # Therefore, when writing to a file, we write the whole string we're passed, but when # writing to a terminal or socket, we build up one long string and write it on the # last call. Since we can write to any combinations we need to handle them all. sub writeData { my $eolFlag= shift; my $datetime=shift; my $string= shift; my $file= shift; my $zfile= shift; my $errtxt= shift; my $strall= shift; # The very last call is special so handle it elsewhere if (!$eolFlag) { # If writing to the terminal or a socket, just concatenate # the strings together until the last call. if (!$logToFileFlag || $sockFlag) { $$strall.=$$string; } # However, we might also be writing to a file as well as a socket # and so need second test like this. if ($logToFileFlag) { # Since we get called with !$eolFlag with partial lines, we always # have a separator at the end of the line, so remove it before write. my $localCopy=$$string; $localCopy=~s/$SEP$//; # Each record gets a timestamp and a newline. In the case of a file # header, this will be null and the data will be the header! $zfile->gzwrite("$datetime$localCopy\n") or writeError($errtxt, $zfile) if $zFlag; print {$file} "$datetime$localCopy\n" if !$zFlag; } return(1); } # Final Write!!! # Doing these two writes this way will allow writing to the # terminal AND a socket if we ever want to. # NOTE - in virtually all cases there will be data to write. However when collecting # data at different intervals, say with a custom import, there may not be data every # time and we don't want to write empty records. if ($$strall ne '') { if (!$sockFlag) { # final write to terminal print "$datetime$$strall\n"; } # write to socket but ONLY if we're not shutting down if ($sockFlag && scalar(@sockets) && !$doneFlag) { # If a data line, preface with timestamp $$strall="$datetime$$strall" if $strall!~/^#/; # If we're not running a server, make sure each line begins # with hostname and write to socket $$strall=~s/^(.*)$/$Host $1/mg if !$serverFlag; # we need to write to each listening socket, though there are probably rarely # more than 1 $$strall.="\n"; foreach my $socket (@sockets) { my $length=length($$strall); for (my $offset=0; $offset<$length;) { # Note - if there is a socket write error, writeData returns 0, but we're # exiting this routine anyway and since '$doneFlag' is hopefully set because # of a broken socket, the calling routines should exit cleanly. # BUT only log error if in server mode, since normal as client my $bytes=syswrite($socket, $$strall, $length, $offset); if (!defined($bytes)) { logmsg('E', "Error '$!' writing to socket") if $serverFlag; return(0); } $offset+=$bytes; $length-=$bytes; } } } } return(1); } ###################################### # T e r m i n a l F o r m a t s ###################################### sub printTerm { local $seconds=shift; local $usecs= shift; my ($ss, $mm, $hh, $mday, $mon, $year, $line, $i, $j); # if someone wants to look at procs with --home and NOT --top, let them! print "$clscr" if !$numTop && $homeFlag; # There are a couple of things we want to do in interactive --top mode regardless # of --brief or --verbose if ($numTop && $playback eq '') { print $clscr if !$printTermFirst; if ($printTermFirst) # --brief OR single subsys --verbose { # only on the first pass move the cursor to the correct location for ALL cases if ($subsys!~/^[YZ]+$/) { my $lineNum=$totalCounter+2; $lineNum=$scrollEnd if $lineNum>$scrollEnd; $lineNum=0 if !$sameColsFlag || $detailFlag; printf "%c[%d;H", 27, $lineNum; } # We only want to clear the screen once and print the header once # the first time through and then just overpaint starting with data, # unless of course we have details in which case we always print it. $clscr=$home; $headerRepeat=0 if !$detailFlag; } $printTermFirst=1; } # if we're including date and/or time, do once for whole interval $line=$datetime=''; if ($miniDateFlag || $miniTimeFlag) { ($ss, $mm, $hh, $mday, $mon, $year)=localtime($seconds); $datetime=sprintf("%02d:%02d:%02d", $hh, $mm, $ss); $datetime=sprintf("%02d/%02d %s", $mon+1, $mday, $datetime) if $options=~/d/; $datetime=sprintf("%04d%02d%02d %s", $year+1900, $mon+1, $mday, $datetime) if $options=~/D/; $datetime.=".$usecs" if ($options=~/m/); $datetime.=" "; } ################ # B r i e f ################ if ($briefFlag) { # This always goes to terminal or socket and is never compressed so we don't need # all the options of writeData() [yet]. printBrief(); # --top mode requires process data too but only if interactive OR we're in playback # mode and processing a file with process data in it if ($numTop && ($playback eq '' || (($playback{$prefix}->{flags} & 1)==0) || $rawPFlag)) { printTermProc() if $topProcFlag; printTermSlab() if $topSlabFlag; $headerRepeat=-1 && $playback eq ''; # only print header once interactively } return; } ############################ # V e r b o s e ############################ # These interval counters will always match the interval we're about to print $interval1Counter++ if $i1DataFlag; $interval2Counter++ if $i2DataFlag && $interval2Print; $interval3Counter++ if $i3DataFlag && $interval3Print; # we usually want record break separators (with timestamps) except in a few cases which $separatorHeaderPrinted=0; if ($subsys=~/c/) { $i=$NumCpus; if (printHeader()) { printText("\n") if !$homeFlag; printText("# CPU$Hyper SUMMARY (INTR, CTXSW & PROC $rate)$cpuDisabledMsg\n"); printText("#$miniDateTime User Nice Sys Wait IRQ Soft Steal Guest NiceG Idle CPUs Intr Ctxsw Proc RunQ Run Avg1 Avg5 Avg15 RunT BlkT\n"); exit(0) if $showColFlag; } $line=sprintf("$datetime %4d %4d %4d %4d %4d %4d %4d %4d %4d %4d %4d %4s %4s %4d %4d %4d %5.2f %5.2f %5.2f %4d %4d\n", $userP[$i], $niceP[$i], $sysP[$i], $waitP[$i], $irqP[$i], $softP[$i], $stealP[$i], $guestP[$i], $guestNP[$i], $idleP[$i], $cpusEnabled, cvt($intrpt/$intSecs), cvt($ctxt/$intSecs), $proc/$intSecs, $loadQue, $loadRun, $loadAvg1, $loadAvg5, $loadAvg15, $procsRun, $procsBlock); printText($line); } if ($subsys=~/C/) { if (printHeader()) { printText("\n") if !$homeFlag; printText("# SINGLE CPU$Hyper STATISTICS$cpuDisabledMsg\n"); my $intrptText=($subsys=~/j/i) ? ' Intrpt' : ''; printText("#$miniDateTime Cpu User Nice Sys Wait IRQ Soft Steal Guest NiceG Idle$intrptText\n"); exit(0) if $showColFlag; } # if not recorded and user chose -s C don't print line items if (defined($userP[0])) { for ($i=0; $i<$NumCpus; $i++) { # skip idle CPUs if --cpuopts z specified. I'd rather check for idle==100% but some kernels don't # always increment counts and there are actually idle cpus with values of 0 here. next if $cpuOpts=~/z/ && $userP[$i]+$niceP[$i]+$sysP[$i]+$waitP[$i]+$irqP[$i]+ $softP[$i]+$stealP[$i]==0; # apply filters if specified next if (@cpuFiltKeep && !defined($cpuFiltKeep[$i])) || (@cpuFiltIgnore && defined($cpuFiltIgnore[$i])); $line=sprintf("$datetime %4d %3d %3d %3d %3d %3d %3d %3d %3d %3d %3d", $i, $userP[$i], $niceP[$i], $sysP[$i], $waitP[$i], $irqP[$i], $softP[$i], $stealP[$i], $guestP[$i], $guestNP[$i], $idleP[$i]); $line.=sprintf(" %6d", $intrptTot[$i]/$intSecs) if $subsys=~/j/i; printText("$line\n"); } } } # Only meaningful when Interrupts not combined with -sC if ($subsys=~/j/ && !$CFlag) { # note we skip cpu when filtering if (printHeader()) { printText("\n") if !$homeFlag; printText("# INTERRUPT SUMMARY$cpuDisabledMsg\n"); my $oneline="#$miniDateTime "; for (my $i=0; $i<$NumCpus; $i++) { next if (@cpuFiltKeep && !defined($cpuFiltKeep[$i])) || (@cpuFiltIgnore && defined($cpuFiltIgnore[$i])); my $cpuname=($cpuEnabled[$i]) ? "Cpu$i" : "CpuX"; $oneline.=sprintf(" %6s", $cpuname); } printText("$oneline\n"); exit(0) if $showColFlag; } my $oneline="$datetime "; for (my $i=0; $i<$NumCpus; $i++) { next if (@cpuFiltKeep && !defined($cpuFiltKeep[$i])) || (@cpuFiltIgnore && defined($cpuFiltIgnore[$i])); $oneline.=sprintf(" %6d", $intrptTot[$i]); } printText("$oneline\n"); exit(0) if $showColFlag; } if ($subsys=~/J/) { if (printHeader()) { printText("\n") if !$homeFlag; printText("# INTERRUPT DETAILS$cpuDisabledMsg\n"); my $oneline="#$miniDateTime Int "; for (my $i=0; $i<$NumCpus; $i++) { next if (@cpuFiltKeep && !defined($cpuFiltKeep[$i])) || (@cpuFiltIgnore && defined($cpuFiltIgnore[$i])); my $cpuname=($cpuEnabled[$i] || $subsys!~/c/i) ? "Cpu$i" : "CpuX"; $oneline.=sprintf(" %6s", $cpuname); } $oneline.=sprintf(" %-15s %s\n", 'Type', 'Device(s)'); printText($oneline); exit(0) if $showColFlag; } foreach my $key (sort keys %intrptType) { my $linetot=0; my $oneline="$datetime $key "; for (my $i=0; $i<$NumCpus; $i++) { next if $key eq 'ERR' || $key eq 'MIS'; next if (@cpuFiltKeep && !defined($cpuFiltKeep[$i])) || (@cpuFiltIgnore && defined($cpuFiltIgnore[$i])); my $ints=($key=~/^\d/) ? $intrpt[$key]->[$i]/$intSecs : $intrpt{$key}->[$i]/$intSecs; $oneline.=sprintf("%6d ", $ints); $linetot+=$ints; } $oneline.=sprintf(" %s", $intName{$key}) if $key!~/ERR|MIS/; printText("$oneline\n") if $linetot; } } if ($subsys=~/d/) { if (printHeader()) { printText("\n") if !$homeFlag; printText("# DISK SUMMARY ($rate)\n"); printText("#${miniDateTime}KBRead RMerged Reads SizeKB KBWrite WMerged Writes SizeKB\n"); exit(0) if $showColFlag; } $line=sprintf("$datetime %6d %6d %6d %6d %6d %6d %6d %6d\n", $dskReadKBTot/$intSecs, $dskReadMrgTot/$intSecs, $dskReadTot/$intSecs, $dskReadTot ? $dskReadKBTot/$dskReadTot : 0, $dskWriteKBTot/$intSecs, $dskWriteMrgTot/$intSecs, $dskWriteTot/$intSecs, $dskWriteTot ? $dskWriteKBTot/$dskWriteTot : 0); printText($line); } if ($subsys=~/D/) { # deal with --dskopts f format here if (!defined($dskhdr1Format)) { if ($dskOpts!~/f/) { $dskhdr1Format="<---------reads---------------><---------writes--------------><--------averages--------> Pct\n"; $dskhdr2Format=" KBytes Merged IOs Size Wait KBytes Merged IOs Size Wait RWSize QLen Wait SvcTim Util\n"; $dskdetFormat="%s%-11s %6d %6d %4s %4s %5d %6d %6d %4s %4s %5d %5d %5d %4d %4d %3d\n"; } else { $dskhdr1Format="<------------reads--------------><-------------writes------------><---------averages----------> Pct\n"; $dskhdr2Format=" KBytes Merged IOs Size Wait KBytes Merged IOs Size Wait RWSize QLen Wait SvcTim Util\n"; $dskdetFormat="%s%-11s %7.1f %6.0f %4s %4s %6.1f %7.1f %6.0f %4s %4s %6.1f %6.1f %6.1f %6.1f %6.1f %5.2f\n"; } } if (printHeader()) { printText("\n") if !$homeFlag; printText("# DISK STATISTICS ($rate)\n"); printText("#$miniFiller $dskhdr1Format"); printText("#${miniDateTime}Name $dskhdr2Format"); exit(0) if $showColFlag; } for (my $o=0; $o<@dskOrder; $o++) { # preserve display order but skip any disks not seen this interval $dskName=$dskOrder[$o]; $i=$disks{$dskName}; next if !defined($i); next if ($dskFiltKeep eq '' && $dskName=~/$dskFiltIgnore/) || ($dskFiltKeep ne '' && $dskName!~/$dskFiltKeep/); # Filter out lines of all zeros when requested next if $dskOpts=~/z/ && ($dskReadKB[$i]+$dskReadMrg[$i]+$dskRead[$i]+ $dskWriteKB[$i]+$dskWriteMrg[$i]+$dskWrite[$i]+ $dskRqst[$i]+$dskQueLen[$i]+$dskWait[$i]+$dskSvcTime[$i]+$dskUtil[$i]==0); # If exception processing in effect, make sure this entry qualities next if $options=~/x/ && $dskRead[$i]/$intSecs<$limIOS && $dskWrite[$i]/$intSecs<$limIOS; $line=sprintf($dskdetFormat, $datetime, $dskName, $dskReadKB[$i]/$intSecs, $dskReadMrg[$i]/$intSecs, cvt($dskRead[$i]/$intSecs), $dskRead[$i] ? cvt($dskReadKB[$i]/$dskRead[$i],4,0,1) : 0, $dskWaitR[$i], $dskWriteKB[$i]/$intSecs, $dskWriteMrg[$i]/$intSecs, cvt($dskWrite[$i]/$intSecs), $dskWrite[$i] ? cvt($dskWriteKB[$i]/$dskWrite[$i],4,0,1) : 0, $dskWaitW[$i], $dskRqst[$i], $dskQueLen[$i], $dskWait[$i], $dskSvcTime[$i], $dskUtil[$i]); printText($line); } } if ($subsys=~/f/) { if (printHeader()) { my $temp=($nfsFilt ne '') ? "Filters: $nfsFilt" : ''; printText("\n") if !$homeFlag; printText("# NFS SUMMARY ($rate) $temp\n"); $temp="#$miniFiller"; $temp.="<---------------------------server--------------------------->" if $nfsSFlag; $temp.="<----------------client---------------->" if $nfsCFlag; printText("$temp\n"); $temp="#$miniDateTime"; $temp.=" Reads Writes Meta Comm UDP TCP TCPConn BadAuth BadClnt " if $nfsSFlag; $temp.=" Reads Writes Meta Comm Retrans Authref" if $nfsCFlag; printText("$temp\n"); exit(0) if $showColFlag; } $line=$datetime; $line.=sprintf(" %6s %6s %4s %4s %4s %4s %4s %4s %4s", cvt($nfsSReadsTot/$intSecs,6), cvt($nfsSWritesTot/$intSecs,6), cvt($nfsSMetaTot/$intSecs), cvt($nfsSCommitTot/$intSecs), cvt($nfsUdpTot/$intSecs), cvt($nfsTcpTot/$intSecs), cvt($nfsTcpConnTot/$intSecs), cvt($rpcBadAuthTot/$intSecs), cvt($rpcBadClntTot/$intSecs)) if $nfsSFlag; $line.=sprintf(" %6s %6s %4s %4s %4s %4s", cvt($nfsCReadsTot/$intSecs,6), cvt($nfsCWritesTot/$intSecs,6), cvt($nfsCMetaTot/$intSecs), cvt($nfsCCommitTot/$intSecs), cvt($rpcRetransTot/$intSecs), cvt($rpcCredRefTot/$intSecs)) if $nfsCFlag; $line.="\n"; printText($line); } if ($subsys=~/F/) { if (printHeader()) { printText("\n") if !$homeFlag; printText("# NFS SERVER/CLIENT DETAILS ($rate)\n"); # NOTE - we're not including V2 root/wrcache printText("#${miniDateTime}Type Read Writ Comm Look Accs Gttr Sttr Rdir Cre8 Rmov Rnam Link Rlnk Null Syml Mkdr Rmdr Fsta Finf Path Mknd Rdr+\n"); exit(0) if $showColFlag; } # As an optimization, only show data where the filesystem is actually active but if --nfsopts z, only show # entries with non-zero data. Currently only valid value for $nfsOpts is 'z' if ($nfs2CFlag && $nfs2CSeen && ($nfsOpts!~/z/ || $nfs2CRead+$nfs2CWrite+$nfs2CMeta)) { $line =sprintf("$datetime Clt2 %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s\n", cvt($nfs2CRead/$intSecs), cvt($nfs2CWrite/$intSecs), '', cvt($nfs2CLookup/$intSecs), '', cvt($nfs2CGetattr/$intSecs), cvt($nfs2CSetattr/$intSecs), cvt($nfs2CReaddir/$intSecs), cvt($nfs2CCreate/$intSecs), cvt($nfs2CRemove/$intSecs), cvt($nfs2CRename/$intSecs), cvt($nfs2CLink/$intSecs), cvt($nfs2CReadlink/$intSecs),cvt($nfs2CNull/$intSecs), cvt($nfs2CSymlink/$intSecs), cvt($nfs2CMkdir/$intSecs), cvt($nfs2CRmdir/$intSecs), cvt($nfs2CFsstat/$intSecs)); printText($line); } if ($nfs2SFlag && $nfs2SSeen && ($nfsOpts!~/z/ || $nfs2SRead+$nfs2SWrite+$nfs2SMeta)) { $line =sprintf("$datetime Svr2 %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s\n", cvt($nfs2SRead/$intSecs), cvt($nfs2SWrite/$intSecs), '', cvt($nfs2SLookup/$intSecs), '', cvt($nfs2SGetattr/$intSecs), cvt($nfs2SSetattr/$intSecs), cvt($nfs2SReaddir/$intSecs), cvt($nfs2SCreate/$intSecs), cvt($nfs2SRemove/$intSecs), cvt($nfs2SRename/$intSecs), cvt($nfs2SLink/$intSecs), cvt($nfs2SReadlink/$intSecs),cvt($nfs2SNull/$intSecs), cvt($nfs2SSymlink/$intSecs), cvt($nfs2SMkdir/$intSecs), cvt($nfs2SRmdir/$intSecs), cvt($nfs2SFsstat/$intSecs)); printText($line); } if ($nfs3CFlag && $nfs3CSeen && ($nfsOpts!~/z/ || $nfs3CRead+$nfs3CWrite+$nfs3CMeta)) { $line =sprintf("$datetime Clt3 %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s\n", cvt($nfs3CRead/$intSecs), cvt($nfs3CWrite/$intSecs), cvt($nfs3CCommit/$intSecs), cvt($nfs3CLookup/$intSecs), cvt($nfs3CAccess/$intSecs), cvt($nfs3CGetattr/$intSecs), cvt($nfs3CSetattr/$intSecs), cvt($nfs3CReaddir/$intSecs), cvt($nfs3CCreate/$intSecs), cvt($nfs3CRemove/$intSecs), cvt($nfs3CRename/$intSecs), cvt($nfs3CLink/$intSecs), cvt($nfs3CReadlink/$intSecs),cvt($nfs3CNull/$intSecs), cvt($nfs3CSymlink/$intSecs), cvt($nfs3CMkdir/$intSecs), cvt($nfs3CRmdir/$intSecs), cvt($nfs3CFsstat/$intSecs), cvt($nfs3CFsinfo/$intSecs), cvt($nfs3CPathconf/$intSecs),cvt($nfs3CMknod/$intSecs), cvt($nfs3CReaddirplus/$intSecs)); printText($line); } if ($nfs3SFlag && $nfs3SSeen && ($nfsOpts!~/z/ || $nfs3SRead+$nfs3SWrite+$nfs3SMeta)) { $line =sprintf("$datetime Svr3 %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s\n", cvt($nfs3SRead/$intSecs), cvt($nfs3SWrite/$intSecs), cvt($nfs3SCommit/$intSecs), cvt($nfs3SLookup/$intSecs), cvt($nfs3SAccess/$intSecs), cvt($nfs3SGetattr/$intSecs), cvt($nfs3SSetattr/$intSecs), cvt($nfs3SReaddir/$intSecs), cvt($nfs3SCreate/$intSecs), cvt($nfs3SRemove/$intSecs), cvt($nfs3SRename/$intSecs), cvt($nfs3SLink/$intSecs), cvt($nfs3SReadlink/$intSecs),cvt($nfs3SNull/$intSecs), cvt($nfs3SSymlink/$intSecs), cvt($nfs3SMkdir/$intSecs), cvt($nfs3SRmdir/$intSecs), cvt($nfs3SFsstat/$intSecs), cvt($nfs3SFsinfo/$intSecs), cvt($nfs3SPathconf/$intSecs),cvt($nfs3SMknod/$intSecs), cvt($nfs3SReaddirplus/$intSecs)); printText($line); } # Not Used: Mkdir Mknod Readdirplus Fsstat Rmdir if ($nfs4CFlag && $nfs4CSeen && ($nfsOpts!~/z/ || $nfs4CRead+$nfs4CWrite+$nfs4CMeta)) { $line =sprintf("$datetime Clt4 %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s\n", cvt($nfs4CRead/$intSecs), cvt($nfs4CWrite/$intSecs), cvt($nfs4CCommit/$intSecs), cvt($nfs4CLookup/$intSecs), cvt($nfs4CAccess/$intSecs), cvt($nfs4CGetattr/$intSecs), cvt($nfs4CSetattr/$intSecs), cvt($nfs4CReaddir/$intSecs), cvt($nfs4CCreate/$intSecs), cvt($nfs4CRemove/$intSecs), cvt($nfs4CRename/$intSecs), cvt($nfs4CLink/$intSecs), cvt($nfs4CReadlink/$intSecs),cvt($nfs4CNull/$intSecs), cvt($nfs4CSymlink/$intSecs), '', '', '', cvt($nfs4CFsinfo/$intSecs), cvt($nfs4CPathconf/$intSecs)); printText($line); } if ($nfs4SFlag && $nfs4SSeen && ($nfsOpts!~/z/ || $nfs4SRead+$nfs4SWrite+$nfs4SMeta)) { # Not Used: Null Pathconf Mkdir Mknod Readdirplus Fsinfo Fsstat Symlink Rmdir $line =sprintf("$datetime Svr4 %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s %4s\n", cvt($nfs4SRead/$intSecs), cvt($nfs4SWrite/$intSecs), cvt($nfs4SCommit/$intSecs), cvt($nfs4SLookup/$intSecs), cvt($nfs4SAccess/$intSecs), cvt($nfs4SGetattr/$intSecs), cvt($nfs4SSetattr/$intSecs), cvt($nfs4SReaddir/$intSecs), cvt($nfs4SCreate/$intSecs), cvt($nfs4SRemove/$intSecs), cvt($nfs4SRename/$intSecs), cvt($nfs4SLink/$intSecs), cvt($nfs4SReadlink/$intSecs)); printText($line); } } if ($subsys=~/i/) { if (printHeader()) { printText("\n") if !$homeFlag; printText("# INODE SUMMARY\n"); printText("#${miniFiller} Dentries File Handles Inodes\n"); printText("#${miniDateTime} Number Unused Alloc MaxPct Number\n"); exit(0) if $showColFlag; } $line=sprintf("$datetime %7s %7s %6s %5.2f %6s\n", cvt($dentryNum,7), cvt($dentryUnused,7), cvt($filesAlloc,6), $filesMax ? $filesAlloc*100/$filesMax : 0, cvt($inodeUsed,6)); printText($line); } # This is the normal output for an MDS and only skip if --lustopts D and only D # noting D output (which itself is only for hp-sfs), is handled elsewhere if ($subsys=~/l/ && $reportMdsFlag && $lustOpts ne 'D') { if (printHeader()) { printText("\n") if !$homeFlag; printText("# LUSTRE MDS SUMMARY ($rate)\n"); printText("#${miniDateTime} Getattr GttrLck StatFS Sync Gxattr Sxattr Connect Disconn"); printText(" Reint") if $cfsVersion lt '1.6.5'; printText(" Create Link Setattr Rename Unlink") if $cfsVersion ge '1.6.5'; printText("\n"); exit(0) if $showColFlag; } # Don't report if exception processing in effect and we're below limit # NOTE - exception processing only for versions < 1.6.5 if ($options!~/x/ || $cfsVersion ge '1.6.5' || $lustreMdsReint/$intSecs>=$limLusReints) { $line.=sprintf("$datetime %7d %7d %7d %7d %7d %7d %7d %7d", $lustreMdsGetattr/$intSecs, $lustreMdsGetattrLock/$intSecs, $lustreMdsStatfs/$intSecs, $lustreMdsSync/$intSecs, $lustreMdsGetxattr/$intSecs, $lustreMdsSetxattr/$intSecs, $lustreMdsConnect/$intSecs, $lustreMdsDisconnect/$intSecs); if ($cfsVersion lt '1.6.5') { $line.=sprintf(" %5d", $lustreMdsReint/$intSecs); } else { $line.=sprintf(" %6d %6d %7d %6d %6d", $lustreMdsReintCreate/$intSecs, $lustreMdsReintLink/$intSecs, $lustreMdsReintSetattr/$intSecs, $lustreMdsReintRename/$intSecs, $lustreMdsReintUnlink/$intSecs); } } $line.="\n"; printText($line); } # This is the normal output for an OST and only skip if --lustopts D and only D # noting D output (which itself is only for hp-sfs), is handled elsewhere if ($subsys=~/l/ && $reportOstFlag && $lustOpts ne 'D') { if (printHeader()) { printText("\n") if !$homeFlag; printText("# LUSTRE OST SUMMARY ($rate)\n"); if ($lustOpts!~/B/) { printText("#${miniDateTime} KBRead Reads SizeKB KBWrite Writes SizeKB\n"); } else { printText("#${miniFiller}<----------------------reads-------------------------|"); printText("-----------------------writes------------------------->\n"); $temp=''; foreach my $i (@brwBuckets) { $temp.=sprintf(" %3dP", $i); } printText("#${miniDateTime}RdK Rds$temp WrtK Wrts$temp\n"); } exit(0) if $showColFlag; } $line=$datetime; if ($lustOpts!~/B/) { $line.=sprintf(" %7d %6d %6s %7d %6d %6s", $lustreReadKBytesTot/$intSecs, $lustreReadOpsTot/$intSecs, $lustreReadOpsTot ? cvt($lustreReadKBytesTot/$lustreReadOpsTot,6,0,1) : 0, $lustreWriteKBytesTot/$intSecs, $lustreWriteOpsTot/$intSecs, $lustreWriteOpsTot ? cvt($lustreWriteKBytesTot/$lustreWriteOpsTot,6,0,1) : 0); } else { $line.=sprintf("%4s %4s", cvt($lustreReadKBytesTot/$intSecs,4,0,1), cvt($lustreReadOpsTot/$intSecs)); for ($i=0; $i<$numBrwBuckets; $i++) { $line.=sprintf(" %4s", cvt($lustreBufReadTot[$i]/$intSecs)); } $line.=sprintf(" %4s %4s", cvt($lustreWriteKBytesTot/$intSecs,4,0,1), cvt($lustreWriteOpsTot/$intSecs)); for ($i=0; $i<$numBrwBuckets; $i++) { $line.=sprintf(" %4s", cvt($lustreBufWriteTot[$i]/$intSecs)); } } $line.="\n"; printText($line); } # NOTE - this only applies to hp-sfs if ($subsys=~/l/ && ($reportMdsFlag || $reportOstFlag) && $lustOpts=~/D/) { if (printHeader()) { printText("\n") if !$homeFlag; printText("# LUSTRE DISK BLOCK LEVEL SUMMARY ($rate)\n#$miniFiller"); $temp=''; # not even room to preceed sizes with r/w's. foreach my $i (@diskBuckets) { #last if $i>$LustreMaxBlkSize; if ($i<1000) { $temp.=sprintf(" %3sK", $i) } else { $temp.=sprintf(" %3dM", $i/1024); } } printText("RdK Rds$temp WrtK Wrts$temp\n"); exit(0) if $showColFlag; } # Now do the data $line=$datetime; $line.=sprintf("%4s %4s", cvt($lusDiskReadBTot[$LusMaxIndex]*0.5/$intSecs), cvt($lusDiskReadsTot[$LusMaxIndex]/$intSecs)); for ($i=0; $i<$LusMaxIndex; $i++) { $line.=sprintf(" %4s", cvt($lusDiskReadsTot[$i]/$intSecs)); } $line.=sprintf(" %4s %4s", cvt($lusDiskWriteBTot[$LusMaxIndex]*0.5/$intSecs), cvt($lusDiskWritesTot[$LusMaxIndex]/$intSecs)); for ($i=0; $i<$LusMaxIndex; $i++) { $line.=sprintf(" %4s", cvt($lusDiskWritesTot[$i]/$intSecs)); } printText("$line\n"); } if ($subsys=~/L/ && $reportOstFlag && ($lustOpts=~/B/ || $lustOpts!~/D/)) { if (printHeader()) { # build ost header, and when no date/time make it even 1 char less. $temp="Ost". ' 'x$OstWidth; $temp=substr($temp, 0, $OstWidth); $temp=substr($temp, 0, $OstWidth-2).' ' if $miniFiller eq ''; # When doing dates/time shift first field over 1 to the left; $fill1=''; if ($miniFiller ne '') { $fill1=substr($miniDateTime, 0, length($miniFiller)-1); } printText("\n") if !$homeFlag; printText("# LUSTRE FILESYSTEM SINGLE OST STATISTICS ($rate)\n"); if ($lustOpts!~/B/) { printText("#$fill1$temp KBRead Reads SizeKB KBWrite Writes SizeKB\n"); } else { $temp2=''; foreach my $i (@brwBuckets) { $temp2.=sprintf(" %3dP", $i); } printText("#$fill1$temp RdK Rds$temp2 WrtK Wrts$temp2\n"); } exit(0) if $showColFlag; } for ($i=0; $i<$NumOst; $i++) { # If exception processing in effect, make sure this entry qualities next if $options=~/x/ && $lustreReadKBytes[$i]/$intSecs<$limLusKBS && $lustreWriteKBytes[$i]/$intSecs<$limLusKBS; $line=''; if ($lustOpts!~/B/) { $line.=sprintf("$datetime%-${OstWidth}s %7d %6d %6d %7d %6d %6d\n", $lustreOsts[$i], $lustreReadKBytes[$i]/$intSecs, $lustreReadOps[$i]/$intSecs, $lustreReadOps[$i] ? $lustreReadKBytes[$i]/$lustreReadOps[$i] : 0, $lustreWriteKBytes[$i]/$intSecs, $lustreWriteOps[$i]/$intSecs, $lustreWriteOps[$i] ? $lustreWriteKBytes[$i]/$lustreWriteOps[$i] : 0); } else { $line.=sprintf("$datetime%-${OstWidth}s %4s %4s", $lustreOsts[$i], cvt($lustreReadKBytes[$i]/$intSecs,4,0,1), cvt($lustreReadOps[$i]/$intSecs)); for ($j=0; $j<$numBrwBuckets; $j++) { $line.=sprintf(" %4s", cvt($lustreBufRead[$i][$j]/$intSecs)); } $line.=sprintf(" %4s %4s", cvt($lustreWriteKBytes[$i]/$intSecs,4,0,1), cvt($lustreWriteOps[$i]/$intSecs)); for ($j=0; $j<$numBrwBuckets; $j++) { $line.=sprintf(" %4s", cvt($lustreBufWrite[$i][$j]/$intSecs)); } $line.="\n"; } printText($line); } } if ($subsys=~/L/ && $lustOpts=~/D/) { if (printHeader()) { printText("\n") if !$homeFlag; printText("# LUSTRE DISK BLOCK LEVEL DETAIL ($rate, units are 512 bytes)\n#$miniFiller"); $temp=''; foreach my $i (@diskBuckets) { #last if $i>$LustreMaxBlkSize; if ($i<1000) { $temp.=sprintf(" %3sK", $i) } else { $temp.=sprintf(" %3dM", $i/1024); } } printText("DISK RdK Rds$temp WrtK Wrts$temp\n"); exit(0) if $showColFlag; } # Now do the data for ($i=0; $i<$NumLusDisks; $i++) { $line=$datetime; $line.=sprintf("%4s %4s %4s", $LusDiskNames[$i], cvt($lusDiskReadB[$i][$LusMaxIndex]*0.5/$intSecs), cvt($lusDiskReads[$i][$LusMaxIndex]/$intSecs)); for ($j=0; $j<$LusMaxIndex; $j++) { $temp=(defined($lusDiskReads[$i][$j])) ? cvt($lusDiskReads[$i][$j]/$intSecs) : 0; $line.=sprintf(" %4s", $temp); } $line.=sprintf(" %4s %4s", cvt($lusDiskWriteB[$i][$LusMaxIndex]*0.5/$intSecs), cvt($lusDiskWrites[$i][$LusMaxIndex]/$intSecs)); for ($j=0; $j<$LusMaxIndex; $j++) { $temp=(defined($lusDiskWrites[$i][$j])) ? cvt($lusDiskWrites[$i][$j]/$intSecs) : 0; $line.=sprintf(" %4s", $temp); } printText("$line\n"); } } # NOTE - there are a number of different types of formats here and we're always going # to include reads/writes with all of them! if ($subsys=~/l/ && $reportCltFlag) { # If time for common header, do it... if (printHeader()) { printText("\n") if !$homeFlag; printText("# LUSTRE CLIENT SUMMARY ($rate)"); printText(":") if $lustOpts=~/[BMR]/; printText(" RPC-BUFFERS (pages)") if $lustOpts=~/B/; printText(" METADATA") if $lustOpts=~/M/; printText(" READAHEAD") if $lustOpts=~/R/; printText("\n"); } # If exception processing must be above minimum if ($options!~/x/ || $lustreCltReadKBTot/$intSecs>=$limLusKBS || $lustreCltWriteKBTot/$intSecs>=$limLusKBS) { if ($lustOpts!~/[BMR]/) { printText("#$miniDateTime KBRead Reads SizeKB KBWrite Writes SizeKB\n") if printHeader(); exit(0) if $showColFlag; $line=sprintf("$datetime %7d %6d %6d %7d %6d %6d\n", $lustreCltReadKBTot/$intSecs, $lustreCltReadTot/$intSecs, $lustreCltReadTot ? int($lustreCltReadKBTot/$lustreCltReadTot) : 0, $lustreCltWriteKBTot/$intSecs, $lustreCltWriteTot/$intSecs, $lustreCltWriteTot ? int($lustreCltWriteKBTot/$lustreCltWriteTot) : 0); printText($line); } if ($lustOpts=~/B/) { if (printHeader()) { $temp=''; foreach my $i (@brwBuckets) { $temp.=sprintf(" %3dP", $i); } printText("#${miniDateTime}RdK Rds$temp WrtK Wrts$temp\n"); exit(0) if $showColFlag; } $line="$datetime"; $line.=sprintf("%4s %4s", cvt($lustreCltReadKBTot/$intSecs,4,0,1), cvt($lustreCltReadTot/$intSecs)); for ($i=0; $i<$numBrwBuckets; $i++) { $line.=sprintf(" %4s", cvt($lustreCltRpcReadTot[$i]/$intSecs)); } $line.=sprintf(" %4s %4s", cvt($lustreCltWriteKBTot/$intSecs,4,0,1), cvt($lustreCltWriteTot/$intSecs)); for ($i=0; $i<$numBrwBuckets; $i++) { $line.=sprintf(" %4s", cvt($lustreCltRpcWriteTot[$i]/$intSecs)); } printText("$line\n"); } if ($lustOpts=~/M/) { printText("#$miniDateTime KBRead Reads KBWrite Writes Open Close GAttr SAttr Seek Fsynk DrtHit DrtMis\n") if printHeader(); exit(0) if $showColFlag; $line=sprintf("$datetime %7d %6d %7d %6d %5d %5d %5d %5d %5d %5d %6d %6d\n", $lustreCltReadKBTot/$intSecs, $lustreCltReadTot/$intSecs, $lustreCltWriteKBTot/$intSecs, $lustreCltWriteTot/$intSecs, $lustreCltOpenTot/$intSecs, $lustreCltCloseTot/$intSecs, $lustreCltGetattrTot/$intSecs, $lustreCltSetattrTot/$intSecs, $lustreCltSeekTot/$intSecs, $lustreCltFsyncTot/$intSecs, $lustreCltDirtyHitsTot/$intSecs, $lustreCltDirtyMissTot/$intSecs); printText($line); } if ($lustOpts=~/R/) { printText("#$miniDateTime KBRead Reads KBWrite Writes Pend Hits Misses NotCon MisWin FalGrb LckFal Discrd ZFile ZerWin RA2Eof HitMax Wrong\n") if printHeader(); exit(0) if $showColFlag; $line=sprintf("$datetime %7d %6d %7d %6d %5d %5d %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d\n", $lustreCltReadKBTot/$intSecs, $lustreCltReadTot/$intSecs, $lustreCltWriteKBTot/$intSecs, $lustreCltWriteTot/$intSecs, $lustreCltRAPendingTot/$intSecs, $lustreCltRAHitsTot/$intSecs, $lustreCltRAMissesTot/$intSecs, $lustreCltRANotConTot/$intSecs, $lustreCltRAMisWinTot/$intSecs, $lustreCltRAFalGrabTot/$intSecs, $lustreCltRALckFailTot/$intSecs, $lustreCltRAReadDiscTot/$intSecs, $lustreCltRAZeroLenTot/$intSecs, $lustreCltRAZeroWinTot/$intSecs, $lustreCltRA2EofTot/$intSecs, $lustreCltRAHitMaxTot/$intSecs, $lustreCltRAWrongTot/$intSecs); printText($line); } } } # NOTE -- there are 2 levels of details, both with and without --lustopts O if ($subsys=~/L/ && $reportCltFlag) { if (printHeader()) { # we need to build filesystem header, and when no date/time make it even 1 # char less. $temp="Filsys". ' 'x$FSWidth; $temp=substr($temp, 0, $FSWidth); $temp=substr($temp, 0, $FSWidth-2).' ' if $miniFiller eq ''; # When doing dates/time, we also need to shift first field over 1 to the left; $fill1=''; if ($miniFiller ne '') { $fill1=substr($miniDateTime, 0, length($miniFiller)-1); } printText("\n") if !$homeFlag; printText("# LUSTRE CLIENT DETAIL ($rate)"); printText(":") if $lustOpts=~/[BMR]/; printText(" RPC-BUFFERS (pages)") if $lustOpts=~/B/; printText(" METADATA") if $lustOpts=~/M/; printText(" READAHEAD") if $lustOpts=~/R/; printText("\n"); } if ($lustOpts=~/O/) { # Never for M or R if ($lustOpts!~/B/) { $fill2=' 'x($OstWidth-3); printText("#$fill1$temp Ost$fill2 KBRead Reads SizeKB KBWrite Writes SizeKB\n") if printHeader(); exit(0) if $showColFlag; for ($i=0; $i<$NumLustreCltOsts; $i++) { $line=sprintf("$datetime%-${FSWidth}s %-${OstWidth}s %7d %6d %6d %7d %6d %6d\n", $lustreCltOstFS[$i], $lustreCltOsts[$i], defined($lustreCltLunReadKB[$i]) ? $lustreCltLunReadKB[$i]/$intSecs : 0, $lustreCltLunRead[$i]/$intSecs, (defined($lustreCltLunReadKB[$i]) && $lustreCltLunRead[$i]) ? $lustreCltLunReadKB[$i]/$lustreCltLunRead[$i] : 0, defined($lustreCltLunWriteKB[$i]) ? $lustreCltLunWriteKB[$i]/$intSecs : 0, $lustreCltLunWrite[$i]/$intSecs, (defined($lustreCltLunWriteKB[$i]) && $lustreCltLunWrite[$i]) ? $lustreCltLunWriteKB[$i]/$lustreCltLunWrite[$i] : 0); printText($line); } } if ($lustOpts=~/B/) { $fill2=' 'x($OstWidth-3); if (printHeader()) { $temp2=' 'x(length("$fill1$temp Ost$fill2 ")); $temp3=''; foreach my $i (@brwBuckets) { $temp3.=sprintf(" %3dP", $i); } printText("#$fill1$temp Ost$fill2 RdK Rds$temp3 WrtK Wrts$temp3\n"); } for ($clt=0; $clt<$NumLustreCltOsts; $clt++) { $line=sprintf("$datetime%-${FSWidth}s %-${OstWidth}s", $lustreCltOstFS[$clt], $lustreCltOsts[$clt]); $line.=sprintf("%4s %4s", cvt($lustreCltLunReadKB[$clt]/$intSecs,4,0,1), cvt($lustreCltLunRead[$clt]/$intSecs)); for ($i=0; $i<$numBrwBuckets; $i++) { $line.=sprintf(" %4s", cvt($lustreCltRpcRead[$clt][$i]/$intSecs)); } $line.=sprintf(" %4s %4s", cvt($lustreCltLunWriteKB[$clt]/$intSecs,4,0,1), cvt($lustreCltLunWrite[$clt]/$intSecs)); for ($i=0; $i<$numBrwBuckets; $i++) { $line.=sprintf(" %4s", cvt($lustreCltRpcWrite[$clt][$i]/$intSecs)); } printText("$line\n"); } } } else { $commonLine= "#$fill1$temp KBRead Reads SizeKB KBWrite Writes SizeKB"; if ($lustOpts!~/[MR]/) { printText("$commonLine\n") if printHeader(); exit(0) if $showColFlag; for ($i=0; $i<$NumLustreFS; $i++) { $line=sprintf("$datetime%-${FSWidth}s %7d %6d %6d %7d %6d %6d\n", $lustreCltFS[$i], $lustreCltReadKB[$i]/$intSecs, $lustreCltRead[$i]/$intSecs, $lustreCltRead[$i] ? $lustreCltReadKB[$i]/$lustreCltRead[$i] : 0, $lustreCltWriteKB[$i]/$intSecs, $lustreCltWrite[$i]/$intSecs, $lustreCltWrite[$i] ? $lustreCltWriteKB[$i]/$lustreCltWrite[$i] : 0); printText($line); } } if ($lustOpts=~/M/) { printText("$commonLine Open Close GAttr SAttr Seek Fsync DrtHit DrtMis\n") if printHeader(); exit(0) if $showColFlag; { for ($i=0; $i<$NumLustreFS; $i++) { $line=sprintf("$datetime%-${FSWidth}s %7d %6d %6d %7d %6d %6d %5d %5d %5d %5d %5d %5d %6d %6d\n", $lustreCltFS[$i], $lustreCltReadKB[$i]/$intSecs, $lustreCltRead[$i]/$intSecs, $lustreCltRead[$i] ? $lustreCltReadKB[$i]/$lustreCltRead[$i] : 0, $lustreCltWriteKB[$i]/$intSecs, $lustreCltWrite[$i]/$intSecs, $lustreCltWrite[$i] ? $lustreCltWriteKB[$i]/$lustreCltWrite[$i] : 0, $lustreCltOpen[$i]/$intSecs, $lustreCltClose[$i]/$intSecs, $lustreCltGetattr[$i]/$intSecs, $lustreCltSetattr[$i]/$intSecs, $lustreCltSeek[$i]/$intSecs, $lustreCltFsync[$i]/$intSecs, $lustreCltDirtyHits[$i]/$intSecs, $lustreCltDirtyMiss[$i]/$intSecs); printText($line); } } } if ($lustOpts=~/R/) { printText("$commonLine Pend Hits Misses NotCon MisWin FalGrb LckFal Discrd ZFile ZerWin RA2Eof HitMax Wrong\n") if printHeader(); exit(0) if $showColFlag; { for ($i=0; $i<$NumLustreFS; $i++) { $line=sprintf("$datetime%-${FSWidth}s %7d %6d %6d %7d %6d %6d %5d %5d %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d\n", $lustreCltFS[$i], $lustreCltReadKBTot/$intSecs, $lustreCltReadTot/$intSecs, $lustreCltRead[$i] ? $lustreCltReadKB[$i]/$lustreCltRead[$i] : 0, $lustreCltWriteKBTot/$intSecs, $lustreCltWriteTot/$intSecs, $lustreCltWrite[$i] ? $lustreCltWriteKB[$i]/$lustreCltWrite[$i] : 0, $lustreCltRAPendingTot/$intSecs, $lustreCltRAHitsTot/$intSecs, $lustreCltRAMissesTot/$intSecs, $lustreCltRANotConTot/$intSecs, $lustreCltRAMisWinTot/$intSecs, $lustreCltRAFalGrabTot/$intSecs, $lustreCltRALckFailTot/$intSecs, $lustreCltRAReadDiscTot/$intSecs, $lustreCltRAZeroLenTot/$intSecs, $lustreCltRAZeroWinTot/$intSecs, $lustreCltRA2EofTot/$intSecs, $lustreCltRAHitMaxTot/$intSecs, $lustreCltRAWrongTot/$intSecs); printText($line); } } } } } if ($subsys=~/m/) { if (printHeader()) { # Note that sar does page sizes in numbers of pages, not bytes printText("\n") if !$homeFlag; my $type=($memOpts!~/R/) ? '' : ' changes/int'; printText("# MEMORY SUMMARY$type\n"); if ($memOpts!~/R/) { $line="#$miniFiller"; $line.="<------------------------------------Physical Memory------------------------------------------>" if $memOpts eq '' || $memOpts=~/P/; $line.="<-----------Swap------------><-------Paging------>" if $memOpts eq '' || $memOpts=~/V/; $line.="<---Other---|-------Page Alloc------|------Page Refill----->" if $memOpts=~/p/; $line.="<------Page Steal-------|-------Scan KSwap------|------Scan Direct----->" if $memOpts=~/s/; printText("$line\n"); $line="#$miniFiller"; $line.=" Total Used Free Buff Cached Slab Mapped Anon AnonH Commit Locked Inact" if $memOpts eq '' || $memOpts=~/P/; $line.=" Total Used Free In Out Fault MajFt In Out" if $memOpts eq '' || $memOpts=~/V/; $line.=" Free Activ Dma Dma32 Norm Move Dma Dma32 Norm Move" if $memOpts=~/p/; $line.=" Dma Dma32 Norm Move Dma Dma32 Norm Move Dma Dma32 Norm Move" if $memOpts=~/s/; printText("$line\n"); } else { $line=sprintf("#$miniFiller<---------------------------------------Physical Memory-----------------------------------------------><------------Swap-------------><-------Paging------>\n"); printText($line); printText("#$miniDateTime Total Used Free Buff Cached Slab Mapped Anon AnonH Commit Locked Inact Total Used Free In Out Fault MajFt In Out\n"); } exit(0) if $showColFlag; } if ($memOpts!~/R/) { $line="$datetime "; $line.=sprintf(" %7s %7s %7s %7s %7s %7s %7s %7s %7s %7s %7s %5s", cvt($memTot,7,1,1), cvt($memUsed,7,1,1), cvt($memFree,7,1,1), cvt($memBuf,7,1,1), cvt($memCached,7,1,1), cvt($memSlab,7,1,1), cvt($memMap,7,1,1), cvt($memAnon,7,1,1), cvt($memAnonH,7,1,1), cvt($memCommit,7,1,1), cvt($memLocked,7,1,1), cvt($memInact,5,1,1)) if $memOpts eq '' || $memOpts=~/P/; $line.=sprintf(" %5s %5s %5s %4s %4s %5s %5s %4s %4s", cvt($swapTotal,5,1,1), cvt($swapUsed,5,1,1), cvt($swapFree,5,1,1), cvt($swapin/$intSecs,5,1,1), cvt($swapout/$intSecs,5,1,1), cvt($pagefault/$intSecs,5), cvt($pagemajfault/$intSecs,5), cvt($pagein/$intSecs,4), cvt($pageout/$intSecs,4)) if $memOpts eq '' || $memOpts=~/V/; $line.=sprintf(" %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s", cvt($pageFree,5), cvt($pageActivate,5), cvt($pageAllocDma,5), cvt($pageAllocDma32,5), cvt($pageAllocNormal,5), cvt($pageAllocMove,5), cvt($pageRefillDma,5), cvt($pageRefillDma32,5), cvt($pageRefillNormal,5), cvt($pageRefillMove,5)) if $memOpts=~/p/; $line.=sprintf(" %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s %5s", cvt($pageStealDma,5), cvt($pageStealDma32,5), cvt($pageStealNormal,5), cvt($pageStealMove,5), cvt($pageKSwapDma,5), cvt($pageKSwapDma32,5), cvt($pageKSwapNormal,5), cvt($pageKSwapMove,5), cvt($pageDirectDma,5), cvt($pageDirectDma32,5), cvt($pageDirectNormal,5), cvt($pageDirectMove,5)) if $memOpts=~/s/; $line.="\n"; } else { $line=sprintf("$datetime %7s %8s %8s %8s %8s %7s %7s %7s %7s %7s %7s %6s %5s %6s %6s %4s %4s %5s %5s %4s %4s\n", cvt($memTot/$intSecs,7,1,1), cvt($memUsedC/$intSecs,7,1,1), cvt($memFreeC/$intSecs,7,1,1), cvt($memBufC/$intSecs,7,1,1), cvt($memCachedC/$intSecs,7,1,1), cvt($memSlabC/$intSecs,7,1,1), cvt($memMapC/$intSecs,7,1,1), cvt($memAnonC/$intSecs,7,1,1), cvt($memAnonHC/$intSecs,7,1,1), cvt($memCommitC/$intSecs,7,1,1), cvt($memLockedC/$intSecs,7,1,1), cvt($memInactC/$intSecs,5,1,1), cvt($swapTotal,5,1,1), cvt($swapUsedC/$intSecs,5,1,1), cvt($swapFreeC/$intSecs,5,1,1), cvt($swapin/$intSecs,5,1,1), cvt($swapout/$intSecs,5,1,1), cvt($pagefault/$intSecs,5), cvt($pagemajfault/$intSecs,5), cvt($pagein/$intSecs,4), cvt($pageout/$intSecs,4)); } printText($line); } if ($subsys=~/M/) { if (printHeader()) { printText("\n") if !$homeFlag; my $type=($memOpts!~/R/) ? '' : " change$type"; printText("# MEMORY STATISTICS $type\n"); { # we've got the room so let's use an extra column for each and have the same # headers for 'R' and because I'm lazy. printText("#$miniFiller Node Total Used Free Slab Mapped Anon AnonH Locked Inact"); printText(" HitPct") if $memOpts!~/R/; printText("\n"); } exit(0) if $showColFlag; } $line=''; for (my $i=0; $i<$CpuNodes; $i++) { if ($memOpts!~/R/) { # total hits can be 0 if no data collected my $hitsplusmisses=$numaStat[$i]->{hits}+$numaStat[$i]->{for}+$numaStat[$i]->{miss}; my $hitrate=($hitsplusmisses) ? $numaStat[$i]->{hits}/$hitsplusmisses*100 : 100; $line.=sprintf("$datetime %4d %8s %8s %8s %8s %8s %8s %8s %8s %8s %6.2f\n", $i, cvt($numaMem[$i]->{used}+$numaMem[$i]->{free},7,1,1), cvt($numaMem[$i]->{used},7,1,1), cvt($numaMem[$i]->{free},7,1,1), cvt($numaMem[$i]->{slab},7,1,1), cvt($numaMem[$i]->{map},7,1,1), cvt($numaMem[$i]->{anon},7,1,1), cvt($numaMem[$i]->{anonH},7,1,1), cvt($numaMem[$i]->{lock},7,1,1), cvt($numaMem[$i]->{inact},7,1,1), $hitrate); } else { $line.=sprintf("$datetime %4d %8s %8s %8s %8s %8s %8s %8s %8s\n", $i, cvt($numaMem[$i]->{usedC}+$numaMem[$i]->{freeC},7,1,1), cvt($numaMem[$i]->{usedC},7,1,1), cvt($numaMem[$i]->{freeC},7,1,1), cvt($numaMem[$i]->{slabC},7,1,1), cvt($numaMem[$i]->{mapC},7,1,1), cvt($numaMem[$i]->{anonC},7,1,1), cvt($numaMem[$i]->{anonH},7,1,1), cvt($numaMem[$i]->{lockC},7,1,1), cvt($numaMem[$i]->{inactC},7,1,1)); } } printText($line); } if ($subsys=~/b/) { if (printHeader()) { my $k=$PageSize/1024; my $headers=''; for (my $i=0; $i<11; $i++) { my $header=sprintf("%dPg%s", 2**$i, $i==0 ? '': 's'); $headers.=sprintf("%8s", $header); } printText("\n") if !$homeFlag; printText("# MEMORY FRAGMENTATION SUMMARY (${k}K pages)\n"); printText("#${miniDateTime}$headers\n"); exit(0) if $showColFlag; } my $line="$datetime "; for (my $i=0; $i<11; $i++) { $line.=sprintf("%8d", $buddyInfoTot[$i]); } printText("$line\n"); } if ($subsys=~/B/) { if (printHeader()) { my $k=$PageSize/1024; my $headers=''; for (my $i=0; $i<11; $i++) { my $header=sprintf("%dPg%s", 2**$i, $i==0 ? '': 's'); $headers.=sprintf("%8s", $header); } printText("\n") if !$homeFlag; printText("# MEMORY FRAGMENTATION (${k}K pages)\n"); printText("#${miniDateTime}Node Zone $headers\n"); exit(0) if $showColFlag; } for (my $i=0; $i<$NumBud; $i++) { my $line="$datetime "; $line.=sprintf("%4d %6s ", $buddyNode[$i], $buddyZone[$i]); for (my $j=0; $j<11; $j++) { $line.=sprintf("%8d", $buddyInfo[$i][$j]); } printText("$line\n"); } } if ($subsys=~/n/) { my $netErrors=$netRxErrsTot+$netTxErrsTot; if ($netOpts!~/E/ || $netErrors || $showColFlag) { if (printHeader()) { my $errors=($netOpts=~/e/) ? 'ERRORS ' : ''; printText("\n") if !$homeFlag; printText("# NETWORK ${errors}SUMMARY ($rate)\n"); printText("#${miniDateTime} KBIn PktIn SizeIn MultI CmpI ErrsI KBOut PktOut SizeO CmpO ErrsO\n") if $netOpts!~/e/; printText("#${miniDateTime} ErrIn DropIn FifoIn FrameIn ErrOut DropOut FifoOut CollOut CarrOut\n") if $netOpts=~/e/; exit(0) if $showColFlag; } # if --netopts E, only print lines when there are errors # remember 'errs' is the totals of all the rx/tx counters, 'err' is a single counter if ($netOpts!~/e/) { $line=sprintf("$datetime%6d %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d\n", $netRxKBTot/$intSecs, $netRxPktTot/$intSecs, $netRxPktTot ? $netRxKBTot*1024/$netRxPktTot : 0, $netRxMltTot/$intSecs, $netRxCmpTot/$intSecs, $netRxErrsTot/$intSecs, $netTxKBTot/$intSecs, $netTxPktTot/$intSecs, $netTxPktTot ? $netTxKBTot*1024/$netTxPktTot : 0, $netTxCmpTot/$intSecs, $netTxErrsTot/$intSecs); } else { $line=sprintf("$datetime %7d %7d %7d %7d %7d %7d %7d %7d %7d\n", $netRxErrTot/$intSecs, $netRxDrpTot/$intSecs, $netRxFifoTot/$intSecs, $netRxFraTot/$intSecs, $netTxErrTot/$intSecs, $netTxErrTot/$intSecs, $netTxDrpTot/$intSecs, $netTxFifoTot/$intSecs, $netTxCollTot/$intSecs, $netTxCarTot/$intSecs); } printText($line); } # When we skip printing an interval when a single subsystem, our header counter # is off because it's been incremented, so back it up elsif ($subsys eq 'n') { $interval1Counter--; } } if ($subsys=~/N/) { # NOTE - header processing for detail data has always been ugly so let's not even # deal with error exception processing. if (printHeader()) { my $errors=($netOpts=~/e/) ? 'ERRORS ' : ''; my $tempName=' 'x($NetWidth-5).'Name'; printText("\n") if !$homeFlag; printText("# NETWORK ${errors}STATISTICS ($rate)\n"); printText("#${miniDateTime}Num $tempName KBIn PktIn SizeIn MultI CmpI ErrsI KBOut PktOut SizeO CmpO ErrsO\n") if $netOpts!~/e/; printText("#${miniDateTime}Num $tempName ErrIn DropIn FifoIn FrameIn ErrOut DropOut FifoOut CollOut CarrOut\n") if $netOpts=~/e/; exit(0) if $showColFlag; } my $idx=0; for ($o=0; $o<@netOrder; $o++) { # since we want to print in the discovery order, we need to turn the next # name into the right index $netName=$netOrder[$o]; $i=$networks{$netName}; next if !defined($i); next if ($netFiltKeep eq '' && $netName=~/$netFiltIgnore/) || ($netFiltKeep ne '' && $netName!~/$netFiltKeep/); my $netErrors=$netRxErrs[$i]+$netTxErrs[$i]; if ($netOpts!~/e/) { $line=sprintf("$datetime %3d %${NetWidth}s %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d\n", $idx, $netName, $netRxKB[$i]/$intSecs, $netRxPkt[$i]/$intSecs, $netRxPkt[$i] ? $netRxKB[$i]*1024/$netRxPkt[$i] : 0, $netRxMlt[$i]/$intSecs, $netRxCmp[$i]/$intSecs, $netRxErrs[$i]/$intSecs, $netTxKB[$i]/$intSecs, $netTxPkt[$i]/$intSecs, $netTxPkt[$i] ? $netTxKB[$i]*1024/$netTxPkt[$i] : 0, $netTxCmp[$i]/$intSecs, $netTxErrs[$i]/$intSecs); } else { $line=sprintf("$datetime %3d %${NetWidth}s %7d %7d %7d %7d %7d %7d %7d %7d %7d\n", $idx, $netName[$i], $netRxErr[$i]/$intSecs, $netRxDrp[$i]/$intSecs, $netRxFifo[$i]/$intSecs, $netRxFra[$i]/$intSecs, $netTxErr[$i]/$intSecs, $netTxErr[$i]/$intSecs, $netTxDrp[$i]/$intSecs, $netTxFifo[$i]/$intSecs, $netTxColl[$i]/$intSecs, $netTxCar[$i]/$intSecs); } printText($line) if $netOpts!~/E/ || $netErrors; $idx++; } } if ($subsys=~/s/) { if (printHeader()) { printText("\n") if !$homeFlag; printText("# SOCKET STATISTICS\n"); printText("#${miniFiller} <-------------Tcp-------------> Udp Raw <---Frag-->\n"); printText("#${miniDateTime}Used Inuse Orphan Tw Alloc Mem Inuse Inuse Inuse Mem\n"); exit(0) if $showColFlag; } $line=sprintf("$datetime%5d %5d %5d %5d %5d %5d %5d %5d %5d %5d\n", $sockUsed, $sockTcp, $sockOrphan, $sockTw, $sockAlloc, $sockMem, $sockUdp, $sockRaw, $sockFrag, $sockFragM); printText($line); } if ($subsys=~/t/) { if (printHeader()) { printText("\n") if !$homeFlag; printText("# TCP STACK SUMMARY ($rate)\n"); $line= "#${miniFiller}"; $line.="<----------------------------------IpPkts----------------------------------->" if $tcpFilt=~/i/; $line.="<---------------------------------Tcp--------------------------------->" if $tcpFilt=~/t/; $line.="<------------Udp----------->" if $tcpFilt=~/u/; $line.="<----------------------------Icmp--------------------------->" if $tcpFilt=~/c/; $line.="<-------------------------IpExt------------------------>" if $tcpFilt=~/I/; $line.="<-------------------------------------------------TcpExt------------------------------------------------>" if $tcpFilt=~/T/; $line.="\n"; $line.="#$miniFiller"; $line.=" Receiv Delivr Forwrd DiscdI InvAdd Sent DiscrO ReasRq ReasOK FragOK FragCr" if $tcpFilt=~/i/; $line.=" ActOpn PasOpn Failed ResetR Estab SegIn SegOut SegRtn SegBad SegRes" if $tcpFilt=~/t/; $line.=" InDgm OutDgm NoPort Errors" if $tcpFilt=~/u/; $line.=" Recvd FailI UnreI EchoI ReplI Trans FailO UnreO EchoO ReplO" if $tcpFilt=~/c/; $line.=" MPktsI BPktsI OctetI MOctsI BOctsI MPktsI OctetI MOctsI" if $tcpFilt=~/I/; $line.=" FasTim Reject DelAck QikAck PktQue PreQuB HdPdct PurAck HPAcks DsAcks RUData REClos SackS PkLoss FTrans" if $tcpFilt=~/T/; $line.="\n"; printText($line); exit(0) if $showColFlag; } $line="$datetime "; $line.=sprintf(" %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d", $tcpData{Ip}->{InReceives}/$intSecs, $tcpData{Ip}->{InDelivers}/$intSecs, $tcpData{Ip}->{ForwDatagrams}, $tcpData{Ip}->{InDiscards}, $tcpData{Ip}->{InAddrErrors}, $tcpData{Ip}->{OutRequests}/$intSecs, $tcpData{Ip}->{OutDiscards}, $tcpData{Ip}->{ReasmReqds}, $tcpData{Ip}->{ReasmOKs}, $tcpData{Ip}->{FragOKs}, $tcpData{Ip}->{FragCreates}) if $tcpFilt=~/i/; $line.=sprintf(" %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d", $tcpData{Tcp}->{ActiveOpens}/$intSecs, $tcpData{Tcp}->{PassiveOpens}/$intSecs, $tcpData{Tcp}->{AttemptFails}, $tcpData{Tcp}->{EstabResets}, $tcpData{Tcp}->{CurrEstab}, $tcpData{Tcp}->{InSegs}/$intSecs, $tcpData{Tcp}->{OutSegs}/$intSecs, $tcpData{Tcp}->{RetransSegs}, $tcpData{Tcp}->{InErrs}, $tcpData{Tcp}->{OutRsts}) if $tcpFilt=~/t/; $line.=sprintf(" %6d %6d %6d %6d", $tcpData{Udp}->{InDatagrams}/$intSecs, $tcpData{Udp}->{OutDatagrams}/$intSecs, $tcpData{Udp}->{NoPorts}, $tcpData{Udp}->{InErrors}) if $tcpFilt=~/u/; $line.=sprintf(" %5d %5d %5d %5d %5d %5d %5d %5d %5d %5d", $tcpData{Icmp}->{InMsgs}, $tcpData{Icmp}->{InErrors}, $tcpData{Icmp}->{InDestUnreachs}, $tcpData{Icmp}->{InEchos}, $tcpData{Icmp}->{InEchoReps}, $tcpData{Icmp}->{OutMsgs}, $tcpData{Icmp}->{OutErrors}, $tcpData{Icmp}->{OutDestUnreachs}, $tcpData{Icmp}->{OutEchos}, $tcpData{Icmp}->{OutEchoReps}) if $tcpFilt=~/c/; $line.=sprintf(" %6d %6d %6d %6d %6d %6d %6d %6d", $tcpData{IpExt}->{InMcastPkts}, $tcpData{IpExt}->{InBcastPkts}, $tcpData{IpExt}->{InOctets}, $tcpData{IpExt}->{InMcastOctets}, $tcpData{IpExt}->{InBcastOctets}, $tcpData{IpExt}->{OutMcastPkts}, $tcpData{IpExt}->{OutOctets}, $tcpData{IpExt}->{OutMcastOctets}) if $tcpFilt=~/I/; $line.=sprintf(" %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d", $tcpData{TcpExt}->{TW}, $tcpData{TcpExt}->{PAWSEstab}, $tcpData{TcpExt}->{DelayedACKs}, $tcpData{TcpExt}->{DelayedACKLost}, $tcpData{TcpExt}->{TCPPrequeued}, $tcpData{TcpExt}->{TCPDirectCopyFromPrequeue}, $tcpData{TcpExt}->{TCPHPHits}, $tcpData{TcpExt}->{TCPPureAcks}, $tcpData{TcpExt}->{TCPHPAcks}, $tcpData{TcpExt}->{TCPDSACKOldSent}, $tcpData{TcpExt}->{TCPAbortOnData}, $tcpData{TcpExt}->{TCPAbortOnClose}, $tcpData{TcpExt}->{TCPSackShiftFallback}, $tcpData{TcpExt}->{TCPLoss}, $tcpData{TcpExt}->{TCPFastRetrans}) if $tcpFilt=~/T/; $line.="\n"; printText($line); } if ($subsys=~/E/ && $interval3Print) { if (printHeader()) { printText("\n") if !$homeFlag; printText("# ENVIRONMENTAL STATISTICS\n"); $envNewHeader=1; } my $keyCounter=0; foreach $key (sort keys %$ipmiData) { next if $key=~/fan/ && $envOpts!~/f/; next if $key=~/power/ && $envOpts!~/p/; next if $key=~/temp/ && $envOpts!~/t/; $keyCounter++; if ($keyCounter==1 || $envOpts=~/M/) { $envHeader="#$miniDateTime"; $line="$datetime "; } for (my $i=0; $i{$key}}); $i++) { # we only do these when a main header printed if ($envNewHeader) { my $name=$ipmiData->{$key}->[$i]->{name}; my $inst=$ipmiData->{$key}->[$i]->{inst}; $name=sprintf("$name%s", $inst ne '-1' ? $inst : ''); $envHeader.=sprintf(" %7s", $name); } # Not sure if I should be reporting 0 but that's why this is experimental! my $value= $ipmiData->{$key}->[$i]->{value}; my $status=$ipmiData->{$key}->[$i]->{status}; $line.=sprintf(" %7s", ($value ne '') ? $value : 0); } # a multi-line print is done for each unique type (currently just fan & temp) if ($envOpts=~/M/) { printText("$envHeader\n") if $envNewHeader; printText("$line\n"); } } # Non-multi-line prints only done once if ($envOpts!~/M/) { printText("$envHeader\n") if $envNewHeader; exit(0) if $showColFlag; printText("$line\n"); } } if ($subsys=~/x/) { if ($NumHCAs) { if (printHeader()) { printText("\n") if !$homeFlag; printText("# INFINIBAND SUMMARY ($rate)\n"); printText("#${miniDateTime} KBIn PktIn SizeIn KBOut PktOut SizeOut Errors\n"); exit(0) if $showColFlag; } $line=sprintf("$datetime%7d %7d %7d %7d %7d %7s %7s\n", $ibRxKBTot/$intSecs, $ibRxTot/$intSecs, $ibRxTot ? cvt($ibRxKBTot*1024/$ibRxTot,7,0,1) : 0, $ibTxKBTot/$intSecs, $ibTxTot/$intSecs, $ibTxTot ? cvt($ibTxKBTot*1024/$ibTxTot,7,0,1) : 0, $ibErrorsTotTot); printText($line); } } if ($subsys=~/X/) { if ($NumHCAs) { if (printHeader()) { printText("\n") if !$homeFlag; printText("# INFINIBAND STATISTICS ($rate)\n"); printText("#${miniDateTime}HCA KBIn PktIn SizeIn KBOut PktOut SizeOut Errors\n"); exit(0) if $showColFlag; } for ($i=0; $i<$NumHCAs; $i++) { # this is messy. some HCSa end with _ which we don't want to print BUT we # need to preserve the full name in the array so do a non-greedy match so # we see everything except the optional _ at the end. $HCAName[$i]=~/(\S+?)_*$/; $line=sprintf("$datetime %-6s %7s %7s %7s %7s %7s %7s %7s\n", $1, cvt($ibRxKB[$i]/$intSecs,7,0,1), cvt($ibRx[$i]/$intSecs,6), $ibRx[$i] ? cvt($ibRxKB[$i]*1024/$ibRx[$i],4,0,1) : 0, cvt($ibTxKB[$i]/$intSecs,7,0,1), cvt($ibTx[$i]/$intSecs,6), $ibTx[$i] ? cvt($ibTxKB[$i]*1024/$ibTx[$i],4,0,1) : 0, cvt($ibErrorsTot[$i],4)); printText($line); } } } if ($subsys=~/y/ && $interval2Print) { if ($slabinfoFlag) { if (printHeader()) { printText("\n") if !$homeFlag; printText("# SLAB SUMMARY\n"); printText("#${miniFiller}<------------Objects------------><--------Slab Allocation-------><--Caches--->\n"); printText("#${miniDateTime} InUse Bytes Alloc Bytes InUse Bytes Total Bytes InUse Total\n"); exit(0) if $showColFlag; } $line=sprintf("$datetime %7s %7s %7s %7s %6s %7s %6s %7s %6s %6s\n", cvt($slabObjActTotal,7), cvt($slabObjActTotalB,7,0,1), cvt($slabObjAllTotal,7), cvt($slabObjAllTotalB,7,0,1), cvt($slabSlabActTotal,6), cvt($slabSlabActTotalB,7,0,1), cvt($slabSlabAllTotal,6), cvt($slabSlabAllTotalB,7,0,1), cvt($slabNumAct,6), cvt($slabNumTot,6)); printText($line); } else { if (printHeader()) { printText("\n") if !$homeFlag; printText("# SLAB SUMMARY\n"); printText("#${miniFiller}<---Objects---><-Slabs-><-----memory----->\n"); printText("#${miniDateTime} In Use Avail Number Used Total\n"); exit(0) if $showColFlag; } $line=sprintf("$datetime %7s %7s %7s %7s %7s\n", cvt($slabNumObjTot,7), cvt($slabObjAvailTot,7), cvt($slabNumTot,7), cvt($slabUsedTot,7,0,1), cvt($slabTotalTot,7,0,1)); printText($line); } } # tricky - by definitio --showcolheaders only shows single lines headers, SO if multiple # imports and verbose, you only get the first! for (my $i=0; $i<$impNumMods; $i++) { &{$impPrintVerbose[$i]}(printHeader(), $homeFlag, \$line); printText($line) if $line ne ''; # rare, but it can happen when no instances of a component (screws up colmux!) exit(0) if $showColFlag; } # Since slabs/processes both report rates, we need to skip first printable interval # unless we're doing consecutive files printTermSlab() if $subsys=~/Y/ && $interval2Print && (!$firstTime2 || $consecutiveFlag); printTermProc() if $subsys=~/Z/ && $interval2Print && (!$firstTime2 || $consecutiveFlag); # if running with --home in --top mode we might have junk in the rest of the display when # items come and go, which they can when doing things like disk filtering or displaying # processes so clear from the current location to the end of the display and reset $clscr # so we never clear the screen more than once but rather just overwrite what's there # also note in --top vertical mode, $clscr and $home are '' printText($clr) if $homeFlag || ($numTop && $playback eq ''); $clscr=$home; } sub printTermSlab { # the use of $topVertFlag should be consistent with that of printTermProc() but the code # is NOT identical so there are differences. # Much of the top-slab methodology stolen from printTermProc() my %slabSort; my $slabCount=0; my $eol=sprintf("%c[K", 27); printf "%c[%d;H", 27, $scrollEnd ? $scrollEnd+1 : 0 if $numTop && $playback eq '' && !$topVertFlag; # if someone wants to look at slabs with --home and NOT --top, let them! print "$clscr" if !$numTop && $homeFlag && !$topVertFlag; if (printHeader() || $numTop) { if ($numTop) { $temp2=(split(/\s+/,localtime($seconds)))[3]; $temp2.=sprintf(".%03d", $usecs) if $options=~/m/; } printText("\n") if !$homeFlag; my $temp=(!$topSlabFlag) ? 'SLAB DETAIL' : "TOP SLABS $temp2"; printText("# $temp\n"); if ($topSlabFlag) { print "#NumObj ActObj ObjSize NumSlab Obj/Slab TotSize TotChg TotPct Name\n"; } elsif ($slabinfoFlag) { printText("#${miniFiller} <-----------Objects----------><---------Slab Allocation------><---Change-->\n"); printText("#${miniDateTime}Name InUse Bytes Alloc Bytes InUse Bytes Total Bytes Diff Pct\n"); } else { printText("#${miniFiller} <----------- objects --------><--- slabs ---><---------allocated memory-------->\n"); printText("#${miniDateTime}Slab Name Size /slab In Use Avail SizeK Number UsedK TotalK Change Pct\n"); } exit(0) if $showColFlag; } if ($slabinfoFlag) { for ($i=0; $i<$slabIndexNext; $i++) { if (!$topSlabFlag || $topType eq 'name') { $key=$slabName[$i]; } elsif ($topType eq 'numobj') { $key=sprintf('%9d', 999999999-$slabObjAllTot[$i]); } elsif ($topType eq 'actobj') { $key=sprintf('%9d', 999999999-$slabObjActTot[$i]); } elsif ($topType eq 'objsize') { $key=sprintf('%9d', 999999999-$slabObjSize[$i]); } elsif ($topType eq 'numslab') { $key=sprintf('%9d', 999999999-$slabSlabActTot[$i]); } elsif ($topType eq 'objslab') { $key=sprintf('%9d', 999999999-$slabObjPerSlab[$i]); } elsif ($topType eq 'totsize') { $key=sprintf('%9d', 999999999-$slabSlabAllTotB[$i]); } elsif ($topType eq 'totchg') { $key=sprintf('%9d', 999999999-abs($slabTotMemChg[$i])); } elsif ($topType eq 'totpct') { $key=sprintf('%9d', 999999999-abs($slabTotMemPct[$i])); } $slabSort{"$key-$i"}=$i; # need to include '-$i' to allow duplicates } foreach $key (sort keys %slabSort) { $i=$slabSort{$key}; # the first test is for filtering out zero-size slabs and the # second for slabs that didn't change this during this interval next if (($slabSlabAllTot[$i]==0 && ($topSlabFlag || $slabOpts=~/s/)) || ($slabOpts=~/S/ && $slabSlabAct[$i]==0 && $slabSlabAll[$i]==0)); if ($topSlabFlag) { last if ++$slabCount>$numTop; $line=sprintf("%7s %7s %7s %7s %7s %7s %7s %6.1f %s", cvt($slabObjAllTot[$i],6), cvt($slabObjActTot[$i],6), cvt($slabObjSize[$i],6), cvt($slabSlabActTot[$i],6), cvt($slabObjPerSlab[$i],6), cvt($slabSlabAllTotB[$i],4,0,1), cvt($slabTotMemChg[$i],4,0,1),$slabTotMemPct[$i], $slabName[$i]); $line.=$eol if $playback eq '' && $numTop && !$topVertFlag; $line.="\n" if $playback ne '' || !$numTop || $slabCount<$numTop || $topVertFlag; printText($line); next; } $line=sprintf("$datetime%-25s %7s %7s %6s %7s %6s %7s %6s %7s %6s %6.1f\n", substr($slabName[$i],0,25), cvt($slabObjActTot[$i],6), cvt($slabObjActTotB[$i],7,0,1), cvt($slabObjAllTot[$i],6), cvt($slabObjAllTotB[$i],7,0,1), cvt($slabSlabActTot[$i],6), cvt($slabSlabActTotB[$i],7,0,1), cvt($slabSlabAllTot[$i],6), cvt($slabSlabAllTotB[$i],7,0,1), cvt($slabTotMemChg[$i],7,0,1),$slabTotMemPct[$i]); printText($line); } } else { foreach my $first (sort keys %slabfirst) { my $slab=$slabfirst{$first}; if (!$topSlabFlag || $topType eq 'name') { $key=lc($first); # otherwise all upper-case names will come first } elsif ($topType eq 'numobj') { $key=sprintf('%9d', 999999999-$slabdata{$slab}->{slabsize}*$slabdata{$slab}->{avail}); } elsif ($topType eq 'actobj') { $key=sprintf('%9d', 999999999-$slabdata{$slab}->{slabsize}*$slabdata{$slab}->{objects}); } elsif ($topType eq 'objsize') { $key=sprintf('%9d', 999999999-$slabdata{$slab}->{slabsize}); } elsif ($topType eq 'numslab') { $key=sprintf('%9d', 999999999-$slabdata{$slab}->{slabs}); } elsif ($topType eq 'objslab') { $key=sprintf('%9d', 999999999-$slabdata{$slab}->{objper}); } elsif ($topType eq 'totsize') { $key=sprintf('%9d', 999999999-$slabdata{$slab}->{total}); } elsif ($topType eq 'totchg') { $key=sprintf('%9d', 999999999-abs($slabdata{$slab}->{memchg})); } elsif ($topType eq 'totpct') { $key=sprintf('%9d', 999999999-abs($slabdata{$slab}->{mempct})); } $slabSort{"$key-$first"}=$first; # need to include '-$first' to allow duplicates } foreach my $key (sort keys %slabSort) { my $first=$slabSort{$key}; my $slab=$slabfirst{$first}; # as for regular slabs, the first test is for filtering out zero-size # slabs and the second for slabs that didn't change this during this interval my $numObjects=$slabdata{$slab}->{objects}; my $numSlabs= $slabdata{$slab}->{slabs}; next if (($slabdata{$slab}->{objects}==0 && ($topSlabFlag || $slabOpts=~/s/)) || ($slabOpts=~/S/ && $slabdata{$slab}->{lastobj}==$numObjects && $slabdata{$slab}->{lastslabs}==$numSlabs)); if ($topSlabFlag) { last if ++$slabCount>$numTop; $line=sprintf("%7s %7s %7s %7s %7s %7s %7s %6.1f %s", cvt($slabdata{$slab}->{slabsize}*$slabdata{$slab}->{avail},6), cvt($slabdata{$slab}->{slabsize}*$numObjects,6), cvt($slabdata{$slab}->{slabsize},6), cvt($numSlabs,6), cvt($slabdata{$slab}->{objper},6), cvt($slabdata{$slab}->{total},4,0,1), cvt($slabdata{$slab}->{memchg},4,0,1), $slabdata{$slab}->{mempct}, $first); $line.=$eol if $playback eq '' && $numTop && !$topVertFlag; $line.="\n" if $playback ne '' || !$numTop || $slabCount<$numTop || $topVertFlag; printText($line); next; } printf "$datetime%-25s %7d %5d %7d %7d %5d %7d %8d %8d %7s %6.1f\n", substr($first,0,25), $slabdata{$slab}->{slabsize}, $slabdata{$slab}->{objper}, $numObjects, $slabdata{$slab}->{avail}, ($PageSize<<$slabdata{$slab}->{order})/1024, $numSlabs, $slabdata{$slab}->{used}/1024, $slabdata{$slab}->{total}/1024, cvt($slabdata{$slab}->{memchg},7,0,1), $slabdata{$slab}->{mempct}; # So we can tell when something changes $slabdata{$slab}->{lastobj}= $numObjects; $slabdata{$slab}->{lastslabs}=$numSlabs; } } } sub printTermProc { # if a process is discovered AFTER we start, this routine gets called called the first # time a process is seen and '$interval2Secs' will be 0! In that one special case # we need to wait for the next interval before printing. return if !$interval2Secs; # if we get here interactively, our cursor has already been set at home, but if # --top and -s also specified ($scrollEnd!=0) we need to move past the scroll area # but only if NOT in vertical mode printf "%c[%d;H", 27, $scrollEnd ? $scrollEnd+1 : 0 if $numTop && $playback eq '' && !$topVertFlag; # Never report timestamps in --top format. my $tempFiller=(!$numTop) ? $miniDateTime : ''; my $tempTStamp=(!$numTop) ? $datetime : ''; # shorter name $uw=$procUsrWidth; # Since printHeader() is used by everyone, we need to force header printing for # processes when in top mode since we ALWAYS want them if (printHeader() || $numTop) { printText("\n") if !$homeFlag; $temp1=($procOpts=~/f/) ? "(counters are cumulative)" : "(counters are $rate)"; $temp2=''; if ($numTop) { $temp2= " ".(split(/\s+/,localtime($seconds)))[3]; $temp2.=sprintf(".%03d", $usecs) if $options=~/m/; printText("# TOP PROCESSES sorted by $topType $temp1$temp2\n"); } else { printText("# PROCESS SUMMARY $temp1$temp2$cpuDisabledMsg\n"); } $tempHdr=''; if ($procOpts!~/[im]/) { if ($procOpts!~/R/) { $prHeader='PR'; $prFormat='%2s'; } else { $prHeader='PRIO'; $prFormat='%4d'; } $tempHdr.="#${tempFiller} PID User $prHeader PPID THRD S VSZ RSS CP SysT UsrT Pct AccuTime "; $tempHdr.=sprintf("%s ", $procOpts=~/s/ ? 'StrtTime' : 'StartTime ') if $procOpts=~/s/i; $tempHdr.=" RKB WKB " if $processIOFlag; $tempHdr.="VCtx NCtx " if $procOpts=~/x/; $tempHdr.="MajF MinF Command\n"; } elsif ($procOpts=~/i/) { $tempHdr.="#${tempFiller} PID User PPID S SysT UsrT Pct AccuTime RKB WKB RKBC WKBC RSys WSys Cncl Command\n"; } elsif ($procOpts=~/m/) { $tempHdr.="#${tempFiller} PID User S VmSize VmLck VmRSS VmData VmStk VmExe VmLib VmSwp MajF MinF Command\n"; } if ($procOpts=~/u/) { $user=sprintf("%-${uw}s", 'User'); $tempHdr=~s/User /$user/; } printText($tempHdr); exit(0) if $showColFlag; } # When doing --top, we sort by time, io or faults my %procSort; my $eol=''; if ($numTop) { # only in non-vertical mode, clear from current position to the end of line since # there could be junk there $eol=sprintf("%c[K", 27) if $playback eq '' && !$topVertFlag; foreach my $pid (keys %procIndexes) { # While I could do this at print time, it's more efficient to not even consider the # during the sort. next if $procState ne '' && $procState[$procIndexes{$pid}]!~/[$procState]/; my $accum=0; my $ipid=$procIndexes{$pid}; if ($topType eq 'vsz') { $accum=defined($procVmSize[$ipid]) ? $procVmSize[$ipid] : 0; } elsif ($topType eq 'rss') { $accum=defined($procVmRSS[$ipid]) ? $procVmRSS[$ipid] : 0; } elsif ($topType eq 'pid') { $accum=32767-$pid; # to sort ascending } elsif ($topType eq 'cpu') { $accum=$NumCpus-$procCPU[$ipid]; # to sort ascending } elsif ($topType eq 'syst') { $accum=$procSTime[$ipid]; } elsif ($topType eq 'usrt') { $accum=$procUTime[$ipid]; } elsif ($topType eq 'time') { $accum=$procSTime[$ipid]+$procUTime[$ipid]; } elsif ($topType eq 'accum') { $accum=$procSTimeTot[$ipid]+$procUTimeTot[$ipid]; } elsif ($topType eq 'thread') { $accum=$procTCount[$ipid]; } elsif ($topType eq 'rkb') { $accum=$procRKB[$ipid]; } elsif ($topType eq 'wkb') { $accum=$procWKB[$ipid]; } elsif ($topType eq 'iokb') { $accum=$procRKB[$ipid]+$procWKB[$ipid]; } elsif ($topType eq 'rbkc') { $accum=$procRKBC[$ipid]; } elsif ($topType eq 'wkbc') { $accum=$procWKBC[$ipid]; } elsif ($topType eq 'iokbc') { $accum=$procRKBC[$ipid]+$procWKBC[$ipid]; } elsif ($topType eq 'ioall') { $accum=$procRKB[$ipid]+ $procWKB[$ipid]+ $procRKBC[$ipid]+$procWKBC[$ipid]; } elsif ($topType eq 'rsys') { $accum=$procRSys[$ipid]; } elsif ($topType eq 'wsys') { $accum=$procWSys[$ipid]; } elsif ($topType eq 'iosys') { $accum=$procRSys[$ipid]+$procWSys[$ipid]; } elsif ($topType eq 'iocncl') { $accum=$procCKB[$ipid]; } elsif ($topType eq 'vctx') { $accum=$procVCtx[$ipid]; } elsif ($topType eq 'nctx') { $accum=$procNCtx[$ipid]; } elsif ($topType eq 'minf') { $accum=$procMinFlt[$ipid]; } elsif ($topType eq 'flt') { $accum=$procMajFlt[$ipid]+$procMinFlt[$ipid]; } my $key=sprintf("%09d:%06d", 999999999-$accum, $pid); $procSort{$key}=$pid if $procOpts!~/z/ || $accum!=0; } } # otherwise we print in order of ascending pid else { foreach $pid (keys %procIndexes) { next if $procState ne '' && $procState[$procIndexes{$pid}]!~/[$procState]/; $procSort{sprintf("%06d", $pid)}=$pid; } } my $procCount=0; foreach $key (sort keys %procSort) { # if we had partial data for this pid don't try to print! $i=$procIndexes{$procSort{$key}}; #print ">>>SKIP PRINTING DATA for pid $key i: $i" # if (!defined($procSTimeTot[$i])); next if (!defined($procSTimeTot[$i])); last if $numTop && ++$procCount>$numTop; # Handle -oF if ($procOpts=~/f/) { $majFlt=$procMajFltTot[$i]; $minFlt=$procMinFltTot[$i]; } else { $majFlt=$procMajFlt[$i]/$interval2Secs; $minFlt=$procMinFlt[$i]/$interval2Secs; } # If wide mode OR when removing known shells (in which case we need to look at cmd1), # we include the command arguments AND chop trailing spaces ($cmd0, $cmd1)=(defined($procCmd[$i])) ? split(/\s+/,$procCmd[$i],2) : ($procName[$i],''); $cmd1='' if $procOpts!~/[kw]/ || !defined($cmd1); # Since a program CAN modify its definition in /proc/pid/cmdline, it can # end up without a trailing null and ultimately the split below results # in an undefined $cmd1, which is why we need to test/init it if need be if ($cmd1 ne '') { $cmd1=~s/\s+$//; $cmd1=substr($cmd1, 0, $procCmdWidth); } # EXPERIMENTAL # if told to do so, remove some of the known/standard shells from the command string in cmd0; if ($procOpts=~/k/) { if ($cmd0=~m[/bin/sh|/bin/bash|/usr/bin/perl|/bin/python\d*|^python]) { $cmd1=~s/^-\S+\s+//; # remove optional switch some shells have # now move 1st field in $cmd2 to $cmd1, which we no longer need, only keeping # the cmd2 if procopts 'w' is set $cmd1=~s/^(\S+)\s*//; # need '*' in case no args following command $cmd0=$1; } $cmd1='' if $procOpts!~/w/; } # If only keeping the root of the command name, do it after dealing with known # shells in case we want the root of arg1 $cmd0=basename($cmd0) if $procOpts=~/r/ && $cmd0=~/^\//; # This is the standard format if ($procOpts!~/[im]/) { # Note we only started fetching Tgid in V3.0.0 $line=sprintf("$tempTStamp%5d%s %-${uw}s $prFormat %5d %4d %1s %5s %5s %2d %s %s %s %s ", $procPid[$i], $procThread[$i] ? '+' : ' ', substr($procUser[$i],0,$uw), $procPri[$i], defined($procTgid[$i]) && $procTgid[$i]!=$procPid[$i] ? $procTgid[$i] : $procPpid[$i], $procTCount[$i], $procState[$i], defined($procVmSize[$i]) ? cvt($procVmSize[$i],4,1,1) : 0, defined($procVmRSS[$i]) ? cvt($procVmRSS[$i],4,1,1) : 0, $procCPU[$i], cvtT1($procSTime[$i]), cvtT1($procUTime[$i]), cvtP($procSTime[$i]+$procUTime[$i]), cvtT2($procSTimeTot[$i]+$procUTimeTot[$i])); $line.=sprintf("%s ", cvtT5($procSTTime[$i])) if $procOpts=~/s/i; $line.=sprintf("%4s %4s ", cvt($procRKB[$i]/$interval2Secs,4,0,1), cvt($procWKB[$i]/$interval2Secs,4,0,1)) if $processIOFlag; $line.=sprintf("%4s %4s ", cvt($procVCtx[$i]/$interval2Secs,4,0,1), cvt($procNCtx[$i]/$interval2Secs,4,0,1)) if $procOpts=~/x/; $line.=sprintf("%4s %4s %s %s", cvt($majFlt), cvt($minFlt), $cmd0, $cmd1); } elsif ($procOpts=~/i/) { $line=sprintf("%s%5d%s %-${uw}s %5d %1s %s %s %3d %s ", $tempTStamp, $procPid[$i], $procThread[$i] ? '+' : ' ', substr($procUser[$i],0,$uw), defined($procTgid[$i]) && $procTgid[$i]!=$procPid[$i] ? $procTgid[$i] : $procPpid[$i], $procState[$i], cvtT1($procSTime[$i]), cvtT1($procUTime[$i]), cvtP($procSTime[$i]+$procUTime[$i]), cvtT2($procSTimeTot[$i]+$procUTimeTot[$i])); $line.=sprintf("%5s %5s %5s %5s %5s %5s %5s %s %s", cvt($procRKB[$i]/$interval2Secs,5,0,1), cvt($procWKB[$i]/$interval2Secs,5,0,1), cvt($procRKBC[$i]/$interval2Secs,5,0,1), cvt($procWKBC[$i]/$interval2Secs,5,0,1), cvt($procRSys[$i]/$interval2Secs,5,0,1), cvt($procWSys[$i]/$interval2Secs,5,0,1), cvt($procCKB[$i]/$interval2Secs,5,0,1), $cmd0, $cmd1); } elsif ($procOpts=~/m/) { $line=sprintf("%s%5d%s %-${uw}s %1s %6s %6s %6s %6s %6s %6s %6s %6s %4s %4s %s %s", $tempTStamp, $procPid[$i], $procThread[$i] ? '+' : ' ', substr($procUser[$i],0,$uw), $procState[$i], defined($procVmSize[$i]) ? cvt($procVmSize[$i],6,1,1) : 0, defined($procVmLck[$i]) ? cvt($procVmLck[$i],6,1,1) : 0, defined($procVmRSS[$i]) ? cvt($procVmRSS[$i],6,1,1) : 0, defined($procVmData[$i]) ? cvt($procVmData[$i],6,1,1) : 0, defined($procVmStk[$i]) ? cvt($procVmStk[$i],6,1,1) : 0, defined($procVmExe[$i]) ? cvt($procVmExe[$i],6,1,1) : 0, defined($procVmLib[$i]) ? cvt($procVmLib[$i],6,1,1) : 0, defined($procVmSwap[$i]) ? cvt($procVmSwap[$i],6,1,1) : 0, cvt($majFlt), cvt($minFlt), $cmd0, $cmd1); } $line.=$eol if $playback eq '' && $numTop && !$topVertFlag; $line.="\n" if $playback ne '' || !$numTop || $procCount<$numTop || $topVertFlag; printText($line); } # clear to the end of the display in case doing --procopts z, since the process list # length changes dynamically print $clr if $numTop && $playback eq '' && !$topVertFlag; } # this routine detects and 'fixes' counters that have wrapped # *** warning *** It appears that partition 'use' counters wrap at wordsize/100 # on an ia32 (these are pretty pesky to actually catch). There may be more and # they may behave differently on different architectures (though I tend to doubt # it) so the best we can do is deal with them when we see them. sub fix { my $counter=shift; # if we're a smaller architecture than the number itself, we should still be # ok because perl isn't restricted by word size. if ($counter<0) { my $divisor=shift; my $maxSize=shift; # if param3 exists (rare), we use this as the max counter size; otherwidse 32 bit my $wordsize=defined($maxSize) ? $maxSize : $word32; # only adjust divisor when we're told to do so in param2. my $add=defined($divisor) ? $wordsize/$divisor : $wordsize; $counter+=$add; } return($counter); } # unitCounter 0 -> none, 1 -> K, etc (divide by $divisor this # times) # divisor 0 -> /1000 1 -> /1024 sub cvt { my $field=shift; my $width=shift; my $unitCounter=shift; my $divisorType=shift; $width=4 if !defined($width); $unitCounter=0 if !defined($unitCounter); $divisorType=0 if !defined($divisorType); $negative=0 if !defined($negative); $field=int($field+.5) if $field>0; # round up in case <1 # This is tricky, because if the value fits within the width, we # must also be sure the unit counter is 0 otherwise both may not # fit. Naturally in 'wide' mode we aways report the complete value # and we never print units with values of 0. return($field) if ($field==0) || ($unitCounter<1 && length($field)<=$width) || $wideFlag; # At least with slabs you can get negative numbers since we're tracking changes my $sign=($field<0) ? -1 : 1; $field=abs($field); my $last=0; my $divisor=($divisorType==0) ? 1000 : $OneKB; while (length($field)>=$width) { $last=$field; $field=int($field/$divisor); $unitCounter++; } $field*=$sign; my $units=substr(" KMGTP", $unitCounter, 1); my $result=(abs($field)>0) ? "$field$units" : "1$units"; # Messy, but I hope reasonable efficient. We're only applying this to # fields >= 'G' and options g/G! Furthermore, for -oG we only reformat # when single digit because no room for 2. if ($units=~/[GTP]/ && $options=~/g/i && (length($field))!=3) { # This one's a mouthful... we need to figure out what the remainer of the # previous division was, by subtracting the field*divisor. Then we need # to round up and pad with leading 0s. Note cases where we've rounded # something like 9.999 which really needs to become 10.000 my $round=($options=~/g/) ? 5 : 50; my $fraction=sprintf("%03d", $last-$field*$divisor+$round); if ($fraction>=$divisor) { $field++; $fraction='000'; } # For 'G' we almost always print the first form but if we just rounded from 9.9 to # to 10, we no longer have room for the fraction if ($options=~/G/) { $result=(length($field)==1) ? "$field.".substr($fraction, 0, 1).'G' : "$field$units"; } elsif ($options=~/g/) { # since the fraction follows the 'g', just chop the thing to 4 chars $result=substr("${field}g".$fraction, 0, 4); } } return($result); } # Time Format1 - convert time in jiffies to something ps-esque # Seconds.hsec only (not suitable for longer times such as accumulated cpu) sub cvtT1 { my $jiffies=shift; my $nsFlag= shift; my ($secs, $hsec); # set formatting for minutes according to 'no space' flag $MF=(!$nsFlag) ? '%2d' : '%d'; $secs=int($jiffies/$HZ); $jiffies=$jiffies-$secs*$HZ; $hsec=$jiffies/$HZ*100; return(sprintf("$MF.%02d", $secs, $hsec)); } # Time Format1 - convert time in jiffies to something ps-esque # we're not doing hours to save a couple of columns sub cvtT2 { my $jiffies=shift; my $nsFlag= shift; my ($hour, $mins, $secs, $time, $hsec); $secs=int($jiffies/$HZ); $jiffies=$jiffies-$secs*$HZ; $hsec=$jiffies/$HZ*100; $mins=int($secs/60); $secs=$secs-$mins*60; $time=($mins<60) ? sprintf("%02d:%02d.%02d", $mins % 60, $secs, $hsec) : sprintf("%02d:%02d:%02d", int($mins/60), $mins % 60, $secs); $time=" $time" if !$nsFlag && length($time)==8; # usually 8, but room for 3 digit mins return($time); } sub cvtT3 { my $secs=shift; $secs/=100; # $secs really is msec my $hours=int($secs/3600); my $mins= int(($secs-$hours*3600)/60); return(sprintf("%d:%02d:%02d", $hours, $mins, $secs-$hours*3600-$mins*60)); } # convert time in seconds to date/time sub cvtT4 { my $seconds=shift; my $msec=($options=~/m/) ? sprintf(".%s", (split(/\./, $seconds))[1]) : ''; my ($ss, $mm, $hh, $mday, $mon, $year)=localtime($seconds); my $date=($options=~/d/) ? sprintf("%02d/%02d", $mon+1, $mday) : sprintf("%d%02d%02d", $year+1900, $mon+1, $mday); my $time= sprintf("%02d:%02d:%02d%s", $hh, $mm, $ss, $msec); return($date, $time); } sub cvtT5 { my $time=shift; my $realTime=$boottime+$time/100; # time in jiffies my ($ss, $mm, $hh, $day, $mon)=localtime($realTime); my $timestr; if ($procOpts=~/s/) { $timestr=sprintf("%02d:%02d:%02d", $hh, $mm, $ss); } else { my $month=substr("JanFebMarAprMayJunJulAugSepOctNovDec", $mon*3, 3); $timestr=sprintf("%s%02d-%02d:%02d:%02d", $month, $day, $hh, $mm, $ss); } return($timestr); } sub cvtP { my $jiffies=shift; my ($secs, $percent); # when using --from, we sometimes have not set $interval2SecsReal for the # first sample so use i2 which is a good approximation $secs=$jiffies/$HZ; $interval2SecsReal=$interval2 if $interval2SecsReal==0; $percent=sprintf("%3d", 100*$secs/$interval2SecsReal); return($percent); } # Like printInterval, this is also used for terminal/socket output and therefore # not something we need to worry about for logging! sub printText { my $text=shift; my $eol= shift; print $text if !$sockFlag; # just like in writeData, we need to make sure each line preceed # with host name if not in server mode BUT only if not shutting down. if ($sockFlag && scalar(@sockets) && !$doneFlag) { $text=~s/^(.*)$/$Host $1/mg if !$serverFlag; $text.=">>><<<\n" if defined($eol); foreach my $socket (@sockets) { my $length=length($text); for (my $offset=0; $offset<$length;) { # When in client mode this WILL generate an error when the process who # started us terminates. my $bytes=syswrite($socket, $text, $length, $offset); if (!defined($bytes)) { logmsg('E', "Error '$!' writing to socket") if $serverFlag; last; } $offset+=$bytes; $length-=$bytes; } } } } # see if time to print header sub printHeader { # It might also be time to print a separator printSeparator($seconds, $usecs) if !$separatorHeaderPrinted; $separatorHeaderPrinted=1; # S p e c i a l C a s e s # Unless we say so explicitly we won't print a header and since we never do so under the # following specific case of --top, let's get it out of the way first. return(0) if $numTop && $headerRepeat==0 && $sameColsFlag; return(1) if $subsys=~/[YZ]/ && $procFilt eq '' && $slabFilt eq '' && $slabOpts!~/S/; return(1) if $numTop && $playback eq ''; return(1) if $headerRepeat==1; # brute force! # S t a n d a r d P r o c e s s i n g # The most common is when different column names and we simply do a new header every # interval or when using --home to look top-ish output. Not sure why $totalCounter... return(1) if $headerRepeat>-1 && (!$sameColsFlag || $totalCounter==1 || $homeFlag); # Note that in detail mode (and that includes processes/slabs with filters) there's no # real easy way to tell when to redo the header so rather we'll just repeat them every # --hr set of intervals rather than lines. return(1) if ($headerRepeat>0 && ( ($interval1Counter % $headerRepeat)==1 || (($interval2Counter % $headerRepeat)==1 && $interval2Print) || (($interval3Counter % $headerRepeat)==1 && $interval3Print)) ); # do NOT print a header... return(0); } # This routine gets called when it MIGHT be time to print a record separator since we've # not printed one yet and are printing data for a new intercal. sub printSeparator { my $seconds=shift; my $usecs= shift; # here's where we decide whether or not we really want the interval headers, but # only if not --full. This is also where all the special cases come in. if (!$fullFlag) { return if !$numTop && $sameColsFlag && $subsys!~/[YZ]/ && !$homeFlag; return if $numTop && $playback eq '' && !$detailFlag; return if $subsys eq 'Y' && ($slabFilt ne '' || $slabOpts=~/S/); return if $subsys eq 'Z' && $procFilt ne ''; } my $date=localtime($seconds); if ($options=~/m/) { my ($dow, $mon, $day, $time, $year)=split(/ /, $date); $date="$dow $mon $day $time.$usecs $year"; } # Remember that -A with logging never writes to terminals. my $temp=sprintf("%s", $homeFlag ? $clscr : "\n"); # we want to include the types of data being reported in this interval, # when --full but since $interval2Print hasn't been set yet, we need # to find out this way assume everything will print and then remove # procs, slabs and env if not my $which=''; if ($fullFlag) { $which=":$subsys"; $which=~s/[YZ]+//g if $i2DataFlag!=$interval2Print; $which=~s/[E]+//g if $i3DataFlag!=$interval3Print; } $temp.=sprintf("### RECORD %4d >>> $HostLC <<< ($seconds$which) ($date) ###\n", ++$separatorCounter); printText($temp); } sub getHeader { my $file=shift; my ($gzFlag, $header, $TEMP, $line); $gzFlag=$file=~/gz$/ ? 1 : 0; if ($gzFlag) { $TEMP=Compress::Zlib::gzopen($file, "rb") or logmsg("F", "Couldn't open '$file'"); } else { open TEMP, "<$file" or logmsg("F", "Couldn't open '$file'"); } $header=""; while (1) { $TEMP->gzreadline($line) if $gzFlag; $line= if !$gzFlag; last if $line!~/^#/; $header.=$line; } close TEMP; print "*** Header For: $file ***\n$header" if $debug & 16; return($header); } sub incomplete { my $type=shift; my $secs=shift; my $special=shift; my ($seconds, $ss, $mm, $hh, $mday, $mon, $year, $date, $time); $seconds=(split(/\./, $secs))[0]; ($ss, $mm, $hh, $mday, $mon, $year)=localtime($seconds); $date=sprintf("%d%02d%02d", $year+1900, $mon+1, $mday); $time=sprintf("%02d:%02d:%02d", $hh, $mm, $ss); my $message=(!defined($special)) ? "Incomplete" : $special; my $where=($playback eq '') ? "on $date" : "in $playbackFile"; logmsg("W", "$message data record skipped for $type data $where at $time"); } # Handy for debugging sub getTime { my $seconds=shift; my ($ss, $mm, $hh, $mday, $mon, $year); ($ss, $mm, $hh, $mday, $mon, $year)=localtime($seconds); return(sprintf("%02d:%02d:%02d", $hh, $mm, $ss)); } ######################################## # Brief Mode is VERY Special ######################################## sub printBrief { my ($command, $pad, $i); my $line=''; # We want to track elapsed time. This is only looked at in interactive mode. $miniStart=$seconds if !defined($miniStart) || $miniStart==0; if ( $headerRepeat==1 || ($headerRepeat==0 && !$headersPrinted) || ($headerRepeat>0 && ($totalCounter % $headerRepeat)==1)) { $cpuDisabledMsg=~s/^://; # just in case non-null $pad=' ' x length($miniDateTime); $fill1=($Hyper eq '') ? "----" : ""; $fill2=($Hyper eq '') ? "----" : "-"; $line.="$clscr"; $line.="#$cpuDisabledMsg\n" if $cpuDisabledMsg ne ''; $line.="#$pad"; $line.="<----${fill1}CPU$Hyper$fill2---->" if $subsys=~/c/; if ($subsys=~/j/) { my $numCpus=$NumCpus; # number of cpus to display usually all of them # if doing CPU filtering, we only want those of interest in the header if (@cpuFiltIgnore || @cpuFiltKeep) { $numCpus=0; for (my $i=0; $i<$NumCpus; $i++) { next if (@cpuFiltKeep && !defined($cpuFiltKeep[$i])) || (@cpuFiltIgnore && defined($cpuFiltIgnore[$i])); $numCpus++; } } my $num=int(($numCpus-1)*5/2); my $pad1='-'x$num; my $pad2=$pad1; $line.="<${pad1}Int$pad2->"; } # sooo ugly... my ($tcp1,$tcp2); if ($subsys=~/t/) { $tcp2=''; $tcp2.=' IP ' if $tcpFilt=~/i/; $tcp2.=' Tcp ' if $tcpFilt=~/t/; $tcp2.=' Udp ' if $tcpFilt=~/u/; $tcp2.='Icmp ' if $tcpFilt=~/c/; $tcp2.='TcpX ' if $tcpFilt=~/T/; my $num=int((length($tcp2)-5)/2); my $num2=((length($tcp2) % 2)==0) ? $num+1 : $num; my $pre= '-' x $num; my $post='-' x $num2; $tcp1="<${pre}TCP$post>"; $tcp1="" if length($tcp2)==5; } $line.="<--Memory-->" if $subsys!~/m/ && $subsys=~/b/; if ($memOpts!~/R/) { $line.="<-----------Memory----------->" if $subsys=~/m/ && $subsys!~/b/; $line.="<-----------------Memory----------------->" if $subsys=~/m/ && $subsys=~/b/; } else { $line.="<--------------Memory-------------->" if $subsys=~/m/ && $subsys!~/b/; $line.="<--------------------Memory-------------------->" if $subsys=~/m/ && $subsys=~/b/; } $line.="<-----slab---->" if $subsys=~/y/; $line.="<----------Disks----------->" if $subsys=~/d/ && !$ioSizeFlag && $dskOpts!~/i/; $line.="<---------------Disks---------------->" if $subsys=~/d/ && ($ioSizeFlag || $dskOpts=~/i/); $line.="<----------Network---------->" if $subsys=~/n/ && !$ioSizeFlag && $netOpts!~/i/; $line.="<---------------Network--------------->" if $subsys=~/n/ && ($ioSizeFlag || $netOpts=~/i/); $line.=$tcp1 if $subsys=~/t/; $line.="<------Sockets----->" if $subsys=~/s/; $line.="<----Files--->" if $subsys=~/i/; $line.="<-----------InfiniBand----------->" if $subsys=~/x/ && $NumHCAs && (!$ioSizeFlag && $xOpts!~/i/); $line.="<----------------InfiniBand---------------->" if $subsys=~/x/ && $NumHCAs && ($ioSizeFlag || $xOpts=~/i/); # probably a better way to handle iosize too $line=~s/Network/---Network---/ if $netOpts=~/e/; # a bunch of extra work but worth it! if ($subsys=~/f/) { # If all filters specified, no room! if ($nfsFilt eq '' || length($nfsFilt)==17) { $line.="<------NFS Totals------>"; } else { my $padL=$padR=int((14-length($nfsFilt))/2); $padL++ if length($nfsFilt) & 1; # handle odd number of -'s $padL='-'x$padL; $padR='-'x$padR; $line.="<$padL-NFS [$nfsFilt]-$padR>"; } } $line.="<--------Lustre MDS-------->" if $subsys=~/l/ && $reportMdsFlag; $line.="<---------Lustre OST--------->" if $subsys=~/l/ && $reportOstFlag && !$ioSizeFlag; $line.="<--------------Lustre OST-------------->" if $subsys=~/l/ && $reportOstFlag && $ioSizeFlag; if ($subsys=~/l/ && $reportCltFlag) { $line.="<--------Lustre Client-------->" if !$ioSizeFlag && $lustOpts!~/R/; $line.="<---------------Lustre Client--------------->" if !$ioSizeFlag && $lustOpts=~/R/; $line.="<-------------Lustre Client------------->" if $ioSizeFlag && $lustOpts!~/R/; $line.="<--------------------Lustre Client-------------------->" if $ioSizeFlag && $lustOpts=~/R/; } for (my $i=0; $i<$impNumMods; $i++) { &{$impPrintBrief[$i]}(1, \$line); } $line.="\n"; $line.="#$miniDateTime"; $line.="cpu sys inter ctxsw " if $subsys=~/c/; if ($subsys=~/j/) { # more ugliness caused by cpu filtering for (my $i=0; $i<$NumCpus; $i++) { next if (@cpuFiltKeep && !defined($cpuFiltKeep[$i])) || (@cpuFiltIgnore && defined($cpuFiltIgnore[$i])); $line.=sprintf("Cpu%d ", $i) if $i<10; $line.=sprintf("Cp%d ", $i) if $i>9 && $i<100; $line.=sprintf("C%d ", $i) if $i>99 && $i<1000; $line.=sprintf("%d ", $i) if $i>999; } # Rare, but if a cpu is offline, change its name in the header if ($cpusDisabled) { for (my $i=0; $i<$NumCpus; $i++) { next if (@cpuFiltKeep && !defined($cpuFiltKeep[$i])) || (@cpuFiltIgnore && defined($cpuFiltIgnore[$i])); $line=~s/Cpu$i/CpuX/ if !$cpuEnabled[$i]; $line=~s/Cp$i/CpXX/ if !$cpuEnabled[$i] && length($i)==2; $line=~s/C$i/CXXX/ if !$cpuEnabled[$i] && length($i)==3; $line=~s/$i/XXXX/ if !$cpuEnabled[$i] && length($i)==4; } } } if ($memOpts!~/R/) { $line.="Free Buff Cach Inac Slab Map " if $subsys=~/m/; } else { $line.=" Free Buff Cach Inac Slab Map " if $subsys=~/m/; } $line.=" Fragments " if $subsys=~/b/; $line.=" Alloc Bytes " if $subsys=~/y/ && $slabinfoFlag; $line.=" InUse Total " if $subsys=~/y/ && $slubinfoFlag; $line.="KBRead Reads KBWrit Writes " if $subsys=~/d/ && !$ioSizeFlag && $dskOpts!~/i/; $line.="KBRead Reads Size KBWrit Writes Size " if $subsys=~/d/ && ($ioSizeFlag || $dskOpts=~/i/); $line.=" KBIn PktIn KBOut PktOut " if $subsys=~/n/ && !$ioSizeFlag && $netOpts!~/i/; $line.=" KBIn PktIn Size KBOut PktOut Size " if $subsys=~/n/ && ($ioSizeFlag || $netOpts=~/i/); $line.="Error " if $netOpts=~/e/; $line.=$tcp2 if $subsys=~/t/; $line.=" Tcp Udp Raw Frag " if $subsys=~/s/; $line.="Handle Inodes " if $subsys=~/i/; $line.=" KBIn PktIn KBOut PktOut Errs " if $subsys=~/x/ && $NumHCAs && (!$ioSizeFlag && $xOpts!~/i/); $line.=" KBIn PktIn Size KBOut PktOut Size Errs " if $subsys=~/x/ && $NumHCAs && ($ioSizeFlag || $xOpts=~/i/); $line.=" Reads Writes Meta Comm " if $subsys=~/f/; if ($subsys=~/l/ && $reportMdsFlag) { $line.="Gattr+ Sattr+ Sync "; $line.=($cfsVersion lt '1.6.5') ? 'Reint ' : 'Unlnk '; } $line.=" KBRead Reads KBWrit Writes " if $subsys=~/l/ && $reportOstFlag && !$ioSizeFlag; $line.=" KBRead Reads Size KBWrit Writes Size " if $subsys=~/l/ && $reportOstFlag && $ioSizeFlag; if ($subsys=~/l/ && $reportCltFlag) { $line.=" KBRead Reads KBWrite Writes" if !$ioSizeFlag; $line.=" KBRead Reads Size KBWrite Writes Size" if $ioSizeFlag; $line.=" Hits Misses" if $lustOpts=~/R/; } for (my $i=0; $i<$impNumMods; $i++) { &{$impPrintBrief[$i]}(2, \$line); } $line.="\n"; $headersPrinted=1; if ($showColFlag) { printText($line); exit(0); } } goto statsSummary if $statsFlag && $statOpts!~/i/i; # leading space not needed for date/time $line.=sprintf(' ') if !$miniDateFlag && !$miniTimeFlag; # First part always the same... $line.=sprintf("%s ", $datetime) if $miniDateFlag || $miniTimeFlag; if ($subsys=~/c/) { $i=$NumCpus; $sysTot=$sysP[$i]+$irqP[$i]+$softP[$i]+$stealP[$i]; $cpuTot=$userP[$i]+$niceP[$i]+$sysTot; $line.=sprintf("%3d %3d %5s %6s ", $cpuTot, $sysTot, cvt($intrpt/$intSecs,5), cvt($ctxt/$intSecs,6)); } if ($subsys=~/j/) { for (my $i=0; $i<$NumCpus; $i++) { next if (@cpuFiltKeep && !defined($cpuFiltKeep[$i])) || (@cpuFiltIgnore && defined($cpuFiltIgnore[$i])); $line.=sprintf("%4s ", cvt($intrptTot[$i]/$intSecs,4,0,0)); } } if ($subsys=~/m/) { if ($memOpts!~/R/) { $line.=sprintf("%4s %4s %4s %4s %4s %4s ", cvt($memFree,4,1,1), cvt($memBuf,4,1,1), cvt($memCached,4,1,1), cvt($memInact,4,1,1), cvt($memSlab,4,1,1), cvt($memMap+$memAnon,4,1,1)); } else { $line.=sprintf("%5s %5s %5s %5s %5s %5s ", cvt($memFreeC/$intSecs,4,1,1), cvt($memBufC/$intSecs,4,1,1), cvt($memCachedC/$intSecs,4,1,1), cvt($memInactC/$intSecs,4,1,1), cvt($memSlabC/$intSecs,4,1,1), cvt($memMapC+$memAnonC/$intSecs,4,1,1)); } } if ($subsys=~/b/) { $line.=sprintf("%s ", base36(@buddyInfoTot)); } if ($subsys=~/y/) { if ($slabinfoFlag) { $line.=sprintf("%6s %7s ", cvt($slabSlabAllTotal,6), cvt($slabSlabAllTotalB,7,0,1)); } else { $line.=sprintf("%6s %7s ", cvt($slabNumObjTot,7), cvt($slabTotalTot,7,0,1)); } } if ($subsys=~/d/) { if (!$ioSizeFlag && $dskOpts!~/i/) { $line.=sprintf("%6s %6s %6s %6s ", cvt($dskReadKBTot/$intSecs,6,0,1), cvt($dskReadTot/$intSecs,6), cvt($dskWriteKBTot/$intSecs,6,0,1), cvt($dskWriteTot/$intSecs,6)); } else { $dskReadSizeTot= ($dskReadTot) ? $dskReadKBTot/$dskReadTot : 0; $dskWriteSizeTot=($dskWriteTot) ? $dskWriteKBTot/$dskWriteTot : 0; $line.=sprintf("%6s %6s %4s %6s %6s %4s ", cvt($dskReadKBTot/$intSecs,6,0,1), cvt($dskReadTot/$intSecs,6), cvt($dskReadSizeTot, 4), cvt($dskWriteKBTot/$intSecs,6,0,1), cvt($dskWriteTot/$intSecs,6), cvt($dskWriteSizeTot, 4)); } } # Network always the same my $netErrors=$netRxErrsTot+$netTxErrsTot; if ($subsys=~/n/) { if (!$ioSizeFlag && $netOpts!~/i/) { $line.=sprintf("%6s %6s %6s %6s ", cvt($netRxKBTot/$intSecs,6,0,1), cvt($netRxPktTot/$intSecs,6), cvt($netTxKBTot/$intSecs,6,0,1), cvt($netTxPktTot/$intSecs,6)); } else { $netRxSizeTot=($netRxPktTot) ? $netRxKBTot*1024/$netRxPktTot : 0; $netTxSizeTot=($netTxPktTot) ? $netTxKBTot*1024/$netTxPktTot : 0; $line.=sprintf("%6s %6s %4s %6s %6s %4s ", cvt($netRxKBTot/$intSecs,6,0,1), cvt($netRxPktTot/$intSecs,6), cvt($netRxSizeTot,4,0,1), cvt($netTxKBTot/$intSecs,6,0,1), cvt($netTxPktTot/$intSecs,6), cvt($netTxSizeTot,4,0,1)); } # if --netops E and no errors, don't print ANYTHING!!! $line.=sprintf("%5s ", cvt($netErrors/$intSecs,5)) if $netOpts=~/e/; } # TCP Stack if ($subsys=~/t/) { $line.=sprintf("%4s ", cvt($ipErrors, 4)) if $tcpFilt=~/i/; $line.=sprintf("%4s ", cvt($tcpErrors, 4)) if $tcpFilt=~/t/; $line.=sprintf("%4s ", cvt($udpErrors, 4)) if $tcpFilt=~/u/; $line.=sprintf("%4s ", cvt($icmpErrors, 4)) if $tcpFilt=~/c/; $line.=sprintf("%4s ", cvt($tcpExErrors,4)) if $tcpFilt=~/T/; } if ($subsys=~/s/) { $line.=sprintf("%4s %4s %4s %4s ", cvt($sockUsed,4), cvt($sockUdp,4), cvt($sockRaw,4), cvt($sockFrag,4)); } if ($subsys=~/i/) { $line.=sprintf("%6s %6s ", cvt($filesAlloc, 6), cvt($inodeUsed, 6)); } if ($subsys=~/x/) { if ($NumHCAs) { if (!$ioSizeFlag && $xOpts!~/i/) { $line.=sprintf("%7s %6s %7s %6s %4s ", cvt($ibRxKBTot/$intSecs,7,0,1), cvt($ibRxTot/$intSecs,6), cvt($ibTxKBTot/$intSecs,7,0,1), cvt($ibTxTot/$intSecs,6), cvt($ibErrorsTotTot,4)); } else { $line.=sprintf("%7s %6s %4s %7s %6s %4s %4s ", cvt($ibRxKBTot/$intSecs,7,0,1), cvt($ibRxTot/$intSecs,6), $ibRxTot ? cvt($ibRxKBTot*1024/$ibRxTot,4,0,1) : 0, cvt($ibTxKBTot/$intSecs,7,0,1), cvt($ibTxTot/$intSecs,6), $ibTxTot ? cvt($ibTxKBTot*1024/$ibTxTot,4,0,1) : 0, cvt($ibErrorsTotTot,4)); } } } if ($subsys=~/f/) { $line.=sprintf("%6s %6s %4s %4s ", cvt($nfsReadsTot/$intSecs,6), cvt($nfsWritesTot/$intSecs,6), cvt($nfsMetaTot/$intSecs), cvt($nfsCommitTot/$intSecs)); } # MDS if ($subsys=~/l/ && $reportMdsFlag) { my $setattrPlus=$lustreMdsReintSetattr+$lustreMdsSetxattr; my $getattrPlus=$lustreMdsGetattr+$lustreMdsGetattrLock+$lustreMdsGetxattr; my $variableParam=($cfsVersion lt '1.6.5') ? $lustreMdsReint : $lustreMdsReintUnlink; $line.=sprintf("%6s %6s %6s %6s ", cvt($getattrPlus/$intSecs,6), cvt($setattrPlus/$intSecs,6), cvt($lustreMdsSync/$intSecs,6), cvt($variableParam/$intSecs,6)); } # OST if ($subsys=~/l/ && $reportOstFlag) { if (!$ioSizeFlag) { $line.=sprintf("%7s %6s %7s %6s ", cvt($lustreReadKBytesTot/$intSecs,7,0,1), cvt($lustreReadOpsTot/$intSecs,6), cvt($lustreWriteKBytesTot/$intSecs,7,0,1), cvt($lustreWriteOpsTot/$intSecs,6)); } else { $line.=sprintf("%7s %6s %4s %7s %6s %4s ", cvt($lustreReadKBytesTot/$intSecs,7,0,1), cvt($lustreReadOpsTot/$intSecs,6), $lustreReadOpsTot ? cvt($lustreReadKBytesTot/$lustreReadOpsTot,4,0,1) : 0, cvt($lustreWriteKBytesTot/$intSecs,7,0,1), cvt($lustreWriteOpsTot/$intSecs,6), $lustreWriteOpsTot ? cvt($lustreWriteKBytesTot/$lustreWriteOpsTot,4,0,1) : 0); } } #Lustre Client if ($subsys=~/l/ && $reportCltFlag) { if (!$ioSizeFlag) { $line.=sprintf("%7s %6s %7s %6s", cvt($lustreCltReadKBTot/$intSecs,7,0,1), cvt($lustreCltReadTot/$intSecs), cvt($lustreCltWriteKBTot/$intSecs,7,0,1), cvt($lustreCltWriteTot/$intSecs,6)); } else { $line.=sprintf("%7s %6s %4s %7s %6s %4s", cvt($lustreCltReadKBTot/$intSecs,7,0,1), cvt($lustreCltReadTot/$intSecs), $lustreCltReadTot ? cvt($lustreCltReadKBTot/$lustreCltReadTot,4,0,1) : 0, cvt($lustreCltWriteKBTot/$intSecs,7,0,1), cvt($lustreCltWriteTot/$intSecs,6), $lustreCltWriteTot ? cvt($lustreCltWriteKBTot/$lustreCltWriteTot,4,0,1) : 0); } # Add in cache hits/misses if --lustopts R $line.=sprintf(" %6d %6d", $lustreCltRAHitsTot, $lustreCltRAMissesTot) if $lustOpts=~/R/; } for (my $i=0; $i<$impNumMods; $i++) { &{$impPrintBrief[$i]}(3, \$line); } $line.="\n"; # S p e c i a l ' h o t ' K e y P r o c e s s i n g # First time through when an attached terminal if ($termFlag && !defined($mini1select)) { $mini1select=new IO::Select(STDIN); resetBriefCounters(); `stty -echo` if !$PcFlag && !$backFlag && $termFlag && $playback eq ''; } # See if user entered a command. If not, @ready will never be # non-zero so the 'if' below will never fire. Also, if we haven't # done one interval, ignore becuase $miniInstances will be 0 @ready=$mini1select->can_read(0) if $termFlag; if (scalar(@ready)) { $command=; if ($miniInstances) { $resetType='T'; $resetType=$command if $command=~/a|t|z/i; printBriefCounters($resetType); resetBriefCounters() if $resetType=~/Z/i; } } # come here from collectl's ONLY goto statement! statsSummary: # Minor subtlety - we want to print the totals as soon as the hot-key # is entered and so we print the sub-total so far which DOESN'T # include this latest line! Then we count the data. countBriefCounters(); $miniInstances++; # The only time we don't print the line is if it doesn't contain any data, which should only happen # when data was imported at a different interval and played back with -s-all, OR we're only # doing network error reporting and this interval is clean. In that cast reset '$totalCounter' # so header printing works correctly. $empty=0; $empty=1 if $import ne '' && $subsys eq '' && length($line) == length($datetime)+1; printText($line) if !$empty && ($netOpts!~/E/ || $netErrors); $totalCounter-- if $netOpts=~/E/ && !$netErrors } sub resetBriefCounters { # talk about a mouthful! $miniStart=0; $miniInstances=0; $cpuTOT=$sysPTOT=$intrptTOT=$ctxtTOT=0; $memFreeTOT=$memBufTOT=$memCachedTOT=$memInactTOT=$memSlabTOT=$memMapTOT=0; $memFreeCTOT=$memBufCTOT=$memCachedCTOT=$memInactCTOT=$memSlabCTOT=$memMapCTOT=0; $slabSlabAllTotalTOT=$slabSlabAllTotalBTOT=0; $dskReadKBTOT=$dskReadTOT=$dskWriteKBTOT=$dskWriteTOT=0; $netRxKBTOT=$netRxPktTOT=$netTxKBTOT=$netTxPktTOT=$netErrTOT=0; $tcpIpErrTOT=$tcpIcmpErrTOT=$tcpTcpErrTOT=$tcpUdpErrTOT=$tcpTcpExErrTOT=0; $sockUsedTOT=$sockUdpTOT=$sockRawTOT=$sockFragTOT=0; $filesAllocTOT=$inodeUsedTOT=0; $ibRxKBTOT=$ibRxTOT=$ibTxKBTOT=$ibTxTOT=$ibErrorsTOT=0; $nfsReadsTOT=$nfsWritesTOT=$nfsMetaTOT=$nfsCommitTOT=0; $lustreMdsGetattrPlusTOT=$lustreMdsSetattrPlusTOT=$lustreMdsSyncTOT=0; $lustreMdsReintTOT=$lustreMdsReintUnlinkTOT=0; $lustreReadKBytesTOT=$lustreReadOpsTOT=$lustreWriteKBytesTOT=$lustreWriteOpsTOT=0; $lustreCltReadTOT=$lustreCltReadKBTOT=$lustreCltWriteTOT=$lustreCltWriteKBTOT=0; $lustreCltRAHitsTOT=$lustreCltRAMissesTOT=0; for (my $i=0; $i<$numBrwBuckets; $i++) { $lustreBufReadTOT[$i]=$lustreBufWriteTOT[$i]=0; } for (my $i=0; $i<$NumCpus; $i++) { $intrptTOT[$i]=0; } for (my $i=0; $i<11; $i++) { $buddyInfoTOT[$i]=0; } for (my $i=0; $i<$impNumMods; $i++) { &{$impPrintBrief[$i]}(4); } } sub countBriefCounters { my $i=$NumCpus; $cpuTOT+= $userP[$i]+$niceP[$i]+$sysP[$i]; $sysPTOT+= $sysP[$i]; $intrptTOT+=$intrpt; $ctxtTOT+= $ctxt; for ($i=0; $i<$NumCpus; $i++) { $intrptTOT[$i]+=$intrptTot[$i]; } # the default, it so add up the amount of actual memory used # could have reused the TOT counter names, but let's not if ($memOpts!~/R/) { $memFreeTOT+= $memFree; $memBufTOT+= $memBuf; $memCachedTOT+=$memCached; $memInactTOT+= $memInact; $memSlabTOT+= $memSlab; $memMapTOT+= $memMap+$memAnon; } else # in this case we add up the changes { $memFreeCTOT+= $memFreeC; $memBufCTOT+= $memBufC; $memCachedCTOT+=$memCachedC; $memInactCTOT+= $memInactC; $memSlabCTOT+= $memSlabC; $memMapCTOT+= $memMapC+$memAnonC; } $slabSlabAllTotalTOT+= $slabSlabAllTotal; $slabSlabAllTotalBTOT+=$slabSlabAllTotalB; $dskReadKBTOT+= $dskReadKBTot; $dskReadTOT+= $dskReadTot; $dskWriteKBTOT+= $dskWriteKBTot; $dskWriteTOT+= $dskWriteTot; $netRxKBTOT+= $netRxKBTot; $netRxPktTOT+= $netRxPktTot; $netTxKBTOT+= $netTxKBTot; $netTxPktTOT+= $netTxPktTot; $netErrTOT+= $netRxErrsTot+$netTxErrsTot; $tcpIpErrTOT+= $ipErrors; $tcpIcmpErrTOT+= $icmpErrors; $tcpTcpErrTOT+= $tcpErrors; $tcpUdpErrTOT+= $udpErrors; $tcpTcpExErrTOT+=$tcpExErrors; $sockUsedTOT+= $sockUsed; $sockUdpTOT+= $sockUdp; $sockRawTOT+= $sockRaw; $sockFragTOT+= $sockFrag; $filesAllocTOT+= $filesAlloc; $inodeUsedTOT+= $inodeUsed; $ibRxKBTOT+= $ibRxKBTot; $ibRxTOT+= $ibRxTot; $ibTxKBTOT+= $ibTxKBTot; $ibTxTOT+= $ibTxTot; $ibErrorsTOT+= $ibErrorsTotTot; $nfsReadsTOT+= $nfsReadsTot; $nfsWritesTOT+= $nfsWritesTot; $nfsMetaTOT+= $nfsMetaTot; $nfsCommitTOT+= $nfsCommitTot; if ($NumMds) { # Although some apply to versions < 1.6.5, easier to just count everything $lustreMdsGetattrPlusTOT+=$lustreMdsGetattr+$lustreMdsGetattrLock+$lustreMdsGetxattr; $lustreMdsSetattrPlusTOT+=$lustreMdsReintSetattr+$lustreMdsSetxattr; $lustreMdsSyncTOT+= $lustreMdsSync; $lustreMdsReintTOT+= $lustreMdsReint; $lustreMdsReintUnlinkTOT+=$lustreMdsReintUnlink; } if ($NumOst) { $lustreReadKBytesTOT+= $lustreReadKBytesTot; $lustreReadOpsTOT+= $lustreReadOpsTot; $lustreWriteKBytesTOT+=$lustreWriteKBytesTot; $lustreWriteOpsTOT+= $lustreWriteOpsTot; } if ($reportCltFlag) { $lustreCltReadTOT+= $lustreCltReadTot; $lustreCltReadKBTOT+= $lustreCltReadKBTot; $lustreCltWriteTOT+= $lustreCltWriteTot; $lustreCltWriteKBTOT+=$lustreCltWriteKBTot; $lustreCltRAHitsTOT+= $lustreCltRAHitsTot; $lustreCltRAMissesTOT+=$lustreCltRAMissesTot; } if ($NumBud) { for ($i=0; $i<11; $i++) { $buddyInfoTOT[$i]+=$buddyInfoTot[$i]; } } for (my $i=0; $i<$impNumMods; $i++) { &{$impPrintBrief[$i]}(5); } } sub printBriefCounters { my $type=shift; my $i; # For things that totals don't make sense, like CPUs or sockets, just do averags all the time # by using the number of instances my $mi=$miniInstances; my $totSecs=$interval; # makes calculation of total easy if ($type=~/a/i) { # Totals are NOT normalized so for averages we need to divide by total seconds. $totSecs=($playback eq '') ? $seconds-$miniStart+$interval : $elapsedSecs; $datetime=' ' x length($datetime) if $statOpts!~/s/i; # when not in summary mode, include date/time stamps } chomp $type; printf "%s", $datetime if $miniDateFlag || $miniTimeFlag; printf "%s", uc($type); printf "%3d %3d %5s %6s ", $cpuTOT/$mi, $sysPTOT/$mi, cvt($intrptTOT/$totSecs,5), cvt($ctxtTOT/$totSecs,6) if $subsys=~/c/; if ($subsys=~/j/) { for (my $i=0; $i<$NumCpus; $i++) { next if (@cpuFiltKeep && !defined($cpuFiltKeep[$i])) || (@cpuFiltIgnore && defined($cpuFiltIgnore[$i])); printf "%4s ", cvt($intrptTOT[$i]/$totSecs,4,0,0); } } if ($subsys=~/m/) { if ($memOpts!~/R/) { printf "%4s %4s %4s %4s %4s %4s ", cvt($memFreeTOT/$mi,4,1,1), cvt($memBufTOT/$mi,4,1,1), cvt($memCachedTOT/$mi,4,1,1), cvt($memInactTOT/$mi,4,1,1), cvt($memSlabTOT/$mi,4,1,1), cvt($memMapTOT/$mi,4,1,1); } else { printf "%5s %5s %5s %5s %5s %5s ", cvt($memFreeCTOT/$mi,4,1,1), cvt($memBufCTOT/$mi,4,1,1), cvt($memCachedCTOT/$mi,4,1,1), cvt($memInactCTOT/$mi,4,1,1), cvt($memSlabCTOT/$mi,4,1,1), cvt($memMapCTOT/$mi,4,1,1); } } # Need to average each field before converting if ($subsys=~/b/) { for ($i=0; $i<11; $i++) { $buddyInfoAVG[$i]=$buddyInfoTOT[$i]/$mi; } printf "%s ", base36(@buddyInfoAVG); } # Will probably never be used again printf "%6s %7s ", cvt($slabSlabAllTotalTOT/$mi,6,0,1), cvt($slabSlabAllTotalBTOT/$mi,7,0,1) if $subsys=~/y/; if ($subsys=~/d/) { if (!$ioSizeFlag && $dskOpts!~/i/) { printf "%6s %6s %6s %6s ", cvt($dskReadKBTOT/$totSecs,6,0,1), cvt($dskReadTOT/$totSecs,6), cvt($dskWriteKBTOT/$totSecs,6,0,1), cvt($dskWriteTOT/$totSecs,6); } else { printf "%6s %6s %4s %6s %6s %4s ", cvt($dskReadKBTOT/$totSecs,6,0,1), cvt($dskReadTOT/$totSecs,6), $dskReadTOT ? cvt($dskReadKBTOT/$dskReadTOT,4,0,1) : 0, cvt($dskWriteKBTOT/$totSecs,6,0,1), cvt($dskWriteTOT/$totSecs,6), $dskWriteTOT ? cvt($dskWriteKBTOT/$dskWriteTOT,4,0,1) : 0; } } if ($subsys=~/n/) { if (!$ioSizeFlag && $netOpts!~/i/) { printf "%6s %6s %6s %6s ", cvt($netRxKBTOT/$totSecs,6,0,1), cvt($netRxPktTOT/$totSecs,6), cvt($netTxKBTOT/$totSecs,6,0,1), cvt($netTxPktTOT/$totSecs,6); } else { printf "%6s %6s %4s %6s %6s %4s ", cvt($netRxKBTOT/$totSecs,6,0,1), cvt($netRxPktTOT/$totSecs,6), $netRxPktTOT ? cvt($netRxKBTOT*1024/$netRxPktTOT,4,0,1) : 0, cvt($netTxKBTOT/$totSecs,6,0,1), cvt($netTxPktTOT/$totSecs,6), $netTxPktTOT ? cvt($netTxKBTOT*1024/$netTxPktTOT,4,0,1) : 0; } printf "%5s ", cvt($netErrTOT/$totSecs,5) if $netOpts=~/e/; } if ($subsys=~/t/) { printf "%4s ", cvt($tcpIpErrTOT/$totSecs,4) if $tcpFilt=~/i/; printf "%4s ", cvt($tcpTcpErrTOT/$totSecs,4) if $tcpFilt=~/t/; printf "%4s ", cvt($tcpUdpErrTOT/$totSecs,4) if $tcpFilt=~/u/; printf "%4s ", cvt($tcpIcmpErrTOT/$totSecs,4) if $tcpFilt=~/c/; printf "%4s ", cvt($tcpTcpExErrTOT/$totSecs,4) if $tcpFilt=~/T/; } printf "%4d %4d %4d %4d ", cvt(int($sockUsedTOT/$mi),6), cvt(int($sockUdpTOT/$mi),6), cvt(int($sockRawTOT/$mi),6), cvt(int($sockFragTOT/$mi),6) if $subsys=~/s/; printf "%6s %6s ", cvt($filesAllocTOT/$mi, 6), cvt($inodeUsedTOT/$mi, 6) if $subsys=~/i/; if ($subsys=~/x/ && $NumHCAs) { if (!$ioSizeFlag && $xOpts!~/i/) { printf "%7s %6s %7s %6s %4s ", cvt($ibRxKBTOT/$totSecs,7,0,1), cvt($ibRxTOT/$totSecs,6), cvt($ibTxKBTOT/$totSecs,7,0,1), cvt($ibTxTOT/$totSecs,6), cvt($ibErrorsTOT,4); } else { printf "%7s %6s %4s %7s %6s %4s %4s ", cvt($ibRxKBTOT/$totSecs,7,0,1), cvt($ibRxTOT/$totSecs,6), $ibRxTOT ? cvt($ibRxKBTOT*1024/ibRxTOT,4,0,1) : 0, cvt($ibTxKBTOT/$totSecs,7,0,1), cvt($ibTxTOT/$totSecs,6), $ibTxTOT ? cvt($ibTxKBTOT*1024/ibTxTOT,4,0,1) : 0, cvt($ibErrorsTOT,4); } } printf "%6s %6s %4s %4s ", cvt($nfsReadsTOT/$totSecs,6), cvt($nfsWritesTOT/$totSecs,6), cvt($nfsMetaTOT/$totSecs), cvt($nfsCommitTOT/$totSecs) if $subsys=~/f/; if ($subsys=~/l/ && $reportMdsFlag) { my $variableParam=($cfsVersion lt '1.6.5') ? $lustreMdsReintTOT : $lustreMdsReintUnlinkTOT; printf "%6s %6s %6s %6s ", cvt($lustreMdsGetattrPlusTOT/$totSecs,6), cvt($lustreMdsSetattrPlusTOT/$totSecs,6), cvt($lustreMdsSyncTOT/$totSecs,6), cvt($variableParam/$totSecs,6); } if ($subsys=~/l/ && $reportOstFlag) { if (!$ioSizeFlag) { printf "%7s %6s %7s %6s ", cvt($lustreReadKBytesTOT/$totSecs,7,0,1), cvt($lustreReadOpsTOT/$totSecs,6), cvt($lustreWriteKBytesTOT/$totSecs,7,0,1), cvt($lustreWriteOpsTOT/$totSecs,6); } else { printf "%7s %6s %4s %7s %6s %4s ", cvt($lustreReadKBytesTOT/$totSecs,7,0,1), cvt($lustreReadOpsTOT/$totSecs,6), $lustreReadOpsTOT ? cvt($lustreReadKBytesTOT/$lustreReadOpsTOT,4,0,1) : 0, cvt($lustreWriteKBytesTOT/$totSecs,7,0,1), cvt($lustreWriteOpsTOT/$totSecs,6), $lustreWriteOpsTOT ? cvt($lustreWriteKBytesTOT/$lustreWriteOpsTOT,4,0,1) : 0; } } if ($subsys=~/l/ && $reportCltFlag) { if (!$ioSizeFlag) { printf "%7s %6s %7s %6s", cvt($lustreCltReadKBTOT/$totSecs,7,0,1), cvt($lustreCltReadTOT/$totSecs,6), cvt($lustreCltWriteKBTOT/$totSecs,7,0,1), cvt($lustreCltWriteTOT/$totSecs,6); } else { printf "%7s %6s %4s %7s %6s %4s", cvt($lustreCltReadKBTOT/$totSecs,7,0,1), cvt($lustreCltReadTOT/$totSecs,6), $lustreCltReadTOT ? cvt($lustreCltReadKBTOT/$lustreCltReadTOT,4,0,1) : 0, cvt($lustreCltWriteKBTOT/$totSecs,7,0,1), cvt($lustreCltWriteTOT/$totSecs,6), $lustreCltWriteTOT ? cvt($lustreCltWriteKBTOT/$lustreCltWriteTOT,4,0,1) : 0; } printf " %6s %6s", cvt($lustreCltRAHitsTOT/$totSecs,6),cvt($lustreCltRAMissesTOT/$totSecs,6) if $lustOpts=~/R/; } for (my $i=0; $i<$impNumMods; $i++) { &{$impPrintBrief[$i]}(6); } print "\n"; } sub base36 { my @buddies=@_; my $frags; for (my $i=0; $i=1000) { # 1000->the res => 30->36 $map=int(log($num)/log(10))-3; $map=8 if $map>8; $frag=substr('stuvwxyz', $map, 1); } elsif ($num>=100) { # 100->999 => 20->29 $map=int($num)/100-1; $frag=substr('jklmnopqr', $map, 1); } elsif ($num>=10) { # 10->99 => 10->19 $map=int($num)/10-1; $frag=substr('abcdefghi', $map, 1); } else { # 0->9 => 0->9 $frag=int($num); } $frags.=$frag; } return($frags); } #################################################### # T a s k P r o c e s s i n g S u p p o r t #################################################### sub nextAvailProcIndex { my $next; if (scalar(@procIndexesFree)>0) { $next=pop @procIndexesFree; } else { $next=$procNextIndex++; } printf "### Index allocated: $next NextIndex: $procNextIndex IndexesFree: %d\n", scalar(@procIndexesFree) if $debug & 256; return($next); } # If we're not processing by pid-only, the processes we're reporting on come # and go. Therefore right before we print we need to see if a process we # were reporting on disappeared by noticing its pid went away and therefore # need to remove it from the $procIndexes{} hash. Is there a better/more # efficient way to do this? If so, fix 'cleanStalePids()' too. sub cleanStaleTasks { my ($removeFlag, %indexesTemp, $pid); if ($debug & 512) { print "### CleanStaleTasks()\n"; foreach $pid (sort keys %procSeen) { print "### PIDPROC: $pid\n"; } } # make a list of only those pids we've seen during last cycle $removeFlag=0; foreach $pid (sort keys %procIndexes) { if (defined($procSeen{$pid})) { $indexesTemp{$pid}=$procIndexes{$pid}; print "### indexesTemp[$pid] set to $indexesTemp{$pid}\n" if $debug & 256; } else { push @procIndexesFree, $procIndexes{$pid}; $removeFlag=1; print "### added $pid with index of $procIndexes{$pid} to free list\n" if $debug & 256; } } # only need to do a swap if we need to remove a pid. if ($removeFlag) { undef %procIndexes; %procIndexes=%indexesTemp; if ($debug & 512) { print "### Indexes Swapped! NEW procIndexes{}\n"; foreach $key (sort keys %procIndexes) { print "procIndexes{$key}=$procIndexes{$key}\n"; } } } undef %procSeen; } # This output goes to the .prc file if -f specified sub printPlotProc { my $date=shift; my $time=shift; my ($procHeaders, $procPlot, $pid, $i); $procHeaders=''; if (!$headersPrintedProc) { $procHeaders=$commonHeader if $logToFileFlag; $procHeaders.=(!$utcFlag) ? "#Date${SEP}Time" : '#UTC';; $procHeaders.="${SEP}PID${SEP}User${SEP}PR${SEP}PPID${SEP}THRD${SEP}S${SEP}VmSize${SEP}"; $procHeaders.="VmLck${SEP}VmRSS${SEP}VmData${SEP}VmStk${SEP}VmExe${SEP}VmLib${SEP}"; $procHeaders.="CPU${SEP}SysT${SEP}UsrT${SEP}PCT${SEP}AccumT${SEP}"; $procHeaders.="RKB${SEP}WKB${SEP}RKBC${SEP}WKBC${SEP}RSYS${SEP}WSYS${SEP}CNCL${SEP}"; $procHeaders.="MajF${SEP}MinF${SEP}Command\n"; $headersPrintedProc=1; } $procPlot=$procHeaders; foreach $pid (sort keys %procIndexes) { $i=$procIndexes{$pid}; next if (!defined($procSTimeTot[$i])); next if $procState ne '' && $procState[$i]!~/[$procState]/; # Handle -oF if ($procOpts=~/f/) { $majFlt=$procMajFltTot[$i]; $minFlt=$procMinFltTot[$i]; } else { $majFlt=$procMajFlt[$i]/$interval2Secs; $minFlt=$procMinFlt[$i]/$interval2Secs; } my $datetime=(!$utcFlag) ? "$date$SEP$time": time; $datetime.=".$usecs" if $options=~/m/; # Username comes from translation hash OR we just print the UID $procPlot.=sprintf("%s${SEP}%d${SEP}%s${SEP}%s${SEP}%s${SEP}%d${SEP}%s${SEP}%s${SEP}%s${SEP}%s${SEP}%s${SEP}%s${SEP}%s${SEP}%s${SEP}%s${SEP}%s${SEP}%s${SEP}%d${SEP}%s${SEP}%d${SEP}%d${SEP}%d${SEP}%d${SEP}%d${SEP}%d${SEP}%d${SEP}%s${SEP}%s${SEP}%s", $datetime, $procPid[$i], $procUser[$i], $procPri[$i], $procPpid[$i], $procTCount[$i], $procState[$i], defined($procVmSize[$i]) ? $procVmSize[$i] : 0, defined($procVmLck[$i]) ? $procVmLck[$i] : 0, defined($procVmRSS[$i]) ? $procVmRSS[$i] : 0, defined($procVmData[$i]) ? $procVmData[$i] : 0, defined($procVmStk[$i]) ? $procVmStk[$i] : 0, defined($procVmExe[$i]) ? $procVmExe[$i] : 0, defined($procVmLib[$i]) ? $procVmLib[$i] : 0, $procCPU[$i], cvtT1($procSTime[$i],1), cvtT1($procUTime[$i],1), ($procSTime[$i]+$procUTime[$i])/$interval2SecsReal, cvtT3($procSTimeTot[$i]+$procUTimeTot[$i],1), defined($procRKB[$i]) ? $procRKB[$i]/$interval2Secs : 0, defined($procWKB[$i]) ? $procWKB[$i]/$interval2Secs : 0, defined($procRKBC[$i]) ? $procRKBC[$i]/$interval2Secs : 0, defined($procWKBC[$i]) ? $procWKBC[$i]/$interval2Secs : 0, defined($procRSys[$i]) ? $procRSys[$i]/$interval2Secs : 0, defined($procWSys[$i]) ? $procWSys[$i]/$interval2Secs : 0, defined($procCKB[$i]) ? $procCKB[$i]/$interval2Secs : 0, cvt($majFlt), cvt($minFlt), defined($procCmd[$i]) ? $procCmd[$i] : $procName[$i]); # This is a little messy (sorry about that). The way writeData works is that # on writeData(0) calls, it builds up a string in $oneline which can be appended # to the current string (for displaying multiple subsystems in plot format on # the terminal and the final call writes it out. In order for all the paths # to work with sockets, etc we need to do it this way. And since writeData takes # care of \n be sure to leave OFF each line being written. $oneline=''; writeData(0, '', \$procPlot, PRC, $ZPRC, 'proc', \$oneline); if (!$logToFileFlag || ($sockFlag && $export eq '')) { last if writeData(1, '', undef, $LOG, undef, undef, \$oneline)==0; } $procPlot=''; } } sub procAnalyze { my $seconds=shift; my $usecs= shift; my ($vmSize, $vmLck, $vmRSS, $vmData, $vmStk, $vmLib, $vmExe); my ($rkb, $wkb, $rkbc, $wkbc, $rsys, $wsys, $cncl, $threads); # Would have been nice to use $interval2Counter, but that only increments # during terminal output. $procAnalCounter++; # loops through all processes for this interval and copy data to simpler variables foreach my $pid (keys %procIndexes) { # Global which indicates at least 1 piece of process data recorded. # we also need to save pids so we'll know what to print to file $procAnalyzed=1; $analyzed{$pid}=1; my $i=$procIndexes{$pid}; my $user=$procUser[$i]; my $ppid=$procPpid[$i]; my $threads=$procTCount[$i]; my $cpu=$procCPU[$i]; my $sysT=$procSTime[$i]; my $usrT=$procUTime[$i]; my $accum=cvtT3($procSTimeTot[$i]+$procUTimeTot[$i]); my $majF=$procMajFlt[$i]; my $minF=$procMinFlt[$i]; my $command=(defined($procCmd[$i])) ? $procCmd[$i] : $procName[$i]; $accum=~s/^\s*//g; $command=~s/\s+$//g; if (defined($procVmSize[$i])) { $vmSize=$procVmSize[$i]; $vmLck=$procVmLck[$i]; $vmRSS=$procVmRSS[$i]; $vmData=$procVmData[$i]; $vmStk=$procVmStk[$i]; $vmLib=$procVmLib[$i]; $vmExe=$procVmExe[$i]; } else { $vmSize=$vmLck=$vmRSS=$vmData=$vmStk=$vmLib=$vmExe=0; } if ($processIOFlag) { $rkb=$procRKB[$i]; $wkb=$procWKB[$i]; $rkbc=$procRKBC[$i]; $wkbc=$procWKBC[$i]; $rsys=$procRSys[$i]; $wsys=$procWSys[$i]; $cncl=$procCKB[$i]; } # Here's what's going on. We're identifying a unique command by its pid and # name. That way if pids are reused the probability of the same pid showing # up for the same command are slim. BUT, when processing multiple logs for # the same day it CAN happen, so we're adding a filename discriminator as well. my $unique="$fileRoot:$pid:$command"; if (!defined($summary[$pid])) { $summary[$pid]={ date=>$date, timefrom=>$seconds, threadsMin=>$threads, threadsMax=>$threads, pid=>$pid, user=>$user, ppid=>$ppid, vmExe=>$vmExe, vmSizeMin=>$vmSize, vmSizeMax=>$vmSize, vmLckMin=>$vmLck, vmLckMax=>$vmLck, vmRSSMin=>$vmRSS, vmRSSMax=>$vmRSS, vmDataMin=>$vmData, vmDataMax=>$vmData, vmStkMin=>$vmStk, vmStkMax=>$vmStk, vmLibMin=>$vmLib, vmLibMax=>$vmLib, sysT=>0, usrT=>0, majF=>0, minF=>0, RKB=>0, WKB=>0, RKBC=>0, WKBC=>0, RSYS=>0, WSYS=>0, CNCL=>0, command=>$command, timethru=>$seconds, accumT=>0 } } next if $procAnalCounter==1; # U p d a t e S u m m a r y # note - we also initialized timethru above in case only a single sample $summary[$pid]->{timethru}=$seconds; # thread counts not necessarily included in raw file $summary[$pid]->{threadsMin}=$threads if defined($threads) && $threads<$summary[$pid]->{threadsMin}; $summary[$pid]->{threadsMax}=$threads if defined($threads) && $threads>$summary[$pid]->{threadsMax}; $summary[$pid]->{vmSizeMin}=$vmSize if $vmSize<$summary[$pid]->{vmSizeMin}; $summary[$pid]->{vmSizeMax}=$vmSize if $vmSize>$summary[$pid]->{vmSizeMax}; $summary[$pid]->{vmLckMin}=$vmLck if $vmLck< $summary[$pid]->{vmLckMin}; $summary[$pid]->{vmLckMax}=$vmLck if $vmLck> $summary[$pid]->{vmLckMax}; $summary[$pid]->{vmRSSMin}=$vmRSS if $vmRSS< $summary[$pid]->{vmRSSMin}; $summary[$pid]->{vmRSSMax}=$vmRSS if $vmRSS> $summary[$pid]->{vmRSSMax}; $summary[$pid]->{vmDataMin}=$vmData if $vmData<$summary[$pid]->{vmDataMin}; $summary[$pid]->{vmDataMax}=$vmData if $vmData>$summary[$pid]->{vmDataMax}; $summary[$pid]->{vmStkMin}=$vmStk if $vmStk< $summary[$pid]->{vmStkMin}; $summary[$pid]->{vmStkMax}=$vmStk if $vmStk> $summary[$pid]->{vmStkMax}; $summary[$pid]->{vmLibMin}=$vmLib if $vmLib< $summary[$pid]->{vmLibMin}; $summary[$pid]->{vmLibMax}=$vmLib if $vmLib> $summary[$pid]->{vmLibMax}; $summary[$pid]->{sysT}+=$sysT; $summary[$pid]->{usrT}+=$usrT; $summary[$pid]->{accumT}=$accum; $summary[$pid]->{majF}+=$majF; $summary[$pid]->{minF}+=$minF; if ($processIOFlag) { $summary[$pid]->{RKB}+=$rkb; $summary[$pid]->{WKB}+=$wkb; $summary[$pid]->{RKBC}+=$rkbc; $summary[$pid]->{WKBC}+=$wkbc; $summary[$pid]->{RSYS}+=$rsys; $summary[$pid]->{WSYS}+=$wsys; $summary[$pid]->{CNCL}+=$cncl; } } } # This gets called twice! Once when we're ready to process a NEW file and # again to write out the process summary data for the LAST log we processed sub printProcAnalyze { print "Write process summary data to: $lastLogPrefix\n" if $debug & 8192; # Note that since this is the only place we write to these files, lets open # them here instead of trying to do it in newlog especially since newlog has # no way of knowing if there will even be any data to write to them! open PRCS, ">$lastLogPrefix.prcs" or logmsg("F", "Couldn't create '$lastLogPrefix.prcs'") if !$zFlag; $ZPRCS=Compress::Zlib::gzopen("$lastLogPrefix.prcs.gz", 'wb') or logmsg("F", "Couldn't create '$lastLogPrefix.prcs.gz'") if $zFlag; # NOTE - we're not printing the CPU since processes can migrate and it's not really # meaningful yet. Perhaps someday I'll do more with it. my $header; $header= "Date${SEP}From${SEP}Thru${SEP}Pid${SEP}User${SEP}PPid${SEP}ExeSize${SEP}SizeMin${SEP}"; $header.="SizeMax${SEP}LckMin${SEP}LckMax${SEP}RSSMin${SEP}RSSMax${SEP}DataMin${SEP}DataMax${SEP}"; $header.="StkMin${SEP}StkMax${SEP}LibMin${SEP}LibMax${SEP}sysT${SEP}usrT${SEP}PCT${SEP}accumT${SEP}"; $header.="RKB${SEP}WKB${SEP}RKBC${SEP}WKBC${SEP}RSYS${SEP}WSYS${SEP}CNCL${SEP}" if $processIOFlag; $header.="threadsMin${SEP}threadsMax${SEP}"; $header.="majF${SEP}minF${SEP}Command\n"; print PRCS $header if !$zFlag; $ZPRCS->gzwrite($header) or logmsg("E", "Error writing PCRS header") if $zFlag; my $line; my ($date, $timefrom, $timethru); foreach my $pid (keys %analyzed) { # Date always come from 'from' field ($date,$timefrom)=cvtT4($summary[$pid]->{timefrom}); # if process only ran for one interval the duration would be 0 and we can't allow that to be a divisor below. my $pidDuration=$summary[$pid]->{timethru}-$summary[$pid]->{timefrom}; $pidDuration=1 if $pidDuration==0; # NOTE - since sys/usr times in jiffies DON'T multiply by 100 $line=sprintf("%s$SEP%s$SEP%s$SEP%d$SEP%s$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%s$SEP%s$SEP%6.2f$SEP%s$SEP", $date, $timefrom, (cvtT4($summary[$pid]->{timethru}))[1], $summary[$pid]->{pid}, $summary[$pid]->{user}, $summary[$pid]->{ppid}, $summary[$pid]->{vmExe}, $summary[$pid]->{vmSizeMin}, $summary[$pid]->{vmSizeMax}, $summary[$pid]->{vmLckMin}, $summary[$pid]->{vmLckMax}, $summary[$pid]->{vmRSSMin}, $summary[$pid]->{vmRSSMax}, $summary[$pid]->{vmDataMin}, $summary[$pid]->{vmDataMax}, $summary[$pid]->{vmStkMin}, $summary[$pid]->{vmStkMax}, $summary[$pid]->{vmLibMin}, $summary[$pid]->{vmLibMax}, cvtT3($summary[$pid]->{sysT}), cvtT3($summary[$pid]->{usrT}), ($summary[$pid]->{sysT}+$summary[$pid]->{usrT})/$pidDuration/$procAnalCounter, $summary[$pid]->{accumT}); $line.=sprintf("%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP", $summary[$pid]->{RKB}, $summary[$pid]->{WKB}, $summary[$pid]->{RKBC}, $summary[$pid]->{WKBC}, $summary[$pid]->{RSYS}, $summary[$pid]->{WSYS}, $summary[$pid]->{CNCL}) if $processIOFlag; $line.=sprintf("%d$SEP%d$SEP", $summary[$pid]->{threadsMin}, $summary[$pid]->{threadsMax}); $line.=sprintf("%d$SEP%d$SEP%s\n", $summary[$pid]->{majF}, $summary[$pid]->{minF}, $summary[$pid]->{command}); print PRCS $line if !$zFlag; $ZPRCS->gzwrite($line) or logmsg('E', "Error writing to prcs") if $zFlag; } # reset for next pass undef @summary; undef %analyzed; $procAnalyzed=0; close PRCS; $ZPRCS->gzclose() if $zFlag; $procAnalCounter=0; } sub slabAnalyze { $slabAnalCounter++; if ($slabinfoFlag) { for (my $i=0; $i<$slabIndexNext; $i++) { slabAnalyze2($slabName[$i], $slabSlabAllTotB[$i]); } } else { foreach my $first (sort keys %slabfirst) { slabAnalyze2($slabfirst{$first}, $slabdata{$slab}->{total}); } } } sub slabAnalyze2 { my $name=shift; my $size=shift; if (!defined($slabMemTotMin{$name})) { $slabMemTotMin{$name}=1024*1024*1024*1024; # 1TB $slabMemTotMax{$name}=0; $slabMemTotFirst{$name}=$size; } $slabMemTotMin{$name}=$size if $size<$slabMemTotMin{$name}; $slabMemTotMax{$name}=$size if $size>$slabMemTotMax{$name}; $slabMemTotLast{$name}=$size; } sub printSlabAnalyze { print "Write slab summary data to: $lastLogPrefix\n" if $debug & 8192; open SLBS, ">$lastLogPrefix.slbs" or logmsg("F", "Couldn't create '$lastLogPrefix.slbs'") if !$zFlag; $ZSLBS=Compress::Zlib::gzopen("$lastLogPrefix.slbs.gz", 'wb') or logmsg("F", "Couldn't create '$lastLogPrefix.slbs.gz'") if $zFlag; my $header=sprintf("%-20s %10s %10s %10s %10s %8s %8s\n", 'Slab Name', 'Start', 'End', 'Minimum', 'Maximum', 'Change', 'Pct'); print SLBS $header if !$zFlag; $ZSLBS->gzwrite($header) or logmsg("E", "Error writing SLBS header") if $zFlag; foreach my $name (sort keys %slabMemTotMin) { next if $slabMemTotMax{$name}==0; my $diff=$slabMemTotMax{$name}-$slabMemTotMin{$name}; my $line=sprintf("%-20s %10d %10d %10d %10d %8d %8.2f\n", $name, $slabMemTotFirst{$name}, $slabMemTotLast{$name}, $slabMemTotMin{$name}, $slabMemTotMax{$name}, $diff, $slabMemTotMin{$name} ? 100*$diff/$slabMemTotMin{$name} : 0); print SLBS $line if !$zFlag; $ZSLBS->gzwrite($line) or logmsg('E', "Error writing to slbs") if $zFlag; } close SLBS; $ZSLBS->gzclose() if $zFlag; # Reset for next time $slabAnalCounter=0; undef %slabTotalMemLast; undef %slabMemTotMin; undef %slabMemTotMax; undef %slabMemTotLast; } # like printPlotProc(), this only goes to .slb and we don't care about --logtoo sub printPlotSlab { my $date=shift; my $time=shift; my ($slabHeaders, $slabPlot); $slabHeaders=''; if (!$headersPrintedSlab) { $slabHeaders=$commonHeader if $logToFileFlag; $slabHeaders.=$slubHeader if $logToFileFlag && $slubinfoFlag; $slabHeaders.=(!$utcFlag) ? "#Date${SEP}Time" : '#UTC'; if ($slabinfoFlag) { $slabHeaders.="${SEP}SlabName${SEP}ObjInUse${SEP}ObjInUseB${SEP}ObjAll${SEP}ObjAllB${SEP}"; $slabHeaders.="SlabInUse${SEP}SlabInUseB${SEP}SlabAll${SEP}SlabAllB${SEP}SlabChg${SEP}SlabPct\n"; } else { $slabHeaders.="${SEP}SlabName${SEP}ObjSize${SEP}ObjPerSlab${SEP}ObjInUse${SEP}ObjAvail${SEP}"; $slabHeaders.="SlabSize${SEP}SlabNumber${SEP}MemUsed${SEP}MemTotal${SEP}SlabChg${SEP}SlabPct\n"; } $headersPrintedSlab=1; } my $datetime=(!$utcFlag) ? "$date$SEP$time": time; $datetime.=".$usecs" if $options=~/m/; $slabPlot=$slabHeaders; # O l d S l a b F o r m a t if ($slabinfoFlag) { for (my $i=0; $i{objects}; my $numSlabs= $slabdata{$slab}->{slabs}; next if ($slabOpts=~/s/ && $slabdata{$slab}->{objects}==0) || ($slabOpts=~/S/ && $slabdata{$slab}->{lastobj}==$numObjects && $slabdata{$slab}->{lastslabs}==$numSlabs); $slabPlot.=sprintf("$datetime$SEP%s$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d$SEP%d\n", $first, $slabdata{$slab}->{slabsize}, $slabdata{$slab}->{objper}, $numObjects, $slabdata{$slab}->{avail}, ($PageSize<<$slabdata{$slab}->{order})/1024, $numSlabs, $slabdata{$slab}->{used}/1024, $slabdata{$slab}->{total}/1024, $slabdata{$slab}->{memchg}, $slabdata{$slab}->{mempct}); # So we can tell when something changes $slabdata{$slab}->{lastobj}= $numObjects; $slabdata{$slab}->{lastslabs}=$numSlabs; } } # See printPlotProc() for details on this... # Also note we're printing the whole thing in one call vs 1 call/line and we # only want to print when there's data since filtering can result in blank # lines. Finally, since writeData() appends a find \n, we need to strip it. if ($slabPlot ne '') { $oneline=''; $slabPlot=~s/\n$//; writeData(0, '', \$slabPlot, SLB, $ZSLB, 'slb', \$oneline); writeData(1, '', undef, $LOG, undef, undef, \$oneline) if !$logToFileFlag || ($sockFlag && $export eq ''); } } # NOTE - although called ibCheck it also picks up opa info sub ibCheck { my $saveHCANames=$HCANames; my $activePorts=0; my ($line, @lines, $port); # Just because we have hardware doesn't mean any drivers installed and # the assumption for now is that's the case if you can't find vstat. # Since VStat can be a list, reset to the first that is found (if any) $NumHCAs=0; my $found=0; # This error can only happen when NOT open fabric if (!-e $SysIB && !$found) { logmsg('E', "Found HCA(s) but no software OR monitoring disabled in $configFile") if $inactiveIBFlag==0; $mellanoxFlag=0; $inactiveIBFlag=1; return(0); } # We need the names of the interfaces and port info. $HCANames=''; my ($maxPorts, $numPorts)=(0,0); my (@ports, $state, $file, $lid); @lines=ls($SysIB); foreach $line (@lines) { $line=~/(.*)(\d+)$/; my $devname=$1; my $devnum=$2; # While this should work for any ofed compliant adaptor, doing it this # way at least makes it more explicit which ones have been found to work. # also note hfi is the guy who speaks to opa if ($devname=~/mthca|mlx4_|mlx5_|qib|hfi1_/) { $HCAName[$NumHCAs]=$devname; $HCAId[$NumHCAs]=($devname=~/hfi/) ? $devnum+1 : "$devname$devnum"; $HCANames.=" $devname"; $file=$SysIB; $file.="/$devname"; $file.=$devnum; $file.="/ports"; @ports=ls($file); $maxPorts=scalar(@ports) if scalar(@ports)>$maxPorts; foreach $port (@ports) { $port=~/(\d+)/; $port=$1; $link=cat("$file/$1/link_layer"); $state=cat("$file/$1/state"); $state=~/.*: *(.+)/; $portState=($link="InfiniBand" && $1 eq 'ACTIVE') ? 1 : 0; $HCAPorts[$NumHCAs][$port]=$portState; $HCAOpaV4[$NumHCAs][$port]=(-e "$file/$port/counters") ? 1 : 0; if ($portState) { print " IB Device: $devname$devnum OFED Port: $port ID: $HCAId[$NumHCAs] OpaV4: $HCAOpaV4[$NumHCAs][$port]\n" if $debug & 2; $HCANames.=":$port"; $activePorts++; } } } $NumHCAs++; } $HCANames=~s/^ //; # Now we need to know port states for header. $HCAPortStates=''; for ($i=0; $i<$NumHCAs; $i++) { for (my $j=1; $j<=scalar($maxPorts); $j++) { # The expectation is the number of ports is contant on all HCAs # but just is case they're not, set extras to 0. $HCAPorts[$i][$j]=0 if !defined($HCAPorts[$i][$j]); $HCAPortStates.=$HCAPorts[$i][$j]; } $HCAPortStates.=':'; } $HCAPortStates=~s/:$//; # only report inactive status once per day OR after something changed if ($activePorts==0) { logmsg('E', "Found $NumHCAs HCA(s) but none had any active ports") if $inactiveIBFlag==0; $inactiveIBFlag=1; } # The names include active ports too so changes can be detected. $changeFlag=($HCANames ne $saveHCANames) ? 1 : 0; print "IB Change -- OldHCAs: $saveHCANames NewHCAs: $HCANames\n" if $debug & 2 && $HCANames ne $saveHCANames; return ($activePorts && $HCANames ne $saveHCANames) ? 1 : 0; } sub lustreCheckClt { # don't bother checking if specific services were specified and not this one return 0 if $lustreSvcs ne '' && $lustreSvcs!~/c/i; my ($saveFS, $saveOsts, $saveInfo, @lustreFS, @lustreDirs); my ($dir, $dirname, $inactiveFlag); # We're saving the info because as unlikely as it is, if the ost or fs state # changes without their numbers changing, we need to know! $saveFS= $NumLustreFS; $saveOsts= $NumLustreCltOsts; $saveInfo= $lustreCltInfo; undef @lustreCltDirs; undef @lustreCltFS; undef @lustreCltFSCommon; undef @lustreCltOsts; undef @lustreCltOstFS; undef @lustreCltOstDirs; # G e t F i l e s y s t e m N a m e s $FSWidth=0; @lustreFS=glob("/proc/fs/lustre/llite/*"); $lustreCltInfo=''; foreach my $dir (@lustreFS) { # in newer versions of lustre, the fs name was dropped from uuid, so look here instead # which does exist in earlier versions too, but we didn't look there sooner because # uuid is still used in other cases and I wanted to be consistent. my $commonName=cat("$dir/lov/common_name"); chomp $commonName; my $fsName=(split(/-/, $commonName))[0]; # we use the dirname for finding 'stats' and fsname for printing. # we may need the common name to make osts back to filesystems my $dirname=basename($dir); push @lustreCltDirs, $dirname; push @lustreCltFS, $fsName; push @lustreCltFSCommon, $commonName; $lustreCltInfo.="$fsName: "; $FSWidth=length($fsName) if $FSWidth$saveFS; # O n l y F o r ' - - l u s t o p t s B / O ' G e t O S T N a m e s undef %lustreCltOstMappings; $inactiveFlag=0; $NumLustreCltOsts='-'; # only meaningful for --lustopts O if ($CltFlag && $lustOpts=~/[BO]/) { # we first need to get a list of all the OST uuids for all the filesystems, noting # the 1 passed to cat() tells it to read until EOF foreach my $commonName (@lustreCltFSCommon) { my $fsName=(split(/-/, $commonName))[0]; my $obds=cat("/proc/fs/lustre/lov/$commonName/target_obd", 1); foreach my $obd (split(/\n/, $obds)) { my ($uuid, $state)=(split(/\s+/, $obd))[1,2]; next if $state ne 'ACTIVE'; $lustreCltOstMappings{$uuid}=$fsName; } } $lustreCltInfo=''; # reset by adding in OSTs $NumLustreCltOsts=0; @lustreDirs=glob("/proc/fs/lustre/osc/*"); foreach $dir (@lustreDirs) { # Since we're looking for OST subdirectories, ignore anything not a directory # which for now is limted to 'num_refs', but who knows what the future will # hold. As for the 'MNT' test, I think that only applied to older versions # of lustre, certainlu tp HP-SFS. next if !-d $dir; # currently only the 'num_refs' file next if $cfsVersion lt '1.6.0' && $dir!~/\d+_MNT/; # Looks like if you're on a 1.6.4.3 system (and perhaps earlier) that is both # a client as well as an MDS, you'll see MDS specific directories with names # like - lustre-OST0000-osc, whereas lustre-OST0000-osc-000001012e950400 is the # client directory we want, so... next if $dir=~/\-osc$/; # if ost closed (this happens when new filesystems get created), ignore it. # note that newer versions of lustre added a sstate and sets it to DEACTIVATED my ($uuid, $state,$sstate)=split(/\s+/, cat("$dir/ost_server_uuid")); next if $state=~/CLOSED|DISCONN/ || $sstate=~/DEACT/; # uuids look something like 'xxx-ost_UUID' and you can actully have a - or _ # following the xxx so drop the beginning/end this way in case an embedded _ # in ost name itself. $ostName=$uuid; $ostName=~s/.*?[-_](.*)_UUID/$1/; $fsName=$lustreCltOstMappings{$uuid}; $OstWidth=length($ostName) if $OstWidth$saveOsts; } $lustreCltInfo=~s/ $//; # Change info is important even when not logging except during initialization if ($lustreCltInfo ne $saveInfo) { my $comment=($filename eq '') ? '#' : ''; my $text="Lustre CLT OSTs Changed -- Old: $saveInfo New: $lustreCltInfo"; logmsg('W', "${comment}$text") if !$firstPass; print "$text\n" if $firstPass && $debug & 8; } return ($lustreCltInfo ne $saveInfo) ? 1 : 0; } sub lustreCheckMds { # don't bother checking if specific services were specified and not this one return 0 if $lustreSvcs ne '' && $lustreSvcs!~/m/i; # if this wasn't an MDS and still isn't, nothing has changed my $type=($cfsVersion lt '1.6.0') ? 'MDT' : 'MDS'; return 0 if !$NumMds && !-e "/proc/fs/lustre/mdt/$type/mds/stats"; my ($saveMdsNames, @mdsDirs, $mdsName); $saveMdsNames=$MdsNames; $MdsNames=''; $NumMds=$MdsFlag=0; @mdsDirs=glob("/proc/fs/lustre/mds/*"); foreach $mdsName (@mdsDirs) { next if $mdsName=~/num_refs/; $mdsName=basename($mdsName); $MdsNames.="$mdsName "; $NumMds++; $MdsFlag=1; # for consistency with CltFlag and OstFlag } $MdsNames=~s/ $//; # Change info is important even when not logging except during initialization if ($MdsNames ne $saveMdsNames) { my $comment=($filename eq '') ? '#' : ''; my $text="Lustre MDS FS Changed -- Old: $saveMdsNames New: $MdsNames"; logmsg('W', "${comment}$text") if !$firstPass; print "$text\n" if $firstPass && $debug & 8; } return ($MdsNames ne $saveMdsNames) ? 1 : 0; } sub lustreCheckOst { # don't bother checking if specific services were specified and not this one return 0 if $lustreSvcs ne '' && $lustreSvcs!~/o/i; # if this wasn't an OST and still isn't, nothing has changed. return 0 if !$NumOst && !-e "/proc/fs/lustre/obdfilter"; my ($saveOst, $saveOstNames, @ostFiles, $file, $ostName, $subdir); $saveOst=$NumOst; $saveOstNames=$OstNames; undef @lustreOstSubdirs; # check for OST files $OstNames=''; $NumOst=$OstFlag=0; @ostFiles=glob("/proc/fs/lustre/obdfilter/*/stats"); foreach $file (@ostFiles) { $file=~m[/proc/fs/lustre/obdfilter/(.*)/stats]; $subdir=$1; push @lustreOstSubdirs, $subdir; $temp=cat("/proc/fs/lustre/obdfilter/$subdir/uuid"); $ostName=transLustreUUID($temp); $OstWidth=length($ostName) if $OstWidth$saveOst; # Change info is important even when not logging except during initialization if ($OstNames ne $saveOstNames) { my $comment=($filename eq '') ? '#' : ''; my $text="Lustre OSS OSTs Changed -- Old: $saveOstNames New: $OstNames"; logmsg('W', "${comment}$text") if !$firstPass; print "$text\n" if $firstPass && $debug & 8; } return ($OstNames ne $saveOstNames) ? 1 : 0; } sub transLustreUUID { my $name=shift; my $hostRoot; # This handles names like OST_Lustre9_2_UUID or OST_Lustre9_UUID or in # the case of SFS something like ost123_UUID, changing them to just 0,9 # or ost123. chomp $name; $hostRoot=$Host; $hostRoot=~s/\d+$//; $name=~s/OST_$hostRoot\d+//; $name=~s/_UUID//; $name=~s/_//; $name=0 if $name eq ''; return($name); } # since it seems OFED changes the locations of perfquery and ofed_info # with each release, we're gonna check for them here and if we can't find # them, do an 'rpm -qal' and look for them there and on finding them, # update /etc/collectl.conf (if we can) sub getOfedPath { my $list= shift; my $name= shift; my $label=shift; my $found=''; foreach my $path (split(/:/, $list)) { if (-e $path) { $found=$path; last; } } # RHEL54 stopped shipping it so we need to know RH version first my $RHVersion=($Distro=~/Red Hat.*(\d+\.\d+)/) ? $1 : ''; # Can't find in standard places so ask rpm, but only if it's there if ($found eq '' && -e $Rpm && $RHVersion ne '' && $RHVersion<5.4) { # This is something we really don't want to have to be doing logmsg('W', "Cannot find '$name' in ${configFile}'s OFED search list, checking with rpm"); $command="$Rpm -qal | $Grep $name | $Grep -v man"; print "Command: $command\n" if $debug & 2; $found=`$command`; if ($found ne '') { if (-w $configFile) { chomp($found); logmsg('I', "Adding '$found' to '$label' in $configFile"); my $conf=`$Cat $configFile`; $conf=~s/($label\s+=\s+)(.*)$/$1$found:$2/m; open CONF, ">$configFile" or logmsg("F", "Couldn't write to $configFile so do it manually!"); print CONF $conf; close CONF; } else { logmsg('W', "found '$name' in rpm but $configFile not writeable so not updated"); } } } return($found); } # While tempted to put this in collectl main line, this is really only used during formatting sub loadEnvRules { my $envStdFlag=($envRules eq '') ? 1 : 0; my $ruleFile=($envStdFlag) ? "$ReqDir${Sep}envrules.std" : $envRules; open TMP, "<$ruleFile" or logmsg('F', "Cannot open '$ruleFile'"); my $skipFlag=1 if $envStdFlag; # if 'std', need to find right stanza my ($index, $type); while (my $line=) { next if $line=~/^#|^\s*$/; chomp $line; if ($line=~/>(.*){$type}->[$index]->{f1}=$f1; $ipmiFile->{$type}->[$index]->{f2}=$f2; $index++; } close TMP; } ################################################## # These are MUCH faster than the linux commands # since we don't have to start a new process! ################################################## sub cat { my $file=shift; my $eof= shift; my $temp=''; if (!open CAT, "<$file") { logmsg("W", "Can't open '$file'"); } else { # if 'eof' set, return entire file, otherwise just 1st line. # also note if empty file return default value which is '' while (my $line=) { $temp.=$line; last if !defined($eof); } close CAT; } return($temp); } sub ls { my @dirs; opendir DIR, $_[0]; while (my $line=readdir(DIR)) { next if $line=~/^\./; push @dirs, $line; } close DIR; return(@dirs); } 1; collectl-4.3.1/colmux0000775000175000017500000023753513366602004012742 0ustar mjsmjs#!/usr/bin/perl -w # problems # pattern match for -i recognition/removal (in -test) not robust enough # Copyright 2005-2017 Hewlett-Packard Development Company, LP # # colmux may be copied only under the terms of either the Artistic License # or the GNU General Public License, which may be found in the source kit # Debug Values # 1 - print interesting stuff # 2 - print more instersting stuff # 4 - do NOT start collectl. real handy for debugging collectl side # 8 - replaced functionality with -noescape # 16 - show selected hostname/addresses and exit # 32 - echo chars received on STDIN # 64 - echo comments from collectl. useful when inserting debugging comments # 128 - echo everything from collectl # 256 - async double-buffering # 512 - echo collectl version checking commands # 1024 - print remote host's datetime as yyyymmddhhmmss # KNOWN PROBLEMS # # pdsh format is not handled correctly by csh and you will need to quote expressions # # The format of the process data may vary based on whether or not a system provides # I/O stats as well. If not all systems provide a consistent format, you will get # some unaligned columns, the headers based on the first system configuration. # # Lustre FS and OST names can vary in width so if you're monitoring systems with # different name widths the columns won't line up. However, since one typically # wouldn't monitor mixed lustre environments at the same time, they should all be # consistent in width, unlike netnames whose widths do and you may need to # include --netopts w in your collectl command # # colmux will sort columns as numeric or string. Numeric preserves sort order whereas # string sorts will go by the leftmost digits, giving 10 a higher sort order than 9. # If colmux sees a column that does contain a digit it will do a string sort. This # will affect any number fields, eg process priorities can also be RT use File::Basename; use Getopt::Long; use IO::Socket; use IO::Select; use Net::Ping; use Time::Local; use Cwd 'abs_path'; use strict 'vars'; use threads; use threads::shared; my @dates:shared; my @threadFailure:shared; my $firstHostName:shared; my $firstColVersion:shared; # This construct allows us to run in playback mode w/o readkey # being installed by explicitly declaring the 2 routines below. my $readkeyFlag=(eval {require "Term/ReadKey.pm" or die}) ? 1 : 0; # it was discovered that the threads::join doesn't work with earlier versions my $threadsVersion=threads->VERSION; # Make sure we flush buffers on print. $|=1; # If running the '.pl' version of colmux, which I typically do during # development, assume collectl lives in /usr/bin. The rest of the time use # the same directory as colmux since the both might be on a network share my $BinDir=($0=~/pl$/) ? '/usr/bin' : dirname(abs_path($0)); my $Collectl="$BinDir/collectl"; my $Program='colmux'; my $Version='5.0.0'; my $Copyright='Copyright 2005-2018 Hewlett-Packard Development Company, L.P.'; my $License="colmux may be copied only under the terms of either the Artistic License\n"; $License.= "or the GNU General Public License, which may be found in the source kit"; my $Ping='/bin/ping'; my $ResizePath='/usr/bin/resize:/usr/X11R6/bin/resize'; my $Route='/sbin/route'; my $Ifconfig='/sbin/ifconfig'; my $Grep='/bin/grep'; my $DefPort=2655; my $K=1024; my $M=1024*1024; my $G=1024*1024*1024; my $ESC=sprintf("%c", 27); my $Home=sprintf("%c[H", 27); # top of display my $Bol=sprintf("%c[99;0f", 27); # beginning of current line my $Clr=sprintf("%c[J", 27); # clear to end of display my $Clscr="$Home$Clr"; # clear screen my $Cleol=sprintf("%c[K", 27); # clear to end of line my $bold=sprintf("%c[7m", 27); my $noBold=sprintf("%c[0m", 27); my $bell=sprintf("%c", 7); my $pingTimeout=5; my $hiResFlag=(eval {require "Time/HiRes.pm" or die}) ? 1 : 0; error('this tool requires the perl-time-hires module') if !$hiResFlag; # Let's see if we can find resize my $resize; foreach my $bin (split/:/, $ResizePath) { $resize=$bin if -e $bin; } my $termHeight=24; my $termWidth=80; if (defined($resize)) { `$resize`=~/LINES.*?(\d+)/m; $termHeight=$1; `$resize`=~/COLUMNS.*?(\d+)/m; $termWidth=$1; } # This controls real-time, multi-line sorting. my %sort; # Don't use unless you know all collectl versions support it my $timeout=''; # Default parameter settings. my $address=''; my $age=2; my $noboldFlag=0; my $nosortFlag=0; my $noEscapeFlag=0; my $column=1; my $cols=''; my $colwidth=6; my $command=''; my $debug=0; my $delay=0; my $freezeFlag=0; my $homeFlag=0; my $hostFilter=''; my $hostFormat=''; my $hostWidth=8; my $keepalive=''; my $nocheckFlag=0; my $port=$DefPort; my $negdataval; my $nodataval=-1; my $maxLines=$termHeight; my $username=''; my $sudoFlag=0; my $sshkey=''; my $colhelp; my ($helpFlag,$verFlag)=(0,0); my ($revFlag,$zeroFlag)=(0,0); my ($colhelpFlag,$colnodetFlag,$testFlag,$colTotalFlag,$col1Flag, $colKFlag, $reachableFlag, $quietFlag)=(0,0,0,0,0,0,0,0,0); my $colnoinstFlag=0; my $colLogFlag=0; my $colnodiv=''; my $finalCr=0; my $retaddr=''; my $timerange=1; GetOptions("address=s" =>\$address, "age=i" =>\$age, "colbin=s" =>\$Collectl, "colk!" =>\$colKFlag, "collog10!" =>\$colLogFlag, "col1000!" =>\$col1Flag, "column=s" =>\$column, "cols=s" =>\$cols, "colhelp!" =>\$colhelpFlag, "colnodet!" =>\$colnodetFlag, "colnoinst!" =>\$colnoinstFlag, "colnodiv=s" =>\$colnodiv, "coltotal!" =>\$colTotalFlag, "colwidth=i" =>\$colwidth, "command=s" =>\$command, "debug=i" =>\$debug, "delay=s" =>\$delay, "finalcr!" =>\$finalCr, "lines=i" =>\$maxLines, "help!" =>\$helpFlag, "homeFlag!" =>\$homeFlag, "hostfilter=s" =>\$hostFilter, "hostformat=s" =>\$hostFormat, "hostwidth=i" =>\$hostWidth, "keepalive=i" =>\$keepalive, "negdataval=i" =>\$negdataval, "nodataval=i" =>\$nodataval, "nocheck!" =>\$nocheckFlag, "nobold!" =>\$noboldFlag, "noescape!" =>\$noEscapeFlag, "nosort!" =>\$nosortFlag, "port=i" =>\$port, "quiet!" =>\$quietFlag, "reachable!" =>\$reachableFlag, "retaddr=s" =>\$retaddr, "reverse!" =>\$revFlag, "sshkey=s" =>\$sshkey, "timerange=i" =>\$timerange, "sudo!" =>\$sudoFlag, "test!" =>\$testFlag, "timeout=i" =>\$timeout, "username=s" =>\$username, "version!" =>\$verFlag, "zero!" =>\$zeroFlag, ) or error("type -help for help"); help() if $helpFlag; if ($verFlag) { my $readkeyVer=($readkeyFlag) ? 'V'.Term::ReadKey->VERSION : 'not installed'; print "$Program: $Version (Term::ReadKey: $readkeyVer Threads: $threadsVersion)\n\n$Copyright\n$License\n"; exit; } if ($noEscapeFlag) { $readkeyFlag=0; $Home=$Bol=$Clr=$Clscr=$Cleol=$bold=$noBold=$bell=''; } # This may evolve over time my ($hostDelim, $hostPiece)=('',''); if ($hostFormat ne '') { error('only valid format is char:pos') if $hostFormat!~/^\S{1}:\d+$/; ($hostDelim, $hostPiece)=split(':', $hostFormat) } # if sudo mode $Collectl="sudo $Collectl" if $sudoFlag; # ok if host not in known_hosts and when not debugging be sure to turn off motd my $Ssh='/usr/bin/ssh -o StrictHostKeyChecking=no -o BatchMode=yes'; $Ssh.=" -o ServerAliveInterval=$keepalive" if $keepalive ne ''; $Ssh.=" -q" unless $debug; error('-nocheck and -recheck are mutually exclusive') if $nocheckFlag && $reachableFlag; error('-nocheck and -quiet are mutually exclusive') if $nocheckFlag && $quietFlag; # P a r s e T h e C o m m a n d error('--top not allowed') if $command=~/--top/; # any imports? we HAVE to deal with these before looking at -s because if there # are and no -s, we have NO subsystems selected. We need to count them and also # set a flag if ANY of them have specifice a 'd' parameter $command=~/--imp.*?\s+(\S+)/; my $imports=(defined($1)) ? $1 : ''; my $numImports=0; my $importDetail=0; foreach my $import (split(/:/, $imports)) { $numImports++; # here'e where we check for a 'd' foreach my $param (split(/,/, $import)) { if ($param=~/^[sd]+$/) { $importDetail=1 if $param=~/d/; # see if detail data $numImports++ if length($param)==2; # if both, we have 2 subsys, not 1 } } } # default subsys depends on whether any imports my $defSubsys=($imports ne '') ? '' : 'cdn'; my $subsys=($command=~/-s\s*(\S+)/) ? $1 : $defSubsys; my $expFlag= ($command=~/--exp/) ? 1 : 0; my $verbFlag=($command=~/--verb/) ? 1 : 0; my $plotFlag=($command=~/-P/) ? 1 : 0; my ($fromTime, $thruTime)=split(/-/, $1) if $command=~/--fr\S*\s+(\S+)/; $thruTime=$1 if $command=~/--th\S*\s+(\S+)/; #error("invalid from/thru time") if !checkTime($fromTime) || !checkTime($thruTime); # Get options from command string being sure to IGNORE hostname in playback mode which could # contain within but removing all occurances of {char}-o from original command my $temp=$command; $temp=~s/\S-o//i; my $options=($temp=~/-o\s*(\S+)/) ? $1 : ''; # get today's date as well as building one in the standard format if specified in command # note - $year, $mon and $day must not be changed! my ($date, $day, $mon, $year, $today, $yesterday); ($day, $mon, $year)=(localtime(time-86400))[3..5]; $yesterday=sprintf("%d%02d%02d", $year+1900, $mon+1, $day); ($day, $mon, $year)=(localtime(time))[3..5]; $today=sprintf("%d%02d%02d", $year+1900, $mon+1, $day); $date=($options=~/d/) ? sprintf("%02d/%02d", $mon+1, $day) : sprintf("%d%02d%02d", $year+1900, $mon+1, $day); $command=~s/TODAY/*$today*/i; $command=~s/YESTERDAY/*$yesterday*/i; # Surrounding the command with spaces makes the parsing easier below. We're looking # for playback filenames and then surrounding them with "s $command=" $command "; my ($playbackFile, $playbackFlag); error('-p in collectl command requires an argument') if $command=~/-p\s+\-|--pla\S+\s+-/; $playbackFile=$1 if $command=~s/\s-p\s*(\S+)(.*)/ -p "$1"$2/; $playbackFile=$2 if $command=~s/\s(--pla.*?)\s+(\S+)(.*)/ $1 "$2"$3/; $playbackFlag=(defined($playbackFile)) ? 1 : 0; $command=~s/^ (.*) $/$1/; # remove leading/trailing spaces error('-P only allowed with -cols') if $plotFlag && $cols eq '' && !$testFlag; error('-colnodiv only applies to -cols') if $colnodiv ne '' && $cols eq ''; error('only valid -o values are mndDT') if $options ne '' && $options!~/^[mndDT]+$/; error('-o only allows 1 of dDT') if $options ne '' && $options=~/([dDT]+)/ && length($1)>1; error('-om requires at least 1 of dDT') if $options eq 'm'; error('-hostfilter only applies to local playback files') if $hostFilter ne '' && (!$playbackFlag || $address ne ''); error('-home only applies to multi-line playback data') if $homeFlag && $cols eq '' && !$playbackFlag; error('cannot mix slab/process data with anything else') if $subsys=~/[YZ]/ && $subsys=~/[a-zA-X]/; # real-time, multi-line default is -home $homeFlag=1 if $cols eq '' && !$playbackFlag; if (!$plotFlag) { # how many subsys, including imports, are being reported? # note that if an uppercase subsys OR an import with a 'd', we have detail data present my $numSubsys=$numImports+(($subsys ne '-all') ? length($subsys) : 0); my $detailFlag=($subsys=~/^[A-X]+$/ || $importDetail) ? 1 : 0; error('--verbose not allowed with multiple subsystems w/o -P') if $verbFlag && $numSubsys>1; error('cannot mix summary and detail data w/o -P') if $subsys=~/[a-x]/ && $subsys=~/[A-X]/; error('cannot specify detail data when multiple subsystems w/o -P') if $numSubsys>1 && $detailFlag; } my $localFlag=1; my (@hostnames, $firstAddress); if (!$playbackFlag || $address ne '') { $address='localhost' if $address eq ''; # use 'localhost' for real-time mode $localFlag=0; if (-f $address) { open ADDR, "<$address" or die "Couldn't open '$address'"; while (my $line=) { next if $line=~/^#|^\s*$/; chomp $line; $line=~s/^\s*//; # strip leading whitespace push @hostnames, $line; } close ADDR; } else { @hostnames=pdshFormat($address); } } my $numHosts=scalar(@hostnames); # See if any host specs contain 'username@' & reset 'localhost' and # adjust maximum hostname length if necessary. my $hostlen=$hostWidth; my $myhost=`hostname`; chomp $myhost; my (%usernames, %sshswitch, %aliases); for (my $i=0; $i<@hostnames; $i++) { # $hostnames[] is typically just the hostname, but sometimes it's more complex and in those cases # we need to pull out the optional ssh prefix, username and aliases. my $host=$hostnames[$i]; # NOTE - to use sshswitches you MUST use @ as well so strip everything # preceding hostname and save host (ignoring alias if there is one) my ($prefix, $user, $alias)=('','',''); if ($hostnames[$i]=~s/(.*)@(\S+)/$2/) { $user=$1; $host=$2; # if whitespace, the it's really a prefix and username if ($user=~/(.*)\s+(\S+)/) { $prefix=$1; $user=$2; } #print "PREFIX: $prefix USER: $user HOST: $host\n"; } error("-i and/or usernames in addr file conflict with -sshkey") if $prefix ne '' && $sshkey ne ''; if ($hostnames[$i]=~/(\S+)\s+(\S+)/) { $host=$1; $alias=$2; #print "ALIAS[$host]: $alias\n"; } # if -username, initially associate it with ALL hosts $usernames{$host}=$username if $username ne ''; $usernames{$host}=$user if $user ne ''; $sshswitch{$host}=$prefix if $prefix ne ''; $sshswitch{$host}="-i $sshkey" if $sshkey ne ''; $aliases{$alias}=$i if $alias ne ''; # make sure this only contains a hostname $hostnames[$i]=$host; # force local hostname if 'localhost' $hostnames[$i]=$myhost if $hostnames[$i] eq 'localhost'; # determine the maximum host's name and if a real name vs an address, remove # the domain portion as well. my $tempname=$host; $tempname=(split(/\./, $tempname))[0] if $tempname=~/^[a-z]/i; $hostlen=length($tempname) if length($tempname)>$hostlen; } ######################################################################################### # C h e c k A l l H o s t s F o r R e a c h a b i l i t y / C o n f i g ######################################################################################### # make sure all remote hosts are reachable and properly configured my @threads; if ($address ne '' && !$nocheckFlag) { # seems that even though a timeout of 1 second if long enough to detect failed pings, # we need longer or else good nodes will get failed trying to connect back to us my $ping=Net::Ping->new(); for (my $i=0; $i<@hostnames; $i++) { $threads[$i]=threads->create('check', $i); } # Wait for ping responses in 10ths of a second for (my $i=0; $i<$pingTimeout*10; $i++) { last if threadsDone($numHosts); Time::HiRes::usleep(100000); } # make sure dates within --timerange secs my $minSecs=9999999999; my $maxSecs=0; for (my $i=0; $i<@hostnames; $i++) { # some checks may have failed so only look at 'good' nodes if (!$threadFailure[$i]) { my $year=substr($dates[$i], 0, 4); my $mon= substr($dates[$i], 4, 2); my $day= substr($dates[$i], 6, 2); my $hour=substr($dates[$i], 8, 2); my $mins=substr($dates[$i], 10, 2); my $secs=substr($dates[$i], 12, 2); my $seconds=(timelocal($secs, $mins, $hour, $day, $mon-1, $year-1900)); $minSecs=$seconds if $seconds<$minSecs; $maxSecs=$seconds if $seconds>$maxSecs; print "$hostnames[$i]: $dates[$i]" if $debug & 1024; if ($maxSecs-$minSecs>$timerange) { my $plural=($timerange>1) ? 's' : ''; print "WARNING: $hostnames[$i]'s time differs by more than $timerange second$plural with at least one other\n"; print " run again with -debug 1024 and/or use -timerange or -quiet to suppress this message\n"; } } } # Finally go back through hosts list in reverse order so we don't shift things # on top of each other, removing any that report unsuitability for use my $killSsh=0; my $allReachableFlag=1; my $printedReturnFlag=0; for (my $i=@hostnames-1; $i>=0; $i--) { if ($threadFailure[$i]) { $allReachableFlag=0; # If ping failed, thread already gone but if ssh it's still there so we need to kill # the ssh. Set a flag so we can do them all at once. $killSsh=1 if $threadFailure[$i]==-1; print "\n" if !$printedReturnFlag; # because ssh failures doesn't return carriage my $reason; $reason='passwordless ssh failed' if $threadFailure[$i]==-1; $reason='ping failed' if $threadFailure[$i]==1; $reason='collectl not installed' if $threadFailure[$i]==2; $reason='ssh: connection refused' if $threadFailure[$i]==4; $reason='ssh: permission denied' if $threadFailure[$i]==8; $reason='could not resolve name' if $threadFailure[$i]==16; $reason='timed out during banner exchange' if $threadFailure[$i]==32; $reason='collectl version < 3.5' if $threadFailure[$i]==64; $reason='collectl not installed' if $threadFailure[$i]==128; printf "$hostnames[$i] removed from list: $reason\n"; $printedReturnFlag=1; splice(@hostnames, $i, 1); $numHosts--; } } if ($killSsh) { # We need to look for a ps command w/o the -q my $tempSsh=$Ssh; $tempSsh=~s/ -q//; print "Killing hung ssh(s)\n" if $debug & 1; open PS, "ps axo pid,command|" or error("couldn't execute 'ps' to find ssh processes"); while (my $line=) { next if $line!~/$tempSsh/; $line=~s/^\s+//; # can have leading space my $pid=(split(/\s+/, $line))[0]; print "Killing ssh with pid: $pid\n" if $debug & 1; `kill $pid`; } sleep 1; # wait a tad for ssh in thread to exit close PS; } # for newer threads versions, all must be joined or we'll get errors when we exit if ($threadsVersion>'1.59') { foreach my $thread (threads->list(threads::joinable)) { $thread->join(); } } # if nobody reachable! error('no accessible addresses in list') if !$numHosts; # a couple of reasons to exit, but only report message if due to # unreachability if (!$allReachableFlag && $reachableFlag) { Term::ReadKey::ReadMode(0) if $readkeyFlag; print "Not all hosts configured correctly or reachable and so exiting...\n"; exit; } } if ($debug & 16) { # the print is over-the-top, but IS useful for verifying usernames parsed correctly print ">>> addresses <<<\n"; printf "%-${hostlen}s %s\n", 'HOST', 'USERNAME'; for (my $i=0; $i<$numHosts; $i++) { printf "%-${hostlen}s %s\n", $hostnames[$i], defined($usernames{$hostnames[$i]}) ? $usernames{$hostnames[$i]} : ''; } exit; } ############################### # C o m m o n S t u f f ############################### error('-lines cannot be negative') if $maxLines<0; # Makes a little easier to reference later. my $timeFlag=($options=~/[dDT]+/) ? 1 : 0; # These switches are common to both real-time and playback modes, but # some of those mode-specific switches not allowed in this mode. my @columns; my $maxColNum=0; my $maxDataAge; my $interval=($command=~/-i\s*:*(\d+)/) ? $1 : 1; # tricky because of --import my @colsNoDiv; if ($cols ne '') { # make sure all data numeric $command.=' -w'; # any data older than this is consider invalid, noting if secondary interval # just use 1. my $ageInterval=($interval=~/:/) ? 1 : $interval; $maxDataAge=$age*$ageInterval; # We need to set this first so -test will work right @columns=split(/,/,$cols); # Skip ALL cols related validation with -test if (!$testFlag) { error('-nosort not allowed in column mode') if $nosortFlag; error('-delay not allowed in column mode unless -p') if $delay && !$playbackFlag; error('-colnodet requires -coltotal') if $colnodetFlag && !$colTotalFlag; error('detailed data not allowed unless -P') if $subsys=~/[A-X]/ && !$plotFlag; foreach my $col (@columns) { error('you cannot select host column with -cols, verify with -test') if $col==0; error('-cols incorrectly specifies date/time field. verify with --test') if ($col==1 && $timeFlag) || ($col==2 && $options=~/[dD]/) || ($col<3 && $plotFlag); $maxColNum=$col if $col>$maxColNum; } } if ($colnodiv ne '') { my @cols=split(/,/, $colnodiv); foreach my $col (@cols) { error("non-numeric column in -colnodiv: $col") if $col!~/^\d+$/; my $match=0; for my $i (@columns) { $match=1 if $col==$i; } error("specified column $col with -colnodiv but not with -cols") if !$match; $colsNoDiv[$col]=1; } } } else { error('-colk only applies to -columns') if $colKFlag; error('-collog only applies to -columns') if $colLogFlag; error('-col1000 only applies to -columns') if $col1Flag; error('-colnodet only applies to -columns') if $colnodetFlag; error('-coltotal only applies to -columns') if $colTotalFlag; error('-nodataval only applies to -columns') if $nodataval!=-1; error('-negdataval only applies to -columns') if defined($negdataval); error("-o not allowed for 'real-time', non-cols format") if !$playbackFlag && $timeFlag; } # force -oT if time not specified by either appending to command OR adding # to -o if that has been specified if (!$timeFlag) { $command=~s/-o/-oT/ if $options ne ''; $command.=' -oT' if $options eq ''; } # Additional globals, may only be needed by one mode my $input; my $ctrlCFlag=0; my $numCols=0; my $numLines=-1; my $numReporting=0; my $somethingPrintedFlag; my $boldFlag=($noboldFlag) ? 0 : 1; my $oldColFlag; my (@printStack, @hostdata); my (@host, @hostVars, @sample, %files); # if in 'local' mode we don't yet know the max host name length for reformHeaders() # so get it here first and while we're at is save the hostnames for later too if ($playbackFlag && $localFlag) { my (%temp, $host); my @hostFilters=pdshFormat($hostFilter) if $hostFilter ne ''; my $globSpec=$playbackFile; $numHosts=0; $globSpec=~s/"//g; foreach my $file (glob($globSpec)) { # When we glob, we expand the string as would the shell. If no wildards it just # returns itself which may NOT be a valid filename so we have to test next if !-f $file; next if $file!~/\d{8}-\d{6}\.raw/; $file=basename($file); $file=~/(.*)-\d{8}-\d{6}\.raw/; $host=$1; next if defined($temp{$host}); # if already seen/kept this hostname # if using host filters, only identify keep those that match if ($hostFilter ne '') { my $filterMatch=0; foreach my $filter (@hostFilters) { $filterMatch=1 if $filter eq $host; } next if !$filterMatch; } # keep this host and add ONCE to list of hosts to be processed $numHosts++; $temp{$host}=''; push @hostnames, $host; $hostlen=length($host) if $hostlen{bufptr}=0; $hostVars[$i]->{maxinst}->[0]=-1; $hostVars[$i]->{lastinst}->[0]=-1; $hostVars[$i]->{lasttime}->[0]=''; $hostVars[$i]->{maxinst}->[1]=-1; $hostVars[$i]->{lastinst}->[1]=-1; $hostVars[$i]->{lasttime}->[1]=''; } # O p e n O u r S o c k e t ( s ) my $myReturnAddr=($retaddr eq '') ? getReturnAddress($firstAddress) : $retaddr; my $mySocket = new IO::Socket::INET(Type=>SOCK_STREAM, Reuse=>1, Listen => 1, LocalPort => $port) or error("couldn't create local socket on port: $port"); my $sel = new IO::Select($mySocket); print "Listening for connections on $port\n" if $debug & 1; # S e t A l a r m # if interval specified in command, use that; otherwise 1 my $interval=(defined($interval)) ? $interval : 1; $interval=~s/.*://; # In case sub-intervals $SIG{"ALRM"}=\&alarm; my $uInterval=$interval*10**6; Time::HiRes::ualarm($uInterval, $uInterval); # S t a r t R e m o t e c o l l e c t l s for (my $i=0; $i<$numHosts; $i++) { my $switch=(!defined($sshswitch{$hostnames[$i]})) ? '' : $sshswitch{$hostnames[$i]}; my $uname= (!defined($usernames{$hostnames[$i]})) ? '' : "$usernames{$hostnames[$i]}\@"; my $access=($localFlag) ? '$Ssh -n localhost' : "$Ssh -n $switch $uname$hostnames[$i]"; # MUST include timestamps my $colCommand= "$access $Collectl $command -A $myReturnAddr:$port"; $colCommand.=":$timeout" if $timeout ne ''; $colCommand.=" --quiet" if !$debug; $colCommand.=" &"; print "Command: $colCommand\n" if $debug & 1; system($colCommand) unless $debug & 4; } # start with a clear screen print "$Home$Clscr" if $homeFlag; # add STDIN to list of handles to look for input on. $sel->add(STDIN); my $Record=''; my $hostNum=0; my $headerHost=-1; my $lastHost=-1; my ($remoteAddr, %sockHandle); while(!$ctrlCFlag) { # NOTE - since much of collectl's multiline prints are via multiple socket # writes, the data may come back as separate lines here, and not # necessary all together so we need to track the last one seen # NOTE2 - the can_read() below will prematurely wake up when the timer goes # off but that's ok because it will simply fall through the loop and # come right back... while(my @ready = $sel->can_read(1)) { foreach my $filehandle (@ready) { if ($filehandle eq 'STDIN') { stdin(); next; } # NOTE - logic for handling hosts by socket stolen from colgui if ($filehandle==$mySocket) { # Create a new socket my $new = $mySocket->accept; $remoteAddr=inet_ntoa((sockaddr_in(getpeername($new)))[1]); $sockHandle{$new}=$addrhost{$remoteAddr}; $sockHandle{$new}=(defined($addrhost{$remoteAddr})) ? $addrhost{$remoteAddr} : $aliases{$remoteAddr}; $sel->add($new); # if we do get a connection from an unexpected place, accept it in case we # keep getting it, but then ignore it! if (!defined($addrhost{$remoteAddr})) { print "*** connection from unknown source: $remoteAddr! ***\n" unless $quietFlag; next; } printf "New socket connection from Host: %d Addr: %s\n", $sockHandle{$new}, $remoteAddr if $debug & 1; $numReporting++; } else { my ($host, $time, $therest); $Record=<$filehandle>; if ($Record) { chomp $Record; print ">>> $Record\n" if $debug & 128; ($host, $therest)=split(/ /, $Record, 2); # preserve leading spaces in 'therest' $hostNum=$sockHandle{$filehandle}; if (!defined($hostNum)) { print "Ignoring records from '$host' which is not "; print "recognizable. Is the alias wrong or missing?\n"; $remoteAddr=inet_ntoa((sockaddr_in(getpeername($filehandle)))[1]); $sel->remove($filehandle); $filehandle->close; $numReporting--; next; } $hostNumMap{$filehandle}=$hostNum; $host[$hostNum]=($hostFormat eq '') ? $host : (split(/$hostDelim/, $host))[$hostPiece]; if ($therest=~/^#/) { print "$therest\n" if $debug & 64; # when first starting up, not all hosts necessarily respond during initial # cycle so let's save the header from the first one who does next if ($gotHeadersFlag || ($headerHost!=-1 && $headerHost!=$hostNum)); # We want to skip the first line of the process data header next if $therest=~/^###/ && $subsys=~/Z/; $headerHost=$hostNum; push @headers, $therest if !$gotHeadersFlag; #print "HostNum: $hostNum TheRest: $therest\n"; } else { # Once we see data from the host we got the header from, we're done setting it. # but if an error with -cols (only discoverable at this point with older collectls) # treat as a ^C. next if $therest eq ''; if (!$gotHeadersFlag && $headerHost==$hostNum && scalar(@headers)) { $gotHeadersFlag=1; $ctrlCFlag=1 if !reformatHeaders(); } # Typically the data piece contains a timestamp as first field, but if date is # requested to be displayed as well we'll pull the time out of the first field # in '$therest', later on. But if it IS plot format we always start with date/time if (!$plotFlag) { ($time, $therest)=split(/ /, $therest, 2); } else { ($date, $time, $therest)=split(/ /, $therest, 3); } # since we know the instance of the last entry stored for this host, we now want the next one # however, if the times have changed we need to reset to 0 since this is all new data. Also need # to reset 'maxinst' to make sure we don't include any stale data which may also be in different # positions. my $bufptr=$hostVars[$hostNum]->{bufptr}; my $index=$hostVars[$hostNum]->{lastinst}->[$bufptr]+1; #print "BUFPTR: $bufptr INDEX: $index TIME: $time LTIME: $hostVars[$hostNum]->{lasttime}->[$bufptr]\n"; if ($time ne $hostVars[$hostNum]->{lasttime}->[$bufptr]) { $bufptr=($bufptr+1) % 2; $hostVars[$hostNum]->{bufptr}=$bufptr; $index=$hostVars[$hostNum]->{maxinst}->[$bufptr]=0; } $lastHost=$hostNum; $hostVars[$hostNum]->{lasttime}->[$bufptr]=($plotFlag || $options!~/[dD]/) ? $time : (split(/\s+/, $therest))[0]; # Be sure to update sample BEFORE updating pointers my $key=(split(/\s+/, $therest))[0]; $sample[$hostNum]->[$index]->[$bufptr]=($plotFlag || $timeFlag) ? "$time $therest": $therest; #print "SAMPLE[$hostNum][$index][$bufptr]: $sample[$hostNum]->[$index]->[$bufptr]\n"; # when doing plot mode we always reconstruct the original record as we also do in non-plot # mode when the user specifies a time format option. Remember, the ONLY time options can # be set are in cols mode so that's why we don't have to add that to the test below. $hostVars[$hostNum]->{lastinst}->[$bufptr]=$index; $hostVars[$hostNum]->{maxinst}->[$bufptr]=$index if $index>$hostVars[$hostNum]->{maxinst}->[$bufptr]; print "B Host[$hostNum]: $host BUF: $bufptr MAXINST: $hostVars[$hostNum]->{maxinst}->[$bufptr] ". "LASTINST: $hostVars[$hostNum]->{lastinst}->[$bufptr] TIME: $time LASTTIME: ". "$hostVars[$hostNum]->{lasttime}->[$bufptr] KEY: $key\n" if $debug & 256 } next; } # Sending socket must have gone away so clean it up $remoteAddr=inet_ntoa((sockaddr_in(getpeername($filehandle)))[1]); $sel->remove($filehandle); $filehandle->close; $numReporting--; printf "Disconnected host #$hostNumMap{$filehandle}: $remoteAddr. %d remaining\n", $numReporting if $debug & 1; # Remove this address from @host which probably should have been built from $remoteAddr # rather than name in record returned by collectl, but it's too late now delete $host[$hostNumMap{$filehandle}]; # when all sockets have been closed, time to exit if ($numReporting==0 && !($debug & 4)) { $ctrlCFlag=1; last; } } } } } $mySocket->close(); # this probably isn't necessary but just to be sure all the ssh sessions are dead, # kill them off, noting since multiple ones would have to have unique ports, there # is no danger of killing the wrong ones. open PS, "ps axo pid,command|"; while (my $line=) { next if $line!~/$myReturnAddr:$port/; # pid can have leading spaces $line=~s/^\s+//; my $pid=(split(/\s+/, $line))[0]; print "Killing ssh with pid: $pid\n" if $debug & 1; `kill $pid`; } } ################################# # P l a y b a c k M o d e ################################# else { # Control-C trap $SIG{"INT"}=\&sigInt; my $sel = new IO::Select(STDIN); # usleep expects usecs $delay*=1000000; error("networked playback file must start with '*'") if $address ne '' && basename($playbackFile)!~/^\*/; error('playback file MUST contain a date') if $playbackFile!~/(\d{8})/; $date=$1; $date=substr($date,4,2).'/'.substr($date,6,2) if $options=~/d/; $fromTime.=':00' if defined($fromTime) && length($fromTime)==5; $thruTime.=':00' if defined($thruTime) && length($thruTime)==5; valTime('from', $fromTime); valTime('thru', $thruTime); # if thru time in collectl command, stop there; otherwise process the whole day my $thruSecs= (defined($thruTime)) ? getSecs($thruTime) : 86400; # S t e p 1 - O p e n c o l l e c t l $somethingPrintedFlag=0; my ($interval1, $interval2, $interval3); my $firstSecs=86400; # ultimately < 24:00:00 my $headerState=0; # don't have my $firstHost=1; my $activeHosts=$numHosts; for (my $i=0; $i<@hostnames; $i++) { my $host=$hostnames[$i]; # We already have the hostname(s) for the local playback file(s) so rebuild # each name with only host-date so it matches only those we're interested in. my $tempCommand=$command; if ($localFlag) { my $playback=$playbackFile; my $dirname=dirname($playback); my $basename=basename($playback); my $filedate=($basename=~/(\d{8})/) ? $1 : '*'; $playback="$dirname/$host*$filedate*"; my $metaPlayback=quotemeta($playbackFile); $tempCommand=~s/$metaPlayback/$playback/; } # Either ssh to the local or remote systems # NOTE - tried skipping 'ssh' for local host but some kind of # interaction problem w/ stdin and it didn't work corectly my $uname=(!defined($usernames{$hostnames[0]})) ? '' : "$usernames{$hostnames[$i]}\@"; my $access=($localFlag) ? "$Ssh -n localhost" : "$Ssh -n $uname$host"; # First host? if ($i==0) { # When playing back local file(s), if they start with a wildcard, '$localhost' will # contain all the hostnames, one at a time. NOTE for some shells/distros/whatever # I found I needed to surround the whole command in 's or error msgs got eaten my $cmd="$access '$Collectl $tempCommand --showheader' 2>&1"; print "Command: $cmd\n" if $debug & 1; my $header=`$cmd`; error($1) if $header=~/(Error.*?)\n/; $header=~/Interval: (\S+)/; ($interval1, $interval2, $interval3)=split(/:/, $1); printf "Intervals: 1: %s 2: %s 3: %s\n", $interval1, defined($interval2) ? $interval2 : '', defined($interval3) ? $interval3 : '' if $debug & 1; } # in local mode, we don't use ssh OR surround the command with single quotes # also be sure to catch any remote errors written to STDERR my $cmd=($localFlag) ? "$Collectl $tempCommand --hr 0" : "$access '$Collectl $tempCommand --hr 0'"; $cmd.=" 2>&1"; print "Command: $cmd\n" if $debug & 1; $files{$host}->{pid}=open $files{$host}->{fd}, "$cmd |" or error("couldn't execute '$cmd'"); $files{$host}->{hdr}=0; # header not seen yet } # S t e p 2 - P r i m e T h e P u m p for (my $i=0; $i<@hostnames; $i++) { # Note that when we see the first host and have not yet seen a header, that I/O # stream's header will be saved my $secs; my $host=$hostnames[$i]; if (($secs=getNext($i))==-1) { $activeHosts--; delete $files{$host}; next; } # let's take the opportunity of having just processed our first file to make sure -column # specifies a valid number, noting with newer versions of collectl we'll have already read # the header. error("invalid column number. did you forget they start with 0?") if $i==0 && $column>=$numCols; # we want to start our analysis loop at the earliest entry returned by collectl. This will # assure is times aren't synchornized we'll catch all records for all hosts using that as a # starting interval. $firstSecs=int($secs) if $secs<$firstSecs; } # S t e p 3 - L o o p T h r o u g h T i m e R a n g e $numReporting=0; my $interval=($subsys!~/[yYZ]/) ? $interval1 : $interval2; $interval=$interval3 if $subsys=~/E/; for (my $time=$firstSecs; !$ctrlCFlag && $time<=$thruSecs; $time+=$interval) { # if we exhausted the data for all hosts before hitting the thru time, # we need an alternate way out of this loop. last if !$activeHosts; print "Time Loop: $time secs Int: $interval Limit: $thruSecs Reported: $numReporting\n" if $debug & 2; @printStack=() if $numReporting; # start empty if something reported last interval @hostdata=() if $numReporting; $numReporting=0; my %reported; my ($pushMin, $pushMax); for (my $i=0; $i<@hostnames; $i++) { my $host=$hostnames[$i]; next if !defined($files{$host}); # We always have read up to the next sample, so if there is something for this host in # this interval (it could be in future), that falls in this timeframe add it to the # output stack. $reported{$host}=0; #print "HOST: $host FILE: $files{$host}->{secs} TIME: $time INT: $interval\n"; # see, we're doing multiple calls to getnext, one per line... while ($files{$host}->{secs}<($time+$interval)) { # remove domain (it there) my $minihost=$host; $minihost=~s/\..*// if $minihost!~/^\d/; my $pushSecs=$files{$host}->{secs}; $pushMin=$pushSecs if !defined($pushMin) || $pushSecs<$pushMin; $pushMax=$pushSecs if !defined($pushMax) || $pushSecs>$pushMax; push @printStack, "$minihost $files{$host}->{line}"; # only need for column data so why waste compute cycles $hostdata[$i]="$minihost $files{$host}->{line}" if $cols ne ''; # If the first line for this host count it as reporting and get next record # remember, if --thru, it will be passed to collectl who will return an EOF # when we hit it. $numReporting++ if !$reported{$host}++; my $seconds=getNext($i); if ($seconds==-1) { $activeHosts--; delete $files{$host}; last; } } } # delay if asked to, check for input and then print contents of stack # (there is an off chance nobody reported anything for this time period) Time::HiRes::usleep($delay); stdin() if $sel->can_read(0); printInterval($pushMin, $pushMax) if $numReporting; } # NOTE - previous bugs in collectl suppressing timestamps have tripped this error in the past. error("No data recorded for any hosts. Is your date/timeframe correct?") if !$somethingPrintedFlag && !$ctrlCFlag; } # reset terminal which includes re-enabling echo print "\n"; Term::ReadKey::ReadMode(0) if $readkeyFlag; foreach my $host (keys %files) { print "Killing pid $files{$host}->{pid} for '$host'\n" if $debug & 1; `kill -9 $files{$host}->{pid}`; #close $files{$host}->{fd} or error("Failed to close playback file for '$host'"); } # This runs as a thread!!! sub check { my $i=shift; # We can't use the Net::Ping module because some systems block pings and # /bin/ping should be installed natively with suid privs so we CAN use that. my $out=`$Ping -c1 -w$pingTimeout $hostnames[$i] 2>&1`; $out=~/(\d+)% packet loss/; if ($1) { $threadFailure[$i]=1; return; } # Let's leave here in case it ever gets ressurected. # If ping fails, just return. If it succeeds we'll try for an ssh # my $pingStatus=$ping->ping($hostnames[$i], $pingTimeout); # if (!$pingStatus) # { # $threadFailure[$i]=1; # return; # } # we need at least V1.67 of threads to do this because we need to be able to KILL # don't really care what this returns as long as it doesn't hang. If it does # hang, the loop waiting on the threads will timeout. $threadFailure[$i]=-1; # Assume ssh will fail if ($threadsVersion>=1.67) { my $switch=(!defined($sshswitch{$hostnames[$i]})) ? '' : $sshswitch{$hostnames[$i]}; my $uname= (!defined($usernames{$hostnames[$i]})) ? '' : "$usernames{$hostnames[$i]}\@"; my $command="$Ssh $switch $uname$hostnames[$i] $Collectl -v 2>&1\n"; $command=~s/ -q//; # remove 'quiet' switch so we see 'connection refused' print "Command: $command" if $debug & 512; my $collectl=`$command`; # note if motd printed, 'collectl' may not start on the 1st line, so need /m on regx $threadFailure[$i]=0 if $collectl=~/^collectl V(\S+)/m; # success!!! $threadFailure[$i]=64 if $threadFailure[$i]==0 && $1 lt '3.5'; my $thisColVer=$1; my $hostname=$hostnames[$i]; $hostname=(split(/\./, $hostname))[0] if $hostname=~/[a-z]/i; # drop domain name if there if ($threadFailure[$i]==0) { if (!defined($firstHostName)) { $firstHostName=$hostname; $firstColVersion=$thisColVer; } print "***warning*** Collectl V$thisColVer on $hostname != V$firstColVersion on $firstHostName\n" if $thisColVer ne $firstColVersion && !$quietFlag; } # couldn't get collectl version $threadFailure[$i]=2 if $collectl=~/command not found/; $threadFailure[$i]=2 if $collectl=~/not installed/; # sometimes debian reports this instead of 'not found' $threadFailure[$i]=4 if $collectl=~/refused/; $threadFailure[$i]=8 if $collectl=~/Permission denied/s; $threadFailure[$i]=16 if $collectl=~/Could not resolve/s; $threadFailure[$i]=32 if $collectl=~/timed out during banner exchange/s; $threadFailure[$i]=128 if $collectl=~/collectl:\s+No such file/s; $command="$Ssh $switch $uname$hostnames[$i] date +%Y%m%d%H%M%S 2>&1\n"; print "Command: $command" if $debug & 512; $dates[$i]=`$command`; } } sub threadsDone { my $num=shift; for (my $i=0; $i<$num; $i++) { return(0) if $threads[$i]->is_running(); } return(1); } sub getNext { my $hostnum=shift; my $line; my $host=$hostnames[$hostnum]; my $fd=$files{$host}->{fd}; while ($line=<$fd>) { # older versions of collectl that would do 'stty' in error() when not # connected to a terminal can generate 'stty' and that screws up output # also some cases of uninit vars in older versions so ignore them too chomp $line; error("$hostnames[$hostnum]: $1") if $line=~/(Error.*)/; print ">>>$line\n" if ($line=~/^#/ && $debug & 64) || $debug & 128; next if $oldColFlag && $line=~/^stty|^Use of uninit/; # this is a little messy. If a MOTD, it precedes the header in this loop so ignore next if !$files{$host}->{hdr} && $line!~/^#/; last if $line!~/^#|^\s*$/; # Only happens with versions of collectl that don't support --showcolheaders push @headers, $line if $hostnum==0 && !$gotHeadersFlag; # header seen so we can exit on next non-# line $files{$host}->{hdr}=1; } return(-1) if !defined($line); # Just in case no data in selected file if ($line=~/(^No files processed)/) { print "$hostnames[$hostnum]: $1\n"; return(-1); } # If the first header was just seen, reformat it and set $numCols as a side effect # note this can only happen with older collectls because we reformat much earlier # If a problem, $crtlCFlag will have been set by reformatHeaders() reformatHeaders() if $hostnum==0 && !$gotHeadersFlag && scalar(@headers)>0; # Remove timestamp from line and save. be sure to do it here since at # print time there's no knowledge of it. my $timestamp; if (!$plotFlag) { $line=~s/^(\S+) //; $timestamp=$1; # If date was specified, we've already pulled that out so now get the time if ($options=~/[dD]/) { $line=~s/^(\S+) //; $timestamp.=" $1"; } } else # for plot format, get time but leave record alone { $line=~/^\S+\s+(\S+)/; $timestamp=$1; $timeFlag=0; } my $seconds=getSecs($timestamp); $files{$host}->{line}=sprintf("%s$line", ($timeFlag) ? "$timestamp " : ""); $files{$host}->{secs}=$seconds; return($seconds); } sub stdin { if ($debug & 32) { for (my $i=0; $i))) { if ($readkeyFlag) { my $byte=unpack('C', $char); if ($byte==3) { $ctrlCFlag=1; print "\n"; return; } $input.=$char; } last if !$readkeyFlag; # if not using ReadKey, this gets us out of loop } # Check for string terminated by RETURN if ($input=~/(.*)\n$/) { my $command=$1; $freezeFlag=0 if $command!~/^f$/i; # anything other than 'f' unfreezes display $freezeFlag=($freezeFlag+1) % 2 if $command=~/^f$/i; $revFlag= ($revFlag+1) % 2 if $command=~/^r$/i; $zeroFlag= ($zeroFlag+1) % 2 if $command=~/^z$/i; if ($command=~/^\d+$/) { print $bell if $command>=$numCols; $column=$command if $command<$numCols; } elsif ($command eq 'pu' || $command eq 'u') # page up { print $bell if $startLine==1; $startLine-=$bodyLines; $startLine=1 if $startLine<1; } elsif ($command eq 'pd' || $command eq 'd') # page down { print $bell if $startLine+$bodyLines-1>=$totalLines; $startLine+=$bodyLines; $startLine=$totalLines-$bodyLines+1 if ($startLine+$bodyLines-1)>$totalLines; $startLine=1 if $startLine<1; } $input=''; } if ($input=~/${ESC}\[(.*)/) { $freezeFlag=0; # anything unfreezes display my $key=$1; if ($key eq 'A') # up { $revFlag=1; } elsif ($key eq 'B') # down { $revFlag=0; } elsif ($key eq 'C') # right { print $bell if $column == ($numCols-1); $column++ if $column != ($numCols-1); } elsif ($key eq 'D') # left { print $bell if $column == 0; $column-- if $column != 0; } elsif ($key eq '5~') # page up { print $bell if $startLine==1; $startLine-=$bodyLines; $startLine=1 if $startLine<1; } elsif ($key eq '6~') # page down { print $bell if $startLine+$bodyLines-1>=$totalLines; $startLine+=$bodyLines; $startLine=$totalLines-$bodyLines+1 if ($startLine+$bodyLines-1)>$totalLines; $startLine=1 if $startLine<1; } $input=''; } } sub alarm { my @value; my $index=0; # But first we need to copy the latest values to the print stack @printStack=(); for (my $i=0; $i<$numHosts; $i++) { next if !defined($host[$i]); # no connected yet OR already disconnected # NOTE - when we call printInterval(), he'll remove the hostname and print it back # out padded accordingly. # Get double-buffering pointers my $currptr=$hostVars[$i]->{bufptr}; my $prevptr=($currptr+1) % 2; my $maxcurr=$hostVars[$i]->{maxinst}->[$currptr]; my $maxprev=$hostVars[$i]->{maxinst}->[$prevptr]; print "HOST: $i CUR: $currptr PREV: $prevptr MAXCUR: $maxcurr MAXPREV: $maxprev\n" if $debug & 256; for (my $j=0; $maxcurr!=-1 && $j<=$hostVars[$i]->{maxinst}->[$currptr]; $j++) { push @printStack, "$host[$i] $sample[$i]->[$j]->[$currptr]"; } for (my $j=$hostVars[$i]->{maxinst}->[$currptr]+1; $maxprev!=-1 && $j<=$hostVars[$i]->{maxinst}->[$prevptr]; $j++) { push @printStack, "$host[$i] $sample[$i]->[$j]->[$prevptr]"; } # column data needs to get stashed in a different data structure, indexed by # host number in case we're doing single line output. In that case we always have # the last sample in the current buffer. if ($cols ne '' && defined($sample[$i]->[0]->[$currptr])) { $hostdata[$i]="$host[$i] $sample[$i]->[0]->[$currptr]"; } } # When we first start, we may not have even received the header, so wait for it... # also note that the timestamp for real-time counters determined by localtime() # print "PRINT: $gotHeadersFlag\n"; $gotHeadersFlag=1; printInterval() if $gotHeadersFlag; } sub pdshFormat { my $address=shift; # Break out individual address, putting 'pdsh' expressions back # together if they got split my $partial=''; my $addressList=''; foreach my $addr (split(/[ ,]/, $address)) { # This is subtle. The '.*' will match up to the rightmost '['. If a ']' # follows, possibly followed by a string, we're done! We use this same # technique later to determine when we're done. if ($addr=~/.*\[(.*)$/ && $1!~/\]/) { $partial.=",$addr"; next; } if ($partial ne '') { $partial.=",$addr"; next if $partial=~/.*\[(.*)$/ && $1!~/\]/; $addr=$partial; } $addr=~s/^,//; $addressList.=($addr!~/\[/) ? "$addr " : expand($addr); $partial=''; } $addressList=~s/ $//; return((split(/[ ,]/, $addressList))); } # Expand a 'pdsh-like' address expression sub expand { my $addr=shift; print "Expand: $addr\n" if $debug & 1; $addr=~/(.*?)(\[.*\])(.*)/; my ($pre, $expr, $post)=($1, $2, $3); #print "PRE: $pre EXPR: $expr POST: $post\n"; my @newStack; my @oldStack=''; # need to prime it foreach my $piece (split(/\[/, $expr)) { next if $piece eq ''; # first piece always blank # get rid of trailing ']' and pull off range $piece=~s/\]$//; my ($from, $thru)=split(/[-,]/, $piece); $from=~/^(0*)(.*)/; #print "PIECE: $piece FROM: $from THRU: $thru 1: $1 2: $2\n"; my $pad=length($1); my $num=length($2); my $len=$pad+$num; my $spec=(!$pad) ? "%d" : "%0${len}d"; $piece=~s/-/../g; $piece=~s/^0*(\d)/$1/; # gets rid of leading 0s $piece=~s/([\[,.-])0*(\d)/$1$2/g; # gets rid of other numbers with them my @numbers=eval("($piece)"); undef @newStack; foreach my $old (@oldStack) { foreach my $number (@numbers) { my $newnum=sprintf("$spec", $number); push @newStack, "$old$newnum"; } } @oldStack=@newStack; } my $results=''; foreach my $spec (@newStack) { $results.="$pre$spec$post "; } return $results; } # Stolen from colgui, but since modified... sub getReturnAddress { my $address=shift; my $myaddr; # If only one network UP, that's the address to use. Note newer versions of # ifconfig, at least on RHEL 7.0 use 'broadcast' instead of 'Bcast' my $cmd="$Ifconfig 2>/dev/null | $Grep -E 'Bcast|broadcast'"; print "Command: $cmd\n" if $debug & 1; my @lines=`$cmd`; if (@lines==1) { $myaddr=(split(/\s+/, $lines[0]))[2]; $myaddr=~s/.*://; # this is a no-op with newer ifconfig print "Got address from ifconfig: $myaddr\n" if $debug & 1; return ($myaddr); } my ($destaddr, $gateway, $mask, $interface, $octet); my (@addrOctets, @destOctets, @maskOctets); print "Get return address associated with $address from 'route'\n" if $debug & 1; @addrOctets=split(/\./, $address); open ROUTES, "$Route|" or error("Couldn't execute '$Route'"); foreach my $line () { next if $line!~/^\d|^default/; chomp $line; ($destaddr, $gateway, $mask, $interface)=(split(/\s+/, $line))[0,1,2,7]; # Note if default route we don't have any digits in here, but since the # mask is 0.0.0.0 if will kick on on the first test. @destOctets=split(/\./, $destaddr); @maskOctets=split(/\./, $mask); for (my $i=0; $i<4; $i++) { # we're guaranteed a hit since the default starts with 0. if ($maskOctets[$i]==0) { close ROUTES; $myaddr=`$Ifconfig $interface | grep 'inet '`; $myaddr=(split(/\s+/, $myaddr))[2]; $myaddr=~s/addr://; # for cases where ipaddr is preceeded by this print "Got address from route for $interface: $myaddr\n" if $debug & 1; return ($myaddr); } last if ($addrOctets[$i] & $maskOctets[$i])!=$destOctets[$i]; } } # The only way to make sure this never happens is to put the code # in to catch it. print "Can't find default Route\n"; #error("Can't find default route in $Route"); } sub reformatHeaders { printf "Reformatting headers%s\n", $oldColFlag ? ': *** using OLD collectl ***' : '' if $debug & 1; # First, get rid of stuff we don't want like blank lines and 'RECORD' my @save=@headers; @headers=(); foreach my $line (@save) { next if $line=~/^\s*$|RECORD/; push @headers, $line; } my $diff=($hostlen>8) ? $hostlen-8 : 0; my $hostpad=' 'x$diff; $hostpad.= ' 'x9 if $timeFlag; my $padChars=length($hostpad); # most of the time we're dealing with multi-line header and NOT plot format # also note that it's the responsibility of earlier error checking to make # sure switch combos are legit # While it might be possible to combine a lot of this into less cases, it's # easier to test when different types of output are grouped this way. Start # by replacing the Time field with Host before we start shifting things. $headers[-1]=~s/#Time/# /; if (!$plotFlag) { if ($expFlag) { $headers[1]=~s/^#/#$hostpad/; $headers[2]=~s/^#/#$hostpad/ if defined($headers[2]); } elsif (($subsys=~/[a-z]/ || $numImports) && !$verbFlag) { # not sure why I used to skip shifting line 1, but I clearly # need to do it with some --imports for (my $i=0; $i<@headers; $i++) { $headers[$i]=~s/^#/#$hostpad/; } } elsif (($subsys=~/[a-z]/ || $numImports) && $verbFlag) { # with verbose or imports we DON'T shift line 0, though # seeing comment above I'm not sure why this is here $headers[1]=~s/^#/#$hostpad/; $headers[2]=~s/^#/#$hostpad/ if defined($headers[2]); } else { $headers[1]=~s/^#/#$hostpad/ if $subsys=~/[A-Y]/; $headers[2]=~s/ Name/${hostpad}Name / if $subsys=~/[DY]/; $headers[1]=~s/ PID/${hostpad} PID/ if $subsys=~/[Z]/; } # we now have a header line that starts with "# " and # normally we just replace it with '#Host', but in playback # mode we might have to deal with date/time as well. if ($playbackFlag) { my $dt=''; $dt.='Date ' if $options=~/d/; $dt.='Date ' if $options=~/D/; $dt.='Time ' if $timeFlag; my $dtlen=length($dt); $headers[-1]=~s/(.{$hostlen}) .{$dtlen}/${1} $dt/; } # now that we're padded in the right number of spaces for the hostname # replace the first 5 with the appropriate text $headers[-1]=~s/.{5}/#Host/; # this is a little tricky because in single line mode while our earlier # trick works with -oD and -od, it doesn't work with -oT in non-plot mode if (!$plotFlag && $options=~/T/) { for (my $i=0; $i<@headers; $i++) { if ($i==scalar(@headers)-1) { $headers[$i]=~s/#Host/#Host Time /; } else { $headers[$i]=~s/#/# /; } } } } else { # since -P is identical everywhere, let's just let the code above replace Date/Time with # 'host' and then we'll put it back. Also, the header is only 1 line. $headers[0]=~s/^.*?\[/#Host Date Time / if $plotFlag; } # Build an array of column positons for first char of each header column for use with bolding my $num=1; my $lastChar='x'; # doestn't really matter $headerPos[0]=1; # first entry always column 1 for (my $i=1; $i=$numCols || ($cols ne '' && $maxColNum>=$numCols)) { printf "%s specifies a column > max, which is %d\n", ($cols eq '') ? '-column': '-cols', $numCols-1; $ctrlCFlag=1; } return(!$ctrlCFlag); # noting the non-error state is 0 } sub printInterval { my $minSecs= shift; my $maxSecs= shift; my @value; my $unique=0; my $numFlag=1; return if $ctrlCFlag; # can't trust sample... ############################################# # S i n g l e L i n e F o r m a t ############################################# # Here we only select specific columns for printing... if ($cols ne '') { my $numCols=scalar(@columns); my $wider=$colwidth+2; # extra width for totals columns $somethingPrintedFlag=1; # H e a d e r if ($numLines==-1 || $maxLines!=0 && (++$numLines % $maxLines)==0) { # there's a blank line after the header, but a cr at end of last # line that would scolls header off, so the height is really 1 less $numLines+=2; printf "\n"; my $datetime=''; $datetime.='#Date Time ' if $options=~/D/; $datetime.='#Date Time ' if $options=~/d/; $datetime.='#Time ' if $options=~/T/; $datetime.=' ' if $options=~/m/; my $dtpad=' ' x length($datetime); # write name of column over each set of hostnames print $dtpad; for (my $i=0; $i<@columns; $i++) { for (my $j=0; $j<@hostnames; $j++) { # note that because of the way the header names are stored (which DO include # a timestamp), we need to skip printing date/timestamps when -o not specified my $col=($options=~/[TdD]/ || $plotFlag) ? $columns[$i]-1 : $columns[$i]; if ($j==0) { printf " %-${colwidth}s", @headernames ? $headernames[$col] : '???'; } else { printf " %${colwidth}s", ''; } } print ' '; # account for ' | ' } print "\n"; if (!$colnodetFlag) { print $datetime; for (my $i=0; $i<$numCols; $i++) { for (my $j=0; $j<@hostnames; $j++) { # if hostname contains ANY alpha chars, it's not an IP address so only use hostname piece my $hostname=($hostnames[$j]=~/[a-zA-Z]/) ? (split(/\./, $hostnames[$j]))[0] : $hostnames[$j]; my $len=length($hostname); my $start=($len-$colwidth>0) ? $len-$colwidth : 0; my $hostTrunc=substr($hostname, $start, $colwidth); printf " %${colwidth}s", $hostTrunc; } print ' | ' if $numCols>1 && $i!=$numCols-1; } } if ($colTotalFlag) { print ' | ' if !$colnodetFlag; for (my $i=0; $i<@columns; $i++) { my $col=($plotFlag || $options=~/[TdD]/) ? $columns[$i]-1 : $columns[$i]; printf " %${wider}s", @headernames ? $headernames[$col] : '???'; } } print "\n"; } # B o d y # We need the current time outside the timestamp printing section below, # mainly so we can be sure out timestamp tests later on use today's date. my ($seconds, $usecs)=Time::HiRes::gettimeofday(); my ($sec, $min, $hour, $day, $mon, $year)=localtime($seconds); # Preface with date/timestamp? But even if so, we use OUR date/time... if ($timeFlag) { if (!$playbackFlag) { my ($seconds, $usecs)=Time::HiRes::gettimeofday(); my ($sec, $min, $hour, $day, $mon, $year)=localtime($seconds); my $date=($options=~/d/) ? sprintf("%02d/%02d", $mon+1, $day) : sprintf("%d%02d%02d", $year+1900, $mon+1, $day); my $time=sprintf("%02d:%02d:%02d", $hour, $min, $sec); $time.=substr(sprintf(".%06d", $usecs),0,4) if $options=~/m/; printf "%s", ($options=~/[dD]/) ? "$date $time" : $time; } else { # always print time, date optional printf "%s%s", ($options=~/[dD]/)? "$date " : '', putSecs($minSecs); } } my @total; my $timeNow=time; for (my $i=0; $i<$numCols; $i++) { # we may end up adjusting column down in plot format my $col=(!$plotFlag) ? $columns[$i] : $columns[$i]-1; $total[$col]=0; for (my $j=0; $j<@hostnames; $j++) { # When running in real-time mode and data exists, make sure it isn't stale if (!$playbackFlag && defined($hostdata[$j])) { my $bufptr=$hostVars[$j]->{bufptr}; my $time=$hostVars[$j]->{lasttime}->[$bufptr]; my $hh=substr($time, 0, 2); my $mm=substr($time, 3, 2); my $ss=substr($time, 6); # NOTE - we're using current day/month/year my $timeSample=timelocal($ss, $mm, $hh, $day, $mon, $year); delete $hostdata[$j] if $maxDataAge<($timeNow-$timeSample); my $diff=$timeNow- $timeSample; #printf "realtime -- MaxAge: $maxDataAge Now: $timeNow Sample: $timeSample AGE: %d\n", $timeNow-$timeSample; } my $data; if (defined($hostdata[$j])) { $data=(split(/\s+/, $hostdata[$j]))[$col]; $data/=1000 if $col1Flag && !defined($colsNoDiv[$col]); $data/=1024 if $colKFlag && !defined($colsNoDiv[$col]); $data=int(10*log($data)/log(10)) if $colLogFlag && !defined($colsNoDiv[$col]) && $data>=1; $data=int($data); $total[$col]+=$data; # pretty rare... if ($data<0 && defined($negdataval)) { $total[$col]-=$data; # do not include in total $data=$negdataval; } } else { $data=$nodataval; } printf " %${colwidth}s", $data if !$colnodetFlag; } print ' | ' if !$colnodetFlag && $numCols>1 && $i!=$numCols-1; } if ($colTotalFlag) { print ' | ' if !$colnodetFlag; foreach my $column (@columns) { # This is clearly something weird! If I decrement $col instead of # doing what I'm doing it clobbers @columns my $col=(!$plotFlag) ? $column : $column-1; my $tot=$total[$col]; printf " %${wider}d", defined($tot) ? $tot : -1; } } print "\n"; @hostdata=(); return; } # B u i l d C o m m o n T i m e s t a m p # Build timestamp, noting it's different in playback vs real-time mode my $timestamp; if ($playbackFlag) { # if more than one time, report as a range $timestamp=putSecs($minSecs); $timestamp.=sprintf("-%s", putSecs($maxSecs)) if $maxSecs!=$minSecs; $timestamp.=" Reporting: $numReporting of $numHosts"; } else { $timestamp=localtime(time); $timestamp.=" Connected: $numReporting of $numHosts"; } ############################# # N o S o r t i n g ############################# if ($nosortFlag) { # Same as when sorting except no-bolding printLine("# $timestamp") if ($subsys=~/[a-z]/ || $numImports) && !$verbFlag; chomp $headers[0]; my $line=sprintf("$headers[0] %s", ($subsys=~/[A-Z]/ || $verbFlag) ? $timestamp : ''); printLine($line); printLine($headers[1]); printLine($headers[2]) if defined($headers[2]); foreach my $line (@printStack) { # also as below we need to remove hostname and put it back properly sized $line=~s/(^\S+)//; my $host=$1; printf "%-${hostlen}s$line\n", $host; $somethingPrintedFlag=1; } return; } ############################# # S o r t F o r m a t ############################# # when not freezing display, we clear out and repopulate sort hash each cycle undef %sort if !$freezeFlag; # only go through look when NOT freezing display $totalLines=0; for (my $i=0; !$freezeFlag && $i<@printStack; $i++) { # note in ealier versions of collectl an extra hostname was part of RECORD line my $line=$printStack[$i]; next if $line=~/^#|RECORD/; $totalLines++; $value[$i]=(split(/\s+/, $line))[$column]; $value[$i]='' if !defined($value[$i]); # can happen with optional fields, as in the case of plugins # N o n - I n t e g e r F i e l d s ( s a v e a f e w n a n o - s e c s ) # Since pure time stamps are fixed width, they'll sort fine as strings if ($subsys=~/E/) { $value[$i]=$1*100+$2 if $value[$i]=~/(\d+)\.(\S+)/; } elsif ($subsys=~/Y/ && $column==11) { # always a percentage $value[$i]=~/(\d+)\.(\d+)/; $value[$i]=$1*100+$2; } elsif ($subsys=~/Z/) { $value[$i]=$1*3600+$2*60+$3 if $value[$i]=~/(\d+):(\S+):(.*)/; # Timestamp $value[$i]=$1*60+$2 if $value[$i]=~/(\d+):(\S+)/; # AccuTime -> seconds $value[$i]=$1*100+$2 if $value[$i]=~/(\d+)\.(\S+)/; # SysT/UsrT -> jiffies } # handle time, noting we can have a LOT more than 24 hours if ($value[$i]=~/:/) { my ($hour, $mins, $secs)=split(/:/, $value[$i]); if ($mins=~/\./) # if < 1 hour, format is mm:ss.ff { $secs=$mins; $mins=$hour; $hour=0; } $value[$i]=$hour*3600+$mins*60+$secs; } # handle K, M, G if ($value[$i]=~s/^(\d+)([KMG])$/$1/) { my $mult=$2; $value[$i]*=$K if $mult eq 'K'; $value[$i]*=$M if $mult eq 'M'; $value[$i]*=$G if $mult eq 'G'; } #print "VAL: $value[$i]\n"; $numFlag=0 if $value[$i]!~/^[0-9.-]*$/; # contains non-numeric char next if $zeroFlag && $numFlag && $value[$i]==0; # Use hash to sort results, noting we could still have a string also assume # this is not perfect, but we need a unique descriminator to make sure duplicates # are dealt with so use as a fraction to retain numeric values. my $sortkey=sprintf("%s%s%d", $value[$i], ($value[$i]=~/\./) ? '' : '.', $unique++); # make all look like numbers $sort{$sortkey}=$i; #printf ">>$value[$i]<< KEY: $sortkey LINE: $printStack[$i]\n"; } my @keys; if ($numFlag) { @keys=($revFlag) ? (sort{$a <=> $b} keys %sort) : reverse sort{$a <=> $b} keys %sort; } else { @keys=($revFlag) ? (sort{$a cmp $b} keys %sort) : reverse sort{$a cmp $b} keys %sort; } # P r i n t H e a d e r s print "$Home" if $homeFlag; # we need a local copy so we can bold it w/o destroying original my @temp=@headers; $temp[-1]=~s/(.{$headerPos[$column]})(\S+)/$1$bold$2$noBold/ if $boldFlag; # no room in summary headers for timestamp so print above my $state=($freezeFlag) ? ' >>>column sorting disabled<<<' : ''; my $endLine=$startLine+$bodyLines-1; $endLine=$totalLines if $endLine>$totalLines; my $display=($startLine>1 || $endLine<$totalLines) ? " Displaying: lines $startLine thru $endLine out of $totalLines" : ''; printLine("# $timestamp$state$display") if ($subsys=~/[a-x]/ || $numImports) && !$verbFlag; chomp $temp[0]; my $line=sprintf("$temp[0] %s", ($subsys=~/[yA-Z]/ || $verbFlag) ? "$timestamp$display" : ''); printLine($line); # if colhelp need to insert in different locations if (!defined($temp[2])) { printLine($colhelp) if $colhelpFlag; printLine($temp[1]); } else { printLine($temp[1]); printLine($colhelp) if $colhelpFlag; printLine($temp[2]); } # P r i n t B o d y # always leave room for header and possible column help my $skip=$startLine; my $lineCount=scalar(@headers); $lineCount++ if ($subsys=~/[a-z]/ || $numImports) && !$verbFlag; # this format has 1 extra line $lineCount++ if $colhelpFlag; foreach my $key (@keys) { next if --$skip>0; # skip any lines if $startLine>1 my $i=$sort{$key}; # Remove the hostname from the line $printStack[$i]=~s/(^\S+)//; my $host=$1; # we never terminate last line in screen mode with a CR my $lastLine=(++$lineCount==$maxLines) ? 1 : 0; #print "$lineCount "; # and now print the line with the hostname padded accordingly but no \n yet # can't use printLine() because we don't always do the CR at the end $somethingPrintedFlag=1; $line=sprintf("%-${hostlen}s$printStack[$i]", $host); printLine($line, $lastLine); last if $lastLine; next; } print "\n" if !$homeFlag || $noEscapeFlag || $finalCr; # clear remainder of display and even if NOT in real-time mode there's nothing # below our current position. print "$Clr" if $homeFlag; } # in home mode, clean end of each line sub printLine { my $line=shift; my $last=shift; print $line; print $Cleol if $homeFlag; print "\n" if !$last; } sub getHeaders { my $access= shift; my $command=shift; # in case -c or -i included, they conflict with -showheader $command=~s/ -c\s*\S+/ /; $command=~s/ -i\s*:*\d+/ /; my $cmd="$access $Collectl $command --showcolheaders"; $cmd=~s/--fr\S+\s+\S*//; print "Command: '$cmd'\n" if $debug & 1; my $headers=`$cmd 2>&1`; # if an older version of collectl, we'll get an error that --showcolheaders # is an invalid switch so set a flag as a reminder. In the correct version, # even in local mode with wildcarded filenames, that's ok too because collectl # exits after processing very first one. $oldColFlag=($headers=~/showcolheaders/) ? 1 : 0; @headers=split(/\n/, $headers) if !$oldColFlag; # We can still get the headers but now we're going to back 1 or more # lines of data which we need to ignore and leave out of @headers if ($oldColFlag) { # if -p, we need to remove -p and the filespec so that we can add in -i & -c # which are incompatible if ($playbackFlag) { # Since we know the format of the filespec is '-p "filespec"' OR --pla* "filespec"' # look for something that matches either (one or more '-', whitespace and "filename") # and remove it all from the command my $meta=quotemeta($playbackFile); $command=~s/-+\S+\s+"$meta"//; $command=~s/-p\s*\S+|-pla\S+\s+\S+//; $command=~s/--fr\S+\s+\S+|--th\S+\s+\S+//g; } # since -i0 is special for secondary intervals, lets just make it small # enough to not be noticable my $tempInterval=.01; $tempInterval='.01:.01' if $subsys=~/[yYZ]/; $tempInterval='.01::.01' if $subsys=~/E/; # This will get data as well as the header so need to remove the data below $cmd="$access $Collectl $command -i$tempInterval -c1 --quiet"; print "Command: '$cmd'\n" if $debug & 1; $headers=`$cmd 2>&1`; foreach my $line (split(/\n/, $headers)) { next if $line=~/^\s*$/; last if $line!~/^#/; push @headers, $line; } } # in some rare cases, collectl can throw errors that ultimately result in showing up in # the header, so let's make a quick pass and remove any lines that don't begin with a '#' for (my $i=@headers-1; $i>=0; $i--) { splice(@headers, $i, 1) if $headers[$i]!~/^#/; } # we only have an error to report if not the missing -showcolheaders switch # since that simply means we need to get the header from the return data # BUT if an ssh problem we need to catch that too error("collectl: $headers") if $headers=~/^Error/m && $headers!~/showcolheaders/; error("ssh: $headers") if $headers=~/Connection refused/; return(@headers); } sub showHeaders { my $header=$headers[-1]; my @fields=split(/\s+/, $header); # assumes single line format but also useful for real-time AND playback mode $columns[0]=$column if $cols eq ''; my $selected=(@columns) ? '' : '(None Selected) '; print "\n>>> Headers $selected<<<\n"; my $maxWidth=0; foreach my $field (@fields) { $maxWidth=length($field) if length($field)>$maxWidth; } # Need to process columns in reverse since bolding shifts them to the right for (my $i=@columns-1; $i>=0; $i--) { my $col=$columns[$i]; if ($col>=@fields) { print "Invalid column number: $col\n"; next; } my $colname=$fields[$col]; my $colmeta=quotemeta($colname); $header=~s/(.{$headerPos[$col]})(\S+)/$1$bold$2$noBold/; } $headers[-1]=$header; foreach my $header (@headers) { print "$header\n"; } print "\n"; my $curcol=0; print ">>> Column Numbering <<<\n"; for (my $i=0; $i<@fields; $i++) { if (($curcol+$maxWidth+4)>$termWidth) { print "\n"; $curcol=0; } printf "%2d %-${maxWidth}s ", $i, $fields[$i]; $curcol+=$maxWidth+4; } print "\n"; } sub valTime { my $name=shift; my $time=shift; if (defined($time)) { error("invalid '$name' time") if $time!~/(\d{2}):(\d{2}):(\d{2})/; error("invalid '$name' hours") if $1>24; error("invalid '$name' mins") if $2>60; error("invalid '$name' secs") if $3>60; } } sub getSecs { my $time= shift; # The timestamp could be just the time but includes msec, so if longer that hh:mm:ss.xxx # it must have a date as well so remove it, remembering we only run against a single date. $time=(split(/ /, $time))[1] if length($time)>12; # Note if time contains msec, we preserve it my $secs=substr($time, 0, 2)*3600+substr($time, 3, 2)*60+substr($time, 6); return($secs); } sub checkTime { my $time=shift; my ($hour,$mins,$secs)=split(/:/, $time); $secs=0 if !defined($secs); # to make tests below work; return(0) if $hour>24 || $mins>59 || $secs > 59; return(0) if $hour!~/^\d+$/ || $mins!~/^\d+$/ || $secs!~/^\d+$/; return(1); } sub putSecs { my $seconds=shift; my $hours=int($seconds/3600); my $mins= int(($seconds-3600*$hours)/60); my $secs= $seconds-3600*$hours-60*$mins; my $msec= (split(/\./, $secs))[1]; my $timestamp=sprintf("%02d:%02d:%02d", $hours, $mins, $secs); $timestamp.=sprintf(".%03d", $msec) if defined($msec); return($timestamp); } sub sigInt { $ctrlCFlag=1; print "^C detected...\n" if $debug & 1; } sub error { # Be sure to reset terminal characteristics print "$Program: $_[0]\n"; Term::ReadKey::ReadMode(0) if $readkeyFlag; exit; } sub help { print <1; # If we ever run with a ':' in the interval, we need to be sure we're only looking at the main one. $graphiteColInt=(split(/:/, $interval))[0]; $graphiteInterval=$graphiteColInt if $graphiteInterval eq ''; # convert to the number of samples we want to send $graphiteSendCount=int($graphiteInterval/$graphiteColInt); error("graphite interval '$graphiteInterval' is not a multiple of '$graphiteColInt' seconds") if $graphiteColInt*$graphiteSendCount != $graphiteInterval; error("'min', 'max', 'avg' & 'tot' require graphite 'i' that is > collectl's -i") if $graphiteFlags && $graphiteSendCount==1; if ($graphiteAlignFlag) { my $div1=int(60/$graphiteColInt); my $div2=int($graphiteColInt/60); error("'align' requires collectl interval be a factor or multiple of 60 seconds") if ($graphiteColInt<=60 && $div1*$graphiteColInt!=60) || ($graphiteColInt>60 && $div2*60!=$graphiteColInt); error("'align' only makes sense when multiple samples/interval") if $graphiteInterval<=$graphiteColInt; error("'lexpr,align' requires -D or --align") if !$graphiteAlignFlag && !$daemonFlag; } error('randomize options requires a value') if !defined($graphiteRandomize); if ($graphiteRandomize ne '') { error("randomization require hires time module") if !$hiResFlag; error("randomization requires interval of at least 2 seconds") if $graphiteInterval<2; error("randomization value must be less than or equal to '" . ($graphiteInterval-1) . "' seconds") if $graphiteRandomize > $graphiteInterval-1; } # Since graphite DOES write over a socket but does not use -A, make sure the default # behavior for -f logs matches that of -A $rawtooFlag=1 if $filename ne '' && !$plotFlag; $graphiteMyHost=(!$graphiteFqdnFlag) ? `hostname` : `hostname -f`; chomp $graphiteMyHost; $graphiteMyHost =~ s/\./$graphiteEscape/g if $graphiteEscape ne ''; # O p e n S o c k e t $SIG{"PIPE"}=\&graphiteSigPipe; # socket comm errors # set fail count such that if first open fails, we'll report an error $graphiteSocketFailCount=$graphiteSocketFailMax-1; openTcpSocket(1); } # NOTE - this routine is almost an identical copy from gexpr. # Being lazy while making it easier to keep the 2 in sync, I left in the # second parameter in the sendData() calls which are ignored in the # modified version of sendData() itself, which prepends a hostname to the # variable name and add a timestamp to the socket call. In fact, I almost # just hacked up gexpr to make it deal with both ganglia and graphite. sub graphite { # if socket not even open and the first try of this interval, try again # NOTE - we're making sure socket is open every interval whether we're # reporting data or not... openTcpSocket() if !defined($graphiteSocket) && $graphiteIntTimeLast!=time; $graphiteIntTimeLast=time; return if !defined($graphiteSocket) && !($graphiteDebug & 8); # still not open? get out! # if not time to print and we're not doing min/max/avg/tot, there's nothing to do. # BUT always make sure time aligns to top of minute based on i= $graphiteCounter++; $graphiteOutputFlag=(($graphiteCounter % $graphiteSendCount) == 0) ? 1 : 0 if !$graphiteAlignFlag; $graphiteOutputFlag=(!(int($lastSecs[$rawPFlag]) % $graphiteInterval)) ? 1 : 0 if $graphiteAlignFlag; return if (!$graphiteOutputFlag && $graphiteFlags==0); # random sleep when r= option Time::HiRes::usleep(rand($graphiteRandomize)*1000000) if $graphiteRandomize ne ''; if ($graphiteSubsys=~/c/) { # CPU utilization is a % and we don't want to report fractions my $i=$NumCpus; sendData('cputotals.user', 'percent', $userP[$i]); sendData('cputotals.nice', 'percent', $niceP[$i]); sendData('cputotals.sys', 'percent', $sysP[$i]); sendData('cputotals.wait', 'percent', $waitP[$i]); sendData('cputotals.idle', 'percent', $idleP[$i]); sendData('cputotals.irq', 'percent', $irqP[$i]); sendData('cputotals.soft', 'percent', $softP[$i]); sendData('cputotals.steal','percent', $stealP[$i]); sendData('ctxint.ctx', 'switches/sec', $ctxt/$intSecs); sendData('ctxint.int', 'intrpts/sec', $intrpt/$intSecs); sendData('ctxint.proc', 'pcreates/sec', $proc/$intSecs); sendData('ctxint.runq', 'runqSize', $loadQue); # these are the ONLY fraction, noting they will print to 2 decimal places sendData('cpuload.avg1', 'loadAvg1', $loadAvg1, 2); sendData('cpuload.avg5', 'loadAvg5', $loadAvg5, 2); sendData('cpuload.avg15', 'loadAvg15', $loadAvg15, 2); } if ($graphiteSubsys=~/C/) { for (my $i=0; $i<$NumCpus; $i++) { sendData("cpuinfo.user.cpu$i", 'percent', $userP[$i]); sendData("cpuinfo.nice.cpu$i", 'percent', $niceP[$i]); sendData("cpuinfo.sys.cpu$i", 'percent', $sysP[$i]); sendData("cpuinfo.wait.cpu$i", 'percent', $waitP[$i]); sendData("cpuinfo.irq.cpu$i", 'percent', $irqP[$i]); sendData("cpuinfo.soft.cpu$i", 'percent', $softP[$i]); sendData("cpuinfo.steal.cpu$i", 'percent', $stealP[$i]); sendData("cpuinfo.idle.cpu$i", 'percent', $idleP[$i]); sendData("cpuinfo.intrpt.cpu$i",'percent', $intrptTot[$i]); } } if ($graphiteSubsys=~/d/) { sendData('disktotals.reads', 'reads/sec', $dskReadTot/$intSecs); sendData('disktotals.readkbs', 'readkbs/sec', $dskReadKBTot/$intSecs); sendData('disktotals.writes', 'writes/sec', $dskWriteTot/$intSecs); sendData('disktotals.writekbs', 'writekbs/sec', $dskWriteKBTot/$intSecs); } if ($graphiteSubsys=~/D/) { for (my $i=0; $i<@dskOrder; $i++) { # preserve display order but skip any disks not seen this interval $dskName=$dskOrder[$i]; next if !defined($dskSeen[$i]); next if ($dskFiltKeep eq '' && $dskName=~/$dskFiltIgnore/) || ($dskFiltKeep ne '' && $dskName!~/$dskFiltKeep/); sendData("diskinfo.reads.$dskName", 'reads/sec', $dskRead[$i]/$intSecs); sendData("diskinfo.readkbs.$dskName", 'readkbs/sec', $dskReadKB[$i]/$intSecs); sendData("diskinfo.writes.$dskName", 'writes/sec', $dskWrite[$i]/$intSecs); sendData("diskinfo.writekbs.$dskName", 'writekbs/sec', $dskWriteKB[$i]/$intSecs); sendData("diskinfo.rqst.$dskName", 'requests/sec', $dskRqst[$i]); sendData("diskinfo.qlen.$dskName", 'depth', $dskQueLen[$i]); sendData("diskinfo.wait.$dskName", 'msec', $dskWait[$i]); sendData("diskinfo.time.$dskName", 'msec', $dskSvcTime[$i]); sendData("diskinfo.util.$dskName", 'percent', $dskUtil[$i]); } } if ($graphiteSubsys=~/f/) { if ($nfsSFlag) { sendData('nfsinfo.SRead', 'SvrReads/sec', $nfsSReadsTot/$intSecs); sendData('nfsinfo.SWrite', 'SvrWrites/sec', $nfsSWritesTot/$intSecs); sendData('nfsinfo.Smeta', 'SvrMeta/sec', $nfsSMetaTot/$intSecs); sendData('nfsinfo.Scommit', 'SvrCommt/sec' , $nfsSCommitTot/$intSecs); } if ($nfsCFlag) { sendData('nfsinfo.CRead', 'CltReads/sec', $nfsCReadsTot/$intSecs); sendData('nfsinfo.CWrite', 'CltWrites/sec', $nfsCWritesTot/$intSecs); sendData('nfsinfo.Cmeta', 'CltMeta/sec', $nfsCMetaTot/$intSecs); sendData('nfsinfo.Ccommit', 'CltCommt/sec' , $nfsCCommitTot/$intSecs); } } if ($graphiteSubsys=~/i/) { sendData('inodeinfo.dentnum', 'dentrynum', $dentryNum); sendData('inodeinfo.dentunused', 'dentryunused', $dentryUnused); sendData('inodeinfo.fhandalloc', 'filesalloc', $filesAlloc); sendData('inodeinfo.fhandmpct', 'filesmax', $filesMax); sendData('inodeinfo.inodenum', 'inodeused', $inodeUsed); } if ($graphiteSubsys=~/l/) { if ($CltFlag) { sendData('lusclt.reads', 'reads/sec', $lustreCltReadTot/$intSecs); sendData('lusclt.readkbs', 'readkbs/sec', $lustreCltReadKBTot/$intSecs); sendData('lusclt.writes', 'writes/sec', $lustreCltWriteTot/$intSecs); sendData('lusclt.writekbs', 'writekbs/sec', $lustreCltWriteKBTot/$intSecs); sendData('lusclt.numfs', 'filesystems', $NumLustreFS); } if ($MdsFlag) { my $getattrPlus=$lustreMdsGetattr+$lustreMdsGetattrLock+$lustreMdsGetxattr; my $setattrPlus=$lustreMdsReintSetattr+$lustreMdsSetxattr; my $varName=($cfsVersion lt '1.6.5') ? 'reint' : 'unlink'; my $varVal= ($cfsVersion lt '1.6.5') ? $lustreMdsReint : $lustreMdsReintUnlink; sendData('lusmds.gattrP', 'gattrP/sec', $getattrPlus/$intSecs); sendData('lusmds.sattrP', 'sattrP/sec', $setattrPlus/$intSecs); sendData('lusmds.sync', 'sync/sec', $lustreMdsSync/$intSecs); sendData("lusmds.$varName", "$varName/sec", $varVal/$intSecs); } if ($OstFlag) { sendData('lusost.reads', 'reads/sec', $lustreReadOpsTot/$intSecs); sendData('lusost.readkbs', 'readkbs/sec', $lustreReadKBytesTot/$intSecs); sendData('lusost.writes', 'writes/sec', $lustreWriteOpsTot/$intSecs); sendData('lusost.writekbs', 'writekbs/sec', $lustreWriteKBytesTot/$intSecs); } } if ($graphiteSubsys=~/L/) { if ($CltFlag) { # Either report details by filesystem OR OST if ($lustOpts!~/O/) { for (my $i=0; $i<$NumLustreFS; $i++) { sendData("lusost.reads.$lustreCltFS[$i]", 'reads/sec', $lustreCltRead[$i]/$intSecs); sendData("lusost.readkbs.$lustreCltFS[$i]", 'readkbs/sec', $lustreCltReadKB[$i]/$intSecs); sendData("lusost.writes.$lustreCltFS[$i]", 'writes/sec', $lustreCltWrite[$i]/$intSecs); sendData("lusost.writekbs.$lustreCltFS[$i]", 'writekbs/sec', $lustreCltWriteKB[$i]/$intSecs); } } else { for (my $i=0; $i<$NumLustreCltOsts; $i++) { sendData("lusost.reads.$lustreCltOsts[$i]", 'reads/sec', $lustreCltLunRead[$i]/$intSecs); sendData("lusost.readkbs.$lustreCltOsts[$i]", 'readkbs/sec', $lustreCltLunReadKB[$i]/$intSecs); sendData("lusost.writes.$lustreCltOsts[$i]", 'writes/sec', $lustreCltLunWrite[$i]/$intSecs); sendData("lusost.writekbs.$lustreCltOsts[$i]", 'writekbs/sec', $lustreCltLunWriteKB[$i]/$intSecs); } } } if ($OstFlag) { for ($i=0; $i<$NumOst; $i++) { sendData("lusost.reads.$lustreOsts[$i]", 'reads/sec', $lustreReadOps[$i]/$intSecs); sendData("lusost.readkbs.$lustreOsts[$i]", 'readkbs/sec', $lustreReadKBytes[$i]/$intSecs); sendData("lusost.writes.$lustreOsts[$i]", 'writes/sec', $lustreWriteOps[$i]/$intSecs); sendData("lusost.writekbs.$lustreOsts[$i]", 'writekbs/sec', $lustreWriteKBytes[$i]/$intSecs); } } } if ($graphiteSubsys=~/m/) { sendData('meminfo.tot', 'kb', $memTot); sendData('meminfo.free', 'kb', $memFree); sendData('meminfo.shared', 'kb', $memShared); sendData('meminfo.buf', 'kb', $memBuf); sendData('meminfo.cached', 'kb', $memCached); sendData('meminfo.used', 'kb', $memUsed); sendData('meminfo.slab', 'kb', $memSlab); sendData('meminfo.map', 'kb', $memMap); sendData('meminfo.hugetot', 'kb', $memHugeTot); sendData('meminfo.hugefree', 'kb', $memHugeFree); sendData('meminfo.hugersvd', 'kb', $memHugeRsvd); sendData('swapinfo.total', 'kb', $swapTotal); sendData('swapinfo.free', 'kb', $swapFree); sendData('swapinfo.used', 'kb', $swapUsed); sendData('swapinfo.in', 'swaps/sec', $swapin/$intSecs); sendData('swapinfo.out', 'swaps/sec', $swapout/$intSecs); sendData('pageinfo.fault', 'faults/sec', $pagefault/$intSecs); sendData('pageinfo.majfault', 'majflt/sec', $pagemajfault/$intSecs); sendData('pageinfo.in', 'pages/sec', $pagein/$intSecs); sendData('pageinfo.out', 'pages/sec', $pageout/$intSecs); } if ($graphiteSubsys=~/M/) { for (my $i=0; $i<$CpuNodes; $i++) { foreach my $field ('used', 'free', 'slab', 'map', 'anon', 'lock', 'act', 'inact') { sendData("numainfo.$field.$i", 'kb', $numaMem[$i]->{$field}); } } } if ($graphiteSubsys=~/n/) { sendData('nettotals.kbin', 'kb/sec', $netRxKBTot/$intSecs); sendData('nettotals.pktin', 'kb/sec', $netRxPktTot/$intSecs); sendData('nettotals.kbout', 'kb/sec', $netTxKBTot/$intSecs); sendData('nettotals.pktout', 'kb/sec', $netTxPktTot/$intSecs); } if ($graphiteSubsys=~/N/) { for ($i=0; $i<@netOrder; $i++) { $netName=$netOrder[$i]; next if !defined($netSeen[$i]); next if ($netFiltKeep eq '' && $netName=~/$netFiltIgnore/) || ($netFiltKeep ne '' && $netName!~/$netFiltKeep/); next if $netName=~/lo|sit/; sendData("nettotals.kbin.$netName", 'kb/sec', $netRxKB[$i]/$intSecs); sendData("nettotals.pktin.$netName", 'kb/sec', $netRxPkt[$i]/$intSecs); sendData("nettotals.kbout.$netName", 'kb/sec', $netTxKB[$i]/$intSecs); sendData("nettotals.pktout.$netName", 'kb/sec', $netTxPkt[$i]/$intSecs); } } if ($graphiteSubsys=~/s/) { sendData("sockinfo.used", 'sockets', $sockUsed); sendData("sockinfo.tcp", 'sockets', $sockTcp); sendData("sockinfo.orphan",'sockets', $sockOrphan); sendData("sockinfo.tw", 'sockets', $sockTw); sendData("sockinfo.alloc", 'sockets', $sockAlloc); sendData("sockinfo.mem", 'sockets', $sockMem); sendData("sockinfo.udp", 'sockets', $sockUdp); sendData("sockinfo.raw", 'sockets', $sockRaw); sendData("sockinfo.frag", 'sockets', $sockFrag); sendData("sockinfo.fragm", 'sockets', $sockFragM); } if ($graphiteSubsys=~/t/) { sendData("tcpinfo.iperrs", 'num/sec', $ipErrors/$intSecs) if $tcpFilt=~/i/; sendData("tcpinfo.tcperrs", 'num/sec', $tcpErrors/$intSecs) if $tcpFilt=~/t/; sendData("tcpinfo.udperrs", 'num/sec', $udpErrors/$intSecs) if $tcpFilt=~/u/; sendData("tcpinfo.icmperrs", 'num/sec', $icmpErrors/$intSecs) if $tcpFilt=~/c/; sendData("tcpinfo.tcpxerrs", 'num/sec', $tcpExErrors/$intSecs) if $tcpFilt=~/T/; } if ($graphiteSubsys=~/x/i) { if ($NumXRails) { $kbInT= $elanRxKBTot; $pktInT= $elanRxTot; $kbOutT= $elanTxKBTot; $pktOutT=$elanTxTot; } if ($NumHCAs) { $kbInT= $ibRxKBTot; $pktInT= $ibRxTot; $kbOutT= $ibTxKBTot; $pktOutT=$ibTxTot; } sendData("iconnect.kbin", 'kb/sec', $kbInT/$intSecs); sendData("iconnect.pktin", 'pkt/sec', $pktInT/$intSecs); sendData("iconnect.kbout", 'kb/sec', $kbOutT/$intSecs); sendData("iconnect.pktout", 'pkt/sec', $pktOutT/$intSecs); } if ($graphiteSubsys=~/E/i) { foreach $key (sort keys %$ipmiData) { for (my $i=0; $i{$key}}); $i++) { my $name=$ipmiData->{$key}->[$i]->{name}; my $inst=($key!~/power/ && $ipmiData->{$key}->[$i]->{inst} ne '-1') ? $ipmiData->{$key}->[$i]->{inst} : ''; sendData("env.$name$inst", $name, $ipmiData->{$key}->[$i]->{value}, '%s'); } } } my (@names, @units, @vals); for (my $i=0; $i<$impNumMods; $i++) { &{$impPrintExport[$i]}('g', \@names, \@units, \@vals); } foreach (my $i=0; $i$graphiteDataMax{$name}; $graphiteDataTot{$name}+=$value if $graphiteAvgFlag || $graphiteTotFlag; } return('') if !$graphiteOutputFlag; # A c t u a l S e n d H a p p e n s H e r e # If doing min/max/avg, reset $value if ($graphiteFlags) { $value=$graphiteDataMin{$name} if $graphiteMinFlag; $value=$graphiteDataMax{$name} if $graphiteMaxFlag; $value=$graphiteDataTot{$name} if $graphiteTotFlag; $value=($graphiteDataTot{$name}/$graphiteCounter) if $graphiteAvgFlag; } # Always send send data if not CO mode, but if so only send when it has # indeed changed OR TTL about to expire my $valSentFlag=0; if (!$graphiteCOFlag || $value!=$graphiteDataLast{$name} || $graphiteTTL{$name}==1) { $valSentFlag=1; my $valString=(!defined($numpl)) ? sprintf('%d', $value) : sprintf("%.${numpl}f", $value); my $message=sprintf("$graphiteBefore$graphiteMyHost$graphitePost.$name $valString %d\n", $graphiteIntTimeLast); print $message if $graphiteDebug & 1; if (!($graphiteDebug & 8)) { my $bytes=syswrite($graphiteSocket, $message, length($message), 0); } $graphiteDataLast{$name}=$value; } # TTL only applies when in 'CO' mode if ($graphiteCOFlag) { $graphiteTTL{$name}-- if !$valSentFlag; $graphiteTTL{$name}=$graphiteTTL if $valSentFlag || $graphiteTTL{$name}==0; } } sub openTcpSocket { return if $graphiteDebug & 8; # don't open socket print "Opening Socket on $graphiteSockHost:$graphiteSockPort\n" if $graphiteDebug & 16; $graphiteSocket=new IO::Socket::INET( PeerAddr => $graphiteSockHost, PeerPort => $graphiteSockPort, Proto => 'tcp', Timeout => $graphiteTimeout); if (!defined($graphiteSocket)) { if (++$graphiteSocketFailCount==$graphiteSocketFailMax) { logmsg('E', "Could not create socket to $graphiteSockHost:$graphiteSockPort. Reason: $!"); $graphiteSocketFailCount=0; } } else { # we're printing to the term with d=16 because 'I' messages don't go there. my $message="Socket opened to graphite/carbon on $graphiteSockHost:$graphiteSockPort"; print "$message\n" if $graphiteDebug & 16; logmsg('I', $message); $graphiteSocketFailCount=0; } } # This catches the socket failure. Only problem is it doesn't happen until we try write # and as a result when we return the write fails with an undef on the socket variable. # Not really a big deal... sub graphiteSigPipe { undef $graphiteSocket; } sub help { my $text=<- this minimum value 4.3.0 Oct 3, 2017 - disable -sL, should have been done at same time -sl was 4.2.0 Jun 12, 2017 - Updated Plotfile docs to explain why you shouldn't leave off the -f when using -P [thanks Bayard] - added support for InfiniBand OPA V4 to read start from /sys instead of having to rely on perfquery for 64 bit counters. [thanks frederic] - removed previos bug introduced in V4.1.2 that was not properly calculating disk summaries. If you do have any raw files collected with this version you WILL be able to play them back properly or create and plot files with this version - although I'm stil leaving the lustre code in place because there is so much of it, I did remove cciss disk types from non-lustre code - finally removed col2tlviz from kit [thanks tom] 4.1.3 Apr 10, 2017 - throws 'unit var' building distro on openSUSE 4.1.2 Feb 27, 2017 - incorrectly requiring a + with --rawdskfilt to be at beginning - when added support for 64bit IB counters it looks like I was only saving 3 of the 4 values (loop only went to 3 instead of 4) around line 4403. [thanks seb] 4.1.1 Nov 2, 2016 - added packet loss and fast restransmissions to TCP Extended versbose output and renamed AkNoPy and PreAck to PurAck and HPAcks to be consistent with earlier versions [thanks Sophie] - add support for nvme disks [thanks fred] - it turns out some people re-enable lustre support for the sake of monitoring clients and to support that I had to add a check for the lustre-client module which is now in a differetn location than others [thanks fred] 4.1.0 Oct 7, 2016 - allow lexpr to pass formatting information for strings and numbers [thanks Guy] - modify the way misc.ph reports uptime to thousandths of a day [thanks, seb] - added OPA interface support for -sx reporting and cleaned up some very old code, like quadrics support! [thanks fred] 4.0.5 Apr 26, 2016 - rawdskfilt has been enhanced to allow a preceding + which will cause the following string to be appended to the default filter - needed to initialized anonH for numa stats [thanks andy] - added 'hed' to known ethernet devices, used by HP Helion 4.0.4 Jan 29, 2016 - if you try to playback a file with --stats and it has recorded processes or slabs, ignore them be removing from $subsys [thanks ghassen] - playback of process data with -P was not skipping first interval and so stats for first entry we not rates but rather raw numbers [thanks philippe] - change 'yikes' message to something more meaningful [thanks rob and laurence] - fixed problem with -sZ -P printing all 0s for thread count [thanks philippe] - added /usr/lib/systemd/system/collectl.service, per sourceforge help discussion on 2015-12-28 [thanks george] - added disk read/write wait timing for disk detail in terminal, plot and lexpr format [thanks bud] - new switch dskremap allows one to change disk names on the fly because in some cases such as etherd disks, the names are messy for use with other tools like ganlia [thanks gabriel] - removed access to disk name remapping file 4.0.3 July 2, 2015 - add AnonHuge memory to memory stats, both verbose and detailed as well as lexpr [thanks, fred] - if lexpr called with --import, throw an error - tighten divide-by-zero test for -sM because it looks like in some cases when misses >0 we're getting occasional errors. could hits be somehow negative? [thanks Robert] 4.0.2 May 27, 2015 - add /bin/bash to list of 'known shells' excluded from output with --procopt k - generalize ethernet network device name to include ALL names matching type 'p\dp' so we pick up p2p, p3p, p4p... [thanks Matt] - collect nr_shmem so we can track shared memory, apparently something I thought of but never acted on [thanks Christian] - do not include guest cpu metrics in totals since already accounted for in user time [thanks Philippe] 4.0.1 - change /usr/sbin to /usr/bin in init.d/collectl [thanks Ladislav] - pattern match to exclude partitions from disk summary is WRONG and we need to make sure name doesn't match cciss disks like c0d0! [thanks, Laurent] - changed help text for -retaddr to NOT use 'use' preceding -deb because rpmbuild gets confused ang tries to include '-deb' as a dependency [thanks dan] - include 'en' network devices in summary data [thanks homerl] - change buddyinfo to deal with less fields in /proc/buddyinfo as apparently there are not always 11 of them [thanks greg] - remove lustre from --showsubsys - removed 'known problem' with older versions of Time::HiRes in these release notes as that was quite a long time ago 4.0.0 Mar 9, 2015 - rare, but if selecting processes by parent pid or command name, it's possible when a new pid is seen that it's already exited by the time we try to read /proc/pid/stat, and it will return an undef value - finally cleaned up code to read speeds from /sys to use internal cat() to avoid misc 'Invalid Arg' errors. also fixed cat() to return null when nothing read. - added mlx5 as a new type of IB device name [thanks fred] - get lustre version a different way because format changed [thanks Jeff] also note that native lustre support in collectl is going away in summer of 2015! - lexpr was incorrectly reporting sys/user cpu details in the wrong place and as a result showed up before the timestamp in some cases - colmux has now been moved to the collectl package, release notes to be continued here going forward COLMUX CHANGES 5.0.0 - getHeader routine, removing -c/-i need to look for leading spaces when stipping switches in case a UUID in command string which can contain -c following by a hex string 4.9.2 - if ping fails, it still tries to ssh and fails, generating meaninlgess uninitialized variable errors - include 'ssh' in the error messages when check() fails (thanks KM5) 4.9.1 Mar 29, 2016 - assume collectl in same directory as colmux so you can install both on network share BUT if colmux ends in 'pl', it's probably me doing development/testing, so use collectl in /usr/bin. [thanks Paul] 4.9.0 Jan 06, 2016 - header name printing in single line mode not quite right for all combinations of switches - not trapping 'collectl not installed' errors and just returning the node isn't reachable - new switch -timerange will report warnings for any nodes found to differ from others by more than this number of seconds - added COMMUNICATIONS PROBLEMS section to man page and dropped section describing what changed in Version 3 4.8.3 Mar 9, 2015 - -oT -test wasn't including time column in help output whereas -od and -oD did [thanks, robbin] - new switch: -retaddr tells collectl to connect back to this address rather than the one colmux chooses by default which is default interface's addr - change in way return address is determined because RHEL 7 changed the format of the ifconfig output, changing Bcast to broadcast and dropping addr: [thanks hank] PRE-4.0 COLLECTL CHANGES 3.7.4-1 Sep 10, 2014 - typo in $netFilt (should have been $netFiltIgnore) preventing any network from being included in totals when --netfilt specified, but also made me rethink the way summaries are calculalted (see next item) - 2 more network types were discovered to be causing double counting in summaries, specifically vibr and vnets. since the exceptions occur at a far greater rate it was decided that rather than have a default list of those network types to exclude from the summaries, it makes far more sense to have a list with those that SHOULD be included as well as a mechanism for handling new summary types. This led to a reinterpretation of --netfilt. see the man page and Network.html for more details - removed references to XC, which is no longer supported - use abs to generate path to exe, simpler and cleaner [thanks Jeff] - extended the way formatit is loaded and changed the order that collectl.conf is discovered, noting it should only effect people actually modifying code or moving things to non-standard locations. it IS now documented in Startup and Initialization. [thanks again, Jeff] - set max lines to read for diskstats to 20000 for those with real large disk counts where 10000 wasn't enough [thanks jean-marc] - very rare, but if doing timing and no hires present, $microInterval gets set to zero and the division by the interval blows up - finally remembered to remove -G and --group which were replaced by --tworaw - clarified description of -s defaults in manpage as well as adding a pointer to the online documentation on file naming [thanks rob] - added additional error message for when files match selection string but none contain -date-time.raw [thanks rob] - add support for newer kernel CPU stats: guest, guest_nice - now that 2.4 kernels no longer supported, make sure CPU stats contain at least softirq field - change headers with % to PCT and remove space, also remove whitespace in interrupt detail output for type and devices columns [thanks rob] - new switch --ALL, selects summary and detail data for all subsystems [thanks rob] - new switch --full, selects --verbose, always includes RECORD separator and includes which subsystem data is being reported with each interval in the RECORD header to make parsing easier for rob [thanks rob] - if you DON'T collect tcp data but want to play it back, variables weren't initialized to 0 and you get uninit variable warnings - if disk name ends with a digit (can only happen when manually changing disk filtering in either collectl.conf or with --rawdskfilt, don't include in disk summary stats [thanks guy] - discovered a place where some numa counters go backwards! This MUST be a kernel bug but inserted code to mitigate and warn if it happens [thanks rob] - removed a line of code incorrectly initializing $HCAPosts[] because that is now a doubly indexed array [thanks Jeff] - discovered tap devices don't set default network speeds correctly and can cause 'bogus' messages so use default max - make 'Intrpt' header mixed case for CPU details, not all upper - new 3rd option for --top, allows one to display the top-n processes sorted by any column vertically, similar to playback mode, which in some cases can be very handy - if only 1 tcp subtype selected with --tcpfilt, was printing column header of ERR and I've no idea why. Changed it to TCP. - I didn't like --tcpfilt I by itself forcing --verbose so changed it to just being in the --tcpfilt string will force it and updated man page as well since --tcpfilt wasn't even documented in it - As warned I'm in the process of direct support for lustre and you should contact Peter Piela at TeraScala to get a copy of his lustre plugin. Therefore -sl is being removed as a default. To get collectl's native lustre support in daemon mode, you must add it to -s. Native support will be completely removed around the summer of 2015. 3.7.3-1 Apr 1, 2014 - had to change 'defined(@array)' to remove the 'defined() which is deprecated on RHEL7 3.7.2-2 Mar 31, 2014 - deal with process names in /proc/pid/stat that have embedded spaces in them (ugh!!!) [thanks, guy] - if HCA supports extended InfiniBand counters, read them from /sys if present, otherwise read them with perfquery {thanks fred and roy] - NOTE: error counters are not present when looked at extended counters and so will be reported as 0 - removed IbDupCheck from collectl.conf since perfquery monitoring always checks for dups - since extended counters do not need to be cleared, you can now run multiple copies of collectl when used - fixed bug in -sX because it was generating wrong stats and more amazingly nobody ever noticed - removed quadrics and myrinet code, indicating end of an error for proprietary interconnects, but without them we may not have gotten to 10Gb or IB as quickly - new switch: --cpufilt allows filtering on CPU number in the same way as dskfile and netfilt, primarily for use with high cpu counts. also honored when reporting interrupt stats - fixed typo for sorting on 'syst' [thanks stig] 3.7.2-1 Mar 5, 2014 - added optional groups & titles to ganglia export module [thanks peter] - removed extra '%s' in gexpr/senddata call for ipmi - an error trying to run dmidecode when it wasn't there was fixed some time after v3.6.0 but never made it into the release notes. [thanks seb] - added additional stats for disk details to graphite.ph [thanks bob] - changed format for AccumTim reporting for process data in prc file to be a single format. [thanks andy] - fixed a problem with --procanalyze when processing multiple raw files, it was not clearing the right data structures 3.7.1-1 Jan 7, 2014 - removed nvidia and sexpr from kit as warned over a year ago - lookup of uid:gid via grep needs trailing ':' in search or it will incorrectly match first entry with longer name string - changed deprecated use of defined(@$impiRemap) to defined($ipmiRemap) re: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=728760 - when rearranged logging code for E/F messages for syslog, ending up using a variable that wasn't yet defined - during playback of multiple files with different host names, disk/network indexing structures need to be reinitialized - when filtering network details, the Num in the output should start at 0 as opposed to its value when not filtered which left holes in numbering - was reporting swapped data as bytes when in fact it is reported by the kernel in pages. it now reports swap sizes correctly by multiplying by the correct page size. [thanks philippe] 3.6.9-1 Oct 18, 2013 - typo in network plot header loop resulted in infinite loop [thanks andy] - remove $int/secs from numa hit rate calc AND add more precision to its output [thanks stig] - need deal with a new process showing up with an existing pid, though rare it can happen when a high rate of process creations [thanks guy] 3.6.8-1 Jul 20, 2013 - new flag $exportComm must be set in gexpr/ganglia so that they won't generate an error if run without -f or -A [thanks tom] - new switch: --intfilt allows filtering of interrupts - always log messages of type F/E to syslog in daemon mode even if -m is not set [thanks again, tom] - wasn't dealing correctly with missing whitespace after network name in /proc/dev/net in initRecord() [thanks andy] - updated init.d script for suse per the maintainer's instructions [thanks tom] - extra spaces were being printed in plot mode for tpc stats - added entry to envrules.std to deal with intel Phi Co-Processor - debian init.d script now does 'exit 1' if status reports 'not running' - rawnetignore switch wasn't working correctly - found/fixed some subtle problems with --procanalyze as well as some cleanup - need to ignore first sample after initializing summary arrays - need to init summary hashes for thrutime and accumT because get uninit var in print routine is only a single process entry - found a typo in procAnalyze() to a $usecs which wasn't being used! - added error check to make sure --procanalyze with -P requires -s - added a little more debugging output for -d128 - discovered dynamic disk/network detail names for interactive mode were not being reported correctly. sounds a lot worse than it is because this is typically not done very often nor are disks/networks very dynamic except in large, virtualized environments such as clouds - add to list of devices to exlude from network summary data: tap, dp and nl, which are associated with openstack cinder. remember you can always add more to that list with --netfilt - $lastHour was never referenced and dayInit() called every time a log was created so fix logic to update $lastHour correctly AND call initDay() one time and do it before newLog() called. - closed a couple of file handles that were left open and reportedly causing some defunct processes with -sx. [thanks brian] - fixed bug in lustre stats recording [thanks roland] - clarified --showsubopts text about disk and network filters in that they apply to both summary and detail data output - fixed problem with --import and --stats - --statsopt a didn't work because when changed some internal logic missed changing a test of $timestampFlag to $timestampCounter[$rawPFlag] and so now $timestampCount can be removed entirely - clear $firstpass after 1st pass during playback - make sure filename initialized before calling loadConfig so if there is an error logsys() doesn't get an undefined var warning - to be safe, remove any quotes on net/dsk filters in case included by mistake in DaemonCommands string - tightened up tests to see if daemonized collectl already running - if no hiRes::Time, fudge the value of $microInterval based on -i [thanks Domi] - new --procOpt k, removes known shells from process listing with -sZ, currently set to /bin.sh, /usr/bin/perl, /usr/bin/python and python - fixed varname in lexpr: $debug should have been $lexDebug 3.6.7-1 Mar 8, 2013 - set network speed for vnets to '??' so they'll use $DefNetSpeed for bogus checks since the kernel hardcodes then to 10 which makes no sense [thanks rick] - code to print brief totals for -st wasn't include in a conditional so you'd always get extra columns of output when -st was NOT included - needed to initialize numaMem->{lock} for cases where user selects -sM and no data collected [thanks laurence] - added randomize [thanks robert] and align switches to graphite module and align switch only to gexpr.ph since gexpr uses current times in messages - added escape switch to graphite to allow one to change the dots in hostname - change to suse startup script to look in /usr/sbin instead of /usr/bin - added debug mask of 16 to lexpr to help test x= switch - can now use commas OR colons with lexpr,x= though commas preferred and colons may go away - added disk qlen, wait, svctime and util to lexpr - it was pointed out that in getExec() I'm initializing $oneline instead of $oneLine [thanks joe] - for debian init script, reverse logic for running start-stop-deamon with -test so it will work with buxybox too [thanks chris with help from troy] - new switch: --cpuopts z (the only option) which suppresses lines of idle activity from detailed stats 3.6.6-2 Dec 7, 2012 - when purging imported detail plot data, only do so if file had changed - when playing back multiple files, do NOT try to process a new file that has not yet seen the end of the current interval ($timestampCound==1) - fix SuSE init.d script, [thanks tom] 3.6.6-1 Nov 25, 2012 - last version broke lexpr and it wasn't correctly handling intervals other than 1 - do not set $dskChangeFlag to 4 when maj/min numbers change as it does not mean the stats changed - removed checks for major/minor disk numbers changing 3.6.5-2 Sept 27, 2012 - was not updating new major/minor numbers for a disk when they changed so got stuck in a loop which kept disk maj/min changed every interval - new -r option to purge older .log files, def=12 months - fixed DaemonCommands to preserver order so you can override anything by adding on the right side of it - new 'align' switch added to lexpr so default is NOT to align to whole min - for -sE do not convert negative temperatures [thanks kevin] - add error handling to 'print' in logmsg - vmstat needs to set $sameColsFlag to make header pagination work with -p - new graphite switch f, use fqdn for host [thanks Bryant] 3.6.5-1 Sept 10, 2012 - when lexpr called with x= it needs to set summary data flag in case nothing else is being reported, otherwise timestamps print after the data instead of before - lexpr typos: $tcpError, $udpError and $icmpError should not be singular - timestamp wasn't being updated for -sD because it was specified in $dskdetFormat - explicitly close logs before opening new ones in the hope that the occasionally corrputed file problems with gunzip will go away - tcp 'last' variables weren't correctly initialized and so was printing bad data on first line of output 3.6.4-2 August 28, 2012 - modified lexpr, gexpr and graphite such that when i= is used, to align sending on whole minute boundaries which is particularly useful with rrd 3.6.4-1 June 25, 2012 - merged snmp and tcp stats under -st and changed export routines to show summary error counts for -st. removed snmp.ph from kit. summaries (based on --tcpfilt) as does brief format - correctly deal with dynamic disks/networks - instead of pulling names from header, get them from raw file when discovered - simplify code that deals with changed disks, now that more cleanly handled - replace runtime calls to 'die' with calls to syslog - readS was still left in INSTALL! [thanks gavin] - added system boot time to header - new values for procopts s/S to show process start times - graphite.ph now prints loadavgs to 2 decimal places [thanks brandon] - extended lexpr,x= functionality to also call an init routine - initFormat now returns entire header! - if nothing returned from an import module on a printVerbose or printPlot call for detail data do not call printText() since it will screw up colmux and plot detail file with empty lines - new --rawdskignore AND --rawnetignore because sometimes easier to specify a pattern of things to ignore - removed restriction for running as root to get network speeds via ethtool by looking in /sys/devices now - slight change to way the disk queue depth is being calculated to provide better accuracy [thanks ken] - new --dskopts f reports disk details with some fractional values - always calculate disk details even when only doing -sd since a plugin might want to get at them - new graphite switch b, will cause output to be prefaced by a specified string [thanks justin] - slight change to s= functionality for lexpr, gexpr and graphite: no arguments will disable all but imported data, allowing you do log -s data to files sending over socket - need to give other routines (specifically --import) access to the lexpr interval by declaring it with 'our' - had to change the way lexpr/gexpr/graphite do min/max/avg since they were using a positional index to track intermediate values when clearly a hash is required for cases where not all intervals contain same elements - -P and --plotflag had different effects on $headerRepeat because prior to calling getopts I was peeking ahead for an ARG of -P and not including --plo [thanks devilized] - gexpr module has wrong units for network packets and with 'g' modes had to multiply kb counts by 1024 to convert to bytes, which is the units for these that ganglia uses [thanks, trevor] - clean up handling of missing ipmitool and root access [thanks trevor] 3.6.3-2 May 01, 2012 - finally remembered to remove readS from the kit [thanks joseba] - when filtering a process by the fill path with 'f', never include collectl itself - documented utime in manpage - if -i0 set $DefNetSpeed to 0 so we don't throw any 'bogus' network speed messages - new switches, --rawdiskfilt and --rawnetfilt, allow one to filter disks/nets at time of data collection so they never appear in raw file - added call to IntervalEnd() (if it exists) for --import - add option timeout to --address when connecting back to explicit address - moved code that deal with fractional intervals and !HiRes closer to other interval processing - added 'strict' to snmp module as well as 'help' option: snmp,h - fixed problems with --import - if --import is used to generate detail data with -f and -P not specified, collectl throws an error trying to close the detail log which clearly hasn't been created - when using interval other than the defaul AND -s-all, blank lines are printed for standard intervals which don't have imported data. this applied to brief, verbose AND detail data - added some more systems to envrules: Proliant SL230/SL250 Gen 8 and SE1170s 3.6.3-1 Mar 03, 2012 - fixed serious bug introduced a number of versions ago, which during playback of multiple files and specifying date/time caused collectl to continue reading first timestamp in each file and generating 'uninit variable' errors. not harmful, but inefficient and ugly! - added exit codes of 0/1 to all the exit points - moved help text for --stats from basic to extended - found $file=~/rawp/ near line 1440 clearing $1, $2 and $3 and so $prefix, $fileDate and $fileTime were not getting set correctly - clarified 'No files processed' message to be a little more explicit - broaden where collectl looks for lustre modules and also fixed a typo of $lustops to $lustOpts [thanks brian] - procAnalize incorrectly totaling fault totals instead if interval values [thanks andy] - limit sizes of -procfilt for username/command to 19 and 15 respectively - change order of ps command in loadPids() so they return max length fields for user/command - remove () from command field from /proc/pid/stat in pidNew() - optimize new pid processing with --procfilt - add new pids to pidSkip{} as appropriate - undef pidSkip{} whenever pids wrap - added hello.ph and graphite.ph to INSTALL - was incorrectly setting DiskFilterFlag to 1 all the time, even when not overridden in collectl.conf. while not a bug, it does cause a slight increase in overhead 3.6.2-1 Feb 28, 2012 - changed behavior of --runas to no longer require a change to /etc/init.d/collectl as it now uses /var/run to write collectl.pid into. this means to ineract with a non-root daemon, you still need to be root, which makes sense. 3.6.1-4 Feb 20, 2012 - removed --ssh switch, making detecting the parent going away the default behavior - added switch --nohup which will allows collectl to continue running if parent exits, which is more consistent with how --nohup itself works - in logmsg ONLY write to STDERR when attached to a terminal - serious problem when using --tworaw and a flush interval < that for the process data occurs because newer versions of zlib will fail if you try to flush to a file that has not been updated. since I don't know which version of zlib this started happening in and feel this is a relatively rare case, we're just rejecting this combination regardless of zlib version. I do have an email out to the zlib author and if I ever get to the bottom of this will be ble to relax this restriction. - use getimeofday() for timestamps in logmsg() - enhanced timing parameters when -i0 used. if specified user 2nd/3rd parameter as ratio to first making it possibily to measure loads of different rations other than 1:6:30. - discovered --import was missing from man pages and so added it - when playing back a file, set $verboseFlag if user specified --verbose but NEVER clear it - experimental import: snmp, see http://collectl.sourceforge.net/Snmp.html for details - printf in record() blows up if formatting chars in command string! [thanks mike] - added accumulated time as a --top sort option - changed formatting of accumulated time in process output to simply be hh:mm:ss or mm::ss.ss when less than an hour to be more in line with top - new swithes, --stats and --sumstats report stats in brief mode, the latter only summary data - during playback need to check $numProcessed before reporting none were processed - stats reporting logic wasn't processing 1st file, checking for $numProcessed>1 - removed -oA and replaced/extended functionality with --stats/--statopts - wasn't allowing --procopts playing back process data unless -sZ which was silly - subtle problem found: illegal 'last' in pidNew() because file disappeared between initial -e and trying to open it a few usecs later! can't exit a sub via last so changed to return(0) - our friends at OFED slightly changed the output of perfquery again [thanks frederic] 3.6.1-3 Jan 13, 2012 - added 'Reason: $!' to socket open failures - was not reporting interrupts in playback mode correctly - added $memAnon to lexpr - need to initialize $thisConfig when --lustopts set [thanks joe] - do not allow -f with gexpr and not one or both of -P/--rawtoo [thanks again, joe] - modify misc.ph to honor --showcolheader - modify lexpr, sexpr, gexpr to reject --showcolheader - if --showcolheader and --export (only works with vmstat for now), exit after first print call - remove restriction of not letting someone use --home with proc/slab data since they may want to apply filters and therefore not need more than a terminal full - new switch, --comment, allows a user to add a comment to the header - only read /proc/slabinfo IF slab monitoring requested AND if slab monitoring requested make sure /proc/slabinfo is readable (some admins only allow root access) - added code for slow proc read speed test on all system >= 32 CPUs except for RHEL6.2 and SLES 11 SP1 - if /sys/devices/system/node doesn't exist, set CpuNodes to 1 and disable -sM if set - fixed a lot of typos in a lot of docs - only set a socket failure handler with a socket is explicitly being opened - added 'h' option to gexpr, lexpr and sexpr - changed the way vmstat.ph decides to print its header - added new process option: x, which adds extended data to standard display - added Mlocked to verbose memory output as well as numa stats [for fred] - changed root name for cpu detail data in gexpr from cputotals to cpuinfo [thanks evan] - new export: graphite - normalization for CPU load reports jiffes instead of a percentage [thanks guy] - removed restriction against using -D as non-root user - as per https://bugzilla.redhat.com/show_bug.cgi?id=716825, non-root access to /proc/pid/io is now considered a security hole and so may not have read access! therefore we need to check to see is the io structure is readable before trying. if it isn't, zeros will be reported for non-readable structures - new procopts option I, disables collection of IOSTATS and reading of /proc/pid/io, a performance optimization at the expense of less process information - newswitch, --runas will cause collectl to run as a non-root daemon. this WILL require changes to the init.d script to work! be sure and read the man page - changed location where $doneFlag was getting cleared because stopping the daemon before initializtion was completed was causing the flag to be reset to 0 and not left at 1 - change sort limit for process counters from 6-9s to 9-9s [thanks stig] - added SUSE SP info to header - added debian and ubuntu release/distro info to header 3.6.0-3 Oct 17, 2011 - added dirty memory to lexpr 3.6.0-2 - support for numa - split anon pages into separate field in verbose mode as well as plot format - changed the memory header for -sm to SUMMARY rather than STATISTICS as the latter is currently used to indicate detail data, something that didn't exist for memory prior to numa support - added --xopts i to be consistent with --dskopts and --netopts. did NOT add such a switch for lustre - expanded error checking with perfquery to catch 'Failed to open' errors during initialization - discovered and removed reading of /proc/stat during -sm, which was there to support 2.4 kernel fields that have since been moved - changed collectl-debian start script to use /bin/sh instead of bash - removed ".B collectl" at start of collectl man page for debian/lintian compliance - made width of number of dentries in -si --verbose 7 instead of 6 digits wide 3.6.0-1 - do NOT call derived() when playing back rawp files or you'll get unit var for $memUsedLast. - need to include non-numeric type interrupt counts in interrupt totals - fixed a few problems with envronmental data and interpretation of --envopts - was not allowed to use with -P and only 'M' should have been restricted - was only honoring C/F when temp name started with Temp rather anywhere in string - was not correctly overriding default ipmi devices with user define options - fixed formatting/calculations for interactive memory subtotals generations when RETURN is typed in conjunction with --memopts R in brief mode - added new section to FAQ called 'gottchas' as a place to describe the perils of round-off error and normalization - when printing verbose data in import modules, need to clear $$lineref or the last line that mainline collectl reports (if any) will be repeated. this was fixed in hello.ph and atigpu.ph - new switch: --dskopts z, which when specified filters out disk details lines of all 0s - added switch examples to start scripts for clarification of use - added support for 'vd' disks [thanks gavin] - since kernel 2.6 compatible with 3.0 and 2.4 is sooo old, 2.4 support officially dropped! [thanks for the push, tony] - dropped support for collectl data generated by versions of collectl older than 2.0 - need to set $cpusEnabled to 0 when playing back interrupts in plot format w/o -sC, since the code that normally does that has already been executed and 'C' not yet added to $subsys. subtle... - filled in some missing ; in nvidia.ph in PrintPlot routine - fixed problem writing plot files with --import - added 'i' to both dskopts and netopts which will cause i/o sizes to be displayed in brief mode like --iosize except in this case independent of each other - do not include virtual networks in network summary [thanks hank] - in newLog() need to use gettimeofday for current time when hires::time is used otherwise you'll occasionally get a time 1 second earlier and new files names are wrong! [thanks hank] - exclude vlan from network totals to avoid duplicate counts [thanks andrey] - added 2 new fields to verbose cpu Summary Stats - Run Total and Blocked Total - added VmSwap to process/memory display 3.5.1-1 May 23, 2011 - change expression used to find CPU count in /sys since -P isn't necessarily built into all greps - instead of only getting the platform name when -sE, always try to get it - forgot to include 'T' as valid --envopts - check for failure of 'ipmitool sdr dump' command - need to ignore interval checks with --showcolhead and -sE - fix bug in checkSubSys() because while it could find newer subsys it couldn't find dropped ones - needed to clear nethostflag outside conditional that looks at prefix changed, which was incorrectly preventing consecutive files on the same day from being identified - added new routine pushmsg() that allowed one to stack up messages generated BEFORE 'beginning execution' message and then play them afterward, making log easier to read - changed several calls from logmsg() to pushmsg() - added support for files that cross midnight and ability to play them back in full see updated Playback.html - remove duplicate message in sexpr - have found an instance where the number of networks in the header didn't match the ones listed (some were dropped!) and so added a check to take care of this - renamed $active, $inactive and $dirty to $memAct, $memInact and $memDirty for better consistency with other memory variable names. Didn't bother with older V2.4 mem variables - new switch --memopts R: display memory info as changes/interval, similar to sar's -R switch - logic to clear '$sameColsFlag' in verbose mode and --import was wrong - --showcolheaders and -sE requires root - added support for nvidia driver V270.41.19 which has different output format. highly probable other versions will behave different as well 3.5.0-3 Feb 12, 2011 - expanded interrupt details to include non-numeric interrupts - new import module added for GPUs: nvidia.ph - added getExec type 0 to support new import - updated version of gexpr, with new switches to control using default ganglia variable names - bug fix: wasn't sending E and F types messages to syslog - wasn't initializing enough 'last' vars for latest nfs V4 - only allow -sT with -P or -f - added new switch --tworaw as a synonym for --group which makes more sense - if an imported module returned -1 in its init routine, disable it. return 1 for success - new --procopt: R causes real-time priorities to be displayed rather than RT, at the cost of 2 extra columns in the display [thanks lee] - added optional callback GetHeader to --import API, if not defined not called - change error handling when playing back files with no selected subsystems to be non-fatal, skip the file and continue processing - added dl585-g7 to envrules.txt - allow -s-all to remove ALL L subsystems when you only wanted --import data played back. I actually forgot to add this to release notes until V3.5.1 3.5.0-2 Jan 09, 2011 - turned utime into a mask, so we can control the granularity of micro-logging to include /proc time with/without process accesses 3.5.0-1 Jan 09, 2011 - renamed --showplotheaders to --showcolheader since it now applies to ALL headers for single header line output (will only show cpu for -scd --verbose) - fixed ALL verbose and detail output formats to include date/time headers - newer kernels added additional files to /sys/devices/system/cpu/ which messed up the way total CPUs were being calculated - added 2 new variableS to lexpr: cputotals.num and cputotals.total [thanks chris] - removed unused switch --pidfile from collectl -x - file processing push/pop code wasn't handling data change correctly - added new flag to show host changed since THAT was what was needed in 'consecutive' file identification processing - found problem with playing back multiple files with --thru for different hosts! needed to 'undef $newSeconds[$rawPFlag]' whenever hostname changed - new netopts values e - show errors in brief mode and explicit types everywhere else E - only print lines that have non-zero network errors in them - new diagnistic switch --utime, causes periodic micro-timestamps to be written into raw file at different points in time for finer grained measurements of operation times 3.4.4-3 Dec 9, 2010 - if -s during playback, at least ONE requested subsys must be in recorded file. if c recorded, C would cause error message because pattern match didn't have 'i' - add requirement for STDOUT to be connected to a terminal as a condition to call resize - change to collectl.conf - roll logs at exactly midnight, not 1 minute past - new --envopts value of T to truncate values to integers - ignore 'Fan Redundant' in env data for dl160g6 - if impi data field is blank, ignore it - fixed filtering of ipmi data AND renames 'c' option to 'p', for power - include THRD in -P format for processes - only turn off echo when in brief mode AND not playing back a file - if data collectl w/o HIRES and display request msec, set default to '000' instead of 0 - discovered only --ssh in help so removed -S 3.4.4-2 Nov 10, 2010 - base36() needs to do an int() on values <10 so their fraction not included in output string - reduced printing of headers for -sf --verbose to one call to printText() per line. otherwise one hostname prepended to each line of socket call. - fixed a problem with --procfilt C: it was trying to match whole process name rather than just the beginning of it [thanks gary] 3.4.4-1 Nov 09, 2010 - vmstat not handling date/time correctly, needed $dateTime[0] - need to call export module's init routine in playback mode - lustre 1.8.4 module location moved, check expanded [thanks Frederik] - new top sort options, pid and cpu, which don't make a lot of sense unless used with filters - do NOT include hostname in RECORD printing routine with -A - CPU verbose output should not right shift 1st header line with -oT - removed printing of extra '$line' at end of NFS DETAIL header - incorrectly setting recSubsys to [YZ] if user specifies --top even if -s specified too! They should be merged [thanks mats] - don't write to a socket if shutting down in which case $doneFlag set - don't report socket errors if not in server mode - added 'ProLiant DL160se G6' to envrules.std - disableSubsys should ONLY remove subsystems from export option 's=' was also clearing KFlag rather than LFlag [thanks chris] - new process sort option 'thread', sorts by thread count - changed start/stop in initd scripts from "$network +openibd" to "$all" so collectl will start after everyting else 3.4.3-3 August 19, 2010 - added --netfilt - very rare: if playing back CPU data but none collected, be sure to set $cpusEnabled to number of CPUs or else you'll get warning that one or more disabled - pattern match wrong for 'emcpower' disks [thanks lewis] - changed disk details to use 'cvt()' for reporting number of I/Os since DM numbers can be more than 4 digits - change --umask behavior. default is to do nothing unless explicity set AND user is 'root' - 2 new process sort fields: pid and cpu 3.4.3-2 August 16, 2010 - only look at $cpuDisabledFlag when processing CPU data - perfquery in OFED 1.5 can report warnings in its output stream which need to be ignored - if you try to playback a file and specify -s with no existing subsystems you'll get an error 3.4.3-1 August 02, 2010 - perfquery checks problems - version finding code not working correctly for ofed 1.5 - disabling -sl by mistake when perfquery not found - when errors detected during initialization not skipping subsequent checks 3.4.2-5 July 21, 2010 - changed INSTALL to only execute commands like chkconfig OR update-rc when $DESTDIR is / [thanks mike] 3.4.2-4 July 09, 2010 - added --dskfilt - added check for client-side OST uuid status 'DEACTIVATED', which seems to have showed up somewhere in the 1.6 timeframe but now sure when, thanks Heiko 3.4.2-3 June 25, 2010 - new memory field 'SUnreclaim' ONLY available in plot format and lexpr, just not enough room in terminal based output [thanks seb/fred] - misc now considers uptime, mhz and mounts as 'lightweight' counters and will sample every standard interval. Only logins, which is heavy-weight, will be sampled based on "i=" or the default of 60 seconds. Further, all lightweight samples will be returned every interval by lexpr whereas the heavy-weight ones will only be returned when sampled. In order to keep sexpr/gexpr formats constant (primarily because I don't know the effect of not doing so), they will report all counters every interval. - support for CPUs dynamically changing stats and going off/on-line - NOTE -- can't detect this during interrupt processing unless also monitoring CPU data, which people typically do anyways 3.4.2-2 June 15,2010 - not correctly handling discovery of new disks during playback - new feature: select process by UID range [thanks mark] - fixed bug in --procfile u/U processing while testing - added systot and usertot to lexpr to report totals for all system and user counters - changed error message processing when trying to playback a file with process when there isn't any or slabs data, etc. Rather than only show the message when -m, which could result in only a 'no files processed' message they will be unconditionally displayed as they should 3.4.2-1 May 21, 2010 - change default umask to 133 so that colplot can read files since webserver doesn't have privs - now that raw files are always compressed, the message about disabling it with -oz when no compression no longer makes sense so the message has been clarified to use --quiet with raw files and -oz with plot files - added README-WINDOWS to src tarball - cleaned up code that still expected [com] in $lustOpts instead of $lustreSvcs - more cleanup and bug fixes to INSTALL for debian support. thanks bernd - change to /bin/sh - do not use ANY explicit paths - minor changes to man pages, also for debian restrictions - wasn't reading NfsFilter correctly from header on playback - save perfquery version and use it to drive the skipping of 'field 13' rather than OFED versions which isn't always available - do not issue 'stty' if !PC, running on terminal and !background. missed a couple... 3.4.1-5 Mar 30, 2010 - new env options F/T converts temps to C or F 3.4.1-4 Mar 29, 2010 - new switch --whatsnew prints a summary of changes, a mini-release notes 3.4.1-3 Mar 23, 2010 - added Fusion-IO card to list of valid disks: fio - gexpr, lexpr and misc weren't honoring internal interval counter. - if a secondary/tertiary interval specified gexpr/lexpr didn't process it correctly - new switch: --envfilt allows you to specify filters - if you specify a " in DaemonCommands it gets passed along in the variable itself (not a problem for ') so we have to remove them - added new section 'Filters' to header. Added EnvFilt and moved NfsFilt to it - added new switch --envremap, which allows for renaming one or more output field names - added new feature switch to lexpr. if x=file is specified, that file will be loaded via require and a corresponding function name called after every print cycle, allowing one to do modified, custom output - new switch, --umask too control output file protections, see man umask. default is 0137 - new environmental option - if you include a device number with --envopts use THAT as a device number with -d when running ipmitool. for some systems the default devices is the slower one and this will have an impact on how fast ipmitool will run, possibly slowing down collectl - added 'use 5.008000', which should have probably been there years ago 3.4.1-2 Mar 16, 2010 - do now allow -oA in verbose mode - consolidated all code to disable -s subsystems when a conflict consolidated into disableSubsys which ALSO disables them in s= clause of --export - removed code to disable s= in all the ph export modules since now redundant - support for DESTDIR env variable in INSTALL/UNINSTALL [thanks Bernd] - Voltaire changes output of ofed_info so we have to process IB version slightly differently - change lustre message about needing -L to --lustsvc - changes to lexpr to include processes in run queue and to change prefix for proc creates/runs to 'proc' - changes fo misc.ph to ALWAYS report latest values in --export as well if 'a' paremeter, noting the default is to only report them when sampled. collection still defaults to 1 minute, overridable via 'i='. - since loading formatit.ph moved in a recent release, any calls to error() before it's loaded since it needs a routine internal to formatit. so now only call printText() from error() if formatit loaded. 3.4.1-1 Feb 22, 2010 - when printing plot data to files, wasn't putting headers on subsequent days' files 3.4.1-0 Jan 10, 2010 - make sure all major release settings in RELEASE-collectl have dates - remove blank line in all collectl start scripts right before 'END INIT INFO' since debian doesn't like it and we should be consistent 3.4.0-4 Jan 04, 2010 - updated envrules to include additional parsing rules for dl185 [thanks evan] - changed envrules header for dl585 G1 to G5 - if running an ofed >= 1.5, ignore 'CounterSelect2' field, which is right in the middle - send errors in getExec() to /dev/null because perfquery for > ofed 1.4 is braindead - was incorrectly using 256 to print IB debugging info instead of 2 3.4.0-3 Dec 14, 2009 - was not clearing right variable for CPU Detail Totals in sexpr.ph - fixed typo on QLogic HCA name from qlib to qib 3.4.0-2 Dec 13, 2009 - fixed typo of HugePages from HughPages [thanks Frederic] - fixed typo of 'openib' in start script LSB headers to 'openibd' - clarified help and man page for --all to indicate ONLY summary data will be reported, meaning NO process or detail data either 3.4.0-1 - restructure installation directories to be more standard - pid was not properly set for suse flush command 3.3.7-1 Nov 26, 2009 - added support for psv [polyserve] disks - added support for QLogic IB HCA - changes to INSTALL/UNINSTALL to handle gentoo and to restructure 'generic' distro processing for more flexibility in the future - 3 'standard' tools turned out not to be standard on gentoo and so: - limit checking for ethtool to writing to log file OR --showhead - if can't find lspci during -sx processing (and -sx IS a daemon default), disable -sx rather than throw a hard error. - only use dmidecode if -sE and if not found, set product name to 'Unknown' - creating /var/log/collectl in INSTALL so when installed this way the daemon writes logs into that directory instead of /var/log. this now matches what an RPM install does - if required include files can't be find in same directory as collectl, look in ReqDir which is initially set to /usr/share/collectl. This can be changed in collectl.conf - when exiting due to a fatal error, be sure to exit(1) and not just exit. - some process I/O counters found to be missing on CentOS 4.8 and so had to initialize to 0 in case not found - wasn't catching 'ioall' as invalid --top option 3.3.6-2 Sep 16, 2009 - if printing interrupts in brief mode, Cpu headers have to be changed as the number of cpus increase to 2 or 3 digits. [thanks Aron] 3.3.6-1 Aug 19, 2009 - changed error message about missing ethtool or lspci to just ethtool since missing lspci was already caught and reported - change location of collectl to /usr/bin in collectl-debian - make -P honor --hr which it currently does not [thanks giles] 3.3.5-4 Jul 20. 2009 - performance optimizations in dataAnalyze() - check process/slabs first whenever type is proc/slab. then in a separate clause look at subsys, thereby preventing parsing of type in other checks - always include test of subsys and do it first. found to be completely missing in lustre tests 3.3.5-3 Jul 17, 2009 - expanded meaning of -G to include slabs in 'rawp' files and to add 'g' to the Flags in the header, which also uncovered a number of bugs in the way batches of files for different hosts/dates were selected/handled even before slabs were added - drop support for -sy in brief mode since it really doesn't make much sense and if you do specify -sy it now forces verbose mode. see Slab documentation for more on playing back files generated with -G - if can't find an ofed utility AND rpm isn't on system, don't use it [thanks seb] - fixed some problems with -oA processing - removed a couple of error checks for switches that don't apply to a particular option since they are silently ignored already, making it easier to recall a command and add switches rather than having to remove those that don't apply - flush STDIN at startup in case someone typed extra CRs - added col2tlviz to kit - changes to --export processing broke --vmstat so moved call to setFlags() from right before playback code (which sets them itself) to right after call to $expName init routine - changed start scripts so that if you can specifice "start/restart {[extension] switches]" making easier to use/document. the old syntax which put the switches 1st meant you had to use "" if you didn't want to change them AND it didn't work with redhat's 'service' command 3.3.5-2 June 30, 2009 - added client.pl to examples/ and moved readS to /examples - added new switch --procstate, which allows you to limit process displays to only show those processes in one or more explicit states - incorrectly looking for 'LustreVersion' in header instead of 'CfsVersion' - when dropped SubOpts from header it broke pattern matching for subsys in header during playback - only calculate disk detail stats using CPU time when hires not available - when reporting a lustre server that is both an MDS and OST in brief mode, the 2nd line column headers are reversed for the types of server - removed obsolete switches (and warnings) -b, -e, -oP, -Y, -Z, -O, --subopts and -sLL - changed buddyinfo headers in verbose, plot and detail files being sure to include name/zone after : in details [thanks bayard] - use mergeSubsys() everywhere $userSubsys is used to reset value of $subsys - changed some instaces local variable $file to begin sorting out of local variables with the same name as the global one - if newlog starts and NOT an interval 2 interval, we don't record correct slab data so only clear $newRawSlabFlag (also renamed for clarification) during interval 2 3.3.5-1 June 19, 2009 - print load averages to 2 decimal places in plot format to match interactive format, which also required adding to lexpr and allowing it to deal with fractions [thanks stevef] - when disk order changes, error message was not reporting correct old maj/min numbers [thanks philippe] - code for including >ignore< stanza in envrules was causing unititialized variable errors - do not make sure ipmi available when running with --envtest - do not include ':' in lexpr network name string - re-enable sending startup and E/F messages to syslog 3.3.4-5 June 14, 2009 - old redhat distros don't recognize the -p switch on the start script so check first before using it 3.3.4-4 - make sure all LSB headers the same and only contain "$network +openib" for services so that collectl can run diskless and not require ntp 3.3.4-3 - fixed a few things with gexpr.ph - incorrectly used ' instead of " for detail counters variable names [thanks evan] - using wrong variable name for interrupt totals by CPU - changed way lustre OST names are parsed so that they handle embedded _s correctly - include LSB comments in start script headers - make SubsysCore in collectl.conf match real subsys core, even though just a comment 3.3.4-2 - changed all hardcoded occurances of /etc/collectl.conf to $configFile even in error messages, in case someone ran with -C [thanks philippe, for this and others] - added DiskMaxValue to collectl.conf, with default of -1. If >0 and a disk read/write rate is greater, reset all stats for this disk to 0 because something reset them and they're probably all bogus - moved code that initialized disk names to separate subroutine and added logic to save disk major/minor numbers so it can also be called later if disks are reordered - if DiskFilter specified in collectl.conf, use that string for disk filtering. if not specified continue to use separate if statements for tests in getProc() since they're slightly more efficient - if diskremap.ph exists, call internal remapDisk() routine when disk array is being initialized in initDisk() - newLog() was clearing $printHeaders instead of $headersPrinted - if playing back multiple files for same day with -sD and disk config changes, generate an error if not -ou because mixing the data in the same detail file will make it impossible to interpret - remove unused variable '$intFlag' 3.3.4-1 - added "ProLiant BL490c G6" to envrules as a 'standard' system since there is nothing special to do to parse the data - changed lustreMDS data for sexpr, lexpr and gexpr to be consistent with what is being reported. this wasn't done when lustre 1.6 support was added and should have been - fixed a typo in a lustre ost variable name in gexpr - don't just report ETH traffic in -sn brief mode, use same numbers as --verbose - added [ignore] stanza to envrules to allow ingoring anything that matches - only call loadEnvRule is -sE or debugging with --envtest - rewrote formatting code for g/G option because it wasn't working correctly for all situations 3.3.3-1 April 28, 2009 - forgot to include misc.ph in INSTALL 3.3.2.1 April 28, 2009 - screwed up $rootFlag and set to 0 after it was intialized correctly - fixed a couple of problems in INSTALL: added 'q' to gzip, added gexpr/envrules.std - added DL385G5 top envrules.std 3.3.1-10 April 27, 2009 - If root, add product name from 'dmidecode' to header - If !root, don't allow -sE because ipmitool will fail - When running -sE and no --envrules, look in 'envrules.std' for matching product rules - remove '.' from ipmi device names before applying parsing rules (screws up =~//) - change ipmi value of 'no reading' to -1 3.3.1-9 April 24, 2009 - When splitting off the daemon options, needed to include ',2' in the split or any *expr options get screwed up since they can have their own = - removed 'C' from -s in daemon command string since no longer needed 3.3.1-8 April 22, 2009 - renamed cmuextras to misc and renamed all variables accordingly - added inactive memory to lexpr - set default interval for 'misc.ph' to 60 seconds - a couple sets of data names in gexpr (for cpu and disk detail) were framed in single quotes and neede to use doubles - wrong variable name for $intrptTot - removed check for CPU data in presence of -sD since always there - -sL --lustopts O not properly parsing read/write bytes for CFS/SUN release - accidentally left some debugging code in that changed 'sd' disks to 'xvd' disks - added support for disk types of 'emcpower' - when running with -P and --rawtoo, collectl only write to the raw file but still created an empty prc file. Not it doesn't create that empty file. Also added reason to FAQ 3.3.1-7 - removed memhuge from cmuextras and added to core memory stats as well as gexpr, lexpr and sexpr - cleaned up a couple bugs in gexpr for i= processing - silently remove 'x' from 's=' in gexpr, lexpr and sexpr if not part of -s since it could have been disabled. this allows one to specify -sx as well as s=x without fear of getting a hard error from the *expr 3.3.1-6 - updated collectl-debian - added avg/min/max options to gexpr and lexpr - added import 'cmuextras.ph' to kit - removed line that set $message to 'unexpected perfquey error' which was clearly the wrong thing to be doing - in 3.2.1-6 added 'unexpected message' for perfquery failures that was wrong so removed it 3.3.1-5 Apr 09, 2009 - need to include command switches when changing process name - rewrite of all the start scripts (collectl, -generic, -debian and -suse) to support multiple daemons. In the process fixed a bug where debian wouldn't restart correctly. Added --restry 2 to start-stop-daemon and that seemed to fix it. - added type 4 to gedtExec() 3.3.1-4 Apr 06, 2009 - changed interface to sexpr and lexpr to more closely reflect gexpr dir/file naming, updated documentation and also changed lexpr to include only sending changes and handling TTL, mainly by stealing a lot of code from gexpr. - got rid of --expdir since that now handled with 'f=' option to all 3 - Had to move calling of ${export}Init to after initRecord() - Reporting incorrect variables for -si with all 'expr' routines. Had changed inode data a long time ago but apparently nobody uses 'expr' or -si or both - Needed to add -sC with -sj in sexpr - Added SwapFree to *expr even though it can be derived - new switch: --pname name, tells collectl to run as a different process name and use a different pid file with that name, which in conjunction with hacking up another init.d/collectl file will allow you to run a second instance of a daemon with a different name - reset $interval2SecsReal to 1 at same as $intereval2Secs when $i2Secs is 0 3.3.1-1 - when writing to plot files not including new headers on subsequent days - typo on major fault display string in lexpr.ph - if only logging plot detail data, was getting errors trying to print to unopened tab file - API for --import allows custom data collection, includes example hello.ph - had to allow for playing of file with blank Subsys field 3.2.1-6 March 03, 2009 - added --nfsopts z to filter lines of 0 in -sF mode - if collectl.conf is not writeable (eg in R/O filesystem), do not try to add IB paths dynamically - wrong logic for handling --nfsopts z - minor formatting changes to column positions in brief format and slab detail - wasn't including CPU type, speed, cores and siblings when converting to plot files - dropped inode info from header which was dropped from collectl awhile back - don't report open failures on nfs data since not always there - add support for XEN xvd disk types [thanks brian] 3.2.1-5 - incremented $nfsCommit instead of $nfsCommitTot - wasn't handling --nfsfilt correctly on playback of 3.2.1-4 files - don't set $sockFlag until after socket opened otherwise we can't report socket errors on terminal - if read & write fields for an nfs version are both zero assume not active and don't report in detail format - make nfs one of the default subsystems to collect data for - UNINSTALL wasn't removing link to start script on Debian - file selection logic for playback wasn't working correctly for multiple hosts with multiple files on same date - fixed preprocessPlayback() to deal with +/- when -s specified - fixed very subtle bug involving playing back multiple files for same day, the first having -sy and the second having -sY and -s overrided with -s+. caused print on opened filehandle 3.2.1-4 - always write client/server nfs data, using nfsc- and nfss- as prefix - added --nfsfilt to control details output - other misc stuff for support of ALL nfs data in raw file at once - dropped SubOpts and NfsOpts from header - added NfsFilt to header 3.2.1-3 - do now allow -O any more, must use --nfsopts and --lustopts - support for nfs V4. will now collect ALL data in /proc but still only report on 1 type either interactively or during playback, based on --nfsopts - only turn echo back on in error() if not a PC - only look for passwd file when recording/playing back process data - when playing back a file with a prefix in front of the host name and specifying multiple directories the destination was not being correctly resolved. 3.2.1-2 - only set $nfsOpts from header during playback if -s wasnt' specified OR it was and contained an 'f' - do not exit on broken pipe if "-A server" - --vmstat wasn't respecting --hr 0 or 1 3.2.1-1 - fixed a couple of bugs in INSTALL - init.d scripts and release notes copied to wrong directory - added Passwd to collectl.conf which if defined will point to default passwd file - changed the way /proc/vmstat read to get more data - added swap in/out and page faults to verbose memory display - added page faults to tab file - when running interactively over multiple days with -P, headers were not being including in subsequent files - changes some verbose summary headers to mix-cased 3.1.3-1 January 23, 2009 - output for '--procopts i' off by one column near accutim - if RETURN entered in brief mode before 1st interval reported, ignore it because we'll get a divide by 0 error - add +openibd to sles startup script so collectl will start after IB - fixed problem processing data from different time zones with new --from/-thru processing - fatal bug in playing back process data was missed before release - another fatal bug in --procanalyze. if looking at a process which were only there for a single interval, when calculating the % of cpu which takes into account the process lifetime (in this case 0), you get a divide by 0 error! the fix is to set the duration to 1. - not all files were opened if -s specified with + and --procanal/--slabinfo so added restriction against doing so - when playing back interrupt data in plot format you have to include -sC and this was too confusing so just silently (unless -m) adding it in and documenting in FAQ. - if --slabanal or --procanal but no -sY/Z, don't write to slb or prc file - allow --passwd for ALL situations since /etc/passwd not valid for NIS. also add to help output - selection of task by UID wasn't working - if uid can't be translated to a username, report the UID instead of ??? - fixed problem with divide by 0 errors if proc/slab analysis on multiple host/days 3.1.2-4 January 20, 2009 - bug fixes to handling of interval times - -sm --verbose needs 1 extra line with --top - when exiting from --top, move cursor to bottom of display - if playing back files for same host, don't reset header counters between them - ignore parent process when looking for duplicate instances of -sx [thanks kaya] 3.1.2-3 - support for allowing multiple clients to connect when in server mode - new documentation page: Genenerating Plottable Files - dropped support for data files generated by pre V1.3 version - when rolling logs, write a timestamp onto end of last file - in playback mode, if last timestamp of previous file matches first timestamp of new file, treat as contiguous data which results in no 'holes' in output stream 3.1.2-2 - check for nfsopts/playback in checkSubsysOpts was incorrectly looking at $plotFlag when it should have been looking at $playback - added Power Meter (ipmitool sdr type current) to env data when available - added all environmental data to lexpr and sexpr - added swap total/used to lexpr and sexpr - building incorrect symlinks to collectl-suse and collectl-debian in INSTALL - also wrong in collectl.spec - for IB monitoring, when couldn't find ofed_info was still trying to run it - need to intialize $interval2SecsReal to i2 first time when 0 - do NOT report process/slab data for the first interval with data in it 3.1.2-1 - more cleanup to INSTALL to give work read access to ARTISTIC, COPYING and GPL and set a few more protections on other files - chage to --from/--thru processing since error messages implied you could use dates too, so now you can. see man page or web documentation on playback for details 3.1.1-5 November 5, 2008 - two new fields added to slab data to show changes in total allocation between samples - when mixing --procanalyze with other subsystems, the non-process data wasn't getting written - new switch: --slabanalyze - in header for process data change 'faults are ...' to 'counters are ...' since we're now including I/O counters as well - wasn't printing process i/o headers with --procanalyze output. thanks Sven - when using -on, cpu % needs to divide by the real interval and not 1. thanks to Sven again! - added percent CPU utilization for process I/O format as well as prc and prcs files - also --procanalye now honors -om for msec level times - new process option: c. will include cpu times of any child processes (not threads) that have since died 3.1.1-4 October 29, 2008 - fixed a rounding problem with numbers between bewteen 1000M and 1024M that were getting printed as 0G (thanks Marko) - found error in conversions to K, M, etc where in some cases dividing by 1000 instead of 1024! Specifically: i/o sizes for disk, networks lustre and infiniband. Also lustre BRW states and some of the KBs fields processes for I/O and memory usage in the default format - the detailed memory format had it right. brief formats for: disk, network, quadrics, IB, lustre But also note these only come into play when values being reported exceed the default field widths and so ususally aren't tripped. - changed -oF to --procopts f - limit username in process display to 8 chars - make sure terminal echo turned on when falling through error() - if processes AccumTime>999 minutes in Process Summary, which is pretty rare, drop fractional seconds resulting in a different format - included sort-type in top process display - added ioall to --showtopopts menu - allow a numeric width to be included with --procopt w - removed restriction for considering -sl ambiguous and it will be assumed to mean lustre subsystem rather than a typo for -slab - misspelled RSys/WSys as RSYS/WSYS in procanalyze code 3.1.1 October 8, 2008 - missing leading space before 'sd' when determining disk names during initialization can result in wrong devices being listed with -sD and the header if they contain an embedded 'sd' - fixed problem with --slabopts s and or S with -P or in playback - fixed --top checking to verify ALL different I/O related types - generate an error message for mixing lustre client option O with M or R - allow printing detaild in --top mode, BUT user needs to control top part of display with --hr - playback of environmental data in plot format was printing values every interval rather than just during interval3 - some impi values may be '' and so report them as 0 to make sure gnuplot can handle it - make a few changes to INSTALL for debian-based installations - added 'AccuTime' to top I/O display format - new feature: top slabs! same switch as top processes, --top, but include names of slab column to sort by. see --showtopopts - filtering for old slabs now matches beginning of slab name just like slub - lustre OST/B data wasn't shifting headers when -oT included 3.1.0 September 3, 2008 - fixed 2 problems in INSTALL (thanks sebastien) - forgot to copy collect.conf to BINDIR/etc - forgot to set protections on collectl and inet.d script - cleaned up interval header printing - new feature: environmental monitoring via ipmitool - added environments to daemon defaults in collectl.conf - changed default interval3 monitoring interval to 2 minutes - 1st line of brief headers were 1 column too narrow for -t,m,h&f - when reading lustre MDS stats, don't tell getProc to skip over anything and save everything that starts with 'mds_' - extended lustre MDS data reporting - added I/O size to lustre Client/OST verbose/detail output and make it honor --iosize in brief mode - fixed a 1 column formatting shift when using -oT with some lustre client/ost output - fixed problem in which --hr 1 wasn't causing a new header every intereval for detail data of same type - increased size of KBBytes for lustre/interconnect data to 7 digits - very minor, but if user specified -s+l and --lustsvc and lustre disabled, only looking at subsys in checkSubsysOpts was generating an error so now it looks at '$userSubsys' too. - make $filename local in getSys() - another pair of switches: -X, --helpall lists ALL help making it possible to grep for something if you can't remember where it is - added --grep which allows printing all entries in raw file as timestamped lines. may mix with other playback switches - if filtering processes and no data initially collected, interval2Secs will be 0 first time and flt/sec will generate illegal division error so set i2 to 1 - was calling procAnalyze even if no data processed during an interval and as a result the last pid seen was being credited for that interval when it shouldn't have been - added parent pid to top i/o display - when looking for collectl procsses with -sx, be sure to ignore those instances where the command is 'ssh' - discovered '$lastInt2Secs' not getting reset when a new set of prefixes were being played back. This meant the denominator for first line of process/slab rate data would be wrong, but most people probably wouldn't have even seen this - a couple of fixes to correct --procanalyze reporting errors - removed extra space from --procopts i header. - significantly expanded --top sort types - "waiting for..." message will now honor --quiet - if more than one file played back with interrupt data AND latter one had more CPUs $intrptLast{}->[$cpu] wasn't getting getting initialized - allow commas in addition to spaces to separate files in 'playback' list - discovered a user app can modify contents of /proc/pid/cmdline and so cannot assume it will always end in null (see test of $cmd1) - change test of !$slubinfoFlab to $slabinoFlag since both may be missing 3.0.0-4 July 1, 2008 - major switch cleanup - completed cut-over from -O to xxxopts started in V2.6.4 by creating --nfsopts/--lustopts. -O kept around for backwards compatibility for nfs and lustre - a couple of switch changes to reduce complexity of -o and to clarify new meaning of handling time offsets and from/thru times for playback - replaced -ot with --home - replaced -oP to passwd - replaced -t/--timezone to --offsettime which now takes a time in seconds - replaced -b/-e to --from/--thru - new switch: --procanalyze will produce space separated process summary file (extension = prcs) that summaries process data for each unique process - big enhancement for --top. now when -s specified prints a scrolling window showing histories (-oT recommended but not required) if in brief OR verbose and all lines the same. note - this mode does NOT support detail subsystem data - also now identifying the parent who created the thread correctly - output format cleanup to make things more concise. no changes to plot format - changed order of columns for brief lustre client to be consistent with all other brief fields - changed order of I/O related verbose subsystems (disk, network, infiniband and lustre) to be more consistent with brief mode. in other words, all input stats preceed output stats and KBs preceed I/Os. NOTE - the order of the fields for plot data have not been touched. - reformatted help to make more readable (I hope) and fit in 80 columns too! - nfs got inadevetantly dropped as a valid subsystem in V2.6.3 and it's now back - wrong logic for verifying --procopts Z only allowed in -top mode - -oA was calling printMini1Counters() instead of printBriefCounters() - renamed printVerbose() to printTerm() because it makes more sense - when reading diskstats, make sure leading space before 'sd' as there is with $diskFilter in formatit.ph - fixed printing process data that got broken in plot format - made brief fields 1 column wider for lustre/infiniband in brief mode - lustre client names didn't make it into header with -sLL was specified using old option format - discovered the cvt() routine wasn't being used everywhere in printBrief() - found/fixed bug that's been there almost forever! if you play back a file recorded with -sZc but force collectl to only process -sZ it got fatal errors. Just goes to show how many combinations of conditions there really are! - fixed problem (I hope) where extra 'RECORD' separators were getting printed for empty intervals - fixed code that checks for another instance using IB since it wasn't dealing with -s using both + and - in it such as a daemon that has -s+YZ-x - couldn't play back process data on a PC without --passwd since /etc/passwd not there - wasn't dividing lustre client OST details by 1024 - discovered/fixed file header entry for switch options which only showed switches and not options. since a read-only field it shouldn't have hurt anything. 2.6.4 June 11, 2008 - fixed references to gzerror() to be in string context and so error text correct - miscellaneous documentation changes, mainly to support code changes - do not report /proc/pid open failures since they happen often enough to be a nuisance - changed order of options for --top to be type,num and if no num use the screen size - dropped --procio and --procmem replacing them with --procopts i and m - new options for --procopts: r and z - broke --vmstat when changed $cls to $clscr - removed inline code for vmstats since now down via vmstat.ph - collectl --top generating uninitialized variable message when blank line in /etc/passwd was fixed - wasn't honoring -ot for a single subsystem in --verbose mode (sheesh) and now it is - remove special code that removes collectl from --top display unless explictly requested. this will help make users more aware of collectl overhead - found at least one system that returned different format from 'resize' and so changed pattern match to make it more general 2.6.3 May 12, 2008 - added a README, INSTALL and UNINSTALL to the tarball to aid in manual installation and removal - changed --procopts to --procfilt and --slabopt to --slabfilt because I want to differentiate between options and filters. - enhanced socket error handling - new I/O output data for disks, networks and interconnect - i/o sizes will always be included in verbose output - new switch --iosize will add to brief displays - NOTE this data is not written to tab file since it can be derived - changes to -si (inode data) - removed info from header and will get it from proc instead - changed what is reported as some fields no longer valid and added 'number' of dentry noting that the values for 'unused', which increase as files are created makes no sense to me. also including file handles and inode counts in brief format. - as a result of adding -si to brief format, --all results in brief output for everything and so you'll need to include ---verbose to see verbose form - several new options for --procopts (thanks for the push Matt) s: will add read/write system calls to process stats t: will force collectl to look/display threads for ALL processes note that this can be a lot of overhead if there are a lot of threads on your system. All you threads can also be seen via 'ps -eLf' w: will make display wider by including arguments to process names - you can now request what to sort on for --top (cpu, io or page faults) - you can now include --procfilt with --top and it will only consider those processes that match for display - you can now use --top in playback mode 2.6.2 Apr 29, 2008 - forgot to rename call to resetMini1Counters() in collectl.pl - do NOT clear $miniDateTime when --export - added swapin/sec and swapout/sec to [MEM] data in tab file 2.6.1 Apr 24, 2008 - for perl version checks, use 2 digit minor/patch levels (thanks devzero) - report zlib and HiRes vesions in collectl -v output - grab ALL of /proc/meminfo for non-2.4 kernels even though we're not processing all of it - added the number of active lustre file systems seen by the client for lexpr/sexpr - was incorrectly restricting -A to -P or --export and that was wrong - allow --export in playback mode, making it possible to use --vmstat as wellx - extended --top to allow -s to be included along with proc stats. not that pretty but very useful - renamed printTerm() to printVerbose(), briefFormat() to printBrief() and other associated printMin1 routines - when changed syswrite() in writeData in last version, lost trailing /n and so put it back - ibcheck was redefining global $port so reopening socket in 'server' mode failed! - when --export added forgot to handle writeData() conditional correctly for process and slab data 2.6.0 Apr 03, 2008 - lustre - typo for lustre readahead 'not consecutive' variable! - added 2 new readahead variables for 'failed grab...' and 'wrong page...' - extended meaning of --headerrepeat and added a synonym of --hr for it a value of -1 means never display a header and 0 means only display it once, eliminating the need for -oH and -oh which are still supported but not shown in help. They will be eliminated in a future release. - bug in regx prevented gzclose on zipped tab file - cleaned up code (finally) that deals with displaying headers such as how often and when to skip entirely. this included dropping the -oh option which predates --verbose mode - if we can find 'resize', use it to get number of lines in display and use for default. This can still be overriden in collectl.conf - slight change to -ot behavior. only erase screen one time and then just overwrite what's there as it's softer on the eyes - fixed format error in 's-expr rate' for disk summary stats - modification to the way --custom is used to make it work with -f, -P, sockets and --rawtoo just like --sexpr. In fact, sexpr and vmstat code has been removed from formatit.ph and are now standalone include file named sexpr.ph and vmstat.ph respectively. See documentation for more details. - renamed --custom and --custdir to --export and --expdir to better reflect that the main purpose of these is to export data to a file or over a socket - had to move subsys/interval initialization code around to happen before calling --export - changed init.d file for SuSE as it couldn't detect collectl running when pids was 5 digits - added code to handle write of partial data over socket - based on popular demand, --all has been provided to show all summary stats. be sure to try it with -ot - added CPU number to process detail report. since this data has actually been collected all along, you can play back older raw file and now get them - changed default socket port to 2655 2.5.1 Mar 21, 2008 - added OFED 1.3 location for perfquery to collectl.conf - added new constant for ofed_info to collectl.conf - if can't find perfquery and/or ofed_info, ask rpm and if there update collectl.conf - redefined debug flag of 8 for lustre checks and leave 2 for interconnect only - adding more debugging details for infiniband initialization - changed daemon startup switches to include -sC. this will NOT generate any extra load on collectl but will cause CPU details to be generated in plot format which will include interrupts/cpu - make sure user have privileges to run perfquery - moved location of --sexpr with -sj check - for lustre versions < 1.6 don't limit BRW stats to being in directory with MNT in its name, which was certainly the case for HP-SFS - changed headers for lustre rpc buffers to 'P' rather than 'K' - changed directory on MDS that we look in for stats from .../MDT/mds/stats for older versions of lustre to ...MDS/mds/stats for versions >= 1.6 - need to check lustre version BEFORE calling lustreCheck() routines - in lustreCheckClt(), only do OST level tests if really a client 2.5.0 Feb 29, 2008 - if HCA present but IB stack not completely loaded, the cat of /sys/class/infiniband/* fails and reports error. redirecting STDERR supresses that error - added support for reporting interrupts by CPU - removed all but the collectl and collectl-data man pages, moving their content to the collectl web site at sourceforge AND to /opt/hp/collectl/docs - when installing is a brand new ROCKS environment /bin/rm not there yet so make conditional in %pre section of spec file [thanks roy] - modified spec file to add build level to release so I can keep the release number the same [thanks again, roy] 2.4.3 Feb 04, 2008 - cpu percentages calculations need to include iowait in denominator - memory stats: include AnonPages in mapped memory - fixed pattern match for IB device number to properly select mlx4_ adapter - was incorrectly including network bond stats with total network stats - wasn't printing date/time for --vmstat when requested - added IbDupCheckFlag to collectl.conf to allow disabling the check for duplicate instances both trying to read IB counters - removed a couple of spaces from default output so now <80 columns wide - when someone creates a new logical disk after collectl has been started, we need to add that disk to the list of valid disk names - changed the algorithm used to check for bogus network data. you can also disable these checks by setting DefNetMax to a negative value in collectl.conf 2.4.2 Jan 16, 2008 - changed purge algorithm to explicitly purge any files in the logging directory that match hostname, contain date/time stamp and do NOT end in 'log'. Before only raw files were purged and this was clearly not the intent. - on a lustre MDS, the mds_sync counter has moved as well as others added so pull more of them. even though the newer ones won't be reported on, they'll be in the 'raw' file for reference via tools like grep. - bogus network record processing changed as follows: - use double the reported network speed from the raw file header and if not known use the DefNetSpeed in collectl.conf which for now it 10000Mb. - was setting 10G network speeds in header wrong (wasn't multiplying by 1000). Therefore if we find a network with speed of '10' on older version multiply by 1000 as this only effects 'bogus' check. - added IB speeds to network interfaces in header BUT limited to OFED and assuming all devices running at same speed - looks like I broke old style slab reporting! the data is still collected correctly but won't print. now it will... - just discovered you couldn't print to terminal in plot format for -sY or -sZ, though I don't know why you would ever want to! in any event, they now call writeData() and so can... - if writing to non-compressed files and the flush time is less than the interval, just open the files with autoflushing enabled to save flushing overhead - plot format for [SOCK] data was sticking an extra $SEP in header after [SOCK]Tw 2.4.1 Jan 05, 2008 - corrected calculation for cpu times to include soft, irq and steal which was causing incorrect values to be reported for system with higher values in one more of these counters. If one replays any existing raw files the correct values will be produced. - added support for new SLUB slab allocator which results in different output format for slab reporting. - added 'Flags:' to header and use value if 's' to indicate a raw file contains new slab data and a 'i' to indicate that process data contains I/O counters - also added slab alias names as a block comment directly below main header because they are needed for playback on a different system or even on the same one in case the slab configuration has changed 2.4.0 Dec 23, 2007 - test for -f filename only checked for existing directory and if ended in / still created it but logfile created started with - - changed way lustre versions and services formatted in header because when used very old file where neither cfs or sfs version defined we get an extra CR which screws up gnuplot. I'm probably the only one who will even see this. - typo prevented OST BRW stats headers from printing properly which in turn messed up plotting - remove arg-list from process command string when displaying in terminal format - added process i/o stats for systems with that feature built into the kernel - include new flag --procio which functions silimarly to --procmem in that it show much more detail about stats 2.3.4 Dec 13, 2007 - added IB code of 0c06 for Mellanox IB Infinihost III card - expanded --sexpr behavior to allow sending over a socket or even to stdout and for consistency, logging to a local file is no longer required, though logging is certainly permitted and as a result more consistent. the collectl-logging man page has been modified to address this - forgot to add cpu irq, soft and steal to sexpr header and raw routines - removed -H as it's no longer needed given all the other data export options but preserved it's functionality by redefining the meaning of -d4 and -d32 2.3.3 Oct 16, 2007 - added 3 new fields to CPU values -- irq, soft and steal, which resulted in a change of order of the verbose output, the theory being it's more important to have the display in a readable order rather than just append the fields to the end. While at it the same was done to ALL most CPU output formats for consistency - incorrectly included !$plotFlag in test to set $zFlag and as a result 'flush' wasn't working for raw files - extended IB interface support to include ConnectX mlx4 - when printing 'brief' subtotals for infiniband, do not average errors since intermittent error rates may be too small to see so just print increasing totals - rare case - if doing slab/proc and we only specify int 2 AND less than default interval (say we do -i:.1), we need to force int1 to be int2 so we don't get error that int2), changed context switches and interrupts to show averages rather than totals - subtotals in brief format were including incorrect units (K,M,G) for some fields including disk, network and lustre - converted string to seach for 'vstat' from a single bin name to mulitple ones and updated processing accordingly - if specifying -l with o/m on lustre client was incorrectly checking for disk stats file which it should only do if -OD specified 1.7.4 Jun 05, 2006 - whether someone requests -sc or -sC, collect both types of stats. this makes it possible to play back either independent of what was specified at collection time making CPU stats consistent which others which also exhibit this property - -V now shows interactive and daemon defaults separately - include the name of the host running collectl in the logfile name for uniqueness - if corrupted file, include name in error message AND if multiple files being processed, skip remainder of currupted one and go on to the next one - changed sfs readhead in brief mode from hit precentage to show actualy hits/misses as these are so typically close to 05 or 100% you can miss the changes 1.7.3 May 23, 2006 - the plot format headers I thought got released in 1.7.2 didn't. they do in this release - if user had specified lustre options and lustre wasn't active, the playback fails because the header says there isn't any lustre data present but -s says this is. the fix is to recognize this condition during playback and simply ignore the options 1.7.2 May 17, 2006 - added qualifiers to plot-format headers to clarify which subsystem the data relates to - changed headers for lustre -OB output from K to P since the data being reported IS in pages, not KB - commented out the check that limited the data being reported for disk stats from 512KB to the full range of up to 2MB 1.7.1 May 03, 2006 - make sure -M1 and -oh turned off in -H mode - support for new /proc format with elan 5.20 - fixed a bug that prevented elan detail files from being written to - wasn't building correct link in /etc/init.d for suse/debian - changed rules for determining hyperthreading: true if siblings/cores==2 - added cpu vendor, speed, cores, sibings to common header - added support for 'official' stats from Voltaire so if /proc/voltaire/adaptor-mlx/stats exists, it looks in there. otherwise it tries to use /proc/voltaire/ib0/stats. note - since voltaire only supports a single HCA, if more than one is found only the first it looked at and an appropriate warning generated 1.7.0 Apr 21, 2006 - fixed bug in thread process reporting. although the treads numbers were getting reported correctly, all lines were reporting the parents stats - changes to -O to include lustre options broke some nfs options - also updated process man page to more clearly articulate what gets reported - need to make sure raw entries that start with 'Slab' are not coming from /proc/memory by making sure no ending ':' - there were problems running on SFS Admin node when -sl specified because of logic error dealing with monitoring a service not yet up 1.6.9 Apr 02, 2006 - when specifying -i:xxx in interactive mode, default monitoring inteval was not changed to 1. this effected displays for slabs - verbose mode wasn't working correctly 1.6.8 Mar 28, 2006 - fixed a few minor problems to ensure works with colplot/colgui - reformat FAQ and include in kit 1.6.7 Mar 20, 2006 - didn't quite get -s right for overriding that in file - made size of disk block buckets bigger but limiting the size if displayed values based on collectl.conf entry - OST detail detail data incorrectly written to CLT detail file (which usually isn't even opened!) - remove default of -oT, it complicates things... 1.6.6 Mar 07, 2006 - added Subsys to 'RECORDED' section of header (note spelling to preserve pattern patches for second occurance) and also updated Subsys data to correctly reflect values based on +/- in -s during playback - removed erroneous check that prevented running -sL -OB on lustre OSSs - forgot to write headers for OST detail data - remove test for ignoring -sL for mds data since we now CAN have some, but remember that it goes into the .blk file 1.6.5 Feb 28, 2006 - verify system is an mds/ost before trying to open block iostats file - when lustre subsystem disabled wasn't removing 'l' from $subsys - incorrect check for valid -M1 subsystem in setOutputFormat(). was invalidating any uppercase letters and show have only looked at $MiniSubsys - changed why $BinDir gets built to work with more complicated links - missing cvt() in print for lustre client summary - was incorrecty setting reportOstFlag to zero when -oD and supressing output on playback - changed -o^h to -o-h - removed support for -t as it's not really useful and anyone using it (and I doubt there any) don't really understand -P 1.6.4 Feb 25, 2006 - default settings for non-daemon mode are not '-i1 -scdn -M1 -oT'. furthermore if single subsystem -oh is also set. you to remove -oT or -oh, preface that switch with a ^ such as -o^T. - added -S switch, which means collectl was started remotely by something like 'ssh' or 'rsh'. at the end of each collection interval see if parent daemon when away and shut down... - when missing HiRes time module and -om or fractional intervals specified, ignore -om and round off interval - wrong error message about requiring HiRes for -om as it reports -P is required instead - removed -OB processing code from ministat processing - produced uninitialized errors if nothing selected during playback do to incorrect settings of -b or -e. now generates an error. - only write inode info to header if -i 1.6.2 Feb 01, 2006 - fixed bug in ELAN monitoring logic that was causing it to create new logfile every 15 minutes - fixed bug which generated unitialized variable in some cases when slab and non-slab data with -M1 - replaced 'cat' command in routines that check lustre/interconnect state to cat() which is more efficient - fixed bug that prevented raw file with no core data from being played back - added support for lustre client rpc and readahead stats. see man collectl-lustre for details 1.6.1 Jan 23, 2006 - changed using units for ELAN stats to KB to be consistent with IB stats - added new option to -M1 mode such that is a user types A, the averages will be displayed - added new switch (groan) -oA, which when playing back a file in -M1 mode will append Averages/Totals - only use ethtool to determine network speed if root - playback error reporting was not explict enough when wildcarded specification didn't match anything - moved around code that expands -s (based on +/-) so expanded value seen with -d4096 - infiniband support - requires special internal '/proc' module - summary data changed for quadrics (including plot format) so that it matches that of IB and any potential future interconnects to only show total errors and not individuals counts which IS available as details. This was necessary so that plotting tools can display interconnect data independent of its type. - incorrectrly determining kernel version if version containg 2.4 in interior of id string 1.5.8 Dec 14, 2005 - add additional copyrights to source and manpages 1.5.7 Dec 08, 2005 - added more conditional execution if not pcs for things like `date` and existence of lspci and ethtool - add setsid to deamon startup and reset terminal I/O channels to /dev/null in child - entire chunk of header line and data that followed not getting prepended with hostname when -A and -M1 - only complain about missing lspci or ethtool when not in playback mode - need to determine path to collectl using readlink() in case defined as a link - latest version expanded size of buffers to 8 so had to modify display - added a warning if new network device found after started, which will cause uninitialzed variable warning. this may go away when started at S99 - renames release notes so they can co-exist with release notes from other tools - modifications to 'spec' file - deamon will now start at S99 to give more devices a chance to initialize and be seen - install in /opt/hp/collect, link to it from /usr/sbin 1.5.6 Sep 23, 2005 - modified data collection for 2.6 memory to include everything up to Vmalloc - extended -sm reporting to include slab and mapped memory for all outputs - was not printing header for first time for -M3 - check for non-existant /proc for inode processing - remove debug check for open proc error messages - fixed bug in collectl.conf. setting Interval2 overwrote Interval. - debug flag of 4096 prints header - make sure under -i0 sampling that the number of intervals for processes and environmentals are proportional to their default timings - removed 'cciss/' from disk names in file headers - moved socket handling code to the front so anyone who calls us and gets an error or does a 'collectl -v' will see the socket open and then close so it can then cleanly exit. - added Swap size to header - if ethtool present, record 'eth' speeds in header - generate an error if lspci not on system. - warn that no eth speeds in header it no ethtool on system - inserted inter-file marker into PRC files so can differentiate between processes with same pid/name from different logs - printing wrong header for lustre client details in plot formatit - allow printing process data with date/time stamps so when you grep the output you can see them - fixed erroneous message "Looks like 0 exited so not looking for new threads" which should not be reported when value is 0. 1.5.5 Sep 01, 2005 - added exception processing for lustre client summary data 1.5.4 Aug 31, 2005 - added exception reporting for lustre KB/sec read/write for OSS and Reints/sec on an MDS. For now it's NOT writing to an exception file as I'm thinking of removing that capability since I don't believe it is used very often AND there are too many files written already! 1.5.3 Aug 29, 2005 - serious bug fixed. when playing back any files and producing process or slab files (.prc or .slb), ALL other data was being skipping for that period. The workaround is if you need both process/slab and other data you'll need to do it in 2 batches! - very minor (but annoying). if interval ends in exactly .000 seconds, $seconds is treated as an integer and splitting on '.' provided an undefined $usecs which in turn generates an uninit var at line 2456 - skip over IB support since net yet fully baked - the line RECORD... was leaking through without a hostname prefix with -A - flush mechanism which was incorrectly flushing every interval - internal coding thing: cleaned up reporting to be driven off $subsys and collection driven off the 'flags'. this means that doing -sl -L in record mode will not display lustre stats on a non-lustre system as is already done in playback mode - check for location of lustre modules expanded to support newer releases - set ALL output files to autoflush on write when printing in plot format on the terminal. not doing so was causing collectl to lock up until the output buffer filled when called from a script with -oh or -oH - InfiniBand Support - changed method for determining lustre driver installed. now just looks for anything named 'lustre' in /lib/modules - updated man page to note that some subsystems, specifically d, l, n, t, x, y thought recorded in summary mode CAN be played back in detail mode and visa-versa - added a restriction that you can't play back a file records with -sd using -sD if it wasn't originally recorded using -sc as well (need times in jiffies for some calculations) - added additional parameter to collectl.conf to point to additional library paths (primarily for development but may prove useful later on) - made some changes to header - added seconds associated with timestamp of filename timezone - renamed 'Daemon Options' to 'DaemonOpts' - moved some fields closer together - added a preamble for plot format files that show original collectl version and switches - added HiRes flag state of original collection so playback knows - Another switch -T to control time zone conversions on playback, which is required for dealing with files that don't have enough info in header to to autmatic conversion of times 1.5.2 May 23, 2005 - removed extra comma in printf statement at line 3752 - warning about -sL and MDS was missing 'if ...' modifier - need to flush buffers before creating new logs - not declaring $datetime as 'local' was generating unit vars with -M1 - the pattern match on sd disks wasn't set to pick up disks with 2 alpha chars after the initial 'sd'. it does now. - print error messages to terminal via STDERR - moved location of slab/proc initialization to record mode as it was causing problems on PCs. - when no process data exists, print 0 instead of '-' 1.5.1 May 03, 2005 - updated several man pages and created a new one: collectl-lustre - added memory size to file headers - strip any quotes from playback file name that may have leaked in - if the destination directory doesn't exist, create it - some Linux specific code was moved/modified to facilitate running on a pc. - initialization of $MyHost and $OS. - only call syslog on linux and so we have to do a 'require', not 'use' - a number of changes to support dynamic identification of lustre configuration changes - change lustre related information in file headers - no longer an error to request -sl when no services present - printing lustre client data in plot format was missing 2 fields - replaced common '.lus' detail file with specific ones of '.ost' and '.clt' - KNOWN PROBLEM identified with lustre client data collected using -sLL with older versions - KNOWN PROBLEM identified with lustre client size read/write I/O counts - modified recognition of quadrics such that if /proc structures are present but driver not loaded, a warning rather than an error followed by an abort occurs. - new switch: -V to print operational defaults - storing kernel version rather than whole o/s name in header - only include SCSI info in header if non-blank - do not print header when plot output directed to terminal - added '[HYPER]' to cpu display header when cpu hyper-threading on - expanded scope of -m to print playback processing messages on terminal - problem writing to syslog on some systems and so that function currently disabled - made width of network name dynamic to account for IB names 1.3.2 Mar 13, 2006 - for M3 reporting, headers weren't printing - /proc/slabinfo went to V2.1 with no format change, so had to extend the check to include anything in 2.* 1.3.1 Jan 18, 2005 - problem corrected with slabs/pagesize during playback - when a new slab was created after collectl started, during playback an 'uninialized variable' message was being generated. - fixed problem with -b/-e when date specified 1.3.0 Jan 18, 2005 - write startup/shutdown messges to /var/log/messages when writing to files (too much of a nuisance to do for all invocations). Write ALL fatal errors to messages. This is in addition to the normal logging that gets written to collectl's own message log in the logging directory, which is only written to when writing to a file. - installation no longer saves old startup script since user customizations haven't been there since introduction of /etc/collectl.conf - support for SuSE distro based installs - support for Debian installs as long as one converts rpm to deb - new man page for process monitoring and how the math works - moved data definitions and examples to their own man pages - make process sort order ascending numeric - added '+' option to -Z switches which results in threads being displayed - see man page for restrictions - -Z enhanced to allow filename to be specified as an alternative. see -h - added startup switches to log header (don't know why I didn't think of earlier!) - expected for lowercase hostname in playback file. this was a bug! - removed partition specific code and now share 2.6 /proc/diskstats - removed -spP and collectl uses nows /proc/partitions if it contains data - removed -oP as it was never fully debugged and buggy - slight format change of disk stats output to make consistent across 2.4/2.6 - fixed bug when specifying a playback file wildcarded spec that didn't start with full hostname - fixed bug when playing back multiple files from multiple dates with -b/-e switches - found bug in the way linux handles reading /proc - may read past end of existing structure) and so changed handling of /proc/pid/stat to only read 1 line - make sure collectl will run on windows in playback mode by changing some linux specific code 1.2.5-4 Dec 14, 2004 - modified processing of /proc/net/netstat to accomdate slightly different format with debian 2.6 kernel (leading blank line not expected!) - removed 'partition' as valid subsystem for 2.6 kernels as that data now comes from /proc/diskstats - wasn't recognizing 'sd' devices in /proc/diskstats 1.2.5-3 Dec 14, 2004 - found latent bug - thanks tom - in process processing code. when not using -Z new processes weren't discovered. now they are! - had to move 'interval' processing code to section before alarm set - wan't honoring -t when printing process data - on systems where lustre fs was created outside of 'sfs' environment. MDS directories had different name format so pattern match had to change - not all MDS read/write fields always defined and so conditional prints req'd - changed single quotes in man page to \` so at least something would print - removed 'C' as a valid option which was moved to -O a number of versions ago 1.2.5-2 Dec 14, 2004 - fixed a problem in 1.2.5 which wasn't properly handling return status from gzflush() - enhanced error messages when invalid subsystems are specified with '-' - removed restriction against adding core subsystems with '+' - enhanced -sLL processing to deal with /proc with data in different positions - bumped indexes for getProc() 12/13 to 13/14 to keep Lustre processing together - if flush timer set AND using -H, only execute command every flush interval - flush timer was off by 1 second (tested it using > instead of >=) 1.2.5 Nov 03, 2005 - The 'subtotal' feature of -M1 causes cron based scripts to blow up because the 'M1' code wants to check for terminal I/O. This feature has now been disabled for environments with no terminal. - Found a corrupted compressed raw file! Closer inspection of collectl showed no error handling for gzflush() errors and limited error handling for gzwrite errors. Changed to close/recreate new logs on zlib errors or abort if recovery impossible. As a safety net will kill itself if there are ever more than $ZlibMaxErrors in a single day - currently setting that value to 20 but can be overridden in collectl.conf. - Added code to make sure valid time marker in raw file and if not to declare file corrupted and exit. This is for the case noted above where a compresses file had bad data in it. Still a mystert how this happened but I'm hoping it was related to gzflush and so now shouldn't happen again. 1.2.4-3 Oct 19, 2004 - _SC_PAGESIZE posix variable not supported on 5.6 releases of perl and results in uninitialized variable warning and errors during SLAB reporting. Added code to force pagesize to 4096 for IA32 and 16384 for other architectures which is not completely correct for all cases. 1.2.4-2 Oct 13, 2004 - using wrong pagesize for slab calculations on IA64. use sysconf() and added PageSize to logfile headers - writes slab version into header instead of whole version line and drive format off that number. report errors for unsupported versions. - removed a line of code that couldn't execute if daemon found to be already running - found reference to cvt2() which was removed in last version - ministats with a -c leaving echo turned off! - changed width of memory reporting fields for slabs and memory stats from 6 to 7 so more significant digits retained when displaying a number like 123456K which in 6 columns displayed as 123M. Not sure if it should be done to other fields as well because it does effect screen real estate. Remember to use -w to get rid of K/M/G - removed unused c and s from -O - added code to filter slabs and remove those with no allocations, -Os, or those with no change in slab activity since last interval, -OS. NOTE - slab objects change all the time and so including them in the filter is pointless - added 't' to core variables to be monitored since it only takes about an extra cpu second per 8640 samples. - added 'sockets' to ministats. this means one can now simply do 'collectl -M1' and get ALL ministats for default variables (of course you'll need a VERY WIDE window) 1.2.4 Oct 07, 2004 - added SLABS! - add -oF tell collectl to use cumulativetotals for Maj/Min faults - fix printing of process data in plot format - found a couple of inconsistencies when reporting in 'K/M/G' format. made sure bytes /1024 and counts/1000 - fixed uninitialize variable with -sP -p xxx - echo not turned back on with -M1 and -p - fixed bug with -p -f -P - changed $ProcInterval/$EnvInterval names to $Interval2/3 so that slab interval processing could be slipped into $interval2. this is really an internal thing. 1.2.3 Sep 18, 2004 - Added -sZ and -Z to capture process data. - Added -M3 to report finer detailed memory data for processes - Added ability for dynamic subtotals with -M1 (see manpage) - Added -st and -sT for tcp counters - Made lustre and quadric counters part of default subsystems 1.2.1 Aug 12, 2004 - Support added for lustre clients - Search for 'collectl.conf' in /etc, collectl bin dir, then current dir if no -C 1.2.0 Jul 16, 2004 - Support added for quadrics and lustre - Replaced /usr/sbin/collectl.ph with more flexible /etc/collectl.conf - Added switch to override location of /etc/collectl.conf - Performance improvements to /proc processing 1.1.13 Jul 16, 2004 - Fixed timer bug (since there since version 1) that only shows up on 2.6 kernels - Fixed formatting bug in NFS detail display introduced by -oT - Preserve /etc/init.d/collectl so any custom changes are preserved across versions 1.1.12 Jun 07, 204 - Fixed bug that was causing nfs client data to not be correctly captured - Moved nfs client option 'C' from -o to -OP/dd> - Changed format/use of -M1: more/optional fields - Made timestamp line formats dDT available for 'standard' output - Changed column widths for DISK/PARTITIONS fields from 4 to 6 as needed 1.1.11 May 21, 2004 - assured 'wc' used correctly since 2.6 kernels changed format - renamed kit to 'noarch' 1.1.10 Jul 16, 2004 - Fixed extra space getting printed in plot format by -ss. - removed -a 0 from init.d file since it would prevent starting on machines that don't have HiRes installed 1.1.9 May 03, 2004 - The biggie - support for 2.6 kernels resulting in changes to -sd & -sp - Added -a to startup script to align times to minute boundary - New switches for ministats to control date/time format: -o dDT - Combined ministats 1 thru 4 to more intelligent -M1 - New ministat M2 mimics vmstat but with date/time stamps - Clarification of memory statistics in man page - Minor bug fixes with custom ministat directory name processing 1.1.8e Mar 24, 2004 - Added new subsystems -sE -slL [environmental and lustre] - Added new output options -otH - Added 'ministats' which are combined subsystems on singe line (see manpage for -M) - Added ability to send output to a socket for remote monitoring 1.1.7 Feb 06, 2004 - Updated Copyright notice - Fixed bug when trying to use -P without -f 1.1.6 Dec 05, 2003 - wan't properly handling -p -f -P for multiple files on same day but different -s values: first one overrides the rest! 1.1.5 Dec 03, 2003 - added error checking to ignore partially read /proc data - added ability to specify YESTERDAY or TODAY in playback file name 1.1.4 Nov 26, 2003 - bug in playback of nfs client data - enhance -v to show zlib/compress if present - fixed bug handling -p -f of mulitple files on same date - fixed but handling logs from different systems with different subsys values 1.1.3 Nov 26, 2003 - added support for Smart Array devices in partition table reporting 1.1.2 Nov 26, 2003 - added support for disk and partition exception reporting along with several switches to support it. See -l, -L and -o x/X COLMUX CHANGES collectl-4.3.1/vmstat.ph0000775000175000017500000000356613366602004013352 0ustar mjsmjssub vmstatInit { error("-s not allowed with 'vmstat'") if $userSubsys ne ''; error("-f requires either --rawtoo or -P") if $filename ne '' && !$rawtooFlag && !$plotFlag; error("-P or --rawtoo require -f") if $filename eq '' && ($rawtooFlag || $plotFlag); $subsys=$userSubsys='cm'; } sub vmstat { my $line; $sameColsFlag=1; if (printHeader()) { $line= "${clscr}#${miniBlanks}procs ---------------memory (KB)--------------- --swaps-- -----io---- --system-- ----cpu-----\n"; $line.="#$miniDateTime r b swpd free buff cache inact active si so bi bo in cs us sy id wa\n"; $headersPrinted=1; } $datetime=''; if ($options=~/[dDTm]/) { ($ss, $mm, $hh, $mday, $mon, $year)=localtime($lastSecs[0]); $datetime=sprintf("%02d:%02d:%02d", $hh, $mm, $ss); $datetime=sprintf("%02d/%02d %s", $mon+1, $mday, $datetime) if $options=~/d/; $datetime=sprintf("%04d%02d%02d %s", $year+1900, $mon+1, $mday, $datetime) if $options=~/D/; $datetime.=".$usecs" if ($options=~/m/); $datetime.=" "; } # currently only happens when called by colmux if ($showColFlag) { printText($line); return; } my $i=$NumCpus; my $usr=$userP[$i]+$niceP[$i]; my $sys=$sysP[$i]+$irqP[$i]+$softP[$i]+$stealP[$i]; $line.=sprintf("%s %2d %2d %6s %6s %6s %6s %6s %6s %4d %4d %5d %5d %4d %5d %2d %2d %3d %2d\n", $datetime, $procsRun, $procsBlock, cvt($swapUsed,6,1,1), cvt($memFree,6,1,1), cvt($memBuf,6,1,1), cvt($memCached,6,1,1), cvt($memInact,6,1,1), cvt($memAct,6,1,1), $swapin/$intSecs, $swapout/$intSecs, $pagein/$intSecs, $pageout/$intSecs, $intrpt/$intSecs, $ctxt/$intSecs, $usr, $sys, $idleP[$i], $waitP[$i]); printText($line); } 1; collectl-4.3.1/README0000664000175000017500000000131513366602004012346 0ustar mjsmjsIf you're real lazy, just run INSTALL and it will install collectl into the same locations as the rpm. It will install as /usr/bin/collectl and all the other runtime components will be placed into /usr/share/collectl. If you really care where everything goes, read the script as it's pretty short. There's also an UNINSTALL that will completely remove everything. If you want to be more creative, you can either hack up the installation script or use it as a guide to move things around to whereever you want them keeping a couple of things in mind: - collectl.conf is looked for first in /etc and then in its binary directory - all ph files must be in the same directory as collectl itself OR /usr/share/collectl collectl-4.3.1/man1/0000775000175000017500000000000013366602004012322 5ustar mjsmjscollectl-4.3.1/man1/collectl.10000664000175000017500000013257413366602004014221 0ustar mjsmjs.TH COLLECTL 1 "APRIL 2003" LOCAL "Collectl" -*- nroff -*- .SH NAME collectl - Collects data that describes the current system status. .SH SYNOPSIS Record Mode - read data from live system and write to file or display on terminal .B collectl [\-f file] [options] Playback Mode \- read data from one or more raw data files and display on terminal .B collectl \-p file1 [file2 ...] [options] .SH OPTIONS Record Mode In this mode data is taken from a .BR live system and either displayed on the terminal or written to one or more files or a socket. .B "--align" .RS If the HiRes modules is present, .BR collectl sample monitoring will be aligned such that a sample will always be taken at the top of a minute (this does NOT mean the first sample will occur then) so that all instances of collectl running on any systems which have their clocks synchronized will all take samples at the same time. Furthermore, if one is doing process monitoring, those samples will also be taken at the top of the minute and so can delay the start of sampling up to 2 full process monitoring intervals. .RE .B "--all" .RS Collect summary data for ALL subsystems except slabs, since slab monitoring requires a different monitoring interval. This also means you won't get any detail data which also includes processes and environmementals. You can use this switch anywhere \-s can be used but not both together. If the system supports lustre and/or interconnect monitoring those statistics will be provided but the warnings produced when they are not available you try to select them with \-s will not be displayed. .RE .B "--ALL" .RS This is actually a superset of --all by adding detail statistics as well with the exception of TCP details when displaying to a terminal since those are only available with -P or -f. .RE .B "\-A, --address address[:port[:timeout]] | server[:port]" .RS In the first form, one specifies an address, optional port and timeout (the first colon is required to specify timeout for default port). All data is then written to that socket prefaced with the current host name at the named address and port until the socket is closed, at which time collectl will exit. In the second form one enters the text "server" and optional port. In this form, collectl runs as a server, waiting for a connection and once established writes data on that socket. The key difference here is if the client exists collectl keeps running and will again look for a new connection, allowing it to survive client restarts or crashes. The default port is set at 2655 but can be changed \- see collectl.conf. In both forms, one can additionally request local data logging by specifying a combination of \-P and \-f. See .B "man collectl-logging" for more details. .RE .B "\--comment string" .RS Add the specified string to the end of the headers in the data files. If any embedded spaces be sure to quote it. This can be very useful when doing characterizations or benchmarking and you're frequently changing system/application parameters and restarting collectl between tests. .RE .B "\-C, --config filename" .RS Name/location of the collectl configuration file. If not specified, .BR collectl searches for .BR collectl.conf first in /etc (the default), then in the same directory the .BR collectl executable is in, and finally the current working directory. .RE .B "\-c, --count Samples" .RS The number of samples to record. This is one way of 3 ways of describing how long collectl should run (see .BR \-r and .BR \-R ). Note that these 3 switches are mutually exclusive. .RE .B "\-D, --daemon" .RS Run .BR collectl as a daemon, primarily used when starting as a service. One caveat about this mode is you can only run one copy. .RE .B "--export file[,options]" .RS This requests that collectl does not print anything on the terminal (or send it to a socket) using the standard brief/verbose/plot formats. Instead it executes a perl "require" on the named file, using an extension of ph if not specified. It first looks in the current directory and if not there the directory the executable is in. It then calls the function "file"Init(options) towards the beginning of collectl and again as simply "file"(@options) to generate the exported formatted output. See the online documentation on Exporting Custom Output and Logging for more details. .RE .B "\-f, --filename Filename" .RS This is the name of a file to write the output to. For details on how the output files are named, see the .BR File .BR Naming section of the documentation on collectl.sourceforge.net OR /usr/share/doc/collectl/FileNaming.html .RE .B \-F, --flush seconds .RS Flush output buffers after this number of seconds. This is equivalent to issuing .B kill \-s USR1 at the same frequency (but a lot easier!). If 0, a flush will occur every data collection interval. .RE .B --grep pattern .RS The main purpose of this switch is for those users who have discovered there is some data in the raw files that never appears in any display and have taken to displaying it themselves with grep. Unfortunately this method does not include timestamps and so makes it difficult to interpret the results. Even if you include the timestamp from the file it is in UTC and so needs to be translated to be of any real value. This switch does just that and then some. Specifically, it allows you to playback a file and instead of processing it normally it simply searches for any entries that match the perl pattern and reports those lines prefaced with time stamps. You can optionally change the time format with the usual \-o options and can even select the timeframe with --from and --thru. .RE .B --home .RS Always start the display for the current interval at the top of the screen also known as the home position (non-plot format only). This generates a real-time, continously refreshing display when the data fits on a single screen. .RE .B --import file1[,options][:file2[,options]...] .RS This loads the named files and executes callbacks to them, which is the API mechanism for importing additional metrics into collectl. See the webpage on the API for further detail. Since these files also include instructions for how to report the output in all the various forms, you will also need to include --import during playback. Finally, since the default is to seamlessly include imported data with everything else collectl reports, if you ONLY want to display imported data you much explicitly deselect all other subsystems either by including -s- (note the trailing minus sign) followed by all the subsystems were recorded OR simply say -s-all. .RE .B "\-i, --interval interval[:interval2[:interval3]]" .RS This is the sampling interval in seconds. The default is 10 seconds when run as a daemon and 1 second otherwise. The process subsystem and slabs (\-sY and \-sZ) are sampled at the lower rate of .BR interval2. Environmentals (\-sE), which only apply to a subset of hardware, are sampled at .BR interval3. Both .BR interval2 and .BR interval3, if specified, must be an even multiple of .BR interval1. The daemon default is \-i10:60:300 and all other modes are \-i1:60:300. To sample only processes once every 10 seconds use \-i:10. .RE .B --nohup .RS Whenever collectl finishes a data collection interval, it checks to see if the starting parent has exited. This is to prevent the case in which someone might start a copy of collectl and then the process dies and collectl keeps running. If that is the behavior someone actually intends, they should start collectl with --nohup. NOTE - when running as a daemon, --nohup is implied. .RE .B "--quiet" .RS Whenever collectl wants to tell the user something, it assigns a category to it such as Informational, Warning, Error or Fatal. When run with \-m, all messages are displayed for the user and if logging data to a file with \-f, these messages are also sent to a log file which is in the data collection directory and has an extenion of "log". However, if \-m is not specified Informational messages (such as collectl starting or stopping) are not reported on the terminal but the other 3 are. Sometimes the warnings can be annoying and one can suppress these with --quiet though they will still be written to the message log in \-f. You cannot suppress Error or Fatal errors. .RE .B "\-r, --rolllogs time[[,days[:months]][,minutes]]" .RS When selected, collectl runs indefinately (or at least until the system reboots). The maximum number of raw and/or plot files that will be retained (older ones are automatically deleted) is controlled by the .BR days field, the default is 7. When -m is also specified to direct collectl to write messages to a log file in the logging directory, the number of months to retain those logs is controlled by the .BR months field and its default is 12. The .BR increment field which is also optional (but is position dependent) specifies the duration of an individual collection file in minutes the default of which is 1440 or 1 day. .RE .B "--rawdskfilt" .RS This switch overrides the DiskFilter setting in collectl.conf and explicitly defines a perl regx expression against which records from /prod/diskstats are selected for processing. When there are a lot of disks to process, this can be a handy way to reduce the amount of data collected and actually improve performance since there are less patterns to match each input record against. Just remember that unlike --dskfilt which only filters during display, records filtered with this switch are never even recorded and so lost forever. You can optionally specify your filter with a leading plus-sign which tells collectl to just add your filter to the default specification. Care should be taken here as longer filters will slightly increase overhead and with a lot of disks and/or shorter monitoring intervals can add up. As a side benefit of this switch, if you really want to look at partition level stats you can do so by leaving off the trailing space in the default pattern. One must be also be careful in selecting the correct pattern since it's easy to get it wrong and you may end up collecting the WRONG data! To verify you are collecting what you think you are, make a test run using -d4 to see the raw data being recorded in real-time. .RE .B "--rawdskignore" .RS This is the opposite of the rawdskfilt switch. When specified any disks listed are completely ignored and will not appear in the raw file. Typically this switch is useful when you're only interested in recording a subset of disk statistics. .RE .B "--rawnetfilt" .RS This works just like --rawdskfilt except it applies to networks. Unlike disk filtering which has an explicit default pattern, the default for network filtering is to simply record all network data from /proc/net/dev. The -d4 switch also works here, as well as everywhere, to see the raw data as it is being collected. .RE .B "--rawnetignore" .RS This is the opposite of the rawnetfilt switch and works just like the rawdskignore switch. When specified any networks listed are ignored and will not appear in the raw file. Typically this switch is useful when you're only interested in recording a subset of network statistics. .RE .B "--rawtoo" .RS Only available in conjunction with \-P, this switch causes the creation/logging of raw data in addition to plottable data. While this may seem excessive, keep in mind that unlike plottable data, raw data can be played back with different switches potentially providing more details. The overhead to write out this additional data is minimal, the only real cost being that of extra disk space. .RE .B "\-R, --runas uid[:gid]" .RS This switch only works when running in daemon mode and so must be specified in the DaemonCommands line. Its presence will cause collectl to write the collectl.pid file into the same directory as its other output files as specified by -f, since /var/run does not normally grant non-privileged users write access. Furthermore, the ownership of that directory must match the specified ownership since collectl needs to write ALL it's files to that directory and can no longer assume global permissions when run as root. This WILL also require manually modifying /etc/init.d/collectl to change the PIDFILE variable to point to the same directory which the -f switch in the DaemonCommands line of collectl.conf points to. As a final note of caution, since this mechanism changes where collectl reads/writes its pid file, once you start using --runas, all calls to run collectl as a daemon must use it or it may be confused and exhibit unpredictable behavior. .RE .B "\-R, --runtime duration" .RS Specify the duration of data collection where the duration is a number followed by one of .BR wdhms, indicating how many weeks, days, hours, minutes or seconds the collection is to be taken for. .RE .B "--sep separator" .RS Specify the plot format separator \- default is a space. If this is a numeric field it is interpretted as the decimal value of the associated ASCII character code. Otherwise it is interpretted as the character itself. In other words, "--sep :" sets the separator character to a colon and "--sep 9" sets it to a horizontal tab. "--sep 58" would also set it to a colon. .RE .B --tworaw .RS The switches \-G and --group have been replaced by --rawtoo, which is more rescriptive of its function. When specified, it tells collectl to treat process and slab data as an entirely separate group of raw files, named with the extention "rawp". These separate files can be played back and processed just like any other collectl raw files and in fact one can even play back both at the same time if that is what is desired. The only real purpose of this switch is that on some systems with many processes, it is possible to generate huge raw files (some have been observerd to be >250MB!) and while collectl will happily play back/process these files it can take a long time. By using the --tworaw switch one still gets a huge rawp file, but the normal raw file is a much more manageable size and as a result will faster to process then when all data is combined into the same file. .RE Playback Mode In this mode, data is read from one or more data files that were generated in Record Mode .B "--export Filename" .RS When playing back a file, use this switch to create an identical raw file differing only in the timeframe being convered, so naturally one must also include --from, --thru or both. Further, since the resultant file will contain the exact same raw data you cannot select a subset using \-s. This switch is actually intended for a support function for situations where somone is having problems playing back a file and a subset of the original raw file that covers the problem time has been requested, hopefully allowing a significantly file to be posted or emailed. .RE .B "--extract filename" .RS If specified, rather than actually play back the file specified with \-p, ALL raw data between the date ranges is selected and a subset of that raw file created. The rules for how to interpret the filename are the same as used for \-f. .RE .B "\-f, --filename filename" .RS If specified, this is the name of a file or directory to write the output to (rather than the terminal). See the description for details on the format of this field. This requires the \-P flag as well. .RE .B "--from time range" .RS Play back data starting with this time, which may optionally include the ending time as well, which is of the format of [date:]time[-[date:]time]. The leading 0 of the hour is optional and if the seconds field is not specified is assumed to be 0. If no dates specified the time(s) apply to each file specified by \-P. Otherwise the time(s) only apply to the first/last dates and any files between those dates will have all their data reported. .RE .B "--full" .RS Full mode is actually a superset of --verbose and if selected will force --verbose. It will also force the RECORD separator to be printed for every interval even if only a single subsystem was requested and to include the actual subsystems that follow following the utc timestamp as a parsing aid for those who may wish to parse the text output rather than the plot data. .RE .B "--offsettime seconds" .RS This field originally was used before collectl reported the timezone in the file headers and allowed one to compensate. Since then it is rarely needed except in two possible cases, one in which data on two systems is to be compared and they weren't synchonized with ntp. This allows all the times to be reported as shifted by some number of seconds. The other case (and this is very rare) is when a clock had changed in the middle of a sample and will not be converted correctly. When this happens one may have to play back the samples in pieces and manually set the time offset. .RE .B "--passwd filename" .RS When reporting usernames associated with a UID, use this file for the mapping. This is particularly important on systems running NIS where this are no user names in /etc/passwd. .RE .B "\-p, --playback Filename" .RS Read data from the specified .BR playback file(s), noting that one can use wildcards in the filename if quoted (if playing back multiple files to the terminal you probably want to include \-m to see the filenames as they are processed). The filename must either end in .BR raw or .BR raw.gz. As an added feature, since people sometimes automate the running of this option and don't want to hard code a date, you can specify the string YESTERDAY or TODAY and they will be replaced in the filename string by the appropriate date. .RE .RE .B "--pname name" .RS By default, collectl uses the file /var/run/collectl.pid to indicate the pid of the running instance of collectl and prevent multiple copies from being run. If you DO want to run a second copy, this switch will cause collectl to change its process name to collectl-name and use that name as the associated pid file as well. .RE .B --procanalyze .RS When specified and there is process data in the raw file, a summary file will be generated with one entry unique process containing such things as the total cpu consumed for both user and system, min/max utilization of various memory types, total page faults and several others. .RE .B --slabanalyze .RS When specified and there is slab data in the raw file, a summary file will be generated with one entry unique slab containing data on physical memory usage by that slab. .RE .B "--thru time" .RS Time thru which to play back a raw file. See --from for more .RE Common Switches \- both record and playback modes .RE .B "\-d, --debug debug" .RS Control the level of debugging information, not typically used. For details see the source code. .RE .B \-h, --help, \-x, --helpext, \-X, --helpall .RS Display standard, extended help message (which doesn't include the optional displays such as --showoptions, --showsubsys, --showsubopts, --showtopopts) or everything. .RE .B --hr, --headerrepeat num .RS Sets the number of intervals to display data for before repeating the header. A value \-1 will prevent any headers from being displayed and a value of 0 will cause only a single header to be displayed and never repeated. .RE .B --iosize .RS In brief mode, include iosize with disk, infiniband and network data. .RE .B \-l, --limits limit .RS Override one or more default exception limits. If more than one limit they must be separated by hyphens. Current values are: .B SVC:value .RS Report partition activity with Service times >= 30 msec .RE .B IOS:value .RS Report device activity with 10 or more reads or writes per second .RE .B LusKBS:value .RS Report client or OSS activity greater than limit. Only applies to Client Summary or OSS Detail reporting. [default=100000] .RE .B LusReints:value .RS Report MDS activity with Reint greater than limit. Only applies to MDS Summary reporting. [default=1000] .RE .B AND .RS Both the IOS and SCV limits must be reached before a device is reported. This is the default value and is only included for completeness. .RE .B OR .RS Report device activity if either IOS or SVC thresholds are reached. .RE .B \-L, --lustsvcs [c|m|o][:seconds] .RS This switch limits which servics lustre checks for and the frequency of those checks. For more information see the man page collectl\-lustre. .RE .RE .B \-m, --messages .RS Write status to a monthly log file in the same directory as the output file (requires \-f to be specified as well). The name of the file will be .BR collectl\-yyyymm.log and will track various messages that may get generated during every run of .BR collectl. .RE .B \-N, --nice .RS Set priority to a .BR nicer one of 10. .RE .B "\-o, --options Options" .RS These apply to the way output is displayed OR written to a plot file. They do not effect the way data is selected for recording. Most of these switches work in both record as well as playback mode. If you're not sure, just try it. .B 1 .RS Data in plotting format should use 1 decimal point of precision as appropriate. .RE .B 2 .RS Data in plotting format should use 2 decimal points of precision as appropriate. .RE .B a .RS Always append data to an existing plot file. By default if a plot file exists, the playback file will be skipped as a way of assuring it is associated with a single recorded file. This switch overrides that mechanism allowing muliple recorded files to be processed and written to a single plot file. .RE .B c .RS Always open newly named plot fies in .BR create mode, overwriting any old ones that may already exists. If one processes multiple files for the same day in .BR append mode multiple times, the same data will be appended to the same file mulitple times. This assures a new file is created at the start of the processing. .RE .B d .RS For use with terminal output and brief mode. Preceed each line with a date/time stamp, the date being in mm/dd format. This option can also be applied to plot formatit which will cause the date portion to also be displayed in this format as opposed to D format. .RE .B D .RS For use with terminal output and brief mode. Preceed each line with a date/time stamp, the date being in yyyymmdd format. .RE .B g .RS For use with terminal output and brief mode. When displaying values of 1G or greater there is limited precision for 1 digit values. This options provides a way to display additional digits for more granularity by substituting a "g" for the decimal point rather than the trailing "G". .RE .B G .RS For use with terminal output and brief mode. This is similar to "g" but preserves the trailing "G" by sacrificing a digit of granularity. .RE .B m .RS Whenever times are reported in plot format, in the normal terminal reporting format at the bginning of each interval or when when one of the time reporting options (d, D, T or U is selected), append the milliseconds to the time. .RE .B n .RS Where appropriate, data such as disk KBs or transfers are normalized to units per second by taking the change in a counter and dividing by the number of seconds in that interval. In the case of CPUs, utilization (calculated in jiffies) is normalized as a percentage of the interval. Normalization can be disabled via this option, the result being the reported values are not divided by the duration of the interval. This can be particulary useful for reporting values that are < 1/2 the sampling, which will be rounded to 0. .RE .B T .RS For use with terminal output and brief mode, preceeds each line with a time stamp. .RE .B u .RS Create plot files with unique names by include the starting time of a colletion in the name. This forces multiple collections taken the same day to be written to multiple files. .RE .B "\-U or --utc" .RS In plot format only, report timestamps in Coordinated Universal time which is more commonly know as UTC. .RE .B x .RS Report only exception records for selected subsystems. Exception reporting also requires --verbose. Currently this only applies to disk detail and Lustre server information so one must select at least -s D, l or L for this to apply. If writing to a detail file, this data will go into a separate file with the extension .BR X appended to the regular detail file name. .RE .B X .RS Report both exceptions as well as all details for selected subsystems, for -s D, l or L only. .RE .B z .RS If the compression library has been installed, all output files will be compressed by default. This switch tells collectl not to compress any plottable files. If collectl tries to compress but cannot because the library hasn't been installed, it will generate a warning which can be suppressed with this switch. .RE .RE .RE .B \-P, --plot .RS Generate output in plot format. This format is space separated data which consists of a header (prefaced with a # for easy identification by an analysis program as well as identifying it as a comment for programs, such as gnuplot, which honor that convention). When written to disk, which is the typical way this option is used, .BR summary data elements are written to the .BR tab file and the .BR detail elements written to one or more files, one per detail subsystem. If \-f is not specified, all output is sent to the terminal. Output is always one line per sampling interval. .RE .B "--stats" .RS This switch will cause brief data to be reported as both totals and averages after processing one or more files for the same day or in playback mode. .RE .B "--statopts option(s)" .RS This switch controls the way brief stats are reported, the default is to report the totals once, at the end of a day's worth of raw files, if more than one. .br a \- include averages along with totals .br i \- include the interval data itself, which is the equivalent of -oA .br s \- print summary stats at the end of each file processed even if more than one per day .RE .B "\-s, --subsys subsystem" .RS This field controls which subsystem data is to be collected or played back. The default for collecting data is "cdn", which stands for CPU, Disk and Network summary data and the default for playback is everthing that was collected. The rules for displaying results vary depending on the type of data selected. If you write data for CPUs and DISKs to a raw file and play it back with \-sc, you will only see CPU data. If you play it back with \-scm you will still only see CPU data since memory data was not collected. However, when used with \-P, collectl will always honor the subsystems specified with this switch so in the previous example you will see CPU data plus memory data of all 0s. To see the current set of default subsystems, which are a subset of this full list, use \-h. You can also use + or \- to add or subtract subsystems to/from the default values. For example, "\-s\-cdn+N"< will remove cpu, disk and network monitoring from the defaults while adding network detail. Refer to data definitions on the sourceforge website OR in /usr/share/collectl/doc/collectl\-xxx to see complete descriptions of the data returned. SUMMARY SUBSYSTEMS .br b \- buddy info (memory fragmentation) .br c \- CPU .br d \- Disk .br f \- NFS V3 Data .br i \- Inode and File System .br j \- Interrupts .br l \- Lustre .br m \- Memory .br n \- Networks .br s \- Sockets .br t \- TCP .br x \- Interconnect .br y \- Slabs (system object caches) DETAIL SUBSYSTEMS This is the set of .BR detail data from which in most cases the corresponding summary data is derived. There are currently 2 types that do not have corresponding summary data and those are "Environmental" and "Process". So, if one has 3 disks and chooses .B \-sd, one will only see a single total taken across all 3 disks. If one chooses .B \-sD, individual disk totals will be reported but no totals. Choosing .B \-sdD will get you both. .br C \- CPU .br D \- Disk .br E \- Environmental data (fan, power, temp), via ipmitool .br F \- NFS Data .br J \- Interrupts .br L \- Lustre OST detail OR client Filesystem detail .br M \- Memory node data, which is also known as numa data .br N \- Networks .br T \- 65 TCP counters only available in plot format .br X \- Interconnect .br Y \- Slabs (system object caches) .br Z \- Processes .RE .B --showheader .RS In collectl mode this command will cause the header that is normally written to a data file to be displayed on the terminal and collectl then exists. This can be a handy way to get a brief overview of the system configuration. .RE .B --showoptions .RS This command shows only the portion of the help text that desribes the \-o and --options switches to save the time of wading through the entire help screen. .RE .B --showcolheaders .RS This command shows the first set of headers that will be printed by collectl and exits. Doesn't really make sense for multi-section output like several sets of verbose or detail data. Also note that since it requires one monitoring interval to build up some headers which may be dynamic, it also forces the interval to 0. .RE .B --showsubopts .RS List all the subsystem specifice options .RE .B --showtopopts .RS Show all the different values for the --top type field, which specify the field(s) by to sort the data .RE .B --showrootslabs .RS This command only works on systems using the new slab allocator and will list the root name (these are those entries in /sys/slab which are not soft links) along with all its alias names. If a name doesn't have an alias, it will not appear in this report. .RE .B --showslabaliases .RS This command only works on systems using the new slab allocator. Like --showrootslabs, it will name a slab and all its aliases but rather than show the root slab name it will show one of the aliases to provide a more meaningful name. If there are any slabs that only have a single (or no) alias they will not be included in this report. .RE .B --showsubopts .RS Similar to --showoptions, this command summaries just the paramaters associated with \-O and --subopts. .RE .B --showsubsys .RS Yet another way to summare a portion of the help text, this command only shows valid subsystems. .RE .B "--top [type][,num[,v]]" .RS Include the top "num" consumers by resource for this interval. The default number is the height of the window if it can be determined otherwise 24, and the default resource is the total cpu time which is taken as the sum of SysT and UsrT. See --showtopopts for a list of other types of data you can sort on. This switch can also be used with \-s in which case a portion of the window is reserved at the top to fill in the subsystem data, which is currently in verbose mode though a brief format is contemplated for some time in the future. In interactive mode and if not specified, the process monitoring interval will be set to that for other subsystems. The screen will be cleared for each interval resulting in a display similar to the "top" utility. In playback more the screen will NOT be cleared. You cannot use this switch in "record" mode. Finally, if v is specified as the 3rd parameter, the output scrolls vertically (like playbak mode) rather than clearing the screen between intervals. .RE .B "--umask mask" .RS Sets collectl's umask to control output file permissions. Only root can set the umask. See "man umask" for details. .RE .B "--utime mask" .RS Write periodic micro-timestamps into raw file at different points in time for fine grained measurements of operation times. .br 1 \- write timestamps when entering major sections .br 2 \- write timestamps for all /proc accesses except for process data .br 4 \- write timestamps for /proc data for all processes including threads .RE .B \-v .RS Show version and whether or not Compression and/or HiResTime modules have been installed and exit. .RE .B \-V .RS Show default parmeter and control settings, all of which can be changed in /etc/collectl.conf .RE .B --verbose .RS Display output in verbose mode. This often displays more data than in the default mode. When displaying detail data, verbose mode is forced. Furthermore, if summary data for a single subsystem is to be displayed in verbose mode, the headers are only repeated occasionally whereas if multiple subsystems are involved each needs their own header. .RE .B \-w .RS Disply data in .BR wide mode. When displaying data on the terminal, some data is formatted followed by a K, M or G as appropriate. Selecting this switch will cause the full field to be displayed. Note that there is no attempt to align data with the column headings in this mode. .RE .SH SUBSYSTEM OPTIONS The following options are subsystem specific and typically filter data for collection and/or display as well as affect the output format: .B "--cpufilt[^]perl-regx[,perl-regx...] " .RS .br Works the same as dskfilt and netfilt, allows one to select a subset of CPUs. These filters are also honored by interrupt reporting as well. .RE .B "--cpuopts" .RS .br z \- only applies to cpu details, do not report any CPUs with no load. In other words all entries are zero except for IDLE. .RE .B "--dskfilt [^]perl-regx[,perl-regx...]" .RS NOTE \- this does NOT effect data collection and ALL disk data will always be collected, unless --rawdskfilt is specified too. However, only data for disk names that match the pattern(s) will be included in the summary totals and displayed when details are requested. Alternatively, if you preface the first expression with a caret, all names that match all strings will be excluded from the summary totals and detail displays rather then included. If you don't know perl, a partial string will usually work too. Just remember, this only applies to collected data and so if for example you specify a parition, such as sda1, you'll never see the data since it was filtered out at the time of data collection. To see those stats you would need to say --rawdskfilt sda1. .RE .B "--dskopts" .RS .br f \- report some columns as fractions for more precision on detail output .br i \- display the i/o sizes in brief mode just like with --iosize .br o \- exclude unused disks from new file headers and plot data .br z \- only applies to disk details, do not report any lines with values of all zeros. .RE .B "--dskremap aaa:bbb,ccc:ddd..." .RS This will cause disk names matching the perl pattern aaa to be replaced with the string bbb. In some cases, you may simply want to remove the entire string in which case the second string should be left empty. If you want to remove a string container a /, be sure to escape it with a backslash. .RE .B "--envopts Environmental Options" .RS The default is to display ALL data but the following will cause a subset to be displayed .br f \- display fan data .br p \- display current (power) data .br t \- display temperature data .br C \- convert temperature to Celcius if in Farenheit .br F \- convert temperature to Farenheit if in Celcius .br M \- display each type of data on separate line .br T \- display data truncated to whole integers (some implemenations displayed them with fractional components) .br 9 \- any number, will tell ipmitool to read on this device number .RE .B "--envfilt regx" If specified, this regx is evaluated against each line of data returned by ipmitool and only those that match are retained. All other data is lost. .RS .RE .B "--envremap perl-regx,..." .RS If specified as a comma separated list of perl regular substitution expressions without the =~s portion, each expression is applied to each environmental field name, thereby allowing one to rename the column headers. This can be most useful when running on heterogeneuos systems and you want consistent column names. .RE .B "--intfilt [^]perl-regx[,perl-regx...]" .RS NOTE \- this does NOT effect data collection, ALL interrupt data will always be collected. However, only data for interrupts that match the pattern(s) will be included in the summary totals and displayed when details are requested. Alternatively, if you preface the first expression with a caret, all names that match all strings will be excluded from the summary totals and detail displays rather then included. If you don't know perl, a partial string will usually work too. NOTE - these expressions are applied to the entire line one sees in /proc/interrupts, including the interrupt number, name and even counters so if you do want to include an interrupt number in the pattern be sure to include the trailing colon as well. .RE .B "--lustopts Lustre Options" .RS .br B \- For clients and servers, show buffer stats .br D \- For MDSs and OSTs AND running earlier versions of HPSFS, collect disk block iostats .br M \- For clients, collect metadata .br O \- For OSTs, show detail level stats .br R \- For client, collect readahead stats .RE .B "--memopts Memory Options" .RS R \- show memory values (including swap space) as rates of change as opposed to absolute values. One can also show absolute changes between intervals by including \-on. .RE .B "--netfilt [^]perl-regx[,perl-regx...]" .RS NOTE \- this does NOT effect data collection and ALL network data will always be collected, unless --rawnetfilt is specified too. Also note that by default only eth, ib, em and p1p networks when present are included in the summary. When this switch is specified, only data for network names that match the pattern(s) will be included in the summary and displayed when details are requested. This switch therefore also gives you the ability to add other, possibly new, network devices to the summary totals. Alternatively, if you preface the first expression with a caret, all names that match all strings will be excluded from the summary totals and detail displays rather then included. If you don't know perl, a partial string will usually work too. .RE .B "--netopts" .RS e \- include network error counts in brief and explicit error types elsewhere .br E \- only include lines with network errors in them .br i \- include i/o sizes in brief mode .br o \- exclude unused networks from new file headers and plot data .br w \- set width of network device name .RE .B "--nfsfilt NFS Filters" .RS Specify one or more comma separated filters as a C/S followed by an nfs version number and only those will have data reported on. For example, C2 says to report data on V2 Clients. As a data collection performance optimization, if one or more client filters are specified, data will actually be collected for all clients as is also done for servers. .RE .B "--nfsopts NFS Options" q.RS z \- only display detail lines which have data .RE .B "--procfilt Process Filters" .RS These filters restrict which processes are selected for collection/display. Using this filter will significanly reduce the load on process data collection since collectl creates a blacklist of those existing processes that do not pass the filter and so are permanently excluded from any future processing. The format of a filter is a one charter type followed by a match string. Multiple filters may be specified if separated by commas. .br c \- substring of the command being executed as explicitly read from /proc/pid/stat. Note that this can actually be a perl expression, so if you want a command that ends in a particular string all you need to is append a \$ to the end of the string. Otherwise it would match any commands containing that string. .br C \- any command that starts with the specified string .br f \- full path of the command, including arguments, as read from /proc/pid/cmdline. Like the c modifier this too can be a perl expression. .br p \- pid .br P \- parent pid .br u \- any process ownerd by this user's UID or in the range specifide by uxxx\-yyy .br U \- any process owned by this username .B "caution:" the process names collectl tries to match with c and C is the second field in /proc/pid/stat which may not necessarily be what you think! eg the name for X emacs is actually emacs-x .RE .B --procopts options .RS These options control the way data is displayed and can also improve data collection performance .br c \- include CPU time of children who have exited (same as ps \-S) .br f \- use cumulative totals for page faults in process data instead of rates .br i \- show process I/O counters in display instead of default format .br I \- disable collection of I/O counters, see note below .br k \- remove known shells from process names, making it possible to see actual command .br m \- show breakdown of memory utilization instead of default format .br p \- never look for new pids or threads during data collection .br r \- show root command name only (no directory) for narrower display. Note that this is applied AFTER 'k' so if arg1 becomes the new command it will be truncated now, which is very handy when running in a virtual python environment .br R \- show ALL process priorities ('RT' currently displayed if realtime) .br s \- show process start time in hh:mm:ss format .br S \- show process start time in mmmdd-hh:mm:ss format .br t \- include ALL process threads (increases collection overhead) .br u \- report username as 12 chars instead of 8, noting uxx will cause column width to be xx but cannot be less than 8 .br w \- widen display by including whole argument string, with optional max width .br x \- include extended process attributes (currently only for context switches) .br z \- exclude any processes with 0 in sort field (in --top mode) Process data is the most expensive type of data collected, costing as much as 3 times the CPU load as all other types of data combined. Collecting thread data makes this even more expensive. One can significantly reduce this load by over 25 percent by disabling the collection of I/O stats. However, keep in mind that even if you don't try to optimize process data collection, the overall system load by collectl can still be on the order of about 0.2% when running as a daemon with default collection rates. See the online documentation on measuring performance for more information. A security hole was identified that allowed non-priviledged users to read /proc/pid/io and guess password lengths and noe many distros retrict access to the owner or root. As a result, non-priviledged users will see all 0 I/O counts for processes that are not theirs when specifying --procopt i. .RE .RE .B "--slabfilt Slab Filters" .RS One can specify a list of slab names separated by commas and only those slabs whose names start with those strings will be listed or summaried. .RE .B "--slabopts Slab Options" .RS .B "s \- exclude any slabs with an allocation of 0" .br .B "S \- only show those slabs whose allocations changed since last display" .RE .B "--tcpfilt" .RS These filters actually control both what is collected as well as displayed. If one selects non-collected filters, 0s will be reported. There is one special case and that is if one includes T (tcp extended stats) in the filter string, there are no brief ones and therefore --verbose will be forced. .br .B "i \- ip stats" .br .B "t \- tcp stats" .br .B "u \- udp stats" .br .B "c \- icmp stats" .br .B "I \- ip extended stats" .br .B "T \- tcp excented stats" .RE .RE .B "--xopts" .RS i \- include i/o sizes in brief mode .RE .SH DESCRIPTION The .BR collectl utility is a system monitoring tool that records or displays specific operating system data for one or more sets of subsystems. Any set of the subsystems, such as CPU, Disks, Memory or Sockets can be included in or excluded from data collection. Data can either be displayed back to the terminal, or stored in either a compressed or uncompressed data file. The data files themselves can either be in .BR raw format (essentially a direct copy from the associated /proc structures) or in a space separated .BR plottable format such that it can be easily plotted using tools such as gnuplot or excel. Data files can be read and manipulated from the command line, or through use of command scripts. Upon startup, .BR collectl.conf is read, which sets a number of default parameters and switch values. Collectl searches for this file first in /etc, then in the directory the collectl execuable lives in (typically /usr/sbin) and finally the current directory. These locations can be overriden with the .BR \-C switch. Unless you're doing something really special, this file need never be touched, the only exception perhaps being when choosing to run collectl as a service and you wish to change it's default behavior which is set by the DaemonCommand entry. .SH RESTRICTIONS/PROBLEMS Thread reporting currently only works with 2.6 kernels. The pagesize has been hardcoded for perl 5.6 systems to 4096 for IA32 and 16384 for all others. If you are running 5.6 on a system with a different pagesize you will see incorrect SLAB allocation sizes and will need to scale the numbers you're seeing accordingly. I have recently discovered there is a bug in /proc in that an extra line is occasionally read with the end of the previous buffer! When this occurs a message is written (if \-m enabled) and always written to the terminal. Since this happens with a higher frequency with process data I silently ignore those as the output can get pretty noisey. If for any reason this is a problem, be sure to let me know. Since collectl has no control over the frequency at which data gets written to /proc, one can get anomolous statistics as collectl is only reporting a snapshot of what is being recorded. For more information see http://collectl.sourceforge.net/TheMath.html. At least one network card occasionally generates erroneous network stats and to try to keep the data rational, collectl tries to detect this and when it does generates a message that bogus data has been detected. .SH FILES, EXAMPLES AND MORE INFORMATION http://collectl.sourceforge.net OR /opt/hp/collectl/docs .SH ACKNOWLEDGEMENTS I would like to thank Rob Urban for his creation of the Tru64 Unix collect tool, which collectl is based on. .SH AUTHOR This program was written by Mark Seger (mjseger@gmail.com). .br Copyright 2003-2016 Hewlett-Packard Development Company, LP .br collectl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the source kit collectl-4.3.1/man1/colmux.10000664000175000017500000005563713366602004013733 0ustar mjsmjs.TH COLMUX 1 "DECEMBER 2010" LOCAL "colmux" -*- nroff -*- .SH NAME .B colmux - multiplex communications to multiple systems running collectl from a single system .SH SYNOPSIS colmux [-command "collectl-switches... [-p filespec]]" [-address addr1[,addr2,...]|-addr filename] [-cols col1[,col2...]] | [-column num] .SH DESCRIPTION This utility gathers up data generated by collectl from multiple systems and multiplexes it into a single consolidated format. It runs in essentially 2 distinct modes, the first is known as real-time, because data is retrieved and displayed in real time. The second is playback mode because data is played back from existing collectl data files. There are also 2 general formats for the data being displayed. The first is a multi-line display in which the data is displayed in the native form that collectl displays it, except it is sorted by a distint column, essentially allowing one to see the TOP producers of that data. The second format is a single line display in which one or more distinct data elements from each source is displayed on the same line. This latter format is never sorted, but rather positionally organized by the name of the system that generated it. Collectl will be then be executed, using any optional switches specified by -command, on each of the systems specified by -address OR read those addresses from a file it the target of that switch is a filename rather than a list of hosts OR on the local system if -address is not specified. See collectl for details of the various switches. In some cases certain collectl switches will not make sense in a colmux environment and if chosen will generate an error. Further, if hosts are specified with -address, they should be a individual addresses or hostnames separated by commas. In turn, any of them can be in what those familiar with pdsh would recognize as -w format. Colmux will then execute the collectl command, gather the results from all sources for a particular interval and display them one result per line, sorted by the specified column OR all on the same line in groups specified by -cols. The number of lines displayed is set to the size of the terminal window by default, but can be changed using -lines. The one exception is the use of -nosort which only applies to the playback of existing collectl raw files. In this mode all records for a particular interval will be displayed and the sorting bypassed, making this a speedy and convenient mechanism for gathering all data from all systems in one place for potential further processing. Colmux will never modify the size of the terminal window so to see more or wider lines either expand the window or override the number of display lines and run it again. If the number display lines is set greater then the terminal height or 0, colmux will no longer overlay the previous window and simply run in a continuous scrolling mode. Common Switches .B "-address list|pdsh|filename" .RS Specify any combination of addresses as hostnames OR in pdsh -w format OR a filename containing a list of hostnames/addresses, 1 per line. You MUST have passwordless ssh access to these nodes. If a different username is required, be sure to specify addresses in username@host format noting you do not have to have the same username on each host. If specified, these usernames will override those specified with the -username switch. rsh access is not supported. .RE .B "-command switches" .RS One can specify virtually any collectl command here, both in real-time or playback mode. Some switches may only be used during one mode or the other and colmux will usually let you know if you specify an invalid combination or an otherwise restricted switch. Only those directly affecting colmux are listed below: .B "--from, --thru" .RS Limit the timeframe for data being played back, noting you can include both the from and thru times with the --from switch if you separate then with a hyphen. .RE .B "-o time-format" .RS This is a "magic" switch in that it not only tells collectl how to display dates/times (no other options are permitted using -o other than those from the set [dDTm]), it also tells colmux how to display dates/times too. In single line mode, the timestamp will either come from the host system in real-time mode OR the first host when run in playback mode. This is the most common use/need for this switch. But be careful in choosing column numbers with -cols as the position of the data shifts by 1 when time is included and by 2 if date and time are. Using -test will correctly show the shifted positions but only if you include -o with the command at the same time you use -test. In real-time/top mode this switch is not allowed since colmux simply reports the current time of the system it is running on. When playing back data multi-line formatted data from one or more files, a timestamp for each interval is reported, consisting of the time of that interval. When this switch is included, each line will be tagged with an appropriate timestamp since on rare occasions they may not necessarily all be identical. .RE .B "-p playback-file" .RS This switch tells colmux to run in playback mode. The filename should include the directory location and is usually specified with wild cards, limiting the selected file(s) to a specific date. When those files are on the same host (-address is not specified), they may be for multiple hosts, but when the files are on remote hosts they must all be for be that unique host. If the file specification includes the string TODAY or YESTERDAY they will be replaced with *yyyymmdd* for that date. .RE .B "-P" .RS Run collectl in plot-format. This allows one to specify just about any combination of subsystems since all data is always displayed on a single line. However, due to the lack of formatting, this also makes no sense for multi-line displays and is therefore only supported in single-line format. .RE .RE .B "-help" .RS Show a brief help message and exit. .RE .B "-hostwidth n" .RS By default, colmux set the hostwidth to 8, unless it sees something wider and for most situations this is sufficient. However, if one specifies hostnames that are aliases of the longer hostname, colmux has no way of knowing the real hostlengths until after it starts receiving data from collectl and the formatting will be off if the hostnames are longer than the default. To overcome this problem, use this switch to force the hostname to be wider. .RE .B "-lines" .RS Change the number of lines that are displayed for each interval in multi-line mode. The default will be determined by the terminal size returned by the linux resize command if present. If that command is not present, the size will be initially set to 24. If -lines is greater than the terminal size or 0, top-like behavior will not be used when in real-time mode. Single-line format controls the number of lines displayed between headers. A value of 0 will only display the header one time. .RE .B "-noescape" .RS Colmux uses brute-force screen formatting, that is it generates its own VT100 escape sequences to clear lines and/or move the cursor. On some occasions you may want to disable this sequences if you wish to recode the output and do your own post-processing of it. This switch will do just that. .RE .B "-port" .RS Sometimes a remote version of collectl is already using the default socket. This allows one to start another instance and override that value. .RE .B "-test" .RS This tells colmux to execute the specified collectl command either locally or on the first remote system specified by -address, print the associated header with the selected column(s) highlighted and also include each column name along with its ordinal number, making it fairly easy to make sure you've selected the right column(s). .RE .B "-username name" .RS Use this username for ALL ssh commands. It can be overridden for specific hosts by specifying them with the -address switch with the desired hostnames. .RE .B "-version" .RS Display the version and exit. It will also report if Term::ReadKey is installed and if so what its version number is. .RE Playback Mode Specific The following additional switches only apply to playback mode. There are no real-time mode specific switches. .B "-delay seconds" .RS Introduce a delay between intervals in seconds. You can specify fractional values. Not using this switch will cause the output to be displayed as fast as it can be rendered. .RE .B "-home" .RS Move the cursor to the home position (upper left-hand corner) of the display to use a top-like display format. This ONLY applies to multi-line mode when in playback mode and provides a mechanism for displaying recorded data in a top-like fashion. .RE .B "-hostfilter addr[,addr]" .RS When playing back files for multiple hosts on the local system, sometimes you do not want to play back ALL the host files. This filter allows you to specify only those hosts which you want to process. The format of the list of addresses is specified in the same way as -address except that you cannot specify a filename. .RE .B "-nosort" .RS Intended primarily for output that would be redirected to a file, do not sort or include any escape sequences in the output. .RE Multi-Line Format .RS When there is more output then will fit on the screen, colmux includes the text: .RS Displaying: lines xx thru yy out of zz .RE on the right-side of the top line of the display, where xx is typically 1. However, once colmux is running, one might want to look at subsequent lines, ie those below the bottom of the screen and therefore invisible. If the ReadKey module is installed, one can simply use the PageDown key to move down the display and the PageUp key to move in the other direction. If ReadKey is not installed, typing the multi-key sequences pd or pu will cause the same thing to happen. .RE .B "-colhelp" .RS When you wish to change the sort column and the arrow keys aren't available to you, it may be cumbersome to identify the number of the column to type in followed by RETURN. This tells colmux to display the numbers over each column eliminating the need to manually count them and find the one you want. .RE .B "-column num" .RS Set the sort column to this number. The column numbering is determined by the columns returned by collectl for the requested command. Since date/time columns are optional for non-plot data, their inclusion will change the numbering of the columns so if you are not sure you selected the correct column, you should first execute your command with -test included. You can also change the column number interactively with the RIGHT/LEFT arrow keys IF the ReadKey module is installed (see colmux -version) OR simply type it in followed by the key. .RE .B "-finalcr" .RS There is a real odd case in which you might want to pipe colmux real-time output to a script for further processing. However, if you do this you can't read the final line with a routine that expects a terminating CR, like python's readline(). Rather, that last line and the one that follows will be returned as one long string. This switch tell colmux to insert that final CR, which WILL mess up the screen under normal operations, so be forewarned. .RE .B "-hostformat char:pos" .RS There are times one has long hostnames which can either take up valuable screen real estate or are simply painful to look at. This switch may evolve over time and is currently targetted as hostnames that have repeating parts along with a unique part, separated by a character such as a hyphen. This switch allows you to specify a single character followed by the piece of the hostname you'd like to see displayed. For example, if you have a hostname like aaa-bbbb-cccc-dddd, -hostformat -:3 will cause the cccc piece to be displayed. .RE .B "-nobold" .RS Do not highlight the selected column. This may be useful when redirecting output to a file and you do not want the associated escape sequences to be written to it. .RE .B "-reverse" .RS Reverse the default sort order. You can also change the direction of the sort interactively with the UP/DOWN arrow keys IF the ReadKey module is installed (see colmux -version) OR simply type the r key and . .RE .B "-zero" .RS Do not display any rows with 0 in the sort column. You can also type zinteractively. .RE Single-Line Format .B "-col1000" .RS Divide each column by 1000 before display .RE .B "-colk" .RS Divide each column by 1024 before display .RE .B "-collog10" .RS Remap large numbers to a smaller number of values by taking the log10 of them and further transforming by the followign mapping: 0,1 to 0, 10 to 10, 100 to 20, 1000 to 30, 10000 to 40, ... 1e9 to 90. .RE .B "-cols num,..." .RS Group all data together for each host by column number(s). As with -column, you can confirm the correct column(s) have been selected by first running with -test. .RE .B "-colnodet" .RS Do not show data for individual hosts, just display the totals. .RE .B "-colnodiv num,..." .RS Do not divide the specified column numbers by 1000 or 1024 when col1000 or colk or apply the colllog10 transformation when specified. A typical usage is if you want to look at cpu loads as well as network or disk stats in which case you may want to divide the latter by 1024 but not the cpu. .RE .B "-colnoinst" .RS Do no include instance portion (and surrounding brackets) in totals column headers. .RE .B "-coltotal" .RS Include the totals for each column to the right. .RE .B "-colwidth" .RS Set the output columns to this width, typically used in conjunction with -col1000 or colk to allow more hosts to fit onto the same line. It can also be used if the host names are too narrow for column headers and you have room to display wider names. .RE Exception Reporting Specific In single-line format, rather than wait for all hosts to report their data, colmux simply reports the last data seen when the time to generate a line of output has come. In most cases, these do reflect the most recent data values but in times of load, the data may be late getting to colmux and so a previous value may be reported. If the age of that data exceeds a defined number of intervals, the default is currently 2, an exception value will be reported of -1. At other times it has been seen where kernel/driver bugs may cause incorrect values to be reported as negative numbers and those values are also reported as -1. Both the age and exception values can be changed with the following switches. .B "-age number" .RS When initially starting up and all hosts have not yet reported any data, colmux will display a -1 to indicate no data has been seen yet. If during processing a host fails to report in -age intervals, the default is 2, colmux will also report a -1 indicating the data is stale. .RE .B "-negdataval val" .RS In some cases, there could be erroneous data reported as negative numbers (though sometimes negative numbers are valid). When specified, replace any negative numbers with this value. .RE .B "-nodataval val" .RS This switch allows you to change the -1 that is normally reported for missing or stale data to the specified value, most commonly 0. .RE Diagnostics The following switches are intended more for diagnostic purposes than normal operation, though are also worth using on appropriate occasions. .B "-debug val" .RS This switch is for generating diagnostic information at various levels. It is actually a bit mask, whose values are listed in the beginning on colmux itself. Perhaps the most useful value is 1 as it will cause colmux to display all the remote commands issues to each host in the address list and can often reveal problems when things don't seem to be working correctly .RE .B "-nocheck" .RS This switch was initially included in an earlier version when remote host checking was causing problem in some cases and by skipping those checks, colmux would run more reliably. While it is felt that as of V3.2.0 these reachability checks are now reliable and should not be skipped, this switch has been left in place. .RE .B "-quiet" .RS By default and when -nocheck not specified, colmux checks the versions of all collectl instances against that of the first node found to be running collectl and if different, reports the mismatch. This switch suppresses that warning. When a connection is received from an unexpected address, a warning is also reported and the request promptly ignored. This switch also suppresses those messages as well. For more information on problems connecting, see CONNECTION PROBLEMS. .RE .B "-reachable" .RS By default, when a node is found to not be reachable, colmux will remove it from its list of hosts and continue execution. This switch will tell colmux to exit when all hosts are not reachable. .RE Miscellaneous There are 2 switches whose descriptions don't really fit anywhere else: .B "-colbin path" .RS On rare occasions, such as testing a patch to collectl in a copy NOT in /usr/bin, you may want to tell colmux to use that copy instead of the standard one. Use this switch to point to that copy. Naturally that copy must exist in that location on all systems. .RE .B "-keepalive secs" .RS Colmux uses ssh to start collectl on each remote machine and then communications between collectl and colmux occur over a socket. Normally, ssh is configured to timeout after an interval of inactivity, such as 30 minutes, which means a long-running colmux session will begin to lose connections when this interval is reached. By specifying a keepalive interval, you're telling the ssh to send a periodic keepalive to the other end so that connection doesn't get dropped. .RE .B "-retaddr addr" .RS Tell remote collectls to open a socket on this address instead of the preselected one. For more details on this, see CONNECTION PROBLEMS. .RE .B "-timeout secs" .RS By default, collectl waits up to 10 seconds for remote instances of collectl to connect back. On slower networks or when a very large number of instances have been started, they may fail to connect back in time. This switch will extend that timeout, but it also requires collectl V3.6.4 be used because earlier version do not support this feature. .RE .B "-timerange secs" .RS When colmux starts up and checks the connectivity to all the machines specified by -addr, it also gets their current date/time and using that computes the range of system times across all nodes. If that time is found to be more then -timerange seconds, colmux generates a warning as this difference could cause reporting probems. One can increase the range to get rid of the message (not recommended unless other factors are preventing nodes from responding quickly enough to the date command) OR suppress the warning with -quiet. .RE .SH PLAYBACK MODE RESTRICTIONS All logs being played back must have been collected using the same interval as colmux only looks at the first file/host to determine the appropriate value. It is assumed all clocks are reasonably well synchronized as colmux uses time to determine which data is to be displayed as a set. All files must be in the same directory on all systems and that directory must be included in the playback file specification All files on a remote host must be for that host only .SH EXAMPLES Run collectl on 3 nodes, showing CPU, Disk and Network statistics once a second and sorted by column 1, which happens to be total cpu. .B "colmux -addr abc,def,xyz Dynamically display top processes on nodes n1-n10 of a cluster once a second, sorted by column 5. .B "colmux -addr n[1-10] -command ""-sZ :1"" -column 5" Do the same for yesterday, between the hours of 5AM and 6AM, being sure to stall for 1/2 second between intervals. Note, if you leave off -addr you could put all the logs into /var/log/collectl on the local host and play them back from there. .B "colmux -addr n[1-10] -command ""-sZ -p/var/log/collectl/YESTERDAY -from 05:00-06:00"" -column 5 -delay .5" Look at the amount of mapped and slab memory consumed on nodes n1-n10 and n15 in real-time, every 2 seconds using single-line format. Include totals and preface each line with the time. Since memory sizes tend to be rather large, divide each by 1024 so we see MB rather than KB. Note that the columns numbers are always displayed are ascending order regardless of their order in -cols. To be sure, first test the column numbers. .B "colmux -addr n[1-10,15] -command ""-sm -i2 -oT"" -cols 6,7 -coltot -colk -test" .br .B "colmux -addr n[1-10,15] -command ""-sm -i2 -oT"" -cols 6,7 -coltot -colk" Display most active disks, based on KB written, on nodes n1, n4 and n5. .B "colmux -addr n1,n4,n5 -command ""-sD"" -column 6 Here is a cool trick. Collectl currently lets you look at top processes with the --top switch and even choose a sort column by name. However, if you want to change the column you need to exit, then rerun collectl with a different sort column name. But if you run it like this example, you get the power of colmux to dynamically change the sort columns with the arrow keys! You can also use this technique to have collectl dynamically sort any local multi-line data such as slabs or even detail data like CPU, Disk, Lustre and Networks too! Naturally this technique works just as well with playing back data as well. .B "colmux -command ""-sZ -i:1"" .SH RESTRICTIONS colmux requires passwordless ssh between the node it is running on those it is monitoring. also be sure the port you are using for communications, the default is 2655, if open .SH CONNECTION PROBLEMS The way colmux works is to choose an address it wants to communicate over and starts up one or more remote copies of collectl, telling them to connect back to colmux using that address. The easiest way to see this, is to run colmux with -noesc, which tells it NOT to issue any escape sequences and therefore not to run in full screen mode. The addional switch of -debug 1 tells it to show the remote collectl startup command. When there is a communications problem you will typically see 'connection timed out' messages displayed. There are actually a couple of possibilities here, one of which is a firewall is preventing connections and the easiest way to test this is run collectl on the local machine like this: collectl -Aserver. This tells collectl run as a server, listening for connections just like colmux. Then log into a remote machine and run /usr/share/collectl/util/client.pl addr-of-server which tells client.pl to open a socket to that copy of collectl. It should fail just like when it was run via colmux, so try opening the firewall and try it again. If it fixes the problem, it was indeed the firewall blocking things and colmux should now work just fine. Sometimes there are multiple interfaces defined on the machine hosting colmux and in some cases only some addresses will allow socket connections. Again, using client.pl on the remote machine try connecting back to collectl over different addresses and when you find one that works, tell colmux to use that address for communication via the -retaddr switch. .SH AUTHOR This program was written by Mark Seger (mjseger@gmail.com). .br Copyright 2016 Hewlett-Packard Development Company, L.P. .SH SEE ALSO http://collectl-utils.sourceforge.net/colmux.html collectl-4.3.1/lexpr.ph0000775000175000017500000006316213366602004013164 0ustar mjsmjs# copyright, 2003-2009 Hewlett-Packard Development Company, LP # Call with --custom "lexpr[,switches] # Debug # 1 - show all names/values, noting the timestamps are now! # 2 - just show names/values 'sent' # 4 - include real times in timestamps (useful for testing) along with skipped intervals # 8 - do not send anything # (useful when displaying normal output on terminal) # 16 - show 'x' processing our $lexInterval; my ($lexSubsys, $lexDebug, $lexCOFlag, $lexTTL, $lexFilename, $lexSumFlag); my (%lexDataLast, %lexDataMin, %lexDataMax, %lexDataTot, %lexTTL); my ($lexColInt, $lexSendCount, $lexFlags); my ($lexMinFlag, $lexMaxFlag, $lexAvgFlag, $lexTotFlag)=(0,0,0,0); my $lexOneTB=1024*1024*1024*1024; my $lexSamples=0; my $lexOutputFlag=1; my $lexFirstInt=1; my $lexAlignFlag=0; my $lexCounter; my $lexExtName=''; sub lexprInit { error('lexpr is an export, not an import') if $import=~/lexpr/; error('--showcolheader not supported by lexpr') if $showColFlag; # on the odd chance someone did -s-all and have other ways to generate data, collectl # hasn't yet hit the code that resets $subsys so we have to do it here. $subsys='' if $userSubsys eq '-all'; # Defaults for options $lexDebug=$lexCOFlag=0; $lexFilename=''; $lexInterval=''; $lexSubsys=$subsys; $lexTTL=5; foreach my $option (@_) { my ($name, $value)=split(/=/, $option, 2); # in case more than 1 = in single option string error("invalid lexpr option '$name'") if $name!~/^[dfhisx]?$|^align$|^co$|^ttl$|^min$|^max$|^avg$|^tot$/; $lexAlignFlag=1 if $name eq 'align'; $lexCOFlag=1 if $name eq 'co'; $lexDebug=$value if $name eq 'd'; $lexFilename=$value if $name eq 'f'; $lexInterval=$value if $name eq 'i'; $lexSubsys=$value if $name eq 's'; $lexExtName=$value if $name eq 'x'; $lexTTL=$value if $name eq 'ttl'; $lexMinFlag=1 if $name eq 'min'; $lexMaxFlag=1 if $name eq 'max'; $lexAvgFlag=1 if $name eq 'avg'; $lexTotFlag=1 if $name eq 'tot'; help() if $name eq 'h'; last if $lexExtName ne ''; } # If importing data, and if not reporting anything else, $subsys will be '' $lexSumFlag=$lexSubsys=~/[cdfilmnstxE]/ ? 1 : 0; # s= disables ALL subsys, only makes sense with imports error("lexpr subsys options '$lexSubsys' not a proper subset of '$subsys'") if $subsys ne '' && $lexSubsys ne '' && $lexSubsys!~/^[$subsys]+$/; error("lexpr cannot write a snapshot file and use a socket at the same time") if $sockFlag && $lexFilename ne ''; # Using -f and f= will not result in raw or plot file so need this message. error ("using lexpr option 'f=' AND -f requires -P and/or --rawtoo") if $lexFilename ne '' && $filename ne '' && !$plotFlag && !$rawtooFlag; # if -f, use that dirname/L for snampshot file; otherwise use f= for it. $lexFilename=(-d $filename) ? "$filename/L" : dirname($filename)."/L" if $lexFilename eq '' && $filename ne ''; $lexFlags=$lexMinFlag+$lexMaxFlag+$lexAvgFlag|$lexTotFlag; error("only 1 of 'min', 'max', 'avg' or 'tot' with 'lexpr'") if $lexFlags>1; # check for consistent intervals in interactive mode if ($playback eq '') { $lexColInt=(split(/:/, $interval))[0]; $lexInterval=$lexColInt if $lexInterval eq ''; $lexSendCount=int($lexInterval/$lexColInt); error("lexpr interval of '$lexInterval' is not a multiple of collectl interval of '$lexColInt' seconds") if $lexColInt*$lexSendCount != $lexInterval; error("'min', 'max', 'avg' & 'tot' require lexpr 'i' that is > collectl's -i") if $lexFlags && $lexSendCount==1; if ($lexAlignFlag) { my $div1=int(60/$lexColInt); my $div2=int($lexColInt/60); error("'align' requires collectl interval be a factor or multiple of 60 seconds") if ($lexColInt<=60 && $div1*$lexColInt!=60) || ($lexColInt>60 && $div2*60!=$lexColInt); error("'align' only makes sense when multiple samples/interval") if $lexInterval<=$lexColInt; error("'lexpr,align' requires -D or --align") if !$alignFlag && !$daemonFlag; } } if ($lexExtName ne '') { # build up swiches from EVERYTHING seen after x= my $xSeen=0; my $switches=''; foreach my $option (@_) { $xSeen=1 if $option=~/^x/; $switches.="$option," if $xSeen && $option!~/^x/; } $switches=~s/,$//; ($lexExtName, $switches)=(split(/:/, $lexExtName, 2))[0,1] if $lexExtName=~/:/; # backwards compatibility with : for switches $lexExtBase=$lexExtName; $lexExtBase=~s/\..*//; # in case extension $lexExtName.='.ph' if $lexExtName!~/\./; #print "NAME: $lexExtName Switches: $switches\n"; $tempName=$lexExtName; # name for error message before prepending with directory $lexExtName="$ReqDir/$lexExtName" if !-e $lexExtName; if (!-e "$lexExtName") { my $temp="can't find lexpr extension file '$tempName' in ./"; $temp.=" OR $ReqDir/" if $ReqDir ne '.'; error($temp); } require $lexExtName; print "$lexExtName loaded\n" if $lexDebug & 16; # rather than pass an undefined switch, if not there don't pass anything my $initName="${lexExtBase}Init"; if (defined($switches)) { print "$initName($switches)\n" if $lexDebug & 16; &$initName($switches); } else { &$initName(); } } # need to reset here in case processing multiple files $lexCounter=0; } sub lexpr { # since our init routine gets call BEFORE playback processing we have to wait until first interval to do this if ($lexFirstInt && $playback ne '') { # you might be able to align with data collected with --align or -D, but I'd rather discourage this $lexColInt=(split(/:/, $recInterval))[0]; $lexInterval=$lexColInt if $lexInterval eq ''; $lexSendCount=int($lexInterval/$lexColInt); error("lexpr interval of '$lexInterval' is not a multiple of recorded interval of '$lexColInt' seconds") if $lexColInt*$lexSendCount != $lexInterval; error("'align' not supported with -p") if $lexAlignFlag; error("'min', 'max', 'avg' & 'tot' require lexpr 'i' that is > collectl's -i") if $lexFlags && $lexSendCount==1; } $lexFirstInt=0; # if not time to print and we're not doing min/max/avg/tot, there's nothing to do. # BUT if align, always make sure time aligns to top of minute based on i= and NOT sendCount $lexCounter++; $lexSamples++; $lexOutputFlag=(($lexCounter % $lexSendCount) ==0) ? 1 : 0 if !$lexAlignFlag; $lexOutputFlag=(!(int($lastSecs[$rawPFlag]) % $lexInterval)) ? 1 : 0 if $lexAlignFlag; #print "Align: $lexAlignFlag Counter: $lexCounter LexSend: $lexSendCount Last: $lastSecs[$rawPFlag] Output: $lexOutputFlag\n"; return if (!$lexOutputFlag && $lexFlags==0); my ($cpuSumString,$cpuDetString)=('',''); if ($lexSubsys=~/c/i) { if ($lexSubsys=~/c/) { # CPU utilization is a % and we don't want to report fractions my $i=$NumCpus; $cpuSumString.=sendData("cputotals.num", $i, 1); $cpuSumString.=sendData("cputotals.user", $userP[$i], 1); $cpuSumString.=sendData("cputotals.nice", $niceP[$i], 1); $cpuSumString.=sendData("cputotals.sys", $sysP[$i], 1); $cpuSumString.=sendData("cputotals.wait", $waitP[$i], 1); $cpuSumString.=sendData("cputotals.irq", $irqP[$i], 1); $cpuSumString.=sendData("cputotals.soft", $softP[$i], 1); $cpuSumString.=sendData("cputotals.steal", $stealP[$i], 1); $cpuSumString.=sendData("cputotals.idle", $idleP[$i], 1); # These 2 are redundant, but also handy $cpuSumString.=sendData("cputotals.systot", $sysP[$i]+$irqP[$i]+$softP[$i]+$stealP[$i], 1); $cpuSumString.=sendData("cputotals.usertot", $userP[$i]+$niceP[$i], 1); $cpuSumString.=sendData("cputotals.total", $sysP[$i]+$irqP[$i]+$softP[$i]+$stealP[$i]+$userP[$i]+$niceP[$i], 1); $cpuSumString.=sendData("ctxint.ctx", $ctxt/$intSecs); $cpuSumString.=sendData("ctxint.int", $intrpt/$intSecs); $cpuSumString.=sendData("proc.creates", $proc/$intSecs); $cpuSumString.=sendData("proc.runq", $loadQue, 1); $cpuSumString.=sendData("proc.run", $loadRun, 1); $cpuSumString.=sendData("cpuload.avg1", $loadAvg1, 1, '%4.2f'); $cpuSumString.=sendData("cpuload.avg5", $loadAvg5, 1, '%4.2f'); $cpuSumString.=sendData("cpuload.avg15", $loadAvg15, 1,'%4.2f'); } if ($lexSubsys=~/C/) { for (my $i=0; $i<$NumCpus; $i++) { $cpuDetString.=sendData("cpuinfo.user.cpu$i", $userP[$i], 1); $cpuDetString.=sendData("cpuinfo.nice.cpu$i", $niceP[$i], 1); $cpuDetString.=sendData("cpuinfo.sys.cpu$i", $sysP[$i], 1); $cpuDetString.=sendData("cpuinfo.wait.cpu$i", $waitP[$i], 1); $cpuDetString.=sendData("cpuinfo.irq.cpu$i", $irqP[$i], 1); $cpuDetString.=sendData("cpuinfo.soft.cpu$i", $softP[$i], 1); $cpuDetString.=sendData("cpuinfo.steal.cpu$i", $stealP[$i], 1); $cpuDetString.=sendData("cpuinfo.idle.cpu$i", $idleP[$i], 1); $cpuDetString.=sendData("cpuinfo.intrpt.cpu$i", $intrptTot[$i], 1); # sys and user can be useful too $cpuDetString.=sendData("cputotals.systot.cpu$i", $sysP[$i]+$irqP[$i]+$softP[$i]+$stealP[$i], 1); $cpuDetString.=sendData("cputotals.usertot.cpu$i", $userP[$i]+$niceP[$i], 1); } } } my ($diskSumString,$diskDetString)=('',''); if ($lexSubsys=~/d/i) { if ($lexSubsys=~/d/) { $diskSumString.=sendData("disktotals.reads", $dskReadTot/$intSecs); $diskSumString.=sendData("disktotals.readkbs", $dskReadKBTot/$intSecs); $diskSumString.=sendData("disktotals.writes", $dskWriteTot/$intSecs); $diskSumString.=sendData("disktotals.writekbs", $dskWriteKBTot/$intSecs); } if ($lexSubsys=~/D/) { for (my $i=0; $i<@dskOrder; $i++) { # preserve display order but skip any disks not seen this interval $dskName=$dskOrder[$i]; next if !defined($dskSeen[$i]); next if ($dskFiltKeep eq '' && $dskName=~/$dskFiltIgnore/) || ($dskFiltKeep ne '' && $dskName!~/$dskFiltKeep/); $diskDetString.=sendData("diskinfo.reads.$dskName", $dskRead[$i]/$intSecs); $diskDetString.=sendData("diskinfo.readkbs.$dskName", $dskReadKB[$i]/$intSecs); $diskDetString.=sendData("diskinfo.readw.$dskName", $dskWaitR[$i]/$intSecs); $diskDetString.=sendData("diskinfo.writes.$dskName", $dskWrite[$i]/$intSecs); $diskDetString.=sendData("diskinfo.writekbs.$dskName", $dskWriteKB[$i]/$intSecs); $diskDetString.=sendData("diskinfo.writew.$dskName", $dskWaitW[$i]/$intSecs); $diskDetString.=sendData("diskinfo.quelen.$dskName", $dskQueLen[$i]/$intSecs); $diskDetString.=sendData("diskinfo.wait.$dskName", $dskWait[$i]/$intSecs); $diskDetString.=sendData("diskinfo.svctime.$dskName", $dskSvcTime[$i]/$intSecs); $diskDetString.=sendData("diskinfo.util.$dskName", $dskUtil[$i]/$intSecs); } } } my $nfsString=''; if ($lexSubsys=~/f/) { if ($nfsSFlag) { $nfsString.=sendData("nfsinfo.Sread", $nfsSReadsTot/$intSecs); $nfsString.=sendData("nfsinfo.Swrite", $nfsSWritesTot/$intSecs); $nfsString.=sendData("nfsinfo.Smeta", $nfsSMetaTot/$intSecs); $nfsString.=sendData("nfsinfo.Scommit",$nfsSCommitTot/$intSecs); } if ($nfsCFlag) { $nfsString.=sendData("nfsinfo.Cread", $nfsCReadsTot/$intSecs); $nfsString.=sendData("nfsinfo.Cwrite", $nfsCWritesTot/$intSecs); $nfsString.=sendData("nfsinfo.Cmeta", $nfsCMetaTot/$intSecs); $nfsString.=sendData("nfsinfo.Ccommit",$nfsCCommitTot/$intSecs); } } my $inodeString=''; if ($lexSubsys=~/i/) { $inodeString.=sendData("inodeinfo.dentrynum", $dentryNum, 1); $inodeString.=sendData("inodeinfo.dentryunused", $dentryUnused, 1); $inodeString.=sendData("inodeinfo.filesalloc", $filesAlloc, 1); $inodeString.=sendData("inodeinfo.filesmax", $filesMax, 1); $inodeString.=sendData("inodeinfo.inodeused", $inodeUsed, 1); } # No lustre details, at least not for now... my $lusSumString=''; if ($lexSubsys=~/l/) { if ($CltFlag) { $lusSumString.=sendData("lusclt.reads", $lustreCltReadTot/$intSecs); $lusSumString.=sendData("lusclt.readkbs", $lustreCltReadKBTot/$intSecs); $lusSumString.=sendData("lusclt.writes", $lustreCltWriteTot/$intSecs); $lusSumString.=sendData("lusclt.writekbs", $lustreCltWriteKBTot/$intSecs); $lusSumString.=sendData("lusclt.numfs", $NumLustreFS, 1); } if ($MdsFlag) { my $getattrPlus=$lustreMdsGetattr+$lustreMdsGetattrLock+$lustreMdsGetxattr; my $setattrPlus=$lustreMdsReintSetattr+$lustreMdsSetxattr; my $varName=($cfsVersion lt '1.6.5') ? 'reint' : 'unlink'; my $varVal= ($cfsVersion lt '1.6.5') ? $lustreMdsReint : $lustreMdsReintUnlink; $lusSumString.=sendData('lusmds.gattrP', $getattrPlus/$intSecs); $lusSumString.=sendData('lusmds.sattrP', $setattrPlus/$intSecs); $lusSumString.=sendData('lusmds.sync', $lustreMdsSync/$intSecs); $lusSumString.=sendData("lusmds.$varName", $varVal/$intSecs); } if ($OstFlag) { $lusSumString.=sendData("lusost.reads", $lustreReadOpsTot/$intSecs); $lusSumString.=sendData("lusost.readkbs", $lustreReadKBytesTot/$intSecs); $lusSumString.=sendData("lusost.writes", $lustreWriteOpsTot/$intSecs); $lusSumString.=sendData("lusost.writekbs", $lustreWriteKBytesTot/$intSecs); } } my ($memString, $memDetString)=('',''); if ($lexSubsys=~/m/i) { if ($lexSubsys=~/m/) { $memString.=sendData("meminfo.tot", $memTot, 1); $memString.=sendData("meminfo.used", $memUsed, 1); $memString.=sendData("meminfo.free", $memFree, 1); $memString.=sendData("meminfo.shared", $memShared, 1); $memString.=sendData("meminfo.buf", $memBuf, 1); $memString.=sendData("meminfo.cached", $memCached, 1); $memString.=sendData("meminfo.slab", $memSlab, 1); $memString.=sendData("meminfo.map", $memMap, 1); $memString.=sendData("meminfo.anon", $memAnon, 1); $memString.=sendData("meminfo.anonH", $memAnonH, 1); $memString.=sendData("meminfo.dirty", $memDirty, 1); $memString.=sendData("meminfo.locked", $memLocked, 1); $memString.=sendData("meminfo.inactive", $memInact, 1); $memString.=sendData("meminfo.hugetot", $memHugeTot, 1); $memString.=sendData("meminfo.hugefree", $memHugeFree, 1); $memString.=sendData("meminfo.hugersvd", $memHugeRsvd, 1); $memString.=sendData("meminfo.sunreclaim", $memSUnreclaim, 1); $memString.=sendData("swapinfo.total", $swapTotal, 1); $memString.=sendData("swapinfo.free", $swapFree, 1); $memString.=sendData("swapinfo.used", $swapUsed, 1); $memString.=sendData("swapinfo.in", $swapin/$intSecs); $memString.=sendData("swapinfo.out", $swapout/$intSecs); $memString.=sendData("pageinfo.fault", $pagefault/$intSecs); $memString.=sendData("pageinfo.majfault", $pagemajfault/$intSecs); $memString.=sendData("pageinfo.in", $pagein/$intSecs); $memString.=sendData("pageinfo.out", $pageout/$intSecs); } if ($lexSubsys=~/M/) { for (my $i=0; $i<$CpuNodes; $i++) { foreach my $field ('used', 'free', 'slab', 'map', 'anon', 'anonH', 'lock', 'act', 'inact') { $memDetString.=sendData("numainfo.$field.$i", $numaMem[$i]->{$field}, 1); } } } } my ($netSumString,$netDetString)=('',''); if ($lexSubsys=~/n/i) { if ($lexSubsys=~/n/) { $netSumString.=sendData("nettotals.kbin", $netRxKBTot/$intSecs); $netSumString.=sendData("nettotals.pktin", $netRxPktTot/$intSecs); $netSumString.=sendData("nettotals.kbout", $netTxKBTot/$intSecs); $netSumString.=sendData("nettotals.pktout", $netTxPktTot/$intSecs); } if ($lexSubsys=~/N/) { for ($i=0; $i<@netOrder; $i++) { $netName=$netOrder[$i]; next if !defined($netSeen[$i]); next if ($netFiltKeep eq '' && $netName=~/$netFiltIgnore/) || ($netFiltKeep ne '' && $netName!~/$netFiltKeep/); next if $netName=~/lo|sit/; $netDetString.=sendData("netinfo.kbin.$netName", $netRxKB[$i]/$intSecs); $netDetString.=sendData("netinfo.pktin.$netName", $netRxPkt[$i]/$intSecs); $netDetString.=sendData("netinfo.kbout.$netName", $netTxKB[$i]/$intSecs); $netDetString.=sendData("netinfo.pktout.$netName", $netTxPkt[$i]/$intSecs); } } } my $sockString=''; if ($lexSubsys=~/s/) { $sockString.=sendData("sockinfo.used", $sockUsed, 1); $sockString.=sendData("sockinfo.tcp", $sockTcp, 1); $sockString.=sendData("sockinfo.orphan", $sockOrphan, 1); $sockString.=sendData("sockinfo.tw", $sockTw, 1); $sockString.=sendData("sockinfo.alloc", $sockAlloc, 1); $sockString.=sendData("sockinfo.mem", $sockMem, 1); $sockString.=sendData("sockinfo.udp", $sockUdp, 1); $sockString.=sendData("sockinfo.raw", $sockRaw, 1); $sockString.=sendData("sockinfo.frag", $sockFrag, 1); $sockString.=sendData("sockinfo.fragm", $sockFragM, 1); } my $tcpString=''; if ($lexSubsys=~/t/) { $tcpString.=sendData("tcpinfo.iperrs", $ipErrors/$intSecs) if $tcpFilt=~/i/; $tcpString.=sendData("tcpinfo.tcperrs", $tcpErrors/$intSecs) if $tcpFilt=~/t/; $tcpString.=sendData("tcpinfo.udperrs", $udpErrors/$intSecs) if $tcpFilt=~/u/; $tcpString.=sendData("tcpinfo.icmperrs", $icmpErrors/$intSecs) if $tcpFilt=~/c/; $tcpString.=sendData("tcpinfo.tcpxerrs", $tcpExErrors/$intSecs) if $tcpFilt=~/T/; } my ($intSumString,$intDetString)=('',''); if ($lexSubsys=~/x/i) { if ($NumHCAs) { if ($lexSubsys=~/x/) { $kbInT= $ibRxKBTot; $pktInT= $ibRxTot; $kbOutT= $ibTxKBTot; $pktOutT=$ibTxTot; $intSumString.=sendData("iconnect.kbin", $kbInT/$intSecs); $intSumString.=sendData("iconnect.pktin", $pktInT/$intSecs); $intSumString.=sendData("iconnect.kbout", $kbOutT/$intSecs); $intSumString.=sendData("iconnect.pktout", $pktOutT/$intSecs); } if ($lexSubsys=~/X/) { for (my $i=0; $i<$NumHCAs; $i++) { $HCAName[$i]=~/(\S+?)_*$/; print "HCA: $HCAName[$i] 1: $1\n"; $intDetString.=sendData("iconnect.$1.kbin", $ibRxKB[$i]/$intSecs); $intDetString.=sendData("iconnect.$1.pktin", $ibRx[$i]/$intSecs); $intDetString.=sendData("iconnect.$1.kbout", $ibTxKB[$i]/$intSecs); $intDetString.=sendData("iconnect.$1.pktout", $ibTx[$i]/$intSecs); } } } } my $envString=''; if ($lexSubsys=~/E/i) { foreach $key (sort keys %$ipmiData) { for (my $i=0; $i{$key}}); $i++) { my $name=$ipmiData->{$key}->[$i]->{name}; my $inst=($key!~/power/ && $ipmiData->{$key}->[$i]->{inst} ne '-1') ? $ipmiData->{$key}->[$i]->{inst} : ''; $envString.=sendData("env.$name$inst", $ipmiData->{$key}->[$i]->{value}, 1, '%s'); } } } # if any imported data, it may want to include lexpr output AND we do a little more work to # separate the summary from the detail. also, in case any variables are gauges and we're doing # totals we'll need to know that as well as non-string formatting. There is a bit of magic here, # perhaps the easiest example in misc.ph where it reports the uptime as a fracion of a day. Here # it passes the summary-data formatting in ref7. Also note since it does distinguish between # summary and detail data, it you want to change the formats of both, you'd need to set ref7 and # ref8 in their appropriate sections of the printExport code. my (@nameS, @valS, @nameD, @valD, @gaugeS, @gaugeD, @fmtS, @fmtD); my ($impSumString, $impDetString)=('',''); for (my $i=0; $i<$impNumMods; $i++) { &{$impPrintExport[$i]}('l', \@nameS, \@valS, \@nameD, \@valD, \@gaugeS, \@gaugeD, \@fmtS, \@fmtD); } foreach (my $i=0; $i$lexFilename" or logmsg("F", "Couldn't create '$lexFilename'"); print EXP $lexprRec; close EXP; } $lexSamples=0; } # this code tightly synchronized with gexpr and graphite sub sendData { my $name= shift; my $value= shift; my $gauge= shift; my $format=shift; #print "Name: $name VAL: $value GAUGE: %s FORMAT: %s\n", # defined($gague) ? $gague : '', defined($format) ? $format : ''; # These are only undefined the very first time if (!defined($lexTTL{$name})) { $lexTTL{$name}=$lexTTL; $lexDataLast{$name}=-1; } # As a minor optimization, only do this when dealing with min/max/avg/tot values if ($lexFlags) { # And while this should be done in init(), we really don't know how may indexes # there are until our first pass through... if ($lexSamples==1) { $lexDataMin{$name}=$lexOneTB; $lexDataMax{$name}=0; $lexDataTot{$name}=0; } $lexDataMin{$name}=$value if $lexMinFlag && $value<$lexDataMin{$name}; $lexDataMax{$name}=$value if $lexMaxFlag && $value>$lexDataMax{$name}; $lexDataTot{$name}+=$value if $lexAvgFlag; # totals are a little different. In the case of rates, we need to multiply by the collectl # interval to get the interval total, but for gauges we're really only doing averages $lexDataTot{$name}+=(!$gauge) ? $value*$lexColInt : $value if $lexTotFlag; } return('') if !$lexOutputFlag; # A c t u a l S e n d H a p p e n s H e r e # If doing min/max/avg, reset $value if ($lexFlags) { $value=$lexDataMin{$name} if $lexMinFlag; $value=$lexDataMax{$name} if $lexMaxFlag; $value=$lexDataTot{$name} if $lexTotFlag; $value=$lexDataTot{$name}/$lexSamples if $lexAvgFlag || defined($gauge); # gauges are reported as averages } # Always send send data if not CO mode, but if so only send when it has # indeed changed OR TTL about to expire my $valSentFlag=0; my $returnString=''; if (!$lexCOFlag || $value!=$lexDataLast{$name} || $lexTTL{$name}==1) { $valSentFlag=1; $format='%d' if !defined($format); $value+=.5 if $format=~/d/; $returnString=sprintf("%s $format\n", $name, $value) unless $lexDebug & 8; $lexDataLast{$name}=$value; } # A fair chunk of work, but worth it if ($lexDebug & 3) { my ($intSeconds, $intUsecs); if ($hiResFlag) { # we have to fully qualify name because or 'require' vs 'use' ($intSeconds, $intUsecs)=Time::HiRes::gettimeofday(); } else { $intSeconds=time; $intUsecs=0; } $intUsecs=sprintf("%06d", $intUsecs); my ($sec, $min, $hour)=localtime($intSeconds); my $timestamp=sprintf("%02d:%02d:%02d.%s", $hour, $min, $sec, substr($intUsecs, 0, 3)); printf "$timestamp Name: %-20s Val: %8d TTL: %d %s\n", $name, $value, $lexTTL{$name}, ($valSentFlag) ? 'sent' : '' if $lexDebug & 1 || $valSentFlag; } # TTL only applies when in 'CO' mode, noting we already made expiration # decision above when we saw counter of 1 if ($lexCOFlag) { $lexTTL{$name}-- if !$valSentFlag; $lexTTL{$name}=$lexTTL if $valSentFlag || $lexTTL{$name}==0; } return($returnString); } sub help { my $text=< $address, PeerPort => $port, Proto => 'tcp', Timeout =>1); if (!defined($socket)) { print "Couldn't connect to server, retrying...\n"; sleep 1; next; } print "Socket opened on $address:$port\n"; $select = new IO::Select($socket); while ($socket ne '') { $buffer=''; while (my @ready=$select->can_read(10)) { $bytes=sysread($socket, $line, 100); #print "BYTES: $bytes\n"; if ($bytes==0) { print "Socket closed on other end\n"; $socket=''; last; } $buffer.=$line; @handles=($select->can_read(0)); last if scalar(@handles)==0; } print "$buffer" if $buffer ne ''; } } sub sigInt { print "Close Socket\n"; $socket->close() if defined($socket) && $socket ne ''; exit; } collectl-4.3.1/collectl0000775000175000017500000075427413366602004013240 0ustar mjsmjs#!/usr/bin/perl -w # Copyright 2003-2017 Hewlett-Packard Development Company, L.P. # # collectl may be copied only under the terms of either the Artistic License # or the GNU General Public License, which may be found in the source kit # debug # 1 - print interesting stuff # 2 - print Interconnect specific checks (mostly Infiniband) # 4 - show each line processed by record(), replaces -H # 8 - print lustre specific checks # 16 - print headers of each file processed # 32 - skip call to dataAnalyze during interactive processing # 64 - socket processing # 128 - show collectl.conf processing # 256 - show detailed pid processing (this generates a LOT of output) # 512 - show more pid details, specifically hash contents # NOTE - output from 256/512 are prefaced with %%% if from collectl.pl # and ### if from formatit.ph # 1024 - show list of SLABS to be monitored # 2048 - playback preprocessing # 4096 - report pidNew() management of pidSkip{} # 8192 - show creation of RAW, PLOT and files # 16384 - for use of perfquery for 32 bit counters # debug tricks # - use '-d36' to see each line of raw data as it would be logged but not # generate any other output # Equivalent Utilities # -s c mpstat, iostat -c, vmstat # -s C mpstat # -s d/D iostat -d # -s f/F nfsstat -c/s [c if -o C] # -s i sar -v # -s m sar -rB, free, vmstat (note - sar does pages by pagesizsie NOT bytes) # -s n/N netstat -i # -s s sar -n SOCK # -s y/Y slabtop # -s Z ps or top # Subsystems # b - buddy # c - cpu # d - disks # E - environmental # i - inodes (and other file stuff) # f - NFS # l - lustre # m - memory # n - network # s - socket # t - tcp # x - interconnect # Z - processes (-sP now available but -P taken!) use POSIX; use Config; use English; use 5.008000; use Getopt::Long; Getopt::Long::Configure ("bundling"); Getopt::Long::Configure ("no_ignore_case"); Getopt::Long::Configure ("pass_through"); use File::Basename; use Time::Local; use IO::Socket; use IO::Select; use Cwd 'abs_path'; $Cat= '/bin/cat'; $Grep= '/bin/grep'; $Egrep= '/bin/egrep'; $Ps= '/bin/ps'; $Rpm= '/bin/rpm'; $Lspci= '/sbin/lspci'; $Lctl= '/usr/sbin/lctl'; $Dmidecode= '/usr/sbin/dmidecode'; %TopProcTypes=qw(vsz '' rss '' syst '' usrt '' time '' accum '' rkb '' wkb '' iokb '' rkbc '' wkbc '' iokbc '' ioall '' rsys '' wsys '' iosys '' iocncl '' majf '' minf '' flt '' pid '' cpu '' thread '' vctx '' nctx ''); %TopSlabTypes=qw(numobj '' name '' actobj '' objsize '' numslab '' objslab '' totsize '' totchg '' totpct ''); # Constants and removing -w warnings $miniDateFlag=0; $PageSize=0; $Memory=$Swap=$Hyper=$Distro=$ProductName=''; $CpuVendor=$CpuMHz=$CpuCores=$CpuSiblings=$CpuNodes=''; $PidFile='/var/run/collectl.pid'; # default, unless --pname $PQuery=$PQopt=$PCounter=$VStat=$IBVersion=$OfedInfo=''; $numBrwBuckets=$cfsVersion=$sfsVersion=''; $Resize=$IpmiCache=$IpmiTypes=$ipmiExec=''; $i1DataFlag=$i2DataFlag=$i3DataFlag=0; $lastSecs=$interval2Print=0; $diskChangeFlag=$cpuDisabledFlag=$cpusDisabled=$cpusEnabled=$noCpusFlag=0; $boottime=0; # only used once here, but set in formatit.ph our %netSpeeds; # Find out ASAP if we're linux or WNT based $PcFlag=($Config{"osname"}=~/MSWin32/) ? 1 : 0; # If we ever want to write something to /var/log/messages, we need this which # may not always be installed $syslogFlag=(eval {require "Sys/Syslog.pm" or die}) ? 1 : 0; # Always nice to know if we're root $rootFlag=(!$PcFlag && `whoami`=~/root/) ? 1 : 0; $SrcArch= $Config{"archname"}; $Version= '4.3.1-1'; $Copyright='Copyright 2003-2018 Hewlett-Packard Development Company, L.P.'; $License= "collectl may be copied only under the terms of either the Artistic License\n"; $License.= "or the GNU General Public License, which may be found in the source kit"; # set up constants to our exe, location and program name root $ExeName=abs_path($0); $BinDir=dirname($ExeName); $Program=basename($ExeName); $Program=~s/\.pl$//; # remove extension for production # Note that if someone redirects stdin or runs it out of a script it will look like # we're in the background. We also need to know if STDOUT connected to a terminal. if (!$PcFlag) { $MyDir=`pwd`; $Cat= 'cat'; $Sep= '/'; $backFlag=(getpgrp()!=tcgetpgrp(0)) ? 1 : 0; $termFlag= (-t STDOUT) ? 1 : 0; } else { $MyDir=`cd`; $Cat= 'type'; $Sep= '\\'; $backFlag=0; $termFlag=0; } chomp $MyDir; # This is a little messy. In playback mode of process data, we want to use # usernames instead of UIDs, so we need to know if we need to know if it's # the same node and hence we need our name. This could be different than $Host # which was recorded with the data file and WILL override in playback mode. # We also need our host name before calling initRecord() so we can log it at # startup as well as for naming the logfile. $myHost=($PcFlag) ? `hostname` : `/bin/hostname`; $myHost=(split(/\./, $myHost))[0]; chomp $myHost; $Host=$myHost; # may be overkill, but we want to throttle max errors/day to prevent runaway. $zlibErrors=0; # These variables only used once in this module and hence generate warnings undef %disks; undef @HCAName; undef @HCAPorts; undef @HCAOpaV4; undef @HCAId; undef %networks; undef @dskIndexAvail; undef @netIndexAvail; undef @lustreCltDirs; undef @lustreCltOstDirs; undef @lustreOstSubdirs; undef %playbackSettings; $recHdr1=$recHeader=$miniDateTime=$miniFiller=$DaemonOptions=''; $OstNames=$MdsNames=$LusDiskNames=$LusDiskDir=''; $NumLustreCltOsts=$NumLusDisks=$MdsFlag=0; $NumSlabs=$SlabGetProc=$newSlabFlag=0; $wideFlag=$coreFlag=$newRawSlabFlag=0; $totalCounter=$separatorCounter=0; $NumCpus=$HZ=''; $NumOst=$NumBud=0; $FS=$ScsiInfo=$HCAPortStates=''; $SlabVersion=''; $dentryFlag=$inodeFlag=$filenrFlag=$allThreadFlag=$procCmdWidth=0; $clr=$clscr=$cleol=$home=''; $netIndexNext=$dskSeenCount=$dskSeenLast=0; $netIndexNext=$netSeenCount=$netSeenLast=0; # This tells us we have not yet made our first pass through the data # collection loop and gets reset to 0 at the bottom. $firstPass=1; # Check the switches to make sure none requiring -- were specified with - # since getopts doesn't! Also save the list of switches we were called with. $cmdSwitches=preprocSwitches(); # These are the defaults for interactive and daemon subsystems $SubsysDefInt='cdn'; $SubsysDefDaemon='bcdfijmnstx'; # We want to load any default settings so that user can selectively # override them. We're giving these starting values in case not # enabled in .conf file. We later override subsys if interactive $SubsysDef=$SubsysCore=$SubsysDefDaemon; $Interval= 10; $Interval2= 60; $Interval3= 120; $LimSVC= 30; $LimIOS= 10 ; $LimLusKBS= 100; $LimLusReints=1000; $LimBool= 0; $Port= 2655; $Timeout= 10; $MaxZlibErrors=20; $LustreSvcLunMax=10; $LustreMaxBlkSize=512; $LustreConfigInt=1; $InterConnectInt=900; $TermHeight=24; $DefNetSpeed=10000; $TimeHiResCheck=1; $PasswdFile='/etc/passwd'; $umask=''; $DiskMaxValue=-1; # disabled # NOTE - the following line should match what is in collectl.conf. If uncommented there, it will be replaced $DiskFilter='hd[ab] | sd[a-z]+ |dm-\d+ |xvd[a-z] |fio[a-z]+ | vd[a-z]+ |emcpower[a-z]+ |psv\d+ |nvme\d+n\d+ '; $DiskFilterFlag=0; # only set when filter set in collectl.conf $ProcReadTest='yes'; # Standard locations $SysIB='/sys/class/infiniband'; # These aren't user settable but are needed to build the list of ALL valid # subsystems $SubsysDet= "BCDEFJMNTXYZ"; $SubsysExcore='y'; # These are the subsystems allowed in brief mode $BriefSubsys="bcdfijlmnstx"; # And the default environmentals $envOpts='fpt'; $envRules=''; $envDebug=0; $envTestFile=''; $envFilt=$envRemap=''; $hiResFlag=0; # must be initialized before ANY calls to error/logmsg $configFile=''; $ConfigFile='collectl.conf'; $daemonFlag=$debug=$formatitLoaded=0; GetOptions('C=s' => \$configFile, 'D!' => \$daemonFlag, 'd=i' => \$debug, 'config=s' => \$configFile, 'daemon!' => \$daemonFlag, 'debug=i' => \$debug ) or error("type -h for help"); # if config file specified and a directory, prepend to default name otherwise # use the whole thing as the name. $filename=''; $configFile.="/$ConfigFile" if $configFile ne '' && -d $configFile;; loadConfig(); # Very unlikely but I hate programs that silently exit. We have to figure out # where formatit.ph lives, which we're allowing to be in one of 4 places, # noting our first choice always being '$BinDir'. # formatit can exist in 3 places, either in the default location of # /usr/share/collectl, the share/collectl directory at the same # level as the binary OR if there's a copy in the local directoy # that always overrides the default location for development/testing. my $oneUp=dirname($BinDir); $ReqDir='/usr/share/collectl'; $ReqDir="$oneUp/share/collectl" if -e "$oneUp/share/collectl/formatit.ph"; $ReqDir=$BinDir if -e "$BinDir/formatit.ph"; # either $ReqDir reported to a directory where formatit.ph lives OR its still # pointing to the default location in which case it better be there. if (!-e "$ReqDir/formatit.ph") { my $msg="can't find formatit.ph in '$BinDir'"; $msg.=" OR '/usr/share/collectl' OR '$oneUp/share/collectl'" if !$PcFlag; print "$msg\n"; # can't call logmsg() before formatit.ph not yet loaded logsys($msg,1); # force it because $filename not yet set exit(1); } # now we can load formatit.ph and any other include files which MUST be # in the same directory as formatit.ph print "BinDir: $BinDir ReqDir: $ReqDir\n" if $debug & 1; require "$ReqDir/formatit.ph"; $formatitLoaded=1; # finally try to load these two, both of which are optional # though included in most distros $zlibFlag= (eval {require "Compress/Zlib.pm" or die}) ? 1 : 0; $hiResFlag= (eval {require "Time/HiRes.pm" or die}) ? 1 : 0; # These can get overridden after loadConfig(). Others can as well but this is # a good place to reset those that don't need any further manipulation $limSVC=$LimSVC; $limIOS=$LimIOS; $limBool=$LimBool; $limLusKBS=$LimLusKBS; $limLusReints=$LimLusReints; $termHeight=$TermHeight; # On LINUX and only if associated with a terminal in the foreground and we can find 'resize', # use the value of LINES to set the terminal height if (!$PcFlag && !$daemonFlag && !$backFlag && $Resize ne '' && $termFlag && defined($ENV{TERM}) && $ENV{TERM}=~/xterm/) { # IF the user typed a CR after collectl started but before it started, flush input buffer my $selTemp=new IO::Select(STDIN); while ($selTemp->can_read(0)) { my $temp=; } $selTemp->remove(); `$Resize`=~/LINES.*?(\d+)/m; $termHeight=$1; } # let's also see if there is a terminal attached. this is currently only # an issue for 'brief mode', but we may need to know some day for other # reasons too. but PCs can only run on a terminal... $termFlag=(open TMP, " \$alignFlag, 'A=s' => \$address, 'address=s' => \$address, 'c=i' => \$count, 'count=i' => \$count, 'f=s' => \$filename, 'filename=s' => \$filename, 'F=i' => \$flush, 'flush=i' => \$flush, 'tworaw!' => \$tworawFlag, 'home!' => \$homeFlag, 'i=s' => \$userInterval, 'interval=s' => \$userInterval, 'h!' => \$hSwitch, 'help!' => \$hSwitch, 'iosize!' => \$ioSizeFlag, 'l=s' => \$limits, 'limits=s' => \$limits, 'L=s' => \$lustreSvcs, 'lustsvcs=s' => \$lustreSvcs, 'm!' => \$msgFlag, 'messages!' => \$msgFlag, 'o=s' => \$userOptions, 'options=s' => \$userOptions, 'N!' => \$niceFlag, 'nice!' => \$niceFlag, 'nohup!' => \$nohupFlag, 'passwd=s' => \$passwdFile, 'p=s' => \$playback, 'playback=s' => \$playback, 'P!' => \$plotFlag, 'quiet!' => \$quietFlag, 'plot!' => \$plotFlag, 'r=s' => \$rollLog, 'rolllogs=s' => \$rollLog, 'R=s' => \$runTime, 'runtime=s' => \$runTime, 's=s' => \$userSubsys, 'sep=s' => \$SEP, 'stats!' => \$statsFlag, 'statopts=s' => \$statOpts, 'subsys=s' => \$userSubsys, 'top=s' => \$topOpts, 'utc!' => \$utcFlag, 'umask=s' => \$umask, 'utime=i' => \$utimeMask, 'v!' => \$vSwitch, 'version!' => \$vSwitch, 'V!' => \$VSwitch, 'showdefs!' => \$VSwitch, 'w!' => \$wideFlag, 'x!' => \$xSwitch, 'helpextend!'=> \$xSwitch, 'X!' => \$XSwitch, 'helpall!' => \$XSwitch, 'slabfilt=s' => \$slabFilt, 'procfilt=s' => \$procFilt, 'all!' => \$allFlag, 'ALL!' => \$AllFlag, 'comment=s' => \$comment, 'cpuopts=s' => \$cpuOpts, 'cpufilt=s' => \$cpuFilt, 'dskfilt=s' => \$dskFilt, 'dskopts=s' => \$dskOpts, 'dskremap=s' => \$dskRemap, 'export=s' => \$export, 'from=s' => \$from, 'full!' => \$fullFlag, 'thru=s' => \$thru, 'headerrepeat=i'=> \$headerRepeat, 'hr=i' => \$headerRepeat, 'import=s' => \$import, 'intfilt=s' => \$intFilt, 'lustopts=s' => \$lustOpts, 'memopts=s' => \$memOpts, 'netfilt=s' => \$netFilt, 'netopts=s' => \$netOpts, 'nfsopts=s' => \$nfsOpts, 'nfsfilt=s' => \$nfsFilt, 'envopts=s' => \$userEnvOpts, 'envrules=s' => \$envRules, 'envdebug!' => \$envDebug, 'envtest=s' => \$envTestFile, 'envfilt=s' => \$envFilt, 'envremap=s' => \$envRemap, 'extract=s' => \$extract, 'grep=s' => \$grepPattern, 'offsettime=s' => \$offsetTime, 'pname=s' => \$pname, 'procanalyze!' => \$procAnalFlag, 'procopts=s' => \$procOpts, 'procstate=s' => \$procState, 'rawtoo!' => \$rawtooFlag, 'rawdskfilter=s'=> \$rawDskFilter, 'rawdskignore=s'=> \$rawDskIgnore, 'rawnetfilter=s'=> \$rawNetFilter, 'rawnetignore=s'=> \$rawNetIgnore, 'runas=s' => \$runas, 'showsubsys!' => \$showSubsysFlag, 'showoptions!' => \$showOptionsFlag, 'showsubopts!' => \$showSuboptsFlag, 'showtopopts!' => \$showTopoptsFlag, 'showheader!' => \$showHeaderFlag, 'showcolheaders!' =>\$showColFlag, 'showslabaliases!' =>\$showSlabAliasesFlag, 'showrootslabs!' =>\$showRootSlabsFlag, 'slabanalyze!' => \$slabAnalFlag, 'slabopts=s' => \$slabOpts, 'tcpfilt=s' => \$tcpFilt, 'verbose!' => \$verboseFlag, 'vmstat!' => \$vmstatFlag, 'whatsnew!' => \$whatsnewFlag, 'xopts=s' => \$xOpts, ) or error("type -h for help"); # This needs to be done BEFORE processing --pname since we end up changing $PidFile if ($runas ne '') { error("canot use --runas without -D") if !$daemonFlag; # temporariluy disable daemon mode in debug mode so we can see messages on terminal. $daemonFlag=0 if $debug; my ($runasUser,$runasGroup)=split(/:/, $runas); error("--runas must at least specify a user") if $runasUser eq ''; if ($runasUser!~/^\d+$/) { $runasUid=(split(/:/, `grep ^$runasUser: /etc/passwd`))[2]; error("can't find '$runasUser' in /etc/passwd. Consider UID.") if !defined($runasUid); } if (defined($runasGroup) && $runasGroup!~/^\d+$/) { $runasGid=(split(/:/, `grep ^$runasGroup: /etc/group`))[2]; error("can't find '$runasGroup' in /etc/group. Consider GID.") if !defined($runasGid); } $runasUid=$runasUser if $runasUser=~/^\d+/; $runasGid=$runasGroup if defined($runasGroup) && $runasGroup=~/^\d+/; # let's make sure the owner/group of the logging directory match my $logdir=dirname("$filename/collectl"); ($uid,$gid)=(stat($logdir))[4,5]; error("Ownership of '$logdir' doesn't match '$runas'") if ($uid!=$runasUid) || (defined($runasGid) && $gid!=$runasGid); # Daemon also means --nohup $daemonFlag=$nohupFlag=1; } if ($pname ne '') { # We need to include switches because collectl-generic expects to find them in the process name $0="collectl-$pname $cmdSwitches"; $PidFile=~s/collectl\.pid/collectl-$pname.pid/; print "Set PName to collectl-$pname\n" if $debug & 1; } # O p e n A S o c k e t ? # It's real important we do this as soon as possible because if someone runs # us in 'client' mode, and an error occurs the server would still be hanging # around waiting for someone to connect to that socket! This way we connect, # report the error and exit and the caller is able to detect it. $sockFlag=$clientFlag=$serverFlag=0; if ($address ne '') { if ($address=~/\./) { ($address,$port,$timeout)=split(/:/, $address); $port=$Port if !defined($port) || $port eq ''; $Timeout=$timeout if defined($timeout); $socket=new IO::Socket::INET( PeerAddr => $address, PeerPort => $port, Proto => 'tcp', Timeout => $Timeout) or error("Could not create socket to $address:$port. Reason: $!") if !defined($socket); print "Socket opened on $address:$port\n" if $debug & 64; push @sockets, $socket; $clientFlag=1; } elsif ($address=~/^server/i) { ($port, $port, $options)=split(/:/, $address, 3); $port=$Port if !defined($port); # Note this socket uses a different variable because when we get # a connection we use the SAME one to talk to client as we do in # client mode. $sockServer = new IO::Socket::INET( Type=>SOCK_STREAM, Reuse=>1, Listen => 1, LocalPort => $port) || error("Could not create local socket on port $port Reason: $!"); print "Server socket opened on port $port\n" if $debug & 64; $select=new IO::Select($sockServer); $serverFlag=1; } else { logmsg('F', 'Invalid -A option'); } $sockFlag=1; } # I'm probably the only one who cares, but in -p --top -s, don't default # to a --hr of 5, use 20 $headerRepeat=20 if $topFlag && $playback ne '' && $headerRepeat==5; # If we used to trap these before we opened the socket, but then we couldn't # send the message back to the called cleanly! if ($sockFlag) { error("-p not allowed with -A") if $playback ne ''; error("-D not allowed with -A address") if $daemonFlag && !$serverFlag; } # Since the output could be intended for a socket (called from colgui/colmux), # we need to do after we open the socket. error() if $hSwitch; showVersion() if $vSwitch; showDefaults() if $VSwitch; extendHelp() if $xSwitch; showSubsys() if $showSubsysFlag; showOptions() if $showOptionsFlag; showSubopts() if $showSuboptsFlag; showTopopts() if $showTopoptsFlag; showSlabAliases($slabFilt) if $showSlabAliasesFlag || $showRootSlabsFlag; whatsnew() if $whatsnewFlag; if ($XSwitch) { extendHelp(1); showSubsys(1); showOptions(1); showSubopts(1); showTopopts(1); printText("$Copyright\n"); printText("$License\n"); exit(0); } # in playback mode all we're really doing is verifying the options setNFSFlags($nfsFilt); if ($vmstatFlag) { error("can't mix --vmstat with --export") if $vmstatFlag && $export ne ''; error("can't mix --vmstat with --all or --ALL") if $vmstatFlag && ($allFlag || $AllFlag); $export='vmstat'; } # --full both forces verbose and ultimately forces RECORD headers as well $verboseFlag=1 if $fullFlag; error("can't use --export with --verbose") if $verboseFlag && $export ne ''; error("can't use -P with --verbose") if $verboseFlag && $plotFlag; error("can't use -f with --verbose") if $verboseFlag && $filename ne ''; error("--utime requires HiRes timer") if $utimeMask && !$hiResFlag; error("--utime requires -f") if $utimeMask && $filename eq ''; error("max value for --utime is 7") if $utimeMask>7; # --all is shortcut for all summary data if ($allFlag) { error("can't mix -s with -all") if $userSubsys ne ''; $userSubsys="$SubsysCore$SubsysExcore"; $userSubsys=~s/y//; } elsif ($AllFlag) { error("can't mix -s with -ALL") if $userSubsys ne ''; $userSubsys="$SubsysCore$SubsysExcore$SubsysDet"; $userSubsys=~s/y//; $userSubsys=~s/T// if !$plotFlag && $filename eq ''; } # As part of the conversion to getopt::long, we need to know the actual switch # values as entered by the user. Those are stored in '$userXXX' and then that # is treated as one used to handle opt_XXX. $options= $userOptions; $interval=($userInterval ne '') ? $userInterval : $Interval; $subsys= ($userSubsys ne '') ? $userSubsys : $SubsysCore; error('invalid value for --lustopts') if $lustOpts ne '' && $lustOpts!~/^[BDMOR]+$/; error('invalid value for --nfsopts') if $nfsOpts ne '' && $nfsOpts ne 'z'; error('invalid value for --memopts') if $memOpts ne '' && $memOpts!~/^[pPsRV]+$/; error('--memopts R cannot be user with any of [psPV]') if $memOpts=~/R/ && $memOpts=~/[psPV]/; error("--tcpfilt only applies to -st or -sT") if $tcpFilt ne '' && $subsys!~/t/i; error("only valid --tcpopts values are 'cituIT'") if $tcpFilt ne '' && $tcpFilt!~/^[cituIT]+$/; $tcpFilt=$tcpFiltDefault if $tcpFilt eq '' && $playback eq ''; # NOTE - technically we could allow fractional polling intervals without # HiRes, but then we couldn't properly report the times. if ($interval=~/\./ && !$hiResFlag) { $interval=int($interval+.5); $interval=1 if $interval==0; print "need to install HiRes to use fractional intervals, so rounding to $interval\n"; } # ultimately we only use when doing process data error("password file '$passwdFile' doesn't exist") if $passwdFile ne '' && !-e $passwdFile; $passwdFile=$PasswdFile if $passwdFile eq ''; # S u b s y s / I n t e r v a l R e s o l u t i o n # This needs to get done as soon a possible... # Set default interval and subsystems for interactive mode unless already # set, noting the default values above are for daemon mode. To be consistent, # we also need to reset $Interval and $SubsysDef noting if one sets a # secondary interval but not the primary, we need to prepend it with 1 and # keep the secondary if (!$daemonFlag) { $interval=$Interval=1 if $userInterval eq '' && !$showColFlag; if ($showColFlag) { error('-c conflicts with --showcolheaders') if $count!=-1; error('-i conflicts with --showcolheaders') if $userInterval ne ''; $interval=0; $interval='0:0' if $subsys=~/[YZ]/; $interval='0:0:0' if $subsys=~/E/; $quietFlag=1; # suppress 'waiting...' startup message } if ($userInterval ne '' && $userInterval=~/^(:.*)/) { $interval="1$userInterval"; $Interval=1; } $SubsysDef=$SubsysDefInt; $subsys=$SubsysDef if $userSubsys eq ''; } # subsystems - must preceed + # special option -s-all disables ALL subsystems which is basically the only way to # disable all subsystems when you want to play back one or more explicit imports # so we need to to allow if the ONLY thing that follows -s error("+/- must start -s arguments if used") if $subsys=~/[+-]/ && $subsys!~/^[+-]/; error("-s-all only allowed with -p") if $subsys eq '-all' && $playback eq ''; error("invalid subsystem '$subsys'") if $userSubsys ne '-all' && $subsys!~/^[-+$SubsysCore$SubsysExcore$SubsysDet]+$/; $subsys=mergeSubsys($SubsysDef); # note that -p, --procanalyze, --slabanalyze and --top can change $subsys # also be sure to note if the user typed --verbose $userVerbose=$verboseFlag; setOutputFormat(); # switch validations once we know whether brief or verbose error("only choose 1 of -oA and --stats") if $statsFlag>1; error("statistics not allowed in verbose mode") if $statsFlag && $verboseFlag; error("statistics not allowed interactively") if $statsFlag && $playback eq ''; error("--statopts required --stats") if $statOpts ne '' && !$statsFlag; error("valid --statopts are [ais]") if $statOpts ne '' && $statOpts!~/[ais]/; $headerRepeat=0 if $statsFlag && $statOpts!~/i/; # force single header line when not including interval data # S p e c i a l F o r m a t s if ($procAnalFlag || $slabAnalFlag) { error("--procanalyze/--slabanalyze require -p") if $playback eq ''; error("--procanalyze/--slabanalyze require -f") if $filename eq ''; error("--procanalyze/--slabanalyze do not support --utc") if $utcFlag; error("--procanalyze/--slabanalyze with -P requires -s") if $plotFlag && $userSubsys eq ''; error("sorry, but no + or - with -s and analyze mode") if $userSubsys=~/[+-]/; # No default from playback file in this mode, so go by whatever user # specificed with -s and if no Y/Z, stick one in there and then make # user $userSubsys and $subsys agree so initFormat() won't diddle # the values. $slabAnalOnlyFlag=($slabAnalFlag && $userSubsys!~/Y/) ? 1 : 0; $procAnalOnlyFlag=($procAnalFlag && $userSubsys!~/Z/) ? 1 : 0; $userSubsys.='Y' if $slabAnalOnlyFlag; $userSubsys.='Z' if $procAnalOnlyFlag; $subsys=$userSubsys; $plotFlag=1; } # We have to wait for '$subsys' to be defined before handling top and it # felt right to keep the code together with --procanalyze/--slabanalyze. # --top forces $homeFlag if not in playback mode or vert mode. # if no process interval specified set it to the monitoring one. $temp=$SubsysDet; $temp=~s/YZ//; $detailFlag=($subsys=~/[$temp]/) ? 1 : 0; if ($topOpts ne '') { # Don't diddle original setting in '$userSubsys', use a copy! # Subtle - the verbose flag wouldn't have been set if ONLY processes or slabs and # it should be. Similarly, Y/Z should not be considered when looking to see is # same columns in verbose mode. my $tempSubsys=$userSubsys; $tempSubsys=~s/[YZ]//g; $verboseFlag=1 if $tempSubsys eq ''; $sameColsFlag=1 if $verboseFlag && length($tempSubsys)==1; $briefFlag=($verboseFlag) ? 0 : 1; my $subsysSize=0; if ($tempSubsys ne '' && $playback eq '') { if (!$verboseFlag || $sameColsFlag) { # in brief or single-subsys verbose mode the area size if fixed by --hr $subsysSize=$headerRepeat+2; } else { # multi-subsys verbose mode is driven by the number of subsystems but if # there are any details, it's up to the users choice of --hr $subsysSize=length($tempSubsys)*3; $subsysSize++ if $tempSubsys=~/m/; $subsysSize=$headerRepeat if $detailFlag; } $scrollEnd=$subsysSize+1; } ($topType, $numTop, $topVert)=split(/,/, $topOpts); $topType='time' if $topType eq ''; $topVert='' if !defined($topVert); $topVertFlag=($topVert eq 'v') ? 1 : 0; error("only valid value for 3rd --top parameter is 'v'") if $topVert ne '' && $topVert ne 'v'; error("cannot specify vertical --top output and -s") if $topVertFlag && $userSubsys ne ''; # enough of these to warrant setting a flag $topIOFlag=($topType=~/io|kb|sys$|cncl/) ? 1 : 0; $termHeight=12 if $playback ne ''; $numTop=$termHeight-$scrollEnd-2 if !defined($numTop) || $numTop==-1; #print "HEIGHT: $termHeight SUBSIZE: $subsysSize HR: $headerRepeat NUMTOP: $numTop\n"; $topProcFlag=(defined($TopProcTypes{$topType})) ? 1 : 0; $topSlabFlag=(defined($TopSlabTypes{$topType})) ? 1 : 0; error("not enough lines in window for display") if $numTop<1; error("invalid --top type. see --showtopopts for list") if $topProcFlag==0 && $topSlabFlag==0; error("you cannot select process and slab subsystems in --top mode") if ($subsys=~/Y/ && $subsys=~/Z/) || ($subsys=~/Y/ && $topProcFlag) || ($subsys=~/Z/ && $topSlabFlag); # if sorting by v/n context switches, force --procopts x if not specified $procOpts.='x' if $topType=~/vctx|nctx/ && $procOpts!~/x/; if ($playback eq '') { $homeFlag=1 if !$topVert; $subsys=(defined($TopProcTypes{$topType})) ? "${tempSubsys}Z" : "${tempSubsys}Y"; $interval.=":$interval" if $interval!~/:/; } } # I m p o r t if ($import ne '') { # Default mode for --import is NO user defined subsystem in interactive mode. # All must be explicitly defined $subsys='' if !$daemonFlag && $userSubsys eq ''; foreach my $imp (split(/:/, $import)) { $impString=$imp; $impDetFlag[$impNumMods]=0; $impNumMods++; # The following chunks based somewhat on --export code, except OPTS is a string ($impName, $impOpts)=split(/,/, $impString, 2); $impName.=".ph" if $impName!~/\./; # If the import file itself doesn't exist in current directory, try $ReqDir my $tempName=$impName; $impName="$ReqDir/$impName" if !-e $impName; if (!-e "$impName") { my $temp="can't find import file '$tempName' in ./"; $temp.=" OR $ReqDir/" if $ReqDir ne '.'; error($temp) if !-e "$impName"; } require $impName; # the basename is the name of the function and also remove extension. $impName=basename($impName); $impName=(split(/\./, $impName))[0]; push @impOpts, $impOpts; push @impInit, "${impName}Init"; push @impGetData, "${impName}GetData"; push @impGetHeader, "${impName}GetHeader"; push @impInitInterval, "${impName}InitInterval"; push @impIntervalEnd, "${impName}IntervalEnd"; push @impAnalyze, "${impName}Analyze"; push @impUpdateHeader, "${impName}UpdateHeader"; push @impPrintBrief, "${impName}PrintBrief"; push @impPrintVerbose, "${impName}PrintVerbose"; push @impPrintPlot, "${impName}PrintPlot"; push @impPrintExport, "${impName}PrintExport"; } # Call REQUIRED initialization routines in reverse so if we have to # delete anything we won't have to deal with overlap $impSummaryFlag=$impDetailFlag=0; for (my $i=($impNumMods-1); $i>=0; $i--) { my $status=&{$impInit[$i]}(\$impOpts[$i], \$impKey[$i]); if ($status==-1) { splice(@impOpts, $i, 1); splice(@impKey, $i, 1); splice(@impInit, $i, 1); splice(@impGetData, $i, 1); splice(@impGetHeader, $i, 1); splice(@impInitInterval, $i, 1); splice(@impIntervalEnd, $i, 1); splice(@impAnalyze, $i, 1); splice(@impUpdateHeader, $i, 1); splice(@impPrintBrief, $i, 1); splice(@impPrintVerbose, $i, 1); splice(@impPrintPlot, $i, 1); splice(@impPrintExport, $i, 1); $impNumMods--; next; } # We need to know if any module has summary or data in case one one else does # and we're in plot format so newlog() will know to open tab file. This also # helps optimize some of the print routines. $impSummaryFlag++ if $impOpts[$i]=~/s/; $impDetailFlag++ if $impOpts[$i]=~/d/; } # Reset output formatting based on the modules we just loaded print "Reset output flags\n" if $debug & 1; setOutputFormat(); } # E x p o r t M o d u l e s # since we might want to diddle with things like $subsys or fake out other # switches, we need to load/initialize things early. We may also need a # call to a pre-execution init module later... # This one needs more explanation. Most export modules expect to either log # to a file or send their output over a socket and so collectl will generate # an error if you try to do so without -f or -A. BUT modules like ganglia # or graphite do their own communications and so need to set this flag to # defeat that message in case they don't want to locally log data. $exportComm=0; if ($export ne '') { # By design, if you specify --export and -f and have a socket open, the exported # data goes over the socket and we write either a raw or plot file to the dir # pointed to by -f. If not -P, we always write a raw file $rawtooFlag=1 if $sockFlag && $filename ne '' && !$plotFlag; $verboseFlag=1; ($expName, @expOpts)=split(/,/, $export); $expName.=".ph" if $expName!~/\./; # If the export file itself doesn't exist in current directory, try $ReqDir my $tempName=$expName; $expName="$ReqDir/$expName" if !-e $expName; if (!-e "$expName") { my $temp="can't find export file '$tempName' in ./"; $temp.=" OR $ReqDir/" if $ReqDir ne '.'; error($temp); } require $expName; # the basename is the name of the function and also remove extension. $expName=basename($expName); $expName=(split(/\./, $expName))[0]; } # S i m p l e S w i t c h C h e c k s $utcFlag=1 if $options=~/U/; # should I migrate a lot of other simple tests here? error("you cannot specify -f with --top") if $topOpts ne '' && $filename ne ''; error("--home does not apply to -p") if $homeFlag && $playback ne ''; error("--envopts M does not apply to -P") if $userEnvOpts ne '' && $userEnvOpts=~/M/ && $plotFlag; error("--envopts are only fptCFMT and/or a number") if $userEnvOpts ne '' && $userEnvOpts!~/^[fptCFMT0-9]+$/; error("--envrules does not exist") if $envRules ne '' && !-e $envRules; error("--grep only applies to -p") if $grepPattern ne '' && $playback eq ''; error('--headerrepeat must be an integer') if $headerRepeat!~/^[\-]?\d+$/; error('--headerrepeat must be >= -1') if $headerRepeat<-1; error("-i not allowed with -p") if $userInterval ne '' && $playback ne ''; error("--rawtoo does not work in playback mode") if $rawtooFlag && $playback ne ''; error("--rawtoo requires -f") if $rawtooFlag && $filename eq ''; error("--rawtoo requires -P or --export") if $rawtooFlag && !$plotFlag && $export eq ''; error("--rawtoo and -P requires -f") if $rawtooFlag && $plotFlag && $filename eq ''; error("--rawtoo cannot be used with -p") if $rawtooFlag && $playback ne ''; error("-ou/--utc only apply to -P format") if $utcFlag && !$plotFlag; error("can't mix UTC time with other time formats") if $utcFlag && $options=~/[dDT]/; error("-oz only applies to -P files") if $options=~/z/ && !$plotFlag; error("--sep cannot be a '%'") if defined($SEP) && $SEP eq '%'; error("--sep only applies to plot format") if defined($SEP) && !$plotFlag; error("--sep much be 1 character or a number") if defined($SEP) && length($SEP)>1 && $SEP!~/^\d+$/; error('--showheader not allowed with -f') if $filename ne '' && $showHeaderFlag; error("--showheader in collection mode only supported on linux") if $PcFlag && $playback eq '' && $showHeaderFlag; error('--showmergedheader not allowed with -f') if $filename ne '' && $showMergedFlag; error('--showcolheaders not allowed with -f') if $filename ne '' && $showColFlag; error('--showcolheaders -sE can only be run by root') if $showColFlag && $subsys=~/E/ && !$rootFlag; error("--align require HiRes time module") if $alignFlag && !$hiResFlag; error('--umask can only be set by root') if $umask ne '' && !$rootFlag; error('-sT can only be used with -f or -P') if $subsys=~/T/ && !$plotFlag && $filename eq ''; # if user enters --envOpts if ($userEnvOpts ne '') { # remove ALL ipmi data types if user specified any, then add in ALL user options # which could include formatting options $envOpts=~s/[fpt]+//g if $userEnvOpts=~/[fpt]/; $envOpts.=$userEnvOpts; } $allThreadFlag=($procOpts=~/t/) ? 1 : 0; # The separator is either a space if not defined or the character supplied if # non-numeric. If it is numeric assume decimal and convert to the associated # char code (eg 9=tab). $SEP=' ' if !defined($SEP); $SEP=sprintf("%c", $SEP) if $SEP=~/\d+/; # Even though users warning about this in docs, it's too easy to forget # at least for me, so make sure no quotes in filters $cpuFilt=~s/['"]//g; $netFilt=~s/['"]//g; $dskFilt=~s/['"]//g; $rawNetFilter=~s/['"]//g; $rawDskFilter=~s/['"]//g; # purely an ease of use thing to allow people to use x-y as a cpu range $cpuFilt=~s/-/../g; # Remember, this filter overrides the one in collectl.conf if ($rawDskFilter ne '') { # if filter starts with '+', just add to existing string if ($rawDskFilter=~/^\+/) { $rawDskFilter=~s/^\+//; $DiskFilter.="|$rawDskFilter"; } else { $DiskFilter=$rawDskFilter; } $DiskFilterFlag=1; } undef %diskRemap; foreach my $remap (split(/,/, $dskRemap)) { my ($pat, $sub) = split(/:/, $remap); logmsg('F', "--dskremap string, $remap, missing ':'") if !defined($sub); $diskRemap{$pat}=$sub; } # cpu filters are a little differnt because we're detailing with # numbers and can't use pattern matching without some pain so # to make it easy just add those to keep or ignore to an array my $ignoreFlag=($cpuFilt=~s/^\^//) ? 1 : 0; foreach my $cpuRange (split(/,/, $cpuFilt)) { my @cpus=eval("($cpuRange)"); foreach my $cpu (@cpus) { $cpuFiltIgnore[$cpu]=1 if $ignoreFlag; $cpuFiltKeep[$cpu]=1 if !$ignoreFlag; } } # ugly debugging code but worth it. note we don't know how many # CPUs there are because we haven't yet called initRecord() if ($debug & 1) { if (@cpuFiltIgnore) { print "Ignore CPUs: "; for (my $i=0; $i<@cpuFiltIgnore; $i++) { print "$i " if defined($cpuFiltIgnore[$i]); } print "\n"; } if (@cpuFiltKeep) { print "Keep CPUs: "; for (my $i=0; $i<@cpuFiltKeep; $i++) { print "$i " if defined($cpuFiltKeep[$i]); } print "\n"; } } # This is applied AFTER the raw disk records are read and possibly filtered $dskFiltKeep=''; $dskFiltIgnore=''; $ignoreFlag=($dskFilt=~s/^\^//) ? 1 : 0; foreach my $disk (split(/,/, $dskFilt)) { $dskFiltIgnore.="|$disk" if $ignoreFlag; $dskFiltKeep.= "|$disk" if !$ignoreFlag; } $dskFiltKeep=~s/^\|//; $dskFiltIgnore=~s/^\|//; print "DskFilt - Ignore: $dskFiltIgnore Keep: $dskFiltKeep\n" if $debug & 1; # Unlike the raw disk filter which uses a flag to decided whether or not to use # if, if the raw net filter is non-blank its very presence is the flag so nothing # to set $netFiltKeep=''; $netFiltIgnore=''; $ignoreFlag=($netFilt=~s/^\^//) ? 1 : 0; foreach my $net (split(/,/, $netFilt)) { $netFiltIgnore.="|$net" if $ignoreFlag; $netFiltKeep.= "|$net" if !$ignoreFlag; } $netFiltKeep=~s/^\|//; $netFiltIgnore=~s/^\|//; print "NetFilt - Ignore: $netFiltIgnore Keep: $netFiltKeep\n" if $debug & 1; # raw net/dsk filters are different in that they're applied at data collection time # and so will never even make it to the raw file. for ease of use, one can separate # multiple entries by commas or pipe, but to make them work in the regx, convert the # commas to pipes $rawDskFilter=~s/,/|/g; $rawDskIgnore=~s/,/|/g; $rawNetFilter=~s/,/|/g; $rawNetIgnore=~s/,/|/g; print "RawDsk - Ignore: $rawDskIgnore Keep: $rawDskFilter\n" if $debug & 1; print "RawNet - Ignore: $rawNetIgnore Keep: $rawNetFilter\n" if $debug & 1; error("--dskopts f only applies to -sD") if $dskOpts=~/f/ && $subsys!~/D/; error("--dskopts z only applies to -sD") if $dskOpts=~/z/ && $subsys!~/D/; error("only valid value for --cpuopts is 'z'") if $cpuOpts ne '' && $cpuOpts!~/^[z]+$/; error("only valid values for --dskopts are 'fioz'") if $dskOpts ne '' && $dskOpts!~/^[fioz]+$/; error("only valid value for --xopts is 'i'") if $xOpts ne '' && $xOpts!~/^[i]+$/; $netOptsW=5; # minumum width if ($netOpts ne '') { error("--netopts only applies to -sn or -sN") if $subsys!~/n/i; error("only valid --netopts values are 'eEiow'") if $netOpts ne '' && $netOpts!~/^[eEiow0-9]+$/; if ($netOpts=~/w/) { error("--netopts -w only applies to -sN") if $subsys!~/N/; error("--netopts w must be followed by width") if $netOpts!~/w(\d+)/; $netOptsW=$1; error("--netopts width must be at least 5") if $netOptsW<5; } } # This is applied AFTER the interrupt records are read and possibly filtered $intFiltKeep=''; $intFiltIgnore=''; $ignoreFlag=($intFilt=~s/^\^//) ? 1 : 0; foreach my $int (split(/,/, $intFilt)) { $intFiltIgnore.="|$int" if $ignoreFlag; $intFiltKeep.= "|$int" if !$ignoreFlag; } $intFiltKeep=~s/^\|//; $intFiltIgnore=~s/^\|//; print "IntFilt - Ignore: $intFiltIgnore Keep: $intFiltKeep\n" if $debug & 1 && $intFilt ne ''; # L i n u x S p e c i f i c if (!$PcFlag) { # This matches THIS host, but in playback mode will be reset to the target $Kernel=`uname -r`; chomp $Kernel; error("collectl no longer supports 2.4 kernels") if $Kernel=~/^2\.4/; $LocalTimeZone=`date +%z`; chomp $LocalTimeZone; # Some distros put lspci in /usr/sbin and others in /usr/bin, so take one last look in # those before complaining, but only if in record mode AND only if looking at interconnects if (!-e $Lspci && $playback eq '' && $subsys=~/x/i) { $Lspci=(-e '/usr/sbin/lspci') ? '/usr/sbin/lspci' : '/usr/bin/lspci'; if (!-e "/usr/sbin/lspci" && !-e "/usr/bin/lspci") { pushmsg('W', "-sx disabled because 'lspci' not in $Lspci or '/usr/sbin' or '/usr/bin'"); pushmsg('W', "If somewhere else, move it or define in collectl.conf"); $xFlag=$XFlag=0; $subsys=~s/x//ig; } } if (!-e $Dmidecode && $playback eq '') { # we really only care about the message is doing -sE pushmsg('W', "cannot find '$Dmidecode' so can't determine hardware Product Name") if $subsys=~/E/; $Dmidecode=''; $ProductName='Unknown'; } # Set protections for output files umask oct($umask) or error("Couldn't set umask to $umask") if $umask ne '' && $rootFlag; } # C o m m o n I n i t i a l i z a t i o n # We always want to flush terminal buffer in case we're using pipes. $|=1; # We need to know where we're logging to so set a couple of flags $logToFileFlag=0; $rawFlag=$rawtooFlag; if ($filename ne '') { $rawFlag=1 if !$plotFlag && $export eq ''; $logToFileFlag=1 if $rawFlag || $plotFlag; } printf "RawFlag: %d PlotFlag: %d Repeat: %d Log2Flag: %d Export: %s\n", $rawFlag, $plotFlag, $headerRepeat, $logToFileFlag, $export if $debug & 1; ($lustreSvcs, $lustreConfigInt)=split(/:/, $lustreSvcs); $lustreSvcs="" if !defined($lustreSvcs); $lustreConfigInt=$LustreConfigInt if !defined($lustreConfigInt); error("Valid values for --lustsvcs are any combinations of cmoCMO") if $lustreSvcs!~/^[cmo]*$/i; error("lustre config check interval must be numeric") if $lustreConfigInt!~/^\d+$/; # some restrictions of plot format -- can't send to terminal for slabs or # processes unless only 1 subsystem selected. quite frankly I see no reason # to ever do it but there are so damn many other odd switch combos we might # as well catch these too. error("to display on terminal using -sY with -P requires only -sY") if $plotFlag && $filename eq '' && $subsys=~/Y/ && length($subsys)>1; error("to display on terminal using -sZ with -P requires only -sZ") if $plotFlag && $filename eq '' && $subsys=~/Z/ && length($subsys)>1; # No great place to put this, but at least here it's in you face! There are times # when someone may want to automate the running of collectl to playback/convert # logs from crontab for the day before and this is the easiest way to do that. # While we're at it, there may be some other 'early' checks that need to be make # in playback mode. if ($playback ne "") { ($day, $mon, $year)=(localtime(time))[3..5]; $today=sprintf("%d%02d%02d", $year+1900, $mon+1, $day); $playback=~s/TODAY/$today/; ($day, $mon, $year)=(localtime(time-86400))[3..5]; $yesterday=sprintf("%d%02d%02d", $year+1900, $mon+1, $day); $playback=~s/YESTERDAY/$yesterday/; error("sorry, but --procfilt not allowed in -p mode. consider grep") if $procFilt ne ''; error("sorry, but --slabfilt not allowed in -p mode. consider grep") if $slabFilt ne ''; } # linux box? if ($SrcArch!~/linux/) { error("record mode only runs on linux") if $playback eq ""; error("-N only works on linux") if $niceFlag; } # daemon if ($daemonFlag) { error("no debugging allowed with -D") if $debug; error("-D requires -f OR -A server") if $filename eq '' && !$serverFlag && !$exportComm; error("-p not allowed with -D") if $playback ne ""; if (-e $PidFile) { # see if this pid matches a version of collectl. If not, we'll overwrite # it further on so not to worry, but at least record a warning. $pid=`$Cat $PidFile`; chomp $pid; @ps=`ps axo pid,command`; foreach my $line (@ps) { $line=~s/^\s+//; # trim leading whitespace for short pids ($procPid, $procCommand)=split(/\s+/, $line, 2); if ($procPid eq $pid && $procCommand=~/collectl/) { error("a daemonized collectl already running"); } } } } # count if ($count!=-1) { error("-c must be numeric") if $count!~/^\d+$/; error("-c conflicts with -r and -R") if $rollLog ne "" || $runTime ne ""; $count++ # since we actually need 1 extra interval } if ($limits ne '') { error("-l only makes sense for -s D/L/l") if $subsys!~/[DLl]/; @limits=split(/-/, $limits); foreach $limit (@limits) { error("invalid value for -l: $limit") if $limit!~/^SVC:|^IOS:|^LusKBS:|^LusReints:|^OR|^AND/; ($name,$value)=split(/:/, $limit); $limBool=0 if $name=~/OR/; $limBool=1 if $name=~/AND/; next if $name=~/AND|OR/; error("-l SVC and IOS only apply to -sD") if $name!~/^Lus/ && $subsys=~/L/; error("-l LusKBS and LusReint only apply to -sL") if $name=~/^Lus/ && $subsys=~/D/; error("limit for $limit not numeric") if $value!~/^\d+$/; $limSVC=$value if $name=~/SVC/; $limIOS=$value if $name=~/IOS/; $limLusKBS=$value if $name=~/LusKBS/; $limLusReints=$value if $name=~/LusReints/; } } # options error("invalid option") if $options ne "" && $options!~/^[\^12acdDGgimnTuUxXz]+$/g; error("-oi only supported interactively with -P to terminal") if $options=~/i/ && ($playback ne '' || !$plotFlag || $filename ne ''); $miniDateFlag=($options=~/d/i) ? 1 : 0; $miniTimeFlag=($options=~/T/) ? 1 : 0; error("use only 1 of -o dDT") if ($miniDateFlag && $miniTimeFlag) || ($options=~/d/ && $options=~/D/); error("--home only applies to terminal output") if $homeFlag && $filename ne ""; error("--home cannot be used with -A") if $homeFlag && $sockFlag; error("option $1 only apply to -P") if !$plotFlag && $options=~/([12ac])/; error("-oa conflicts with -oc") if $options=~/a/ && $options=~/c/; error("-oa conflicts with -ou") if $options=~/a/ && $options=~/u/; if (!$hiResFlag && $options=~/m/) { print "need to install HiRes to report fractional time with -om, so ignoring\n"; $options=~s/m//; } $pidOnlyFlag=($procOpts=~/p/) ? 1 : 0; # We always compress files unless zlib not there or explicity turned off $zFlag=($options=~/z/ || $filename eq "") ? 0 : 1; if (!$zlibFlag && $zFlag) { $options.="z"; $zFlag=0; pushmsg("W", "Zlib not installed so can't compress raw file(s). Use --quiet to disable this warning.") if $rawFlag; pushmsg("W", "Zlib not installed so can't compress plot file(s). Use -oz to get rid of this warning.") if $plotFlag; } $precision=($options=~/(\d+)/) ? $1 : 0; $FS=".${precision}f"; # playback mode specific error('--showmerged only applies to playback mode') if $playback eq '' && $showMergedFlag; error('--extract only applies to playback mode') if $playback eq '' && $extract ne ''; if ($playback ne "") { error("-p not allowed with -F") if $flush ne ''; error("--offsettime must be in seconds with optional leading '-'") if defined($offsetTime) && $offsetTime!~/^-?\d+/; $playback=~s/['"]//g; # in case quotes passed through from script $playback=~s/,/ /g; # so glob below will work error("--align only applies to record mode") if $alignFlag; error("-p filename must end in '*', 'raw' or 'gz'") if $playback!~/\*$|raw$|gz$/; error("MUST specify -P if -p and -f") if $filename ne "" and !$plotFlag; if ($extract ne '') { $extractMode=1; error("-s not allowed in 'extract' mode") if $userSubsys ne ''; error("--from OR --thru required in 'extract' mode") if !defined($from) && !defined($thru); } # Quick check to make sure at least one file matches playback string my $foundFlag=0; foreach $file (glob($playback)) { $foundFlag=1; # this is a great place to print headers since we're already looping through glob if ($showHeaderFlag) { next if $file!~/raw/; # remember, this has to work on a pc as well, so can't use linux commands print "$file\n"; my $return; $return=open TMP, "<$file" if $file!~/gz$/; $return=($ZTMP=Compress::Zlib::gzopen($file, 'rb')) if $file=~/gz$/; logmsg("F", "Couldn't open '$file' for reading") if !defined($return) || $return<1; while (1) { $line= if $file!~/gz$/; $ZTMP->gzreadline($line) if $file=~/gz$/; last if $line!~/^#/; print $line; } print "\n"; close TMP if $file!~/gz$/; $ZTMP->gzclose() if $file=~/gz$/; } } error("can't find any files matching '$playback'") if !$foundFlag; exit(0) if $showHeaderFlag; } # end time $purgeDays=0; if (defined($from) || defined($thru)) { error("--from/--thru only apply to -p") if $playback eq ''; error("do not specify 2 times with --thru") if defined($thru) && index($thru, '-')!=-1; error("do not specify 2 times with --from and also use --thru") if defined($from) && defined($thru) && index($from, '-')!=-1; # Parse switches and handle those that only specify a date ($from, $thru)=split(/-/, $from) if !defined($thru); ($fromDate,$fromTime)=checkTime('--from', $from) if defined($from); ($thruDate,$thruTime)=checkTime('--thru', $thru) if defined($thru); } $fromDate=0 if !defined($fromDate); # 0 means all dates $thruDate=0 if !defined($thruDate); $fromTime='000000' if !defined($fromTime); $thruTime='235959' if !defined($thruTime); print "From: $fromDate $fromTime Thru: $thruDate $thruTime\n" if $debug & 1; $endSecs=0; if ($runTime ne "") { error("pick either -r or -R") if $rollLog ne ""; error("invalid -R format") if $runTime!~/^(\d+)[wdhms]{1}$/; $endSecs=$1; $endSecs*=60 if $runTime=~/m/; $endSecs*=3600 if $runTime=~/h/; $endSecs*=86400 if $runTime=~/d/; $endSecs*=604800 if $runTime=~/w/; $endSecs+=time; } # log file rollover my $rollSecs=0; my $expectedHour; if ($rollLog ne '') { error("-r requires -f") if $filename eq ""; ($rollTime,$purgeDays,$rollIncr)=split(/,/, $rollLog); ($purgeDays, $purgeMons)=split(/:/, $purgeDays); $rollIncr=60*24 if !defined($rollIncr) || $rollIncr eq ''; # default is 7 days for data and 12 months for logs $purgeDays=7 if !defined($purgeDays) || $purgeDays eq ''; $purgeMons=12 if !defined($purgeMons); error("-r time must be in HH:MM format") if $rollTime!~/^\d{2}:\d{2}$/; ($rollHour, $rollMin)=split(/:/, $rollTime); error("-r purge days must be an integer") if $purgeDays!~/^\d+$/; error("-r purge months must be an integer") if $purgeMons!~/^\d+$/; error("-r increment must be an integer") if $rollIncr!~/^\d+$/; error("-r time invalid") if $rollHour>23 || $rollMin>59; error("-r increment must be a factor of 24 hours") if int(1440/$rollIncr)*$rollIncr!=1440; error("if -r increment>1 hour, must be multiple of 1 hour") if $rollIncr>60 && int($rollIncr/60)*60!=$rollIncr; error("roll time must be specified in 1st interval") if ($rollHour*60+$rollMin)>$rollIncr; # Getting the time to the next interval can be tricky because we have to # worry about daylight savings time. This IS further complicated by # having to deal with intervals. The safest thing to do is using brute-force. # I also have to write the following down because I know I'll forget it and # think it's a bug! Assume you're going to roll every two hours (or more) and it's # midnite of the day to move clocks forward (probably never going happen but...). # 2 hours from midnight is 3AM! so we subtract an hour and now since we're before the # time change we create a logfile with a time of 1AM. BUT the next log gets created # at AM and everyone is happy! # We start at the first interval of the day and then step forward until we # pass our current time. Then we see if DST is involved and then we're done! # Note however, if the interval is an hour or less, DST takes care of itself! # Step 1 - Get current date/time my ($sec, $min, $hour, $day, $mon, $year)=localtime(time); my $timeNow=sprintf "%d%02d%02d %02d:%02d:%02d", $year+1900, $mon+1, $day, $hour, $min, $sec; $rollToday=timelocal(0, $rollMin, $rollHour, $day, $mon, $year); # Step 2 - step through each increment (note in most cases there is only 1!) # looking for each one > now my $timeToRoll; $expectedHour=$rollHour; foreach ($rollSecs=$rollToday;; $rollSecs+=$rollIncr*60) { # Get the corresponding time and if not the first one see if the # time was changed my ($sec, $min, $hour, $day, $mon, $year)=localtime($rollSecs); $timeToRoll=sprintf "%d%02d%02d %02d:%02d:%02d", $year+1900, $mon+1, $day, $hour, $min, $sec; #print "CurTime: $timeToRoll CurHour: $hour ExpectedHour: $expectedHour\n"; if ($rollIncr>60) { # Tricky... We can have expected hour differ from the current one by # exactly 1 hour when we hit a DST time change. However, while a # simple subtraction will yield +/- 1, the one special case is when # we're rolling logs at 00:00 and get an hour of 23, which generates a # diff of -23 when we really want +1. my $diff=($expectedHour-$hour); $specialFlag=($diff==-23) ? 1 : 0; $diff=1 if $specialFlag; $rollSecs+=$diff*3600; # diff is USUALLY 0 # When in this 'special' situation, '$timeToRoll' is pointing to the previous # day so we need to reset $timeToRoll, but only AFTER we updated rollSecs. if ($specialFlag) { ($sec, $min, $hour, $day, $mon, $year)=localtime($rollSecs); $timeToRoll=sprintf "%d%02d%02d %02d:%02d:%02d", $year+1900, $mon+1, $day, $hour, $min, $sec; } $expectedHour+=$rollIncr/60; $expectedHour%=24; } last if $timeToRoll gt $timeNow; } ($sec, $min, $hour, $day, $mon, $year)=localtime($rollSecs); $rollFirst=sprintf "%d%02d%02d %02d:%02d:%02d", $year+1900, $mon+1, $day, $hour, $min, $sec; pushmsg("I", "First log rollover will be: $rollFirst"); } # for --home we do some vt100 cursor control if ($homeFlag) { $home=sprintf("%c[H", 27); # top of display $clr=sprintf("%c[J", 27); # clear to end of display $clscr="$home$clr"; # clear screen $cleol=sprintf("%c[K", 27); # clear to end of line } # if -N, set priority to 20 `renice 20 $$` if $niceFlag; # Couldn't find anywhere else to put this one... error("-sT only works with -P for now (too much data)") if $TFlag && !$plotFlag; # get parent pid so we can check later to see it still there $stat=`cat /proc/$$/stat`; $myPpid=(split(/\s+/, $stat))[3]; ############################### # P l a y b a c k M o d e ############################### if ($playback ne '') { # Select all files that need to be processed based on dates my $numSelected=0; my ($firstFileDate, $lastFileDate, $lastPrefix)=(0,0,''); my $pushed=''; while (my $file=glob($playback)) { next if $file!~/(.*)-(\d{8})-(\d{6})\.(raw[p]*)/; my $prefix= $1; my $fileDate=$2; my $fileTime=$3; # we never look at rawp files when doing stats next if $file=~/rawp/ && $statsFlag; $firstFileDate=$fileDate if $firstFileDate==0; $numSelected++; # F i l t e r O u t F i l e s N e w e r T h a n t h r u D a t e # If there IS a thru date, ignore any files start were beyond it. next if ($thruDate ne 0) && (($fileDate > $thruDate) || (($fileDate == $thruDate) && ($fileTime > $thruTime))); # and finally, if no from OR thru dates, the thru time is applied against all files # so skip any files created after the from time next if ($fromDate eq 0) && ($thruDate eq 0) && ($fileTime>$thruTime); # New functionality for V3.5.1: only apply other filters if wildcards NOT in filename # since files CAN contain data beyond their date stamp. if ($playback!~/\*/) { push @playbackList, $file; next; } # A p p l y F i l t e r s T o W i l d c a r d e d F i l e n a m e s # We only get here is a wildcard in the file list. If it's from date is early than specified # ignore it, remembering if it did have data that crossed midnight we'll never know. Those # MUST be processes w/o wild cards in their names. If this ever becomes an issue we could always # look inside the header here, but that's more work than currently deemed worth it. next if ($fromDate ne 0) && ($fileDate < $fromDate); # This is the magic AND there are 3 cases all of which have the common test of # there needs to be a file with a different basename (in case we're doing a rawp # file and there is alreay a raw file there) on the stack and the current file # is < $fromTime, in which case we might NOT want to process the file(s) on the # top of the stack under the the following cases: # - if we have a from date this only applies to files for the from date # - if no from date but a thru date, apply time to files of first date # - if no from AND thru dates this test applies to ALL dates but only # pop files for the same date which are by definition too 'young' if ($file!~/$pushed/ && ($lastFileDate!=0) && ($fileTime<$fromTime) && ($prefix eq $lastPrefix)) { if (($fromDate!=0 && $fileDate==$fromDate) || ($fromDate==0 && $thruDate!=0 && $fileDate==$firstFileDate) || ($fromDate==0 && $thruDate==0 && $fileDate==$lastFileDate)) { my $popped=pop(@playbackList); $popped=quotemeta((split(/\./, $popped))[0]); # get rid of companion file if there is one. pop(@playbackList) if scalar(@playbackList) && $playbackList[0]=~/$popped/; } } push @playbackList, $file; $pushed=quotemeta((split(/\./, $file))[0]); # filename less extension, ready for regx $lastPrefix=$prefix; $lastFileDate=$fileDate; } if (@playbackList==0) { print "no files found containing -yyyymmdd-hhmmss.raw OR within specified time period\n"; exit(0); } $numProcessed=0; $elapsedSecs=0; preprocessPlayback(\@playbackList); $doneFlag=0; $lastPrefix=$lastHost=$prefixPrinted=$lastSubsys=''; foreach $file (@playbackList) { # Unfortunately we need a more unique global name for the file we're doing $playbackFile=$file; $rawPFlag=($file=~/\.rawp/) ? 1 : 0; # For now, we're going to skip files in error and process the rest. # Some day we may just want to exit on errors (or have another switch!) $ignoreFlag=0; foreach $key (keys %preprocErrors) { # some are file names and some just prefixes. if ($file=~/$key/) { ($type, $text)=split(/:/, $preprocErrors{$key}, 2); $modifier=($type eq 'E') ? 'due to error:' : 'because'; logmsg($type, "*** Skipping '$file' $modifier $text ***"); $ignoreFlag=1; next; } } next if $ignoreFlag; print "\nPlaying back $file\n" if $msgFlag || $debug & 1; $file=~/(.*)-(\d{8})-\d{6}\.raw[p]*/; $prefix="$1-$2"; $fileHost=$1; $fileRoot=basename($prefix); # if the prefix didn't change, we can't have a new host $newPrefixFlag=$newHostFlag=0; if ($prefix ne $lastPrefix) { # Remember that the prefix includes the date so the host could still be the same! $newPrefixFlag=1; $newHostFlag=($fileHost ne $lastHost) ? 1 : 0; $lastHost=$fileHost; print "NewPrefix: $newPrefixFlag NewHost: $newHostFlag\n" if $debug & 1; if ($newHostFlag) { undef $newSeconds[$rawPFlag]; # need to reset disk structures so we don't append to the last host's # data. Note that @dskSeen/@netSeen cleared at top of each interval undef %disks; undef @dskOrder; undef @dskIndexAvail; $dskIndexNext=$dskSeenLast=$dskSeenCount=0; # same for networks undef %networks; undef @netOrder; undef @netIndexAvail; $netIndexNext=$netSeenLast=$netSeenCount=0; } # For each day's set of files, we need to reset this variable so interval # lengths are calculared correctly. Since int3 doesn't contain any rate # data we don't care about that one. $lastInt2Secs=0; if ($msgFlag && defined($preprocMessages{$prefix})) { # Whatever the messages may be, we only want to display them once for # each set of files, that is files with the same prefix my $preamblePrinted=0; for ($i=0; $i<$preprocMessages{$prefix}; $i++) { $key="$prefix|$i"; if ($file=~/$prefix/) { # messy but makes it easier on the user to only see this message when a -s change # didn't happen because of a raw/rawp adjustment. Since changes are appended, just # subtract first string from final one and don't report if only [YZ] remains if ($preprocMessages{$key}=~/-s overridden/ && ($playback{$prefix}->{flags} & 1)) { my $first=$playback{$prefix}->{subsysFirst}; my $final=$playback{$prefix}->{subsys}; $final=~s/$first//; next if $final=~/^[YZ]*$/; } print " >>> Forcing configuration change(s) for '$prefix-*'\n" if !$preamblePrinted; print " >>> $preprocMessages{$key}\n"; $preamblePrinted=1; } } } # When we start a new prefix, that's the time to reset any variables that # span the set of common files. $lustreCltInfo=''; $headersPrinted=$totalCounter=$separatorCounter=0; # Finally save the merged set of subsystems associated with all the files for # for this prefix. $subsysAll=$playback{$prefix}->{subsys}; } $lastPrefix=$prefix; print "NewPrefix: $newPrefixFlag NewHost: $newHostFlag\n" if $debug & 1; # we need to initialize a bunch of stuff including these variables and the # starting time for the file as well as the corresponding UTC seconds. ($recVersion, $recDate, $recTime, $recSecs, $recTZ, $recInterval, $recSubsys, $recNfsFilt, $recHeader)=initFormat($file); error("$file was created before collectl V2.0 and so cannot be played back") if $recVersion lt '2.0'; printf "RECORDED -- Host: $Host Version: %s Date: %s Time: %s Interval: %s Subsys: $recSubsys\n", $recVersion, $recDate, $recTime, $recInterval if $debug & 1; # we can't do this until we know what version of collectl recorded the file if ($tcpFilt ne '' && $recVersion ne '' && $recVersion lt '3.6.4-1') { print "$file recorded with collectl V$recVersion which does not support --tcpfilt, so skipping...\n"; next; } $tcpFilt='T' if $subsys=~/t/i && $recVersion lt '3.6.4-1'; # only subsystem reported earlier # Make sure at least 1 requested subsys is actually recorded OR if -s-all clear them all # also note an empty $subsys had been set to ' ' so regx below will work. Now set it back! $subsys='' if $userSubsys eq '-all'; my $tempSys=$subsys; # this is what we want to report $tempSys=~s/[$recSubsys]//gi; # remove ANY that are recorded, whether summary OR detail $subsys='' if $subsys eq ' '; print "recSubsys: $recSubsys subsys: $subsys tempSys: $tempSys\n" if $debug & 1; if ($statsFlag && $recSubsys=~/[YZ]/ && $subsys=~/[YZ]/) { print "--stats does not apply to slabs/process and so ignoring those subsystems\n"; $subsys=~s/[YZ]//g; } # When processing a batch of files, it's possible none of them have any of the selected subsystems, # the best example being playing back *.gz files which have been collected with --tworaw and only # requestion data in one typw. In those cases both files will be processed and we need to skip # the ones w/o data. The logmsg() below only reports the message when -m included. if (!$numProcessed && !$impNumMods && $subsys eq $tempSys) { logmsg("w", "none of the requested subsystems are recorded in selected file"); next; } loadUids($passwdFile) if $recSubsys=~/Z/; #print "SUBSYS: $subsys RECSUBSYS: $recSubsys FLAGS: $playback{$prefix}->{flags}\n"; # if --top but user didn't specify -s too, ignore anything in header(s) $subsys=~s/[^YZ]*//g if $topFlag && $userSubsys eq ''; $subsysAll=~s/[^YZ]*//g if $topFlag && $userSubsys eq ''; # Now that we know the subsystem it's safe to initialize a custom --export module if using one. if ($expName ne '') { my $initName="${expName}Init"; &$initName(@expOpts); } # I wanted these 'in your face' rather than buried in 'initFormat()'. if ($playback{$prefix}->{flags} & 1) { # when playing back data from BOTH files, we need to reset these if in fact something to # print from rawp so that we'll repeat brief headers. $headersPrinted=$totalCounter=0 if $subsys=~/[YZ]/i; # When playing back files generated with -G and user specified -s, make sure that subsys # only contains file-related subsystems so $subsys is consistent with the file we're processing $subsys=~s/[YZ]//gi if $file!~/rawp/; $subsys=~s/[^YZ]//gi if $file=~/rawp/; next if $subsys eq ''; # in case $subsys now '' for this file } else { # no 'rawp' files associated with this prefix so if user chose 'y' in playback and no slab # data has been recorded, ignore it so we won't put ourselves into --verbose because of it. # NOTE - this is an exception to the rule that if the user requests a subsystem for which # we have no data we report it as zeros. $subsys=~s/y//gi if $recSubsys!~/y/i; } # the only way nfsfilt can come back null is when there is a blank nfsfilt in header my $tempFilt=($recNfsFilt ne '' ? $recNfsFilt : 'c2,s2,c3,s3,c4,s4'); if ($nfsFilt ne '') { foreach my $filt (split(/,/, $nfsFilt)) { error("'$filt' data not recorded in $file and so cannot be selected") if $tempFilt!~/$filt/; } $tempFilt=$nfsFilt; } setNFSFlags($tempFilt); # We can only do this test after figuring out what's in the header. NOTE that since the number # of enabled CPUs can change dynamically when doing -sC and we've already skipped the code in # formatit that sets the number to 0, we have to do it here too. if ($subsys=~/j/i && $subsys!~/C/i && $plotFlag) { logmsg('I', "-sj or -sJ with -P also requires CPU details so adding -sC. See FAQ for details."); $subsys.='C'; $subsysAll.='C'; $noCpusFlag=1; # we need to know elsewhere when this was done $cpusEnabled=0 if $recSubsys=~/c/i; # if recorded, WILL be dynamically reset } # the way the process/slab tests work is if raw file not built with -G, look at all files. # but IF a -G only look at rawp files. if (($playback{$prefix}->{flags} & 1)==0 || $rawPFlag) { # no rawp files so these tests are pretty easy my $skipmsg=''; $skipmsg="io" if $topIOFlag && !$processIOFlag; $skipmsg="process" if $procAnalFlag && $recSubsys!~/Z/; $skipmsg="slab" if $slabAnalFlag && $recSubsys!~/Y/; if ($skipmsg ne '') { print " >>> Skipping file because it does not contain $skipmsg data <<<\n"; next; } } # Need to reset the globals for the intervals that gets recorded in the header. # Note the conditional on the assignments for i2 and i3. This is because they SHOULD be # in the header as of V2.1.0 and I don't want to mask any problems if they're not. ($interval, $interval2, $interval3)=split(/:/, $recInterval); $interval2=$Interval2 if !defined($interval2) && $recVersion lt '2.1.0'; $interval3=$Interval3 if !defined($interval2) && $recVersion lt '2.1.0'; # At this point we've initialized all the variables that will get written to the common # header for one set of files for one day, so if the user had specified --showmerged, now # is the best/easiest time to do it. We also need to set a flag so we only print the # header once for each set of merged files if ($showMergedFlag) { # I'm bummed I can't use '$lastPrefix', but we don't always execute the # outer loop and can't rest it in one place common to everyone... if ($prefix ne $prefixPrinted) { $commonHeader=buildCommonHeader(0); } $prefixPrinted=$prefix; next; } # on the off chance that lustre data was collected with --lustopts but not # played back, clear the lustre settings or else we're screw up the default # playback mode. $lustOpts='' if $subsys!~/l/i; # conversely, if data was collected using --lustOpts but lustre # wasn't active during the time this file was collected, the header will # indicate this log does NOT contain any lustre data but the -s will and # so we need to turn off any -lustOpts or else 'checkSubsysOpts()' # will report a conflict. $lustOpts=~s/B//g if $lustOpts=~/B/ && $CltFlag==0 && $OstFlag==0; $lustOpts=~s/D//g if $lustOpts=~/D/ && $MdsFlag==0 && $OstFlag==0; $lustOpts=~s/[MR]//g if $lustOpts=~/[MR]/ && $CltFlag==0; # Now we can check for valid/consistent sub-options (not sure this is still # necessary, but it shouldn't hurt). Since we can swap back and forth between # raw and rawp, with the latter requiring verbose, always reset to the default # of brief, unless if course user specified --verbose. checkSubsysOpts(); # Make sure valid $verboseFlag=1 if $userVerbose; setOutputFormat(); # We need to set the 'coreFlag' based on whether or not any core # subsystems will be processed. $coreFlag=($subsys=~/[a-z]/) ? 1 : 0; # if a specific time offset wasn't selected, find difference between # time collectl wrote out the log and the time of the first timestamp. if (!defined($offsetTime) && $recSecs ne '') { $year=substr($recDate, 0, 4); $mon= substr($recDate, 4, 2); $day= substr($recDate, 6, 2); $hour=substr($recTime, 0, 2); $min= substr($recTime, 2, 2); $sec= substr($recTime, 4, 2); $locSecs=timelocal($sec, $min, $hour, $day, $mon-1, $year-1900); $timeAdjust=$locSecs-$recSecs; } elsif (defined($offsetTime)) { $timeAdjust=$offsetTime; # user override of default } # Header already successfully read one, but what the heck... if (!defined($recVersion)) { logmsg("E", "Couldn't read header for $file"); next; } # Note - the prefix includes full path $zInFlag=($file=~/gz$/) ? 1 : 0; $file=~/(.*-\d{8})-\d{6}\.raw[p]*/; $prefix=$1; if ($prefix!~/$Host/) { print "ignoring $file whose header says recorded for $Host but whose name says otherwise!\n"; next; } # we get new output files (if writing to a file) for each prefix-date combo noting if reading # a rawp file we might get 2, also noting that $Host is a global pointing to the current host # being processed both in record as well as playback mode. We also need to track for terminal # processing as well, so use a different flag for that $key="$prefix:$recDate"; $newPrefixDate=(!defined($playback{$key})) ? 1 : 0; if ($newPrefixDate) { print "Prefix: $prefix Host: $Host\n" if ($debug & 1) && !$logToFileFlag; $headersPrintedProc=$headersPrintedSlab=$prcFileCount=0; $newOutputFile=($filename ne '') ? 1 : 0; $playback{$key}=1; } $prcFileCount++ if $subsys=~/Z/; #print "NEW PREFIX: $newPrefixDate NEW FILE: $newOutputFile\n"; # set playback timeframe for the file we're about to playback, using the date of the file # if not specified or that from the from if it is. The start time has already been set # earlier but when not starting at the beginning, we need to back up 1 interval since the # first one is never reported. my $tempDate=($fromDate eq '0') ? $recDate : $fromDate; $fromSecs=getSeconds($tempDate, $fromTime); $fromSecs-=$interval if $fromTime!=0; # The ending time is either the same date as the starting one (unless overriden by the user # for files that cross midnight) and we need to add a fraction to the ending time in case # fractional timestamps in file. Max time is Jan 19, 2038 but we'll use Jan 1 if needed. $tempDate=$thruDate if defined($thruDate) && $thruDate ne'0'; $thruSecs=(!defined($thru)) ? 2145934800 : getSeconds($tempDate, $thruTime).'.999'; # this is just to make debugging time frames easier especially if user gets odd results. if ($debug & 1) { my $fromstamp=getDateTime($fromSecs); my $thrustamp=getDateTime($thruSecs); print "PlayBack From: $fromstamp Thru: $thrustamp\n"; } if ($zInFlag) { $ZPLAY=Compress::Zlib::gzopen($file, "rb") or logmsg("F", "Couldn't open '$file'"); } else { open PLAY, "<$file" or logmsg("F", "Couldn't open '$file'"); } if ($extractMode) { my $base=basename($file); $outfile=(-d $extract) ? "$extract$Sep$base" : "$extract-$base"; #print "BASE: $base PREFIX: $prefix OUT: $outfile\n"; error("--extract specifies an output file with the same name as original!") if $outfile eq $file; logmsg('I', "Extracting to '$outfile'"); # compress the output file, but only if the input one was compressed. $ZRAW=Compress::Zlib::gzopen($outfile, 'wc') or logmsg("F", "Couldn't create '$outfile'") if $outfile=~/gz$/; open RAW, ">$outfile" or logmsg("F", "Couldn't create '$outfile'") if $outfile!~/gz$/; } # only call this if generating plot data either in file or on terminal AND # only one time per output file if ($plotFlag && ($newOutputFile || $options=~/u/)) { # Before we do anything else, close any files that were opened last pass # noting 'closeLogs()' also calls setFlags($subsys) closeLogs($lastSubsys) if $lastSubsys ne ''; # Open all output files here based on what was in merged subsystems. setFlags($subsysAll); print "SetFlags: $subsysAll\n" if $debug & 1; # If playback file has a prefix before its hostname things get more complicated # as we want to preserve that prefix and at the same time honor -f. $filespec=$filename; if ($prefix=~/(.+)-$Host/) { my $temp=$1; $temp=~s/.*$Sep//; $filespec.=(-d $filespec) ? "$Sep$temp" : "-$temp"; } # note we're only passing '$file' along in case we need diagnostics and we're also # resetting '$subsys' to match ALL the subsystems selected for this set of file(s) my $saveSubsys=$subsys; $subsys=$subsysAll; $newfile=newLog($filespec, $recDate, $recTime, $recSecs, $recTZ, $file); if ($newfile ne '1') { # This is the most common failure mode since people rarely use -ou # and having 2 separate conditions gives us more flexibility in messages if ($options!~/u/) { print " Plotfile '$newfile' already exists and will not be touched\n"; print " '-oc' to create a new one OR '-oa' to append to it\n"; } else { print " Plotfile '$newfile' exists and is newer than $file\n"; print " You must specify '-ocu' to force creation of a new one\n"; } next; } $subsys=$saveSubsys; $newOutputFile=0; $lastSubsys=$subsysAll; # used to track which files were actually opened } # when processing data for a new prefix/date and printing on a terminal # we need to print totals from previous file(s) if there were any and # reset total. However is --statopts s (as opposed to S), we do subtotals # for each file if ($filename eq '' && ($newPrefixDate || $statOpts=~/s/)) { if ($statsFlag && $numProcessed) { printBriefCounters('A') if $statOpts=~/a/; printBriefCounters('T'); } $elapsedSecs=0; resetBriefCounters(); } # Whenever a from time specified AND we're doing a new prefix, we need to start out in # skip mode. In all other cases we read ALL the records. Since --from with no date # applies to all files, that will also trigger starting out in 'skip' mode. $skip=($fromSecs && ($fromDate==0 || $newPrefixFlag)) ? 1 : 0; undef($fileFrom); $firstTime=$firstTime2=1; # tracks int1 and int2 first time processing $fileThru=$newMarkerWritten=$timestampFlag=$timestampCounter[$rawPFlag]=0; $fullTime=0; # so we don't get uninit first time we do $microInterval calculation $bytes=1; # so no compression error on non-zipped files $numProcessed++; # it's not until we get here that we can say this while (1) { # read a line from either zip file or plain ol' one last if ( $zInFlag && ($bytes=$ZPLAY->gzreadline($line))<1) || (!$zInFlag && !($line=)); # we always skip comments, but in extract mode we need to echo them to output file if ($line=~/^#/) { if ($extractMode) { $ZRAW->gzwrite($line) if $outfile=~/gz$/; print RAW $line if $outfile!~/gz$/; } next; } # Doncha love special cases? Turns out when reading back process data # from a PRC file which was created from multiple logs, if a process from # one log comes up with the same pid as that of an earlier log, there's # no easy way to tell. Now there is! writeInterFileMarker() if $filename ne '' && $prcFileCount>1 && !$newMarkerWritten; $newMarkerWritten=1; # if new interval, it really indicates the end of the last one but its # time is that of the new one so process last interval before saving. # if this isn't a valid interval marker the file somehow got corrupted # which was seen one time before flush error handling was put in. Don't # know if that was the problem or not so we'll keep this extra test. $timestampFlag=0; if ($line=~/^>>>/) { # after we've processed the data for the interval that DOESN'T print, we # need to clear the flag that indicates all intervals need to be processed $firstPass=0 if $timestampCounter[$rawPFlag]==1; # we need to make sure both $lastSeconds and $newSeconds track BOTH the # raw and rawp files, if both exist. # we need to know later on if we're processing a timestamp AND how many we've seen # because if we hit EOF and only 1 seen, we have not processed a single, full interval. $timestampFlag=1; if ($line!~/^>>> (\d+\.\d+) <<=$fromSecs; # since we're in an inner loop we need a flag if ($thruSecs && $lastSeconds[$rawPFlag]>$thruSecs) { $doneFlag=1; last; } # Always echo timestamp in extract mode when we're processing this interval if ($extractMode && !$skip) { $ZRAW->gzwrite($line) if $outfile=~/gz$/; print RAW $line if $outfile!~/gz$/; } $timestampCounter[$rawPFlag]++ if !$skip; if ($timestampCounter[$rawPFlag]==1) { # If a second (or more) file for same host, are their timstamps consecutive? # Since we could have a raw/rawp file the way to tell a new file is that # $newSeconds will be defined. # If NOT consecutive (or first file for a host), init 'last' variables, noting # we also need to init if there was a disk configuration change. $consecutiveFlag=(!$newHostFlag && defined($newSeconds[$rawPFlag]) && $thisSeconds==$newSeconds[$rawPFlag] && !$diskChangeFlag) ? 1 : 0; $newSeconds[$rawPFlag]=$thisSeconds; if (!$consecutiveFlag) { # if not doing raw/rawp files, init everything, otherwise just init the type we're doing initLast() if ($playback{$prefix}->{flags} & 1)==0; initLast($rawPFlag) if $playback{$prefix}->{flags} & 1; $lastSecs[$rawPFlag]=$thisSeconds; } print "ConsecFlag: $consecutiveFlag\n" if $debug & 1; next; } $newSeconds[$rawPFlag]=$fullTime=$thisSeconds; # we use '$fullTime' for $microInterval re-calculation # track from/thru times for each file to be used for -oA in terminal mode if (!$skip && !$rawPFlag) { $fileFrom=$newSeconds[$rawPFlag] if !defined($fileFrom); $fileThru=$newSeconds[$rawPFlag]; } # Normally we fall through on a timestamp marker so we can process the interval results # but in extract we don't want to generate any output, just record data. next if $extractMode; } next if $skip; print $line if $debug & 4; if ($grepPattern ne '') { if ($line=~/$grepPattern/) { $firstTime=0; # to indicate something found my $msec=(split(/\./, $newSeconds[$rawPFlag]))[1]; my ($ss, $mm, $hh, $mday, $mon, $year)=localtime($newSeconds[$rawPFlag]); $datetime=sprintf("%02d:%02d:%02d", $hh, $mm, $ss); $datetime=sprintf("%02d/%02d %s", $mon+1, $mday, $datetime) if $options=~/d/; $datetime=sprintf("%04d%02d%02d %s", $year+1900, $mon+1, $mday, $datetime) if $options=~/D/; $datetime.=".$msec" if ($options=~/m/); print "$datetime $line"; } next; } if ($extractMode) { # clear flag to prevent error message later. $firstTime=0; $ZRAW->gzwrite($line) if $outfile=~/gz$/; print RAW $line if $outfile!~/gz$/; next; } # Either we're processing a timestamp marker OR data entries # When using a single raw file that has interval markers for all record and newer rawp # files that only have them for interval2 only we need to force the 'print' flag each time $interval2Print=1 if $rawPFlag && $recVersion ge '3.3.5'; if ($timestampFlag) { # We already skipped first interval marker. As for the second one, which indicates the end of # a complete set of data, we only process that if we have consecutive files in which case # we get to use the last file's data for the previous interval's data. BUT we have to make # sure 'initInterval' called for second interval which may have been skipped. my $saveI2P=$interval2Print; # gets reset to 0 during intervalEnd() intervalEnd($lastSeconds[$rawPFlag]) if $consecutiveFlag || $timestampCounter[$rawPFlag]>2; initInterval() if $timestampCounter[$rawPFlag]==2; $firstTime2=0 if $saveI2P; } else { dataAnalyze($subsys, $line); } $firstTime=0; } # Write 'next' timestamp at end of file. if ($extractMode) { $ZRAW->gzwrite($line) if $outfile=~/gz$/; print RAW $line if $outfile!~/gz$/; } # We really only need message when -p specifies single file if ($firstTime && $numSelected==1) { print "No records selected for playback! Are --from/--thru` wrong?\n"; next; } # normally samples will end on timestamp marker (even if last interval) and therefore # processed above. However in pre-3.1.2 releases timestamps weren't written when # logs rolled and so we need to process the last interval in those cases as well. intervalEnd($newSeconds[$rawPFlag]) if !$timestampFlag && $recVersion lt '3.1.2'; my $tmpsys=$subsys; $tmpsys=~s/[YZ]//g; $ZPLAY->gzclose() if $zInFlag; close PLAY if !$zInFlag; # if we reported data from this file (we may have skipped it entirely if --from # used with multiple files), calculate how many seconds reported on in for # stats reporting with -oA, but only if at least 1 full interval processed if (!$skip && !$rawPFlag && !$extractMode && $timestampCounter[$rawPFlag]>1) { # Note that by default we never include first interval data, but if this was a # consecitive file we need to include that interval to so add it back in $playbackSecs=$fileThru-$fileFrom; $playbackSecs+=$interval if $consecutiveFlag; $elapsedSecs+=$playbackSecs; } # for easier reading... print "\n" if $debug & 1; # This should be pretty rare.. logmsg("E", "Error reading '$file'\n") if $bytes==-1; last if $doneFlag; } # Close logs that are open from last pass closeLogs($lastSubsys) if $numProcessed; # Always print last set of summary data... printProcAnalyze() if $procAnalCounter; printSlabAnalyze() if $slabAnalCounter; # if printing to terminal, be sure to print averages & totals for last file processed if (!$rawPFlag && $statsFlag && $filename eq '') { $subsys=$subsysAll; # in case mixed raw/rawp we need to reset $subsys=~s/y//; printBriefCounters('A') if $statOpts=~/a/; printBriefCounters('T'); } `stty echo` if !$PcFlag && $termFlag && !$backFlag; # in brief mode, we turned it off my $temp=(!$msgFlag) ? ' Try again with -m.' : ''; print "No files selected contain the selected data.$temp\n" if !$numProcessed; exit(0); } ########################### # R e c o r d M o d e ########################### # Would be nice someday to migrate all record-specific checks here error("-offsettime only applies to playback mode") if defined($offsetTime); # need to load even if interval is 0, but don't allow for -p mode loadPids($procFilt) if $subsys=~/Z/; loadUids($passwdFile) if $subsys=~/Z/; # In case running on a cluster, record the name of the host we're running on. # Track in collecl's log as well as syslog my $nohup=($nohupFlag && !$daemonFlag) ? '[--nohup]' : ''; # only announce --nohup if not daemon my $temp=($runas ne '') ? "as user '$runas' " : ''; $temp.=($pname ne '') ? "(running as '$pname') " : ''; $message="V$Version Beginning execution$nohup ${temp}on $myHost..."; logmsg("I", $message); logsys($message); checkHiRes() if $daemonFlag; # check for possible HiRes/glibc incompatibility # now let's report any messages that occurred earlier foreach my $message (@messages) { my ($severity, $text)=split(/-/, $message, 2); logmsg($severity, $text); } # initialize. noting if the user had only selected subsystems not supported # on this platform, initRecord() will have deselected them! initRecord(); error("no subsystems selected") if $subsys eq '' && $import eq ''; # ok in --import mode # Process I/O stats are a little tricky. initRecord() sets $processIOFlag based on kernel's # capabilities, but if user has disabled them, we then need to clear that flag. error("process I/O features not enabled in this kernel") if $procOpts=~/i/i && !$processIOFlag; error("process options i and I are mutually exclusive") if $procOpts=~/i/ && $procOpts=~/I/; error("you cannot use --top and IO options with this kernel") if $topIOFlag && !$processIOFlag; error("you cannot use --top and IO options with --procopt I") if $topIOFlag && $procOpts=~/I/; $processIOFlag=0 if $procOpts=~/I/; if ($subsys=~/y/i && !$slabinfoFlag && !$slubinfoFlag) { logmsg("W", "Slab monitoring disabled because neither /proc/slabinfo nor /sys/slab exists"); $yFlag=$YFlag=0; $subsys=~s/y//ig; } # We can't do this until we know if the data structures exist. loadSlabs($slabFilt) if $subsys=~/y/i; # now that subsys accurately reflects the systems we're collectling data on we # can safely initialize out export if one is defined. if ($expName ne '') { my $initName="${expName}Init"; &$initName(@expOpts); } # this sets all the xFlags as specified by -s. At least one must be set to # write to the 'tab' file. setFlags($subsys); # In case displaying output. We also need the recorded version to match ours. initFormat(); initLast(); $recVersion=$Version; # This has to go after initFormat() since it loads '$envRules' and may init # stuff needed by printTerm() if ($envTestFile ne '') { envTest(); exit(0); } # Since we have to check subsystem specific options against data in recorded # file, let's not do it twice, but we have to do it AFTER initFormat() checkSubsysOpts(); # L a s t M i n u t e V a l i d a t i o n # This needs to be done after loadConfig and only in record mode logmsg('W', "Couldn't find 'ipmitool' in '$ipmitoolPath'") if $subsys=~/E/ && $Ipmitool eq ''; # These can only be done after initRecord() # Since it IS possible for a server to be running as an MDS and a client, we need the following error("-sL applies to a server only running as an MDS when used with --lustopts D") if $subsys=~/L/ && $NumMds && !$CltFlag && $lustOpts!~/D/; error("--lustopts D only applies to HP-SFS") if $lustOpts=~/D/ && $sfsVersion eq ''; if ($options=~/x/i) { error("exception reporting requires --verbose") if !$verboseFlag; error("exception reporting only applies to -sD and lustre OST details or MDS/Client summary") if ($subsys!~/[DLl]/ || ($subsys=~/L/ && $NumOst==0) || ($subsys=~/l/ && $NumMds+$CltFlag==0)); error("exception reporting must be to a terminal OR a file in -P format") if ($filename ne "" && !$plotFlag) || ($filename eq "" && $plotFlag); } # L a s t M i n u t e C h a n g e s T o F o r m a t t i n g # OK, so it's getting messy. The decision to use brief/verbose is made in setOutputFormat() # but it's called much earlier, certainly before if we know what types of lustre node that # gets determined in initFormat() which gets called up above. Perhaps over time other # last minute tests will need a home and this may prove to be it. # The purpose of this is that in verbose mode when a single type of data is being displayed # we'll have set $sameColsFlag, but now that we know more about the lusre configuration # we may have to clear that setting. if ($subsys=~/l/i && $verboseFlag) { $sameColsFlag=0 if $CltFlag+$OstFlag+$MdsFlag>1; } # daemonize if necessary if ($daemonFlag) { # We need to make sure no terminal I/O open STDIN, '/dev/null' or logmsg("F", "Can't read /dev/null: $!"); open STDOUT, '>/dev/null' or logmsg("F", "Can't write to /dev/null: $!"); open STDERR, '>/dev/null' or logmsg("F", "Can't write to /dev/null: $!"); # fork a child and exit parent, but make sure fork really works defined(my $pid=fork()) or logmsg("F", "Can't fork: $!"); exit(0) if $pid; # Make REALLY sure we're disassociated setsid() or logmsg("F", "Couldn't setsid: $!"); open STDIN, '/dev/null' or logmsg("F", "Can't read /dev/null: $!"); open STDOUT, '>/dev/null' or logmsg("F", "Can't write to /dev/null: $!"); open STDERR, '>/dev/null' or logmsg("F", "Can't write to /dev/null: $!"); `echo $$ > $PidFile`; # Now that we're set up to start, if '--runas' has been sprecified we need to do a # few things that require privs before actually changing our UID. Also note the # GID is optional. if ($runas ne '') { # we have to make sure the owner ship of the message log is correct. # This is only an issue for the msglog when a new file gets created to log the first # messge of the month and we've restarted as root. Steal the code from logmsg() to # build its name. ($ss, $mm, $hh, $day, $mon, $year)=localtime(time); $yymm=sprintf("%d%02d", 1900+$year, $mon+1); $logname=(-d $filename) ? $filename : dirname($filename); $logname.="/$myHost-collectl-$yymm.log"; `chown $runasUid $logname`; `chgrp $runasGid $logname` if defined($runasGid); # now we can change our process's ownership taking care to do the group first # since we won't be able to change anything once we change our UID. $EGID=$runasGid if defined($runasGid); $EUID=$runasUid; } } ###################################################### # # ===> WARNING: No Writing to STDOUT beyond <===== # since we may be daemonized! # ###################################################### $SIG{"INT"}=\&sigInt; # for ^C $SIG{"TERM"}=\&sigTerm; # default kill command $SIG{"USR1"}=\&sigUsr1; # for flushing gz I/O buffers # to catch collectl's socket I/O errors, noting graphite.ph sets its own handler $SIG{"PIPE"}=\&sigPipe if $address ne ''; $flushTime=($flush ne '') ? time+$flush : 0; # intervals... note that if no main interval specified, we use # interval2 (if defined OR if only doing slabs/procs) and if not # that, interval3. Also, if there is an interval3, interval3 IS defined, so we # have to compare it to ''. Also note that since newlog() can change subsys # we need to wait until after we call it to do interval/limit validation. # be sure to ignore interval error checks for --showcolheader $origInterval=$interval; ($interval, $interval2, $interval3)=split(/:/, $interval); if (!$showColFlag) { error("interval2 only applies to -s y,Y or Z") if defined($interval2) && $interval2 ne '' && $subsys!~/[yYZ]/; error("interval2 must be >= interval1") if defined($interval) && defined($interval2) && $interval2 ne '' && $interval>$interval2; } $interval2=$Interval2 if !defined($interval2); $interval3=$Interval3 if !defined($interval3); $interval=$interval2 if $origInterval=~/^:/ || ($subsys=~/^[yz]+$/i && $interval!=0); $interval=$interval3 if $origInterval=~/^::/; if ($interval!=0) { if ($subsys=~/[yYZ]/) { error("interval2 must be >= main interval") if $interval2<$interval; error("interval2 must be the same as interval1 in --top mode") if $numTop && $interval!=$interval2; $limit2=$interval2/$interval; error("interval2 must be a multiple of main interval") if $limit2!=int($interval2/$interval); } if ($subsys=~/E/) { error("interval3 must be >= main interval") if $interval3<$interval; $limit3=$interval3/$interval; error("interval3 must be a multiple of main interval") if $limit3!=int($interval3/$interval); } } else { # While we don't want any pauses, we also want to limit the number of collections # to the same number as would be taken during normal activities. The magic here # is we can only get here is if $userInterval is not null. By default we assume # the ratios between ints 1/2/3 to be 1/6/30, but if i2 or i3 specified use those # as the ratios, not actual intervals. eg for -i5:20, use -i0:4 my ($ui, $ui2, $ui3)=split(/:/, $userInterval); $ui2='' if !defined($ui2); $ui3='' if !defined($ui3); $limit2=($ui2 eq '') ? 6 : $ui2; $limit3=($ui3 eq '') ? 30 : $ui3; $interval2=$interval3=0; print "Interval Lim2: $limit2 Lim3: $limit3\n" if $debug & 1; # make sure no 'bogus network speed' errors $DefNetSpeed=0; } if ($tworawFlag) { error("--tworaw require BOTH process and non-procss data") if !$recFlag0 || !$recFlag1; error("--tworaw requires data collection to a file") if $filename eq ''; # DUE to what seems to be a bug in zlib 2.02 (and maybe others), you cannot flush a buffer # twice in a row w/o writing to it. A shorter interval causes that to happen to rawp.gz. error("cannot use -F0 with --tworaw when interval1 not equal interval2") if !$flush && $interval!=$interval2; error("flush time cannot be < process collection interval, when using --tworaw") if $flush && $flush<$interval2; } # Note that even if printing in plotting mode to terminal we STILL call newlog # because that points the LOG, DSK, etc filehandles at STDOUT # Also, note that in somecase we set non-compressed files to autoflush $autoFlush=1 if $flush ne '' && $flush<=$interval && !$zFlag; newLog($filename, "", "", "", "", "") if ($filename ne '' || $plotFlag); # We want all final runtime parameters defined before doing this if ($showHeaderFlag && $playback eq '') { initRecord(); my $temp=buildCommonHeader(0, undef); printText($temp); exit(0); } # If HiRes had been loaded and we're NOT doing 'time' tests, we want to # align each interval via sigalrm. We HAVE to clear doneFlag here rather # than at loop top because when collectl receives a sigterm it sets the flag # and we don't want to set it back to 0. $doneFlag=0; if ($hiResFlag && $interval!=0) { # Default for deamons is to always align to the primary interval $alignFlag=1 if $daemonFlag; # sampling is calculated as multiples of a base time and we set that # time such that our next sample will occur on the next whole second, # just to make integer sampling align on second boundaries $AlignInt=$interval; $BaseTime=(time-$AlignInt+1)*1000000; # For aligned time we want to align on either the primary interval OR if # we're monitoring for processes or slabs, on the secondary one. To make # all sample times align no matter when they were started, we align based # on a time of 0 which is 00:00:00 on Jan 1, 1970 GMT if ($alignFlag) { $AlignInt=($subsys=~/[yz]/i) ? $interval2 : $interval; $BaseTime=0; } # Point to our alarm handler and set up some u-constants $SIG{"ALRM"}=\&sigAlrm; $uInterval=$interval*1000000; # Now we can enable our alarm and sleep for at least a full interval, from # which we'll awake by a 'sigalrm'. The first time thought is based on our # alignment, which may be '$interval2', but after that it's always '$interval' # Also note use of arg2 to note first call since arg1 always set to 'ALRM' # when it fires normally. $uAlignInt=$AlignInt*1000000; sigAlrm(undef, 1); sleep $AlignInt+1; $uAlignInt=$uInterval; sigAlrm(); # we're now aligned so reset timer } if ($debug & 1 && $options=~/x/i) { $temp=$limBool ? "AND" : "OR"; print "Exception Processing In Effect -- SVC: $limSVC $temp IOS: $limIOS ". "LusKBS: $LimLusKBS LusReints: $LimLusReints\n" } # remind user we always wait until second sample before producing results # if only yY, Z or E or both, we don't wait for the standard interval $temp=$interval; $temp=$interval2 if $subsys=~/^[EyYZ]+$/; $temp=$interval3 if $subsys eq 'E'; print "waiting for $temp second sample...\n" if $filename eq "" && !$quietFlag; # Need to make sure proc's and env's align with printing of other vars first # time. In other words, do the first read immediately. $counted2=($subsys=~/[yYZ]/) ? $limit2-1 : 0; $counted3=($subsys=~/E/) ? $limit3-1 : 0; # Figure out how many intervals we want to check for lustre config changes, # noting that in the debugging case where the interval is 0, we calculate it # based on a day's worth of seconds. $lustreCheckCounter=0; $lustreCheckIntervals=($interval!=0) ? int($lustreConfigInt/$interval) : int($count/(86400/$lustreConfigInt)); $lustreCheckIntervals=1 if $lustreCheckIntervals==0; print "Lustre Check Intervals: $lustreCheckIntervals\n" if $debug & 8; # Looks like only HP-SFS should skip leading 7 fields of client OST data my $lustreCltOstSkip=($sfsVersion ne '') ? 7 : 0; # Same thing (sort of) for interconnect interval $interConnectCounter=0; $interConnectIntervals=($interval!=0) ? int($InterConnectInt/$interval) : int($count/(86400/$InterConnectInt)); $interConnectIntervals=1 if $interConnectIntervals==0; print "InterConnect Interval: $interConnectIntervals\n" if $debug & 2; if ($options=~/i/) { my $temp=buildCommonHeader(0, undef); printText($temp); } # Wait until the last minute to set up the scrolling region so if we crap out # earlier we haven't screwed up the terminal. printf "%c[3;%dr", 27, $scrollEnd if $scrollEnd; # M a i n P r o c e s s i n g L o o p # This is where efficiency really counts # $lastHour lets us figure out when it's a new day my $lastHour=(localtime($rollSecs))[2]; my $lastFirstPid=0; for (; $count!=0 && !$doneFlag; $count--) { # When in server mode we always need to check for readable socket # but be sure to do without any timeout if ($serverFlag) { print "Looking for connection...\n" if $debug & 64; if ($newHandle=($select->can_read(0))[0]) { print "Socket 'can read'\n" if $debug & 64; if ($newHandle==$sockServer) { $socket=$sockServer->accept() || logmsg('F', "Couldn't accept socket request. Reason: $!"); $select->add($socket); push @sockets, $socket; my $client=inet_ntoa((sockaddr_in(getpeername($socket)))[1]); logmsg('I', "New socket connection from $client"); } else { $socket=$newHandle; my $message=<$socket>; if (!defined($message)) { logmsg('W', "Client closed socket"); for (my $i=0; $iremove($socket); $socket->close(); } else { print "Received: $message" if $debug & 64; } } } } # Use the same value for seconds for the entire cycle if ($hiResFlag) { # we have to fully qualify name because or 'require' vs 'use' ($intSeconds, $intUsecs)=Time::HiRes::gettimeofday(); } else { $intSeconds=time; $intUsecs=0; } # T i m e F o r a N e w L o g ? if ($logToFileFlag && $rollSecs) { # if time to roll, do so and recalculate next roll time. if ($intSeconds ge $rollSecs) { # if new day, do day-level inits of which there aren't all that many ($sec, $min, $hour, $day, $mon, $year)=localtime($rollSecs); initDay() if $hour<$lastHour; $lastHour=$hour; # We need to make sure each logfile has headers. Since this flag is used interactively # as well we can't just clear it here. $zlibErrors=$headersPrinted=0; newLog($filename, "", "", "", "", ""); $rollSecs+=$rollIncr*60; # Just like the logic above to calculate the time of our first roll, we # need to see if we're going to cross a time change boundary if ($rollIncr>60) { #print "EXP: $expectedHour HOUR: $hour\n"; my $diff=($expectedHour-$hour); $diff=1 if $diff==-23; $rollSecs+=$diff*3600; $expectedHour+=$rollIncr/60; $expectedHour%=24; logmsg("I", "Time change! Did you remember to change your watch?") if $diff!=0; } logmsg("I", "Logs rolled"); } } # G a t h e r S T A T S # This is the section of code that needs to be reasonably efficient. but first, start # the interval with a time marker noting we have to first make sure we're padding with # 0's, then truncate to 2 digit precision noting is a rawp file we only write a marker # during interval2 $counted2++; $fullTime=sprintf("%d.%06d", $intSeconds, $intUsecs); record(1, sprintf(">>> %.3f <<<\n", $fullTime)) if $recFlag0; record(1, sprintf(">>> %.3f <<<\n", $fullTime), undef, 1) if $recFlag1 && $counted2==$limit2; ############################################################## # S t a n d a r d I n t e r v a l P r o c e s s i n g ############################################################## if ($bFlag || $BFlag) { getProc(0, "/proc/buddyinfo", "buddy"); } if ($cFlag || $CFlag || $dFlag || $DFlag) { # Too crazy to do in getProc() though maybe someday should be moved there open PROC, ") { last if $line=~/^kstat/; record(2, $line) if (( ($cFlag || $CFlag) && $line=~/^cpu|^ctx|^proc/) || ( $DFlag && !$hiResFlag && $line=~/^cpu /)); record(2, "$1\n") if ($cFlag || $CFlag) && $line=~/(^intr \d+)/; } close PROC; } if ($jFlag || $JFlag) { getProc(0, '/proc/interrupts', 'int', 1); } if ($dFlag || $DFlag) { getProc(9, "/proc/diskstats", "disk", undef, 20000); } if ($cFlag || $CFlag) { getProc(0, "/proc/loadavg", "load"); } if ($tFlag || $TFlag) { getProc(20, "/proc/net/netstat", 'tcp') if $tcpFilt=~/[IT]/; getProc(21, "/proc/net/snmp", 'tcp') if $tcpFilt=~/[cimtu]/; } if ($iFlag) { getProc(0, "/proc/sys/fs/dentry-state", "fs-ds") if $dentryFlag; getProc(0, "/proc/sys/fs/inode-nr", "fs-is") if $inodeFlag; getProc(0, "/proc/sys/fs/file-nr", "fs-fnr") if $filenrFlag; } if ($lFlag || $LFlag || $lustOpts=~/O/) { # Check to see if any services changed and if they did, we may need # a new logfile as well. if (++$lustreCheckCounter==$lustreCheckIntervals) { newLog($filename, "", "", "", "", "") if lustreCheckClt()+lustreCheckOst()+lustreCheckMds()>0 && $filename ne ''; $lustreCheckCounter=0; } # This data actually applies to both MDS and OSS servers and if # both services are running on the same node we're only going to # want to collect it once. if ($lustOpts=~/D/ && ($NumMds || $OstFlag)) { my $diskNum=0; foreach my $diskname (@LusDiskNames) { # Note that for scsi, we read the whole thing and for cciss # quit when we see the line with 'index' in it. Also note that # for sfs V2.2 we need to skip more for cciss than sd disks $diskSkip=($sfsVersion lt '2.2' || $LusDiskDir=~/sd_iostats/) ? 2 : 14; $statfile="$LusDiskDir/$diskname"; getProc(2, $statfile, "LUS-d_$diskNum", $diskSkip, undef, 'index'); $diskNum++; } } # OST Processing if ($OstFlag) { # Note we ALWAYS read the base ost data for ($ostNum=0; $ostNum<$NumOst; $ostNum++) { $dirspec="/proc/fs/lustre/obdfilter/$lustreOstSubdirs[$ostNum]"; getProc(1, "$dirspec/stats", "OST_$ostNum", undef, undef, "^io"); # for versions of SFS prior to 2.2, there are only 9 buckets of BRW data. getProc(2, "$dirspec/brw_stats", "OST-b_$ostNum", 4, $numBrwBuckets) if $lustOpts=~/B/; } } # MDS Processing if ($NumMds) { my $type=($cfsVersion lt '1.6.0') ? 'MDT' : 'MDS'; getProc(3, "/proc/fs/lustre/mdt/$type/mds/stats", "MDS"); } # CLIENT Processing if ($CltFlag) { $fsNum=0; foreach $subdir (@lustreCltDirs) { # For vanilla -sl we only need read/write info, but lets grab metadata file # we're at it. In the case of --lustopts R, we also want readahead stats getProc(11, "/proc/fs/lustre/llite/$subdir/stats", "LLITE:$fsNum", 1, 19); getProc(0, "/proc/fs/lustre/llite/$subdir/read_ahead_stats", "LLITE_RA:$fsNum", 1) if $lustOpts=~/R/; $fsNum++; } # RPC stats are optional for both clients and servers if ($lustOpts=~/B/) { for ($index=0; $index<$NumLustreCltOsts; $index++) { getProc(2, "$lustreCltOstDirs[$index]/rpc_stats", "LLITE_RPC:$index", 8, 11); } } # Client OST detail data if ($lustOpts=~/O/) { for ($index=0; $index<$NumLustreCltOsts; $index++) { getProc(12, "$lustreCltOstDirs[$index]/stats", "LLDET:$index ", $lustreCltOstSkip); } } } } # even if /proc not there (nothing exported/mounted), it could # show up later so we need to be sure and look every time if ($fFlag || $FFlag) { getProc(8, '/proc/net/rpc/nfs', "nfsc-") if $nfsCFlag; getProc(8, '/proc/net/rpc/nfsd', "nfss-") if $nfsSFlag; } if ($mFlag) { getProc(0, "/proc/meminfo", ""); getProc(5, "/proc/vmstat", ""); } # NOTE - unlike other detail data this is only recorded when explicitly requested if ($MFlag) { for (my $i=0; $i<$CpuNodes; $i++) { # skip first line which is blank, noting 'Node X' is already part of data getProc(0, "/sys/devices/system/node/node$i/meminfo", "numai", 1); # only if we want hits, adds about 2 secs for 2 node home box getProc(0, "/sys/devices/system/node/node$i/numastat", "numas Node $i"); } } if ($sFlag) { getProc(0, "/proc/net/sockstat", "sock"); } if ($nFlag || $NFlag) { if ($rawNetFilter eq '' && $rawNetIgnore eq '') { getProc(0, "/proc/net/dev", "Net", 2); } else { getProc(7, "/proc/net/dev", "Net", 2); } } if ($xFlag || $XFlag) { # Whenever we hit the end of interconnect checking interval we need to # see if any of them changed configuration (such as an IB port fail-over) # NOTE - we do the $filename test last so we ALWAYS do the ib checks # even if printing to terminal. if (++$interConnectCounter==$interConnectIntervals) { newLog($filename, "", "", "", "", "") if ($mellanoxFlag || $opaFlag) && ibCheck() && $filename ne ''; $interConnectCounter=0; } if (($mellanoxFlag || $opaFlag) && $NumHCAs) { for (my $i=0; $i<$NumHCAs; $i++) { if ( -e $SysIB ) { foreach my $j (1..2) { # only read if port active if ($HCAPorts[$i][$j]) { # for OPA V4, /sys counters always 64 bits if ($HCAOpaV4[$i][$j]) { my $proc="$SysIB/$HCAName[$i]0/ports/$j/counters"; getProc(0, "$proc/port_rcv_data", "ib$i-$j:rcvd"); getProc(0, "$proc/port_xmit_data", "ib$i-$j:xmtd"); getProc(0, "$proc/port_rcv_packets", "ib$i-$j:rcvp"); getProc(0, "$proc/port_xmit_packets", "ib$i-$j:xmtp"); } elsif ($HCAName[$i]=~/hfi/) { getExec(5, "/sbin/opapmaquery -h $HCAId[$i] -p $j -o getdatacounters", "ib$i-$j:opa"); } # non-opa V4 counters but present in /sys, counters come from counters_ext elsif ($PQopt eq 'sys') { my $proc="$SysIB/$HCAName[$i]$i/ports/$j/counters_ext"; getProc(0, "$proc/port_rcv_data_64", "ib$i-$j:rcvd"); getProc(0, "$proc/port_xmit_data_64", "ib$i-$j:xmtd"); getProc(0, "$proc/port_rcv_packets_64", "ib$i-$j:rcvp"); getProc(0, "$proc/port_xmit_packets_64", "ib$i-$j:xmtp"); } # we only use perfquery for 32bit counters or 64 when not in # available in sys and then only if perfquery hasn't been disabled elsif ( -e $PQuery ) { getExec(1, "$PQuery $PQopt -C $HCAId[$i] -P $j", "ib$i-$j:pquery"); } } } } } } } # Custom data import logdiag("begin import data") if ($utimeMask & 1) && $impNumMods; for (my $i=0; $i<$impNumMods; $i++) { &{$impGetData[$i]}(); } logdiag("interval1 done") if $utimeMask & 1; ############################################# # I n t e r v a l 2 P r o c e s s i n g ############################################# if (($yFlag || $YFlag || $ZFlag) && $counted2==$limit2) { if ($yFlag || $YFlag) { # NOTE - $SlabGetProc is either 99 for all slabs or 14 for selective if ($slabinfoFlag) { getProc($SlabGetProc, "/proc/slabinfo", "Slab", 2); } else { # Reading the whole directory and skipping links via the 'skip' hash # is only about about 1/2 second slower over the day so let's just do it. opendir SLUBDIR, "/sys/slab" or logmsg('E', "Couldn't open '/sys/slub'"); while ($slab=readdir SLUBDIR) { next if $slab=~/^\./; next if $slabFilt ne '' && !defined($slabdata{$slab}); next if defined($slabskip{$slab}); # See if a new slab appeared, noting this doesn't apply when using # --slabfilt because of the optimization 'next' for '$slabFilt' above # also remember since we're only looking at root slabs, we'll never # discover 'linked' ones if (!defined($slabdata{$slab})) { $newSlabFlag=1; logmsg("W", "New slab detected: $slab"); } # Whenever there are 'new' slabs to read (which certainly includes the first # full pass or any time we change log files) read constants before reading # variant data. getSys('Slab', '/sys/slab', $slab, 1, ['object_size', 'slab_size', 'order','objs_per_slab']) if $firstPass || $newRawSlabFlag || $newSlabFlag; getSys('Slab', '/sys/slab', $slab, 1, ['objects', 'slabs']); $newSlabFlag=0; } } } if ($ZFlag) { # need to know when we're looking at the first proc of this interval $firstProcCycle=1; # Process Monitoring RULES # if --procopt p OR --procfilt p and only pids # - only look at pids in %pidProc and nothing more # - if + and no --procopt t, never look for new threads # - if --procopt t, always look for new threads whether + or not # else always look for new processes # - if --procopt p look for threads for each pid undef %pidSeen; if ($pidOnlyFlag) { foreach $pid (keys %pidProc) { # When looking at threads, we read ALL data from /proc/pid/task/pid # rather than /proc/pid so we can be assured we only seeing stats # for the main process. Later on too... # But also note earliest kernels only support process io under /proc/pid $task=$taskio=($allThreadFlag || $oneThreadFlag) ? "$pid/task/" : ''; $taskio='' if !-e "/proc/$pid/task/$pid/io"; # note that not everyone has 'Vm' fields in status so we need # special checks. Also note both here and below whenever we process a pid # and not --procopt p (we could have gotten here via --procfilt p...) and # we're doing threads on this pid, see if any new threads showed up. If # this gets much more involved it should probably become a sub since we do # it below too. $pidSeen{$pid}=getProc(17, "/proc/$task/$pid/stat", "proc:$pid stat", undef, 1); $pidSeen{$pid}=getProc(13, "/proc/$task/$pid/status", "proc:$pid") if $pidSeen{$pid}==1; $pidSeen{$pid}=getProc(16, "/proc/$task/$pid/cmdline", "proc:$pid cmd", undef, 1) if $pidSeen{$pid}==1; $pidSeen{$pid}=getProc(17, "/proc/$taskio/$pid/io", "proc:$pid io") if $pidSeen{$pid}==1 && $processIOFlag && ($rootFlag || -r "/proc/$taskio/$pid/io"); findThreads($pid) if $allThreadFlag || ($oneThreadFlag && $procOpts!~/p/ && $pidThreads{$pid}); } } else { opendir DIR, "/proc" or logmsg("F", "Couldn't open /proc"); while ($pid=readdir(DIR)) { next if $pid=~/^\./; # skip . and .. next if $pid!~/^\d/; # skip not pids next if defined($pidSkip{$pid}); next if !defined($pidProc{$pid}) && pidNew($pid)==0; # see comment in previous block $task=$taskio=($allThreadFlag || $oneThreadFlag) ? "$pid/task/" : ''; $taskio='' if !-e "/proc/$pid/task/$pid/io"; print "%%% READPID $pid\n" if $debug & 256; $pidSeen{$pid}=getProc(17, "/proc/$task/$pid/stat", "proc:$pid stat", undef, 1); $pidSeen{$pid}=getProc(13, "/proc/$task/$pid/status", "proc:$pid") if $pidSeen{$pid}==1; $pidSeen{$pid}=getProc(16, "/proc/$task/$pid/cmdline", "proc:$pid cmd", undef, 1) if $pidSeen{$pid}==1; $pidSeen{$pid}=getProc(17, "/proc/$taskio/$pid/io", "proc:$pid io") if $pidSeen{$pid}==1 && $processIOFlag && ($rootFlag || -r "/proc/$taskio/$pid/io"); findThreads($pid) if $allThreadFlag || ($oneThreadFlag && $procOpts!~/p/ && $pidThreads{$pid}); } } # if --procopts t OR '+' with --procfilt if ($allThreadFlag || $oneThreadFlag) { foreach $pid (keys %tpidProc) { # Location of thread stats is below parent, but I/O only there when kernel patched! $task=$taskio=($allThreadFlag || $oneThreadFlag) ? "$pid/task/" : ''; $taskio='' if !-e "/proc/$pid/task/$pid/io"; # The 'T' lets the processing code know it's a thread for formatting purposes $tpidSeen{$pid}=getProc(17, "/proc/$task/$pid/stat", "procT:$pid stat", undef, 1); $tpidSeen{$pid}=getProc(13, "/proc/$task/$pid/status", "procT:$pid") if $tpidSeen{$pid}==1; $tpidSeen{$pid}=getProc(17, "/proc/$taskio/$pid/io", "procT:$pid io") if $tpidSeen{$pid}==1 && $processIOFlag && ($rootFlag || -r "/proc/$taskio/$pid/io"); } } # how else will we know if a process exited? # This will also clean up stale thread pids as well. cleanStalePids(); } $counted2=0; logdiag("interval2 done") if $utimeMask & 1; } ############################################# # I n t e r v a l 3 P r o c e s s i n g ############################################# if ($EFlag && ++$counted3==$limit3) { # On the off chance someone deleted it (how do you say overkill?) if (!-e $IpmiCache) { logmsg('E', "Who deleted my cache file '$IpmiCache'?"); logmsg('I', "Recreated missing cache file"); $command="$Ipmitool sdr dump $IpmiCache"; `$command`; } # About the same overhead to invoke ipmitool twice but much less elapsed time. getExec(3, "$Ipmitool -c -S $IpmiCache exec $ipmiExec", 'ipmi'); $counted3=0; logdiag("interval3 done") if $utimeMask & 1; } ########################################################### # E n d O f I n t e r v a l P r o c e s s i n g ########################################################### # if printing to terminal OR generating data in plot format (or both) # we need to wait until the end of the interval so complete data is in hand if (!$logToFileFlag || $plotFlag || $export ne '') { $fullTime=sprintf("%d.%06d", $intSeconds, $intUsecs); intervalEnd(sprintf("%.3f", $fullTime)); logdiag('interval processed') if $utimeMask & 1; } # If there was a disk configuration change and writing to plot files (changes # can't be detected when writing to raw file), create new log files. if ($diskChangeFlag && $plotFlag && $filename ne '') { logmsg('I', 'Creating new log file') if $options=~/u/; logmsg('W', 'all data mixed in same file! use -ou to force unique files!') if $options!~/u/; newLog($filename, "", "", "", "", ""); } $diskChangeFlag=0; # If our parent's pid went away we're done, unless --nohup specified or we're a daemon if (!-e "/proc/$myPpid" && !$daemonFlag && !$nohupFlag) { logmsg('W', 'parent exited and --nohup not specified'); last; } # if we'll pass the end time while asleep, just get out now. last if $endSecs && ($intSeconds+$interval)>$endSecs; # NOTE - I tried used select() as timer when no HiRes but got premature # wakeups on early 2.6 testing and so went back to sleep(). Also, in # case we lose our wakeup signal, only sleep as long as requested noting # we SHOULD get woken up before this timer expires since we already used # up part of our interval with data collection flushBuffers() if !$autoFlush && $flushTime && time>=$flushTime; if ($interval!=0) { sleep $interval if !$hiResFlag; Time::HiRes::usleep($uInterval) if $hiResFlag; } $firstPass=0; $newRawSlabFlag=0 if $counted2==0; # interval 2 just processed next; } # the only easy way to tell a complete interval is by writing a marker, with # not time, since we don't need it anyway. if ($hiResFlag) { ($intSeconds, $intUsecs)=Time::HiRes::gettimeofday(); $fullTime=sprintf("%d.%06d", $intSeconds, $intUsecs); } else { $fullTime=time; } record(1, sprintf(">>> %.3f <<<\n", $fullTime)) if $recFlag0; record(1, sprintf(">>> %.3f <<<\n", $fullTime), undef, 1) if $recFlag1; # close logs cleanly and turn echo back on because when 'brief' we turned it off. closeLogs($subsys); unlink $PidFile if $daemonFlag; `stty echo` if !$PcFlag && $termFlag && !$backFlag; # clean up when in pure top mode if ($numTop && !$topVertFlag) { printf("%c[r", 27) if $userSubsys ne ''; printf "%c[%d;H\n", 27, $scrollEnd+$numTop+2; } logmsg("I", "Terminating..."); logsys("Terminating..."); sub preprocSwitches { my $switches=''; foreach $switch (@ARGV) { # Cleaner to not allow -top and force --top error("invalid switch '$switch'. did you mean -$switch?") if $switch=~/^-to/; # multichar switches COULD be single char switch and option if ($switch=~/^-/ && length($switch)>2) { $use=substr($switch, 0, 2).' '.substr($switch,2); error("invalid switch '$switch'. did you mean -$switch? if not use '$use'") if $switch=~/^-al|^-ad|^-be|^-co|-^de|^-de|^-en|^-fl|^-he|^-no|^-in|^-ra/; error("invalid switch '$switch'. did you mean -$switch? if not use '$use'") if $switch=~/^-li|^-lu|^-me|-^ni|^-op|^-su|^-ro|^-ru|^-ti|^-wi|^-pr/; } $switches.="$switch "; } return($switches); } # This only effects multiple files for the same system on the same day. # In most cases, those log files will have been run with the same parameters # and as a result when their output is simply merged into single 'tab' or # detail files. However on rare occasions, the configurations will NOT be the # same and the purpose of this function is to recognize that and change the # processing parameters according, if possible. The best example of this is # if one generates one log based on -scd and a second on -scm. By forcing # the processing of both to be -scdm, the resultant 'tab' file will contain # everything. Alas, things get more complicated with detail files and even # more so with lustre detail files if filesystems are mounted/umounted, etc. # In any event, the details are in the code... # # NOTE - if any files cannot be processed, none will and the user will be # require to change command options sub preprocessPlayback { my $playbackref=shift; my ($selected, $header, $i); my ($lastPrefix, $thisSubSys, $thisInterval, $lastInterval, $mergedInterval); my ($lastSubSys, $lastSubOpt, $lastNfs, $lastDisks, $lastLustreConfig, $lastLustreSubSys); local ($configChange, $filePrefix, $file); $selected=0; $configChange=0; $lastPrefix=$lastLustreConfig=$lastInterval=$mergedInterval=""; foreach $file (@$playbackref) { print "Preprocessing: $file\n" if $debug & 2048; # need to do individual file checks in case filespec matches bad files if ($file!~/(.*-\d{8})-\d{6}\.raw[p]*/) { $preprocErrors{$file}="I:its name is wrong format"; next; } $filePrefix=$1; $playback{$filePrefix}->{flags}|=0 if !defined($playback{$filePrefix}->{flags}); if (-z $file) { $preprocErrors{$file}="I:its size is zero"; next; } if ($file!~/raw$|rawp$|gz$/) { $preprocErrors{$file}="I:it doesn't end in 'raw', 'rawp' or 'gz'"; next; } # If any files in 'gz' format, make sure we can cope. $zInFlag=0; if ($file=~/gz$/ && !$zlibFlag) { $zInFlag=1; $preprocErrors{$file}="E:Zlib not installed"; next; } # Read header - cleanup code in newlog: see call to getHeader in newLog() # Set flags based on whether raw or rawp $header=getHeader($file); $header=~/SubSys:\s+(\S+)/; $thisSubSys=$1; $playback{$filePrefix}->{flags}|=1 if $file=~/\.rawp/; $playback{$filePrefix}->{flags}|=2 if $file!~/\.rawp/; # We finally dropped SubOpts and nfsOpts from the header in V3.2.1-5, but not LustOpts my $subOpts=($header=~/SubOpts:\s+(\S*)\s+Options:/) ? $1 : ''; my $thisNfsOpts= ($header=~/NfsOpts: (\S*)\s*Interval/) ? $1 : $subOpts; my $thisDisks= ($header=~/DiskNames: (.*)/) ? $1 : ''; my $thisLustOpts=($header=~/LustOpts: (\S*)\s*Services/) ? $1 : $subOpts; $thisNfsOpts=~s/[BDMORcom]//g; # in case it came from SubOpts remove lustre stuff, 'com' for pre-lustsvc $thisLustOpts=~s/[234C]//g; # ditto for nfs stuff $header=~/Interval:\s+(\S+)/; $thisInterval=$1; # If user specified '--procopts i' and file doesn't have data, we can't process it $flags=($header=~/Flags:\s+(\S+)/) ? $1 : ''; if ($procOpts=~/i/ && $flags!~/i/) { $preprocErrors{$file}="E:--procopts i requested but data not present in file"; next; } # we need to merge intervals if user has selected her own AND set a flag so # changeConfig() will update %playbackSettings{} correctly # NOTE - this has never been allowed as -i not allowed in playback if ($userInterval ne '') { $configChange=4; # will cause config change processing AND -m notice $mergedInterval=mergeIntervals($thisInterval, $mergedInterval); # on subsequent files, we need to check for interval consistency if ($filePrefix eq $lastPrefix) { print "Merged Intervals: $mergedInterval\n" if $debug & 2048; my ($int1, $int2, $int3)= split(/:/, $mergedInterval); my ($uint1, $uint2, $uint3)=split(/:/, $userInterval); $preprocErrors{$file}="E:common interval '$mergedInterval' not self-consistent" if (defined($int2) && ($int1>$int2 || int($int2/$int1)*$int1!=$int2)) || (defined($int3) && ($int1>$int3 || int($int3/$int1)*$int1!=$int3)); $preprocErrors{$file}="E:common interval '$mergedInterval' has value(s) > $userInterval" if $uint1<$int1 || (defined($uint2) && defined($int2) && $unint2<$int2) || (defined($uint3) && defined($int3) && $unint3<$int3); $preprocErrors{$file}="E:common interval '$mergedInterval' not consistent with $userInterval" if (int($uint1/$int1)*$int1!=$uint1) || (defined($unint2) && defined($int2) && (int($uint2/$int2)*$int2!=$uint2)) || (defined($unint3) && defined($int3) && (int($uint3/$int3)*$int3!=$uint3)); } } print "File: $file FileSubSys: $thisSubSys NfsOpts: $thisNfsOpts LustOpts: $thisLustOpts\n" if $debug & 2048; # note that -s and --lustsvc override anything in the files AND in case -s contained +/- # we need to do a merge rather than a wholesale replace $thisSubSys=mergeSubsys($thisSubSys); $lastLustreConfig=$lustreSvcs if $lustreSvcs ne ''; $lastLustreConfig.='|||'; # it's only if the prefix for this file is the same as the last that # we have to do all our interval merging and consistency checks. $selected++; if ($filePrefix ne $lastPrefix) { configChange($lastPrefix, $lastSubSys, $lastLustreConfig, $mergedInterval) if $lastPrefix ne ''; # New prefix, so initialize for subsequent tests $newPrefix=1; $configChange=0; $mergedInterval=''; # this returns client/server and version or null string $thisNfs=checkNfs("", $thisSubSys, $thisNfsOpts); ($thisLustreConfig, $lastLustOpts)=checkLustre('', $header, '', $thisLustOpts); $lastDisks=checkDisks('', $thisDisks) if $thisSubSys=~/D/; # useful for telling what may have changed $playback{$filePrefix}->{subsysFirst}=$thisSubSys; } else # subsequent files (if any) for same prefix-date { # subsystem checks $newPrefix=0; $thisSubSys=checkSubSys($lastSubSys, $thisSubSys); $thisNfs= checkNfs($thisNfs, $thisSubSys, $thisNfsOpts); ($thisLustreConfig, $lastLustOpts)= checkLustre($lastLustreConfig, $header, $lastLustOpts, $thisLustOpts); $lastDisks=checkDisks($lastDisks, $thisDisks) if $thisSubSys=~/D/; } $lastPrefix=$filePrefix; $lastSubSys=$thisSubSys; $lastLustreConfig=$thisLustreConfig; $playback{$filePrefix}->{subsys}=$thisSubSys; } # If multiple files for this prefix processed there are outstanding # potential changes we need to check for. configChange($lastPrefix, $lastSubSys, $lastLustreConfig, $mergedInterval) if $selected && !$newPrefix; } # if no -s, return default subsys # if -s but no +/- return -s # otherwise merge... sub mergeSubsys { my $default=shift; my $newSubsys=$default; if ($userSubsys ne '') { if ($userSubsys!~/[+-]/) { $newSubsys=$userSubsys; } else { if ($userSubsys=~/-(.*)/) { my $pat=$1; $pat=~s/\+.*//; # if followed by '+' string $default=~s/[$pat]//g; # remove matches } if ($userSubsys=~/\+(.*)/) { my $pat=$1; $pat=~s/-.*//; # remove anything after '-' string $default.=$pat; # add matches } $newSubsys=$default; } } return($newSubsys); } # This purpose of this routine is to look at the intervals from multiple headers # and figured out what common intervals would be needed to process them all if the # user wanted to override them. In effect determine the 'least commmon interval', # only I'm not going to be too precise since virtually all the time these files # WILL have the same intervals and calculating the LCI will be a lot of work. sub mergeIntervals { my $interval=shift; my $merged= shift; my ($mgr1, $mrg2, $mrg3)=split(/:/, $merged); my ($int1, $int2, $int3)=split(/:/, $interval); # if any intervals aren't in the merged list, simply move them in # which will always be the case the first time through $mrg1=$int1 if !defined($mrg1) || $mrg1 eq ''; $mrg2=$int2 if !defined($mrg2) || $mrg2 eq ''; $mrg3=$int3 if !defined($mrg3) || $mrg3 eq ''; # get least common intervals, but only if new value defined $mrg1=lci($int1, $mrg1); $mrg2=lci($int2, $mrg2) if defined($int2); $mrg3=lci($int3, $mrg3) if defined($int3); # return the list of merged intervals $merged=$mrg1; $merged.=":$mrg2" if defined($mrg2); $merged.=(defined($mrg2)) ? ":$mrg3" : "::$mrg3" if defined($mrg3); return($merged); } sub lci { my $new=shift; my $old=shift; $lci=$old; if ($new>$old) { # if a common multiple, use new interval for lci; other return their product # which will be common but may NOT be the LEAST common! $lci=($new==int($new/$old)*$old) ? $new : $old*$new; } else { # same thing only see if $old a multiple of $new $lci=($old==int($old/$new)*$new) ? $old : $old*$new; } } sub configChange { my $prefix= shift; my $subsys= shift; my $config= shift; my $interval=shift; my ($services, $mdss, $osts, $clts); my ($i, $type, $names, $temp, $index); ($services, $mdss, $osts, $clts)=split(/\|/, $config); print "configChange() -- Pre: $prefix Svcs: $services Mds: $mdss Osts: $osts Clts: $clts Int: $interval\n" if $debug & 8; # Usually there are no existing messages, but we gotta check... $index=defined($preprocMessages{$prefix}) ? $preprocMessages{$prefix} : 0; if ($configChange) { $preprocMessages{$prefix.'|'.$index++}=" -s overridden to '$subsys'" if $configChange & 1; $preprocMessages{$prefix.'|'.$index++}=" --lustsvr overridden to '$services'" if $configChange & 2; $preprocMessages{$prefix.'|'.$index++}=" -i overridden from '$interval' to '$userInterval'" if $configChange & 4; foreach $i (8,16,32) { next if !($configChange & $i); if ($i==8) { $types=$mdss; $temp="MDS"; } if ($i==16) { $types=$osts; $temp="OST"; } if ($i==32) { $types=$clts; $temp="Client"; } $preprocMessages{$prefix.'|'.$index++}=" combined Lustre $temp objects now '$types'"; } $preprocMessages{$prefix}=$index; $playbackSettings{$prefix}="$subsys|$services|$mdss|$osts|$clts|$interval"; print "Playback -- Prefix: $prefix Settings: $playbackSettings{$prefix}\n" if $debug & 2048; } # Send these to log if we're not running interactively and -m not specified for ($i=0; !$termFlag && !$msgFlag && $i<$index; $i++) { logmsg("W", $preprocMessages{$prefix.'|'.$i}); } return; } sub checkSubSys { my $lastSubSys=shift; my $thisSubSys=shift; my ($nextSubSys, $i); print "Check SubSys -- Last: $lastSubSys This: $thisSubSys\n" if $debug & 2048; # if any differences between 'this' and 'last', we have a config change. my $temp1=$thisSubSys; $temp1=~s/[$lastSubSys]//g; # remove 'last' from 'this' my $temp2=$lastSubSys; $temp2=~s/[$thisSubSys]//g; # remove 'this' from 'last' $configChange|=1 if $temp1 ne '' || $temp2 ne ''; # $temp1 contains any NEW subsys in current file, so add them to 'last' $lastSubSys.=$temp1; $preprocErrors{$file}="E:-P and details to terminal from multiple raw files not allowed." if $lastSubSys=~/[A-Z]/ && $filename eq '' && $plotFlag; return($lastSubSys); # has new sub-systems appended } sub checkSubsysOpts { error("you cannot mix --slabopts with --top") if $slabOpts ne '' && $topSlabFlag; error("invalid slab option in '$slabOpts'") if $slabOpts ne '' && $slabOpts!~/^[sS]+$/; error("invalid env option in '$envOpts'") if $envOpts ne '' && $envOpts!~/^[fptCFMT\d]+$/; $procUsrWidth=8; if ($procOpts ne '') { $procUsrWidth=($procOpts=~s/u(\d+)/u/) ? $1 : 12 if $procOpts=~/u/; $procCmdWidth=($procOpts=~s/w(\d+)/w/) ? $1 : 1000; error("minumum username width is 8") if $procUsrWidth<8; error("invalid process option '$procOpts'") if $procOpts!~/^[cfiIkmprRsStuwxz]+$/; error("process options i and m are mutually exclusive") if $procOpts=~/i/ && $procOpts=~/m/; error("your kernel doesn't support process extended info") if $procOpts=~/x/ && !$processCtxFlag; error("--procopts z can only be used with --top") if !$numTop && $procOpts=~/z/; } error("--procstate not one or more of 'DRSTWZ'") if $procState ne '' && $procState!~/^[DRSTWZ]+$/; # it's possible this is not recognized as running a particular type of service # from the 'flag's if that service is isn't yet started and so we need # to check $lustreSvcs too. Be sure to include '$userSubsys' in case -sl was # specified by the user and then disabled by collectl. error("--lustsvcs only applies to lustre") if $lustreSvcs ne '' && $subsys!~/l/i && $userSubsys!~/l/i; my $cltFlag=($CltFlag || $lustreSvcs=~/c/i) ? 1 : 0; my $mdsFlag=($MdsFlag || $lustreSvcs=~/m/i) ? 1 : 0; my $ostFlag=($OstFlag || $lustreSvcs=~/o/i) ? 1 : 0; error("--lustopts only applies to lustre") if $lustOpts ne '' && $subsys!~/l/i; error("--lustopts B only applies to Lustre Clts/Osts") if $lustOpts=~/B/ && !$ostFlag && !$cltFlag; error("--lustopts D only applies to Lustre OSTs/MDSs") if $lustOpts=~/D/ && !$ostFlag && !$mdsFlag; error("--lustopts M only applies to Lustre Clients") if $lustOpts=~/M/ && !$cltFlag; error("--lustopts R only applies to Lustre Clients") if $lustOpts=~/R/ && !$cltFlag; error("--lustopts O only applies to client detail data") if $lustOpts=~/O/ && (!$cltFlag || $subsys!~/L/); error("you cannot mix --lustopts 'O' with 'M' or 'R'") if $lustOpts=~/O/ && $lustOpts=~/[MR]/; error("you cannot mix --lustopts 'B' with 'M'") if $lustOpts=~/B/ && $lustOpts=~/M/; error("you cannot mix --lustopts 'B' with 'R'") if $lustOpts=~/B/ && $lustOpts=~/R/; # Force if not already specified, but ONLY for details $lustOpts='BO' if $cltFlag && $subsys=~/L/ && $lustOpts=~/B/ && $lustOpts!~/O/; } sub checkNfs { my $lastNfs=shift; my $subsys= shift; my $subopt= shift; my $temp; print "checkNfs(): LastNfs: $lastNfs SubSys: $subsys SubOpt: $subopt\n" if $debug & 2048; $temp=''; if ($subsys=~/f/i) { $temp= ($subopt=~/C/) ? 'C' : 'S'; $temp.=($subopt=~/2/) ? '2' : '3'; } # all these are legal return($temp) if $lastNfs eq ''; return($lastNfs) if $temp eq ''; return($lastNfs) if $lastNfs eq $temp; # neither null, both MUST match # too tricky to handle all possible inconsistencies with multiple files # so we're only going to print a stock message $preprocErrors{$filePrefix}="E:confilicting nfs settings with other files of same prefix"; return($temp); } sub checkDisks { my $lastDisks=shift; my $thisDisks=shift; print "checkDisks(): Last: $lastDisks This: $thisDisks\n" if $debug & 2048; if (($lastDisks ne '') && ($thisDisks ne $lastDisks) && $options!~/u/) { $preprocErrors{$filePrefix}="E:confilicting disk names with other files of same prefix and -sD w/o -ou"; } return($thisDisks); } sub checkLustre { my $lastConfig= shift; my $header= shift; my $lastLustOpts=shift; my $thisLustOpts=shift; my ($temp, $thisConfig, $thisMdss, $thisOsts, $thisClts); my ($services, $mdss, $osts, $clts); print "checkLustre() -- LastConfig: $lastConfig LastOpts: $lastLustOpts ThisOpts: $thisLustOpts\n" if $debug & 2048; ($services, $mdss, $osts, $clts)=split(/\|/, $lastConfig); $services=$osts=$mdss=$clts='' if $lastConfig eq ''; # first time through # C h e c k L u s t r e S e r v i c e s # Remember, if set --lustsvcs trumps everything! if ($lustreSvcs eq '') { $thisConfig=''; if ($header=~/MdsNames:\s+(.*)\s*NumOst:\s+\d+\s+OstNames:\s+([^\n\r]*)$/m) { # for the first file of a new prefix, we just use the current mdss/osts # and only check for changes on subsequent calls if ($1 ne '') { $thisMdss=$1; $thisConfig.='m'; $mdss=($lastConfig eq '') ? $thisMdss : setNames(4, $thisMdss, $mdss); } if ($2 ne '') { $thisOsts=$2; $thisConfig.='o'; $osts=($lastConfig eq '') ? $thisOsts : setNames(8, $thisOsts, $osts); } } if ($header=~/CltInfo:\s+(.*)$/m) { $thisClts=$1; $thisConfig.='c'; $clts=($lastConfig eq '') ? $thisClts : setNames(16, $thisClts, $clts); } # see if anything new in config for ($i=0; $i>>>>>>>>>>>> Preproc Error: $errorText\n" if $debug & 2048; return(($lastConfig, $lastLustOpts)); } } return(("$services|$mdss|$osts|$clts", $thisLustOpts)); } sub setNames { my $type= shift; my $newNames=shift; my $oldNames=shift; my $name; print "Set Name -- Type: $type Old: $oldNames New: $newNames\n" if $debug & 8; # remember, it's ok for names to go away. we just want new ones! $oldNames=" $oldNames "; # to make pattern match work foreach $name (split(/\s+/, $newNames)) { if ($oldNames!~/ $name /) { $oldNames.="$name "; $configChange|=$type; } } $oldNames=~s/^\s+|\s+$//g; # trim leading/trailing space return($oldNames); } # This routine reads partial files AND has /proc specific processing # code for optimal performance. sub getProc { my $type= shift; my $proc= shift; my $tag= shift; my $ignore=shift; my $quit= shift; my $last= shift; my ($index, $line, $ignoreString); # matches one or 2 consective //s for pids because when no threads there are 2 of them logdiag("$proc") if ($utimeMask & 2) && ($proc!~/^\/proc\/?\/\d/) || ($utimeMask & 4) && ($proc=~/^\/proc\/?\/\d/); if (!open PROC, "<$proc") { # but just report it once, but not foe nfs or proc data logmsg("W", "Couldn't open '$proc'") if !defined($notOpened{$proc}) && $type!=8 && $type!=13 && $type!=16 && $type!=17; $notOpened{$proc}=1; return(0); } # Skip beginning if told to do so $ignore=0 if !defined($ignore); $quit=(defined($quit)) ? $ignore+$quit : 10000; for (my $i=0; $i<$ignore; $i++) { ; } $index=0; for (my $i=$ignore; $i<$quit; $i++) { last if !($line=); last if defined($last) && $line=~/$last/; # GENERIC - just prepend tag to records if ($type==0) { $spacer=$tag ne '' ? ' ' : ''; record(2, "$tag$spacer$line"); next; } # OST stats if ($type==1) { if ($line=~/^read/) { record(2, "$tag $line"); next; } if ($line=~/^write/) { record(2, "$tag $line"); next; } } # Client RPC and OST brw_stats AND mds/oss disk stats elsif ($type==2) { # for RPC and brw_stats, this block is virtually always 11 entries, # but the first time an OST is created it's not so we have to stop # when we hit a blank. In the case of disk stats, we call with # $last so it quites on the 'totals' row last if $line=~/^\s+$/; record(2, "$tag:$index $line"); $index++; } # MDS stats elsif ($type==3) { if ($line=~/^mds_/) { record(2, "$tag $line"); next; } } # type=4 no longer used # Memory elsif ($type==5) { next if $line=~/^nr/ && $line!~/^nr_sh/; next if $line=~/^numa/; last if $memOpts!~/[ps]/ && $line=~/^pgre/; # ignore from pgrefill forward last if $memOpts!~/s/ && $line=~/^pgst/; # ignore from pgstead forward last if $line=~/^pginode/; # ignore from pginodesteal and below record(2, "$line"); } elsif ($type==7) { next if ($rawNetIgnore ne '' && $line=~/$rawNetIgnore/) || ($rawNetFilter ne '' && $line!~/$rawNetFilter/); record(2, "$tag $line"); next; } # NFS elsif ($type==8) { # Can't use type==0 because we don't want a space after $tag record(2, "$tag$line"); } # /proc/diskstats & /proc/partitions # would be nice if we could improve even more since this table can # get quite large. elsif ($type==9) { next if $rawDskIgnore ne '' && $line=~/$rawDskIgnore/; # If disk filter NOT specified in collectl.conf, use the following syntax. # Even thought it matches internal constant $DiskFilter, it's a little # faster to as separate if statements if (!$DiskFilterFlag) { if ($line=~/hd[ab] /) { record(2, "$tag $line"); next; } if ($line=~/ sd[a-z]+ /) { record(2, "$tag $line"); next; } if ($line=~/dm-\d+ /) { record(2, "$tag $line"); next; } if ($line=~/xvd[a-z] /) { record(2, "$tag $line"); next; } if ($line=~/fio[a-z]+ /) { record(2, "$tag $line"); next; } if ($line=~/ vd[a-z] /) { record(2, "$tag $line"); next; } if ($line=~/emcpower[a-z]+ /) { record(2, "$tag $line"); next; } if ($line=~/psv\d+ /) { record(2, "$tag $line"); next; } if ($line=~/nvme\d+n\d+ /) { record(2, "$tag $line"); next; } } else { if ($line=~/$DiskFilter/) { record(2, "$tag $line"); next; } } } # /proc/fs/lustre/llite/fsX/stats elsif ($type==11) { if ($line=~/^dirty/) { record(2, "$tag $line"); next; } if ($line=~/^read_/) { record(2, "$tag $line"); next; } if ($line=~/^write_/) { record(2, "$tag $line"); next; } if ($line=~/^open/) { record(2, "$tag $line"); next; } if ($line=~/^close/) { record(2, "$tag $line"); next; } if ($line=~/^seek/) { record(2, "$tag $line"); next; } if ($line=~/^fsync/) { record(2, "$tag $line"); next; } if ($line=~/^getattr/) { record(2, "$tag $line"); next; } if ($line=~/^setattr/) { record(2, "$tag $line"); next; } } # /proc/fs/lustre/osc/XX/stats # since I've seen difference instances of SFS report these in different # locations we have to hunt them out, quitting after 'write' or course. elsif ($type==12) { # This is for the standard CFS/SUN release if ($sfsVersion eq '') { if ($line=~/^read_bytes/) { record(2, "$tag $line"); next; } if ($line=~/^write_bytes/) { record(2, "$tag $line"); last; } } else { # and this is the older HP/SFS V2.* if ($line=~/^ost_read/) { record(2, "$tag $line"); next; } if ($line=~/^ost_write/) { record(2, "$tag $line"); last; } } } # /proc/*/status - save it all! elsif ($type==13) { # only saving a subset because there is a lot of 'noise' in here # looks like not exiting early via ^Threads is costing ~10 seconds. If this # ever turns out to be an issue we could always make collecting the context # switches optional but for at least now I'm thinking we just do it! # since ^nonvol is the last entry no need for a test to exit loop earlier if ($line=~/^Tgid/) { record(2, "$tag $line", undef, 1); next; } if ($line=~/^Uid/) { record(2, "$tag $line", undef, 1); next; } if ($line=~/^Vm/) { record(2, "$tag $line", undef, 1); next; } if ($line=~/^vol/) { record(2, "$tag $line", undef, 1); next; } if ($line=~/^nonv/) { record(2, "$tag $line", undef, 1); next; } } # /proc/slabinfo - only if not doing all of them elsif ($type==14) { $slab=(split(/ /, $line))[0]; record(2, "$tag $line") if defined($slabProc{$slab}); } # /proc/pid/cmdline - only 1 line long elsif ($type==16) { $line=~s/\000/ /g; record(2, "$tag $line\n", undef, 1); last; } # identical to type 0, only it writes to process raw file elsif ($type==17) { $spacer=$tag ne '' ? ' ' : ''; record(2, "$tag$spacer$line", undef, 1); next; } # /proc/dev/netstat elsif ($type==20) { record(2, "tcp-$line") if ($tcpFilt=~/I/ && $line=~/^I/) || ($tcpFilt=~/T/ && $line=~/^T/); } # /proc/dev/netstat elsif ($type==21) { # no UdpLite or IcmpMsg (at least for now) if ($line=~/^Icmp:/ && $tcpFilt=~/c/) { record(2, "tcp-$line"); next; } elsif ($line=~/^Ip/ && $tcpFilt=~/i/) { record(2, "tcp-$line"); next; } elsif ($line=~/^T/ && $tcpFilt=~/t/) { record(2, "tcp-$line"); next; } elsif ($line=~/^Udp:/ && $tcpFilt=~/u/) { record(2, "tcp-$line"); next; } } # GENERIC 2 - same as generic but support for rawp file if ($type==99) { $spacer=$tag ne '' ? ' ' : ''; record(2, "$tag$spacer$line", undef, 1); next; } } close PROC; return(1); } # Functionally equivilent to getProc(), but instead has to run a command rather # than look in proc. sub getExec { my $type= shift; my $command=shift; my $tag= shift; # for now, always send error messages to /dev/null unless we're debugging. This is # really manditory for perfquery >= ofed 1.5 but let's do it everywhere unless it becomes # problematic later on. $command.=' 2>/dev/null' unless $debug & 3; print "Type: $type Exec: $command\n" if $debug & 2; # If we can't exec command, only report it once. if (!open CMD, "$command|") { logmsg("W", "Couldn't execute '$command'") if !defined($notExec[$type]); $notExec[$type]=1; return; } # Return complete contents of command my $oneLine=''; if ($type==0) { foreach my $line () { record(2, "$tag: $line"); } } # Open Fabric elsif ($type==1) { my $lineNum=0; foreach my $line () { # Skip warnings found in perfquery/ofed 1.5 next if $line=~/^ibwarn/; # Perfquery V1.5 adds an extra field called CounterSelect2 so ignore. next if ++$lineNum==13 && ($PQVersion ge '1.5.0'); if ($line=~/^#.*port (\d+)/) { # The 0 is a place holder we don't care about, at least not now $oneLine="$1 0 "; next; } # Since we're not doing anything with hex values this will not include # the leading 0x, but it will be faster than trying to include it. $line=~/([0x]*\d+$)/; $oneLine.="$1 "; } $oneLine=~s/ $//; record(2, "$tag: $oneLine\n"); } # Voltaire elsif ($type==2) { foreach my $line () { if ($line=~/^PORT=(\d+)$/) { $oneLine="$1 "; next; } # If counter, append to list. Note the funky pattern match that will catch # both decimal and hex numbers. $oneLine.="$1 " if $line=~/\s(\S*\d)$/; } $oneLine=~s/ $//; record(2, "$tag: $oneLine\n"); } # impi elsif ($type==3) { foreach my $line () { next if $envFilt ne '' && $line!~/$envFilt/; record(2, "$tag: $line"); } } # just count records elsif ($type==4) { my $count=0; foreach my $line () { $count++; } record(2, "$tag: $count\n"); } # OPA elsif ($type==5) { foreach my $line () { $line=~/ ([0x]*\d+)/; $oneLine.="$1 "; } $oneLine=~s/ $//; record(2, "$tag: $oneLine\n"); } close CMD; } # This guy is in charge of reading single valued entries, which are # typical of those found in /sys. The other big difference between # this and getProc() is it doens't have to deal with all those # special 'skip', 'ignore', etc flags. Just read the data! sub getSys { my $tag= shift; my $sys= shift; my $dir= shift; my $rawpFlag=shift; # write to rawp file if one and this is defined my $files= shift; foreach my $file (@$files) { # as of writing this for slub, I'm not expecting file open failures # but might as well put in here in case needed in the future my $filespec="$sys/$dir/$file"; if (!open SYS, "<$filespec") { # but just report it once logmsg("E", "Couldn't open '$filespec'") if !defined($notOpened{$filespec}); $notOpened{$filespec}=1; return(0); } my $line=; record(2, "$tag $dir $file $line", undef, $rawpFlag); } } sub record { my $type= shift; my $data= shift; my $recMode= shift; # error recovery mode my $rawpFlag=shift; # if defined, write to rawp or zrawp print "$data" if $debug & 4; # W r i t e T o R A W F i l e # a few words about writing to the raw gz file... If we fail, we need to # create a new file and I want to use newLog() since there's a lot going # one. However, part of newLog() writes the commonHeader as well and that # in turn calls this routine, so... We pass a flag around indicating we're # in recovery mode and if writing the common header fails, we have no # alternative other than to abort. # when logging raw data to a file $data, the data to write is either an # interval marker or raw data. Note that when doing plot format to a file # as well as any terminal based I/O, that all gets handled by dataAnalyze(). if ($logToFileFlag && $rawFlag) { if ($zlibFlag) { # When flags set, we write 'process' data (identified by '$recFlag1') to a 'rawp' # file; otherwise just 'raw' my $rawComp=(defined($rawpFlag) && $recFlag1) ? $ZRAWP : $ZRAW; $status=$rawComp->gzwrite($data); if (!$status) { $zlibErrors++; $temp=$recMode ? 'F' : 'E'; logmsg($temp, "Error writing to raw.gz file: ".$rawComp->gzerror()); logmsg("F", "Max Zlib error count exceeded") if $zlibErrors>$MaxZlibErrors; newLog($filename, "", "", "", "", "", 1); record(1, sprintf(">>> %.3f <<<\n", $fullTime)) if $recFlag0; record(1, sprintf(">>> %.3f <<<\n", $fullTime), undef, 1) if $recFlag1; } } else { # Same logic as for compressed data above. my $rawNorm=(defined($rawpFlag) && $recFlag1) ? $RAWP : $RAW; print $rawNorm $data; } } # G e n e r a t e N u m b e r s F r o m D a t a # When doing interative reporting OR generating plot data, we need to # analyze each record as it goes by. This means that in the case of '-P --rawtoo' # we write to the raw file AND generate the numbers. Also remember that in the # case of --export we may not end up writing anywhere other than the exported file dataAnalyze($subsys, $data) if $type==2 && (!$logToFileFlag || $plotFlag || $export ne ''); } # Design note - this is very subtle, but when creating consecutive files via the log rolling # mechanism, the last timestamp of one file matches that of the new one. This tells us NOT # to reset 'last' counters during playback. BUT if newlog() called before new timestamp # generated, as when $diskChangeFlag set, this does not happen and so you lose 1 interval # during playback. Not a big deal but worth noting somewhere... sub newLog { my $filename=shift; my $recDate= shift; my $recTime= shift; my $recSecs= shift; my $recTZ= shift; my $playback=shift; my $recMode= shift; # only used during error recovery mode my ($ss, $mm, $hh, $mday, $mon, $year, $datetime); my ($dirname, $basename, $command, $fullname, $mode); my (@disks, $dev, $numDisks, $i, $oldHeader, $oldSubsys, $timesecs, $timezone); print "NewLog -- Playback: $playback File: $filename Raw: $rawFlag Plot: $plotFlag\n" if $debug & 1; if ($recDate eq '') { # We need EXACT seconds associated with the timestamp of the filename. # turns out time() and gettimeofday can differ by 5 or 6 msec and when that happens files could end up # getting rolled 1 second earlier. Therefore always use gettimeofday when using hires to be consistent. $timesecs=($hiResFlag) ? (Time::HiRes::gettimeofday())[0] : time(); ($ss, $mm, $hh, $mday, $mon, $year)=localtime($timesecs); $datetime=sprintf("%d%02d%02d-%02d%02d%02d", $year+1900, $mon+1, $mday, $hh, $mm, $ss); $dateonly=substr($datetime, 0, 8); $timezone=$LocalTimeZone; } else { $timesecs=$recSecs; $datetime="$recDate-$recTime"; $dateonly=$recDate; $timezone=$recTZ; } # Build a common header for ALL files, noting type1 for process # we only build it if we need it. $temp="# Date: $datetime Secs: $timesecs TZ: $timezone\n"; $commonHeader= buildCommonHeader(0, $temp); $commonHeader1=buildCommonHeader(1, $temp) if $recFlag1; # Now build a slab subheader just to be used for 'raw' and 'slb' files if ($slubinfoFlag) { $slubHeader="#SLUB DATA\n"; foreach my $slab (sort keys %slabdata) { # when we have a slab with no aliases, 'first' gets set to that same # name which in turns ends up on the alias list because it always # contains 'first' followed by any additional aliases. On the rare # case we have no alias, which can happen where we have only the root # slab itself, set the aliases to that slab which will then be skipped. my $aliaslist=$slabdata{$slab}->{aliaslist}; next if defined($aliaslist) && $slab eq $aliaslist; $aliaslist=$slab if !defined($aliaslist); $slubHeader.="#$slab $aliaslist\n"; } $slubHeader.=sprintf("%s\n", '#'x80); } # If generating plot data on terminal, just open everything on STDOUT # but be SURE set the buffers to flush in case anyone runs as part # of a script and needs the output immediately. if ($filename eq "" && $plotFlag) { # sigh... error("Cannot use -P for terminal output of process and 'other' data at the same time") if $subsys=~/Z/ && length($subsys)>1; # in the event that someone runs this as a piped command from # a script and turns off headers things lock up unless these # files are set to auto-flush. $zFlag=0; open $LOG, ">-" or logmsg("F", "Couldn't open LOG for STDOUT"); select $LOG; $|=1; open BLK, ">-" or logmsg("F", "Couldn't open BLK for STDOUT"); select BLK; $|=1; open BUD, ">-" or logmsg("F", "Couldn't open BUD for STDOUT"); select BUD; $|=1; open CLT, ">-" or logmsg("F", "Couldn't open CLT for STDOUT"); select CLT; $|=1; open CPU, ">-" or logmsg("F", "Couldn't open CPU for STDOUT"); select CPU; $|=1; open DSK, ">-" or logmsg("F", "Couldn't open DSK for STDOUT"); select DSK; $|=1; open ENV, ">-" or logmsg("F", "Couldn't open ENV for STDOUT"); select ENV; $|=1; open IB, ">-" or logmsg("F", "Couldn't open IB for STDOUT"); select IB; $|=1; open NFS, ">-" or logmsg("F", "Couldn't open NFS for STDOUT"); select NFS; $|=1; open NET, ">-" or logmsg("F", "Couldn't open NET for STDOUT"); select NET; $|=1; open NUMA, ">-" or logmsg("F", "Couldn't open NUMA for STDOUT"); select NUMA; $|=1; open OST, ">-" or logmsg("F", "Couldn't open OST for STDOUT"); select OST; $|=1; open TCP, ">-" or logmsg("F", "Couldn't open TCP for STDOUT"); select TCP; $|=1; open SLB, ">-" or logmsg("F", "Couldn't open SLB for STDOUT"); select SLB; $|=1; open PRC, ">-" or logmsg("F", "Couldn't open PRC for STDOUT"); select PRC; $|=1; for (my $i=0; $i<$impNumMods; $i++) { open $impGz[$i], ">-" or logmsg("F", "Couldn't open $impKey[$i] for STDOUT"); select $impGz[$i]; $|=1; } select STDOUT; $|=1; return 1; } # C r e a t e N e w L o g # note the way we build files: # - if name is a dir, the filename starts with hostname. # - if name not a dir, the filename gets '-host' appended # - if raw file it also gets date/time but if plot file only date. $filename= "." if $filename eq ''; # -P and no -f $filename.=(-d $filename || $filename=~/\/$/) ? "/$Host" : "-$Host"; $filename.=(!$plotFlag || $options=~/u/) ? "-$datetime" : "-$dateonly"; # if the directory doesn't exist (we don't need date/time stamp), create it $temp=dirname($filename); if (!-e $temp) { logmsg('W', "Creating directory '$temp'"); `mkdir $temp`; } # track number of times same file processed, primarily for options 'a/c'. in # case multiple raw files for same day, only check on initial one # If we're in playback mode and writing a plotfile, either the user specified # an option of 'a', 'c' or 'u', we just created it (newFiles{} defined) OR it had # better not exist! If is does, return it name so a contextual error message # can be generated. return $filename if $playback ne "" && $options!~/a|c|u/ && !defined($newFiles{$filename}) && plotFileExists($filename); # -ou is special in that we're never going to have multiple source files generate # the same output file so 'a' doesn't mean anything in this context. Furthermore # if the output file already exists and its update time is less than that of the # source file, the source file has changed since the output file was created and # it should and will be overwritten. Finally, the user may also have chosen to # reprocess a source file with different options and so if 'c' is included the # file WILL be overwritten even if newer. Whew... if ($options=~/u/ && plotFileExists($filename)) { my @files; @files=glob("$filename*"); my $plotTime=(stat($files[0]))[9]; my $rawTime= (stat($playback))[9]; return($filename) if $plotTime>$rawTime && $options!~/c/; } # The only time we force creation of a new file is for the first one of the # day when in plot create mode (not sure why 'u' too). In all others cases # we append, which will also create file if not already there. $newFiles{$filename}++; if ($options=~/c|u/ && $newFiles{$filename}==1) { $mode=">"; $zmode="wb"; } else { $mode=">>"; $zmode="ab"; } print "NewLog Modes: $mode + $zmode Name: $filename\n" if $debug & 1; # C r e a t e R A W F i l e if ($rawFlag) { # on subsequent file creates (this is new for V3.1.3) write a terminating time # stamp, noting this will be the SAME starting timestamp of the new file. if (!$firstPass) { my $fullTime=sprintf("%d.%06d", $intSeconds, $intUsecs); record(1, sprintf(">>> %.3f <<<\n", $fullTime)) if $recFlag0; record(1, sprintf(">>> %.3f <<<\n", $fullTime), undef, 1) if $recFlag1; # Now we can safely close the raw log(s) closeLogs($subsys, 'r'); } # In some cases, such as when using --rawtoo (and other situations as well), # the default filename may only have a datestamp put time back in. my $rawFilename=$filename; $rawFilename=~s/$dateonly$/$datetime/; print "Create raw file: $rawFilename Flag0: $recFlag0 Flag1: $recFlag1\n" if $debug & 8192; # Unlike plot files, we ALWAYS compress when compression lib exists $ZRAW=Compress::Zlib::gzopen("$rawFilename.raw.gz", $zmode) or logmsg("F", "Couldn't open '$rawFilename.raw.gz'") if $zlibFlag && $recFlag0; $ZRAWP=Compress::Zlib::gzopen("$rawFilename.rawp.gz", $zmode) or logmsg("F", "Couldn't open '$rawFilename.rawp.gz'") if $zlibFlag && $recFlag1; open $RAW, "$mode$rawFilename.raw" or logmsg("F", "Couldn't open '$rawFilename.raw'") if !$zlibFlag && $recFlag0; open $RAWP, "$mode$rawFilename.rawp" or logmsg("F", "Couldn't open '$rawFilename.rawp'") if !$zlibFlag && $recFlag1; # write common header to raw file (record() ignores otherwise). Note that we # we need to pass along the recovery mode flag because if this record() # fails it's fatal. we may also need a slub header record(1, $commonHeader, $recMode) if $recFlag0; record(1, $commonHeader1, $recMode, 1) if $recFlag1; record(1, $slubHeader, $recMode, 1) if $slubinfoFlag && $subsys=~/y/i; # This flag indicated a new file was created and full SLUB records may need to be read $newRawSlabFlag=1; } # C r e a t e P l o t F i l e s if ($plotFlag) { print "Create plot files: $filename.*\n" if $debug & 8192; # but first close any that might be open closeLogs($subsys, 'p') if !$firstPass; # Indicates something needs to be printed printProcAnalyze($filename) if $procAnalCounter; printSlabAnalyze($filename) if $slabAnalCounter; print "Writing file(s): $mode$filename\n" if $msgFlag && !$daemonFlag; print "Subsys: $subsys\n" if $debug & 1; # this is already taken care of in playback mode, but when doing -P in # collection mode we need to clear this since nobody else does! $headersPrinted=0 if $newFiles{$filename}==1; # Open 'tab' file in plot mode if processing at least 1 core variable (or extended core) # OR we're --importing something that prints summary data $temp="$SubsysCore$SubsysExcore"; if ($subsys=~/[$temp]/ || $impSummaryFlag) { $ZLOG=Compress::Zlib::gzopen("$filename.tab.gz", $zmode) or logmsg("F", "Couldn't open '$filename.tab.gz'") if $zFlag; open $LOG, "$mode$filename.tab" or logmsg("F", "Couldn't open '$filename.tab'") if !$zFlag; $headersPrintedProc=$headersPrintedSlab=0; } open BLK, "$mode$filename.blk" or logmsg("F", "Couldn't open '$filename.blk'") if !$zFlag && $LFlag && $lustOpts=~/D/; $ZBLK=Compress::Zlib::gzopen("$filename.blk.gz", $zmode) or logmsg("F", "Couldn't open BLK gzip file") if $zFlag && $LFlag && $lustOpts=~/D/; open BUD, "$mode$filename.bud" or logmsg("F", "Couldn't open '$filename.bud'") if !$zFlag && $BFlag; $ZBUD=Compress::Zlib::gzopen("$filename.bud.gz", $zmode) or logmsg("F", "Couldn't open BUD gzip file") if $zFlag && $BFlag; open CPU, "$mode$filename.cpu" or logmsg("F", "Couldn't open '$filename.cpu'") if !$zFlag && $CFlag; $ZCPU=Compress::Zlib::gzopen("$filename.cpu.gz", $zmode) or logmsg("F", "Couldn't open CPU gzip file") if $zFlag && $CFlag; open CLT, "$mode$filename.clt" or logmsg("F", "Couldn't open '$filename.clt'") if !$zFlag && $LFlag && $reportCltFlag; $ZCLT=Compress::Zlib::gzopen("$filename.clt.gz", $zmode) or logmsg("F", "Couldn't open CLT gzip file") if $zFlag && $LFlag && $reportCltFlag; # if only doing exceptions, we don't need this file. if ($options!~/x/) { open DSK, "$mode$filename.dsk" or logmsg("F", "Couldn't open '$filename.dsk'") if !$zFlag && $DFlag; $ZDSK=Compress::Zlib::gzopen("$filename.dsk.gz", $zmode) or logmsg("F", "Couldn't open DSK gzip file") if $zFlag && $DFlag; } # exception processing for both x and X options if ($options=~/x/i) { open DSKX, "$mode$filename.dskX" or logmsg("F", "Couldn't open '$filename.dskX'") if !$zFlag && $DFlag; $ZDSKX=Compress::Zlib::gzopen("$filename.dskX.gz", $zmode) or logmsg("F", "Couldn't open DSKX gzip file") if $zFlag && $DFlag; } if ($XFlag && $NumHCAs) { open IB, "$mode$filename.ib" or logmsg("F", "Couldn't open '$filename.ib'") if !$zFlag; $ZIB=Compress::Zlib::gzopen("$filename.ib.gz", $zmode) or logmsg("F", "Couldn't open IB gzip file") if $zFlag; } open ENV, "$mode$filename.env" or logmsg("F", "Couldn't open '$filename.env'") if !$zFlag && $EFlag; $ZENV=Compress::Zlib::gzopen("$filename.env.gz", $zmode) or logmsg("F", "Couldn't open ENV gzip file") if $zFlag && $EFlag; open NFS, "$mode$filename.nfs" or logmsg("F", "Couldn't open '$filename.nfs'") if !$zFlag && $FFlag; $ZNFS=Compress::Zlib::gzopen("$filename.nfs.gz", $zmode) or logmsg("F", "Couldn't open NFS gzip file") if $zFlag && $FFlag; open NUMA, "$mode$filename.numa" or logmsg("F", "Couldn't open '$filename.numa'") if !$zFlag && $MFlag; $ZNUMA=Compress::Zlib::gzopen("$filename.numa.gz", $zmode) or logmsg("F", "Couldn't open NUMA gzip file") if $zFlag && $MFlag; open NET, "$mode$filename.net" or logmsg("F", "Couldn't open '$filename.net'") if !$zFlag && $NFlag; $ZNET=Compress::Zlib::gzopen("$filename.net.gz", $zmode) or logmsg("F", "Couldn't open NET gzip file") if $zFlag && $NFlag; open OST, "$mode$filename.ost" or logmsg("F", "Couldn't open '$filename.ost'") if !$zFlag && $LFlag && $reportOstFlag; $ZOST=Compress::Zlib::gzopen("$filename.ost.gz", $zmode) or logmsg("F", "Couldn't open OST gzip file") if $zFlag && $LFlag && $reportOstFlag; # These next two guys are 'special' because they're not really detail files per se, # Also note when doing --rawtoo, the data in 'prc' and 'raw' is essentially identical and # we don't need it on both places. Furthermore, raw is already being compressed. if (!$rawtooFlag) { if (!$procAnalOnlyFlag && $ZFlag) { print "Creating PRC file\n" if $debug & 8192; open PRC, "$mode$filename.prc" or logmsg("F", "Couldn't open '$filename.prc'") if !$zFlag && $ZFlag; $ZPRC=Compress::Zlib::gzopen("$filename.prc.gz", $zmode) or logmsg("F", "Couldn't open PRC gzip file") if $zFlag && $ZFlag; } if (!$slabAnalOnlyFlag && $YFlag && !$rawtooFlag) { print "Creating SLB file\n" if $debug & 8192; open SLB, "$mode$filename.slb" or logmsg("F", "Couldn't open '$filename.slb'") if !$zFlag && $YFlag; $ZSLB=Compress::Zlib::gzopen("$filename.slb.gz", $zmode) or logmsg("F", "Couldn't open SLB gzip file") if $zFlag && $YFlag; } } open TCP, "$mode$filename.tcp" or logmsg("F", "Couldn't open '$filename.tcp'") if !$zFlag && $TFlag; $ZTCP=Compress::Zlib::gzopen("$filename.tcp.gz", $zmode) or logmsg("F", "Couldn't open TCP gzip file") if $zFlag && $TFlag; # Open any detail files associated with --import for (my $i=0; $i<$impNumMods; $i++) { next if $impOpts[$i]!~/d/; open $impText[$i], "$mode$filename.$impKey[$i]" or logmsg("F", "Couldn't open '$filename.$impKey[$i]'") if !$zFlag; $impGz[$i]=Compress::Zlib::gzopen("$filename.$impKey[$i].gz", $zmode) or logmsg("F", "Couldn't open $impKey[$i] gzip file") if $zFlag; } if ($autoFlush) { print "Setting non-compressed files to 'autoflush'\n" if $debug & 1; if (defined($LOG)) { select $LOG; $|=1; } if (defined(fileno(BLK))) { select BLK; $|=1; } if (defined(fileno(BUD))) { select BUD; $|=1; } if (defined(fileno(CLT))) { select CLT; $|=1; } if (defined(fileno(CPU))) { select CPU; $|=1; } if (defined(fileno(DSK))) { select DSK; $|=1; } if (defined(fileno(DSKX))) { select DSKX; $|=1; } if (defined(fileno(ENV))) { select ENV; $|=1; } if (defined(fileno(IB))) { select IB; $|=1; } if (defined(fileno(OST))) { select OST; $|=1; } if (defined(fileno(NET))) { select NET; $|=1; } if (defined(fileno(NFS))) { select NFS; $|=1; } if (defined(fileno(NUMA))) { select NUMA; $|=1; } if (defined(fileno(PRC))) { select PRC; $|=1; } if (defined(fileno(SLB))) { select SLB; $|=1; } if (defined(fileno(TCP))) { select TCP; $|=1; } select STDOUT; $|=1; } } # P u r g e O l d L o g s # ... but only if an interval specified # explicitly purge anything in the logging directory as long it looks like a collectl log # starting with the host name. in the case of monthly logs, we typically will keep them # around a LOT longer if ($purgeDays) { my ($day, $mon, $year)=(localtime(time-86400*$purgeDays))[3..5]; my $purgeDate=sprintf("%4d%02d%02d", $year+1900, $mon+1, $day); $dirname=dirname($filename); if (opendir(DIR, "$dirname")) { while (my $filename=readdir(DIR)) { next if $filename=~/^\./; next if $filename=~/log$/; next if $filename!~/-(\d{8})(-\d{6})*\./ || $1 ge $purgeDate; unlink "$dirname/$filename"; } } else { logmsg('E', "Couldn't open '$dirname' for purging"); } close DIR; # now do it for the collectl logs themselves, based on the number # of months, so no days included ($day, $mon, $year)=(localtime(time-86400*$purgeMons*30))[3..5]; $purgeDate=sprintf("%4d%02d", $year+1900, $mon+1); my $globspec="$dirname/*.log"; foreach my $file (glob($globspec)) { next if $file!~/-(\d{6})\.log$/; unlink $file if $1 < $purgeDate; } } # Save as a global for later use. Could probably avoid passing back the name # on error below, but I'm afraid to change it if I don't have to. $lastLogPrefix=$filename; return 1; } # Build a common header for ALL files... sub buildCommonHeader { my $rawType= shift; my $timeZoneInfo=shift; # if grouping we need to remove subsystems for groups not in # the associated files my $tempSubsys=$subsys; if ($tworawFlag) { $tempSubsys=~s/[YZ]//g if $rawType==0; $tempSubsys=~s/[^YZ]+//g if $rawType==1; } # We want to store all the interval(s) being used and not just what # the user specified with -i. So include i2 if process/slabs and # i3 more for a placeholder. NOTE - if we're playing back multiple # files and first doesn't have process data, i2 not set for second # and so we can't put in header. $tempInterval=$interval; $tempInterval.=(defined($interval2)) ? ":$interval2" : ':' if $subsys=~/[yz]/i; $tempInterval.=($subsys!~/[yz]/i) ? "::$interval3" : ":$interval3" if $subsys=~/E/; # For now, these are the only flags I can think of but clearly they # can grow over time... my $flags=''; $flags.='d' if $diskChangeFlag; $flags.='2' if $tworawFlag; # start using 2 instead of 'g' $flags.='i' if $processIOFlag; $flags.='s' if $slubinfoFlag; $flags.='x' if $processCtxFlag; $flags.='D' if $cpuDisabledFlag; $flags.='X' if $PQopt eq 'sys'; $flags.='PX' if $PQopt eq '-x'; my $dskNames=''; foreach my $disk (@dskOrder) { $dskNames.="$disk "; } $dskNames=~s/ $//; my $netNames=''; foreach my $netname (@netOrder) { $netNames.=sprintf("$netname:%s ", defined($netSpeeds{$netname}) ? $netSpeeds{$netname} : '??'); } $netNames=~s/ $//; my ($sec, $min, $hour, $day, $mon, $year)=localtime($boottime); my $booted=sprintf "%d%02d%02d-%02d:%02d:%02d", $year+1900, $mon+1, $day, $hour, $min, $sec; my $commonHeader=''; if ($rawType!=-1 && $playback ne '') { $commonHeader.='#'x35; $commonHeader.=' RECORDED '; $commonHeader.='#'x35; $commonHeader.="\n# $recHdr1"; $commonHeader.="\n# $recHdr2" if $recHdr2 ne ''; $commonHeader.="\n"; } $commonHeader.='#'x80; $commonHeader.="\n# Collectl: V$Version HiRes: $hiResFlag Options: $cmdSwitches\n"; $commonHeader.="# Host: $Host DaemonOpts: $DaemonOptions\n"; $commonHeader.="# Booted: $boottime [$booted]\n"; $commonHeader.="# Distro: $Distro Platform: $ProductName\n"; $commonHeader.=$timeZoneInfo if defined($timeZoneInfo); $commonHeader.="# SubSys: $tempSubsys Options: $options Interval: $tempInterval NumCPUs: $NumCpus $Hyper"; $commonHeader.= " CPUsDis: $cpusDisabled" if $cpusDisabled; $commonHeader.= " NumBud: $NumBud Flags: $flags\n"; $commonHeader.="# Filters: NfsFilt: $nfsFilt EnvFilt: $envFilt TcpFilt: $tcpFilt\n"; $commonHeader.="# HZ: $HZ Arch: $SrcArch PageSize: $PageSize\n"; $commonHeader.="# Cpu: $CpuVendor Speed(MHz): $CpuMHz Cores: $CpuCores Siblings: $CpuSiblings Nodes: $CpuNodes\n"; $commonHeader.="# Kernel: $Kernel Memory: $Memory Swap: $Swap\n"; $commonHeader.="# NumDisks: $dskIndexNext DiskNames: $dskNames\n"; $commonHeader.="# NumNets: $netIndexNext NetNames: $netNames\n"; $commonHeader.="# NumSlabs: $NumSlabs Version: $SlabVersion\n" if $yFlag || $YFlag; $commonHeader.="# IConnect: NumHCAs: $NumHCAs PortStates: $HCAPortStates IBVersion: $IBVersion PQVersion: $PQVersion\n" if $NumHCAs; $commonHeader.="# SCSI: $ScsiInfo\n" if $ScsiInfo ne ''; if ($subsys=~/l/i) { # Lustre Version and services (if any) info $commonHeader.="# Lustre: "; $commonHeader.=" CfsVersion: $cfsVersion" if $cfsVersion ne ''; $commonHeader.=" SfsVersion: $sfsVersion" if $sfsVersion ne ''; $commonHeader.=" LustOpts: $lustOpts Services: $lustreSvcs"; $commonHeader.="\n"; $commonHeader.="# LustreServer: NumMds: $NumMds MdsNames: $MdsNames NumOst: $NumOst OstNames: $OstNames\n" if $NumOst || $NumMds; $commonHeader.="# LustreClient: CltInfo: $lustreCltInfo\n" if $CltFlag && $lustreCltInfo ne ''; # in case all filesystems umounted # more stuff for Disk Stats $commonHeader.="# LustreDisks: Num: $NumLusDisks Names: $LusDiskNames\n" if ($lustOpts=~/D/); } for (my $i=0; $i<$impNumMods; $i++) { &{$impUpdateHeader[$i]}(\$commonHeader); } $commonHeader.="# Comment: $comment\n" if $comment ne ''; $commonHeader.='#'x80; $commonHeader.="\n"; return($commonHeader); } sub writeInterFileMarker { # I was torn between putting this test in the one place this routine # is called or keeping it cleaner and so put it here. return if $procAnalOnlyFlag; # for now, only need one for process data my $marker="# >>> NEW LOG <<<\n"; if ($subsys=~/Z/ && !$rawtooFlag) { $ZPRC->gzwrite($marker) or writeError('prc', $ZPRC) if $zFlag; print PRC $marker if !$zFlag; } } # see if there is a file that matches this filename root (should't have # an extension). sub plotFileExists { my $filespec=shift; my (@files, $file); @files=glob("$filespec*"); foreach my $file (@files) { return(1) if $file!~/raw/; } return(0); } # In retrospect, there are a number of special cases in here just for playback # and things might be clearer to do away with this function and move code where # it applies. sub setOutputFormat { # By default, brief has been initialized to 1 and verbose to 0 but in these # cases (when not doing --import) we switch to verbose automatically $verboseFlag=1 if ($subsys ne '' && $subsys!~/^[$BriefSubsys]+$/) || $lustOpts=~/[BDM]/; $verboseFlag=1 if $memOpts=~/[psPV]/; $verboseFlag=1 if $tcpFilt=~/I/; # except as where noted below, columns in verbose mode are assumed different $sameColsFlag=($verboseFlag) ? 0 : 1; # Now let's deal with a few special cases where we're in verbose mode but # the cols are the same after all, such as a single subsystem or '-sCj' $sameColsFlag=1 if $verboseFlag && (length($subsys)==1 || $subsys=~/^[Cj]+$/); # Environmental data is multipart if '--envopts M' so we only have same columns when 1 type $sameColsFlag=0 if $subsys eq 'E' && length($envOpts)>1 && $userEnvOpts=~/M/; # As usual, lustre complicates things since we can get multiple lines of # output and if more than 1 clear the flag. $sameColsFlag=0 if length($lustOpts)>1; # Finally, if --import modules we've been called at least a second time if ($impNumMods) { # Detail mode forces verbose if ($impDetailFlag) { $verboseFlag=1; $sameColsFlag=0; } # Verbose mode special, because if we don't have any subsystem data and only have 1 type of # imported data, we still get all data on the same line and won't need to repeat headers every pass. # On the other hand it we have more than 1 type of data we can't have the same columns $sameColsFlag=($impSummaryFlag+$impDetailFlag+length($subsys)==1) ? 1 : 0 if $verboseFlag; # and finally if processing any standard detail data we know we have at least 2 fields, at least # one of which is our custom import, and so we can't have same columns in effect. $sameColsFlag=0 if $subsys=~/[A-Z]/; # detail for single -s would have set flag } # time doesn't print when not all columns the same AND not something that # was exported since they's on their own for formatting if (!$sameColsFlag && $export eq '') { $miniDateFlag=$miniTimeFlag=0; $miniDateTime=$miniFiller=''; } $briefFlag=($verboseFlag) ? 0 : 1; print "Set Output -- Subsys: $subsys Verbose: $verboseFlag SameCols: $sameColsFlag\n" if $debug & 1; # This also feels like a good place to do these $i1DataFlag=($subsys!~/^[EYZ]+$/i) ? 1 : 0; $i2DataFlag=($subsys=~/[yYZ]/) ? 1 : 0; $i3DataFlag=($subsys=~/E/) ? 1 : 0; } # Control C Processing # This will wake us if we're sleeping or let us finish a collection cycle # if we're not. sub sigInt { print "Ouch!\n" if !$daemonFlag; $doneFlag=1; } sub sigTerm { logmsg("W", "Shutting down in response to signal TERM on $myHost..."); $doneFlag=1; } sub sigAlrm { # This will set next alarm to the next interval that's a multiple of # our base time. Note the extra 1000usecs which we need as fudge # Also note that arg[0] always defined with "ALRM" when ualarm below # fires so we need to use arg[1] as the 'first time' switch for # logmsg() below. my ($intSeconds, $intUsecs)=Time::HiRes::gettimeofday(); my $nowUSecs=$intSeconds*1000000+$intUsecs; my $secs=int(($nowUSecs-$BaseTime+$uAlignInt)/$uAlignInt)*$uAlignInt; my $waitTime=$BaseTime+$secs-$nowUSecs; Time::HiRes::ualarm($waitTime+1000); # message only on the very first call AND when --align since we always # align on an interval boundary anyway and don't want cluttered messages logmsg("I", "Waiting $waitTime usecs for time alignment") if defined($_[1]) && $alignFlag; # The following is all debug #($intSeconds, $intUsecs)=Time::HiRes::gettimeofday(); #$nowUSecs2=$intSeconds*1000000+$intUsecs; #$diff=($nowUSecs2-$nowUSecs)/1000000; #printf "Start: %f Current: %f Wait: %f Time: %f\n", $BaseTime/1000000, $nowUSecs/1000000, $waitTime/1000000, $diff; } # flush buffer(s) on sigUsr1 sub sigUsr1 { # There should be a small enough number of these to make it worth logging logmsg("I", "Flushing buffers in response to signal USR1") if !$autoFlush; logmsg("W", "No need to signal 'USR1' since autoflushing") if $autoFlush; flushBuffers() if !$autoFlush; } sub sigPipe { # The only time we're treating a broken pipe as an error is when not in server mode, # where we simply log but ignore the message since we don't want to quit. if (!$serverFlag) { logmsg("W", "Shutting down due to a broken pipe"); $doneFlag=1; } else { logmsg("W", "Ignoring broken pipe"); } } sub flushBuffers { return if !$logToFileFlag; # Remember, when $rawFlag set we flush everything including process/slab data. But if # just $rawtooFlag set we those 2 other files aren't open and so we don't flush them. $flushTime=time+$flush if $flushTime; logdiag("begin flush") if $utimeMask & 1; if ($zFlag) { if ($rawFlag) { # if in raw mode, may be up to 2 buffers to flush $ZRAW-> gzflush(2)<0 and flushError('raw', $ZRAW) if $recFlag0; $ZRAWP->gzflush(2)<0 and flushError('raw', $ZRAWP) if $recFlag1; if (!$plotFlag) { logdiag("end flush") if $utimeMask & 1; return; } } $ZLOG-> gzflush(2)<0 and flushError('log', $ZLOG) if $subsys=~/[a-z]/; $ZBLK-> gzflush(2)<0 and flushError('blk', $ZBLK) if $LFlag && $lustOpts=~/D/; $ZBUD-> gzflush(2)<0 and flushError('bud', $ZBUD) if $BFlag; $ZCPU-> gzflush(2)<0 and flushError('cpu', $ZCPU) if $CFlag; $ZCLT-> gzflush(2)<0 and flushError('clt', $ZCLT) if $LFlag && $CltFlag; $ZDSK-> gzflush(2)<0 and flushError('dsk', $ZDSK) if $DFlag && $options!~/x/; # exception only file? $ZDSKX->gzflush(2)<0 and flushError('dskx',$ZDSKX) if $DFlag && $options=~/x/i; $ZIB-> gzflush(2)<0 and flushError('ib', $ZIB) if $XFlag && $NumHCAs; $ZENV-> gzflush(2)<0 and flushError('env', $ZENV) if $EFlag; $ZNFS-> gzflush(2)<0 and flushError('nfs', $ZNFS) if $FFlag; $ZNUMA->gzflush(2)<0 and flushError('net', $ZNET) if $MFlag; $ZNET-> gzflush(2)<0 and flushError('net', $ZNET) if $NFlag; $ZOST-> gzflush(2)<0 and flushError('ost', $ZOST) if $LFlag && $OstFlag; $ZTCP-> gzflush(2)<0 and flushError('tcp', $ZTCP) if $TFlag; $ZSLB-> gzflush(2)<0 and flushError('slb', $ZSLB) if $YFlag && !$rawtooFlag; $ZPRC-> gzflush(2)<0 and flushError('prc', $ZPRC) if $ZFlag && !$rawtooFlag; # handle --import for (my $i=0; $i<$impNumMods; $i++) { # we can only flush detail data if something in buffer or else we'll throw an error! $impGz[$i]->gzflush(2)<0 and flushError($impKey[$i], $impGz[$i]) if defined($impGz[$i]) && $impDetFlag[$i]; $impDetFlag[$i]=0; } } else { if (defined($LOG)) { select $LOG; $|=1; print $LOG ""; $|=0; select STDOUT; } if (!$plotFlag) { logdiag("end flush") if $utimeMask & 1; return; } return if !$plotFlag; if ($BFlag) { select BUD; $|=1; print BUD ""; $|=0; } if ($CFlag) { select CPU; $|=1; print CPU ""; $|=0; } if ($DFlag) { select DSK; $|=1; print DSK ""; $|=0; } if ($EFlag) { select ENV; $|=1; print ENV ""; $|=0; } if ($FFlag) { select NFS; $|=1; print NFS ""; $|=0; } if ($NFlag) { select NET; $|=1; print NET ""; $|=0; } if ($TFlag) { select TCP; $|=1; print TCP ""; $|=0; } if ($XFlag && $NumHCAs) { select IB; $|=1; print IB ""; $|=0; } if ($YFlag && !$rawtooFlag) { select SLB; $|=1; print SLB ""; $|=0; } if ($ZFlag && !$rawtooFlag) { select PRC; $|=1; print PRC ""; $|=0; } if ($LFlag && $CltFlag) { select CLT; $|=1; print CLT ""; $|=0; } if ($LFlag && $OstFlag) { select OST; $|=1; print OST ""; $|=0; } if ($LFlag && $lustOpts=~/D/) { select BLK; $|=1; print BLK ""; $|=0; } # Handle --import for (my $i=0; $i<$impNumMods; $i++) { if (defined($impText[$i])) { select $impText[$i]; $|=1; print {$impText[$i]} ""; $|=0; } } if ($options=~/x/i) { if ($DFlag) { select DSKX; $|=1; print DSKX ""; $|=0; } } select STDOUT; } logdiag("end flush") if $utimeMask & 1; } sub writeError { my $file=shift; my $desc=shift; # just print the error and reopen ALL files (since it should be rare) # we also don't need to set '$recMode' in newLog() since not recursive. $zlibErrors++; logmsg("E", "Write error - File: $file Reason: ".$desc->gzerror()); logmsg("F", "Max Zlib error count exceeded") if $zlibErrors>$MaxZlibErrors; $headersPrinted=0; newLog($filename, "", "", "", "", ""); } sub flushError { my $file=shift; my $desc=shift; # just print the error and reopen ALL files (since it should be rare) # we also don't need to set '$recMode' in newLog() since not recursive. $zlibErrors++; logmsg("E", "Flush error - File: $file Reason: ".$desc->gzerror()); logmsg("F", "Max Zlib error count exceeded") if $zlibErrors>$MaxZlibErrors; $headersPrinted=0; newLog($filename, "", "", "", "", ""); } # write diagnostic record into raw file sub logdiag { my ($intSeconds, $intUsecs)=Time::HiRes::gettimeofday(); my $fullTime=sprintf("%d.%06d", $intSeconds, $intUsecs); record(1, "### $fullTime $_[0]\n"); } # Note - ALL errors (both E and F) will be written to syslog. If you want # others to go there (such as startup/shutdown messages) you need to call # logsys() directly, but be sure to make sure $filename ne '' (but can't # unless $filename is known at that point). sub logmsg { my ($severity, $text)=@_; my ($ss, $mm, $hh, $day, $mon, $year, $msg, $time, $logname, $yymm, $date); # may need time if in debug and this routine gets called infrequently enough # that the extra processing is no big deal. Also note that time and gettimeofday # are not always exactly in sync so when hires loaded, ALWAYS use it my $timesecs=($hiResFlag) ? (Time::HiRes::gettimeofday())[0] : time(); ($ss, $mm, $hh, $day, $mon, $year)=localtime($timesecs); $time=sprintf("%02d:%02d:%02d", $hh, $mm, $ss); # always report non-informational messages and if not logging, we're done # BUT - if not attached to a terminal or not running as a daemon we CAN'T print # because no terminal to talk to. # Also, not that we ONLY write to the log when writing to a file and -m $text="$time $text" if $debug & 1; print STDERR "$text\n" if $termFlag && !$daemonFlag && ($msgFlag || ($severity eq 'W' && !$quietFlag) || $severity=~/[EF]/ || $debug & 1); # if we're not writing to a message log always send F/E errors to syslog AND get out if fatal if (!$msgFlag) { logsys($text, 1) if $severity=~/[EF]/ && $syslogFlag; exit(1) if $severity eq "F"; } # Remember: if running as a daemon and NOT -m, we'll never see any messages # in collectl log OR syslog. return unless $msgFlag && $filename ne ''; $yymm=sprintf("%d%02d", 1900+$year, $mon+1); $date=sprintf("%d%02d%02d", 1900+$year, $mon+1, $day); $msg=sprintf("%s-%s", $severity, $text); # the log file live in same directory as logs $logname=(-d $filename) ? $filename : dirname($filename); $logname.="/$myHost-collectl-$yymm.log"; open MSG, ">>$logname" or logsys("Couldn't open log file '$logname' to write: $msg", 1); print MSG "$date $time $msg\n" or logsys("Print Error: $! Text: $msg"); close MSG; logsys($msg) if $severity=~/[EF]/; exit(1) if $severity=~/F/; } sub logsys { my $message=shift; my $force= shift; # if not writing to a file, only log when forced return if !$syslogFlag || ($filename eq '' && !$force); $x=Sys::Syslog::openlog($Program, "", "user"); $x=Sys::Syslog::syslog("info", "%s", $message); Sys::Syslog::closelog(); } # this is for non-fatal messages that are reported before collectl actually # starts. by saving them, we can then report after the startup message to # make things cleaner in the log sub pushmsg { my $severity=shift; my $text= shift; push @messages, "$severity-$text"; } sub setFlags { my $subsys=shift; print "SetFlags: $subsys\n" if $debug & 1; # NOTE - are flags are faster than string compares? # unfortunately I got stuck using zFlag for ZIP and ZFlag for processes $bFlag=($subsys=~/b/) ? 1 : 0; $BFlag=($subsys=~/B/) ? 1 : 0; $cFlag=($subsys=~/c/) ? 1 : 0; $CFlag=($subsys=~/C/) ? 1 : 0; $dFlag=($subsys=~/d/) ? 1 : 0; $DFlag=($subsys=~/D/) ? 1 : 0; $EFlag=($subsys=~/E/) ? 1 : 0; $fFlag=($subsys=~/f/) ? 1 : 0; $FFlag=($subsys=~/F/) ? 1 : 0; $iFlag=($subsys=~/i/) ? 1 : 0; $jFlag=($subsys=~/j/) ? 1 : 0; $JFlag=($subsys=~/J/) ? 1 : 0; $lFlag=($subsys=~/l/) ? 1 : 0; $LFlag=($subsys=~/L/) ? 1 : 0; $mFlag=($subsys=~/m/) ? 1 : 0; $MFlag=($subsys=~/M/) ? 1 : 0; $nFlag=($subsys=~/n/) ? 1 : 0; $NFlag=($subsys=~/N/) ? 1 : 0; $sFlag=($subsys=~/s/) ? 1 : 0; $tFlag=($subsys=~/t/) ? 1 : 0; $TFlag=($subsys=~/T/) ? 1 : 0; $xFlag=($subsys=~/x/) ? 1 : 0; $XFlag=($subsys=~/X/) ? 1 : 0; $yFlag=($subsys=~/y/) ? 1 : 0; $YFlag=($subsys=~/Y/) ? 1 : 0; $ZFlag=($subsys=~/Z/) ? 1 : 0; # NOTE - the definition of 'core' as slightly changed and maybe should be # changed to be 'summary' to better reflect what we're trying to do. $coreFlag=($subsys=~/[a-z]/) ? 1 : 0; # by default, all data gets logged in a single file. if the 'tworaw' flag is set, # we defined flags that control recording into groups based on process/other $recFlag0=1; $recFlag1=0; if ($tworawFlag) { $tempSys=$subsys; $tempSys=~s/[YZ]//g; $recFlag0=0 if $tempSys eq ''; $recFlag1=1 if $subsys=~/[YZ]/; } print "RecFlags: $recFlag0 $recFlag1\n" if $debug & 1 && !$playback; } sub setNFSFlags { my $nfsFilt=shift; # Assume no NFS data of any type seen yet. Do it twice to get rid of -w warning $nfs2CSeen=$nfs3CSeen=$nfs4CSeen=$nfs2SSeen=$nfs3SSeen=$nfs4SSeen=0; $nfs2CSeen=$nfs3CSeen=$nfs4CSeen=$nfs2SSeen=$nfs3SSeen=$nfs4SSeen=0; if ($nfsFilt eq '') { $nfsCFlag=$nfsSFlag=$nfs2Flag=$nfs3Flag=$nfs4Flag=1; $nfs2CFlag=$nfs2SFlag=$nfs3CFlag=$nfs3SFlag=$nfs4CFlag=$nfs4SFlag=1; } else { $nfsCFlag=$nfsSFlag=$nfs2Flag=$nfs3Flag=$nfs4Flag=0; $nfs2CFlag=$nfs2SFlag=$nfs3CFlag=$nfs3SFlag=$nfs4CFlag=$nfs4SFlag=0; foreach my $filt (split(/,/, $nfsFilt)) { # These flags make processing easier/faster later on if ($filt eq 'c2') { $nfsCFlag=1; $nfs2Flag=1; $nfs2CFlag=1; } elsif ($filt eq 's2') { $nfsSFlag=1; $nfs2Flag=1; $nfs2SFlag=1; } elsif ($filt eq 'c3') { $nfsCFlag=1; $nfs3Flag=1; $nfs3CFlag=1; } elsif ($filt eq 's3') { $nfsSFlag=1; $nfs3Flag=1; $nfs3SFlag=1; } elsif ($filt eq 'c4') { $nfsCFlag=1; $nfs4Flag=1; $nfs4CFlag=1; } elsif ($filt eq 's4') { $nfsSFlag=1; $nfs4Flag=1; $nfs4SFlag=1; } else { error("--nfsfilt option '$filt' not one of 'c2,s2,c3,s3,c4,s4'"); } } } } sub getSeconds { my $date=shift; my $time=shift; my ($year, $mon, $day, $hh, $mm, $ss, $seconds); $year=substr($date, 0, 4); $mon= substr($date, 4, 2); $day= substr($date, 6, 2); $hh= substr($time, 0, 2); $mm= substr($time, 2, 2); $ss= substr($time, 4, 2); return(timelocal($ss, $mm, $hh, $day, $mon-1, $year-1900)); } # print error and exit if bad datetime without going crazy over all the # possible purmutations of a bad date/time format sub checkTime { my $switch= shift; my $datetime=shift; my $date=0; # can't return '' my $time=$datetime; if (length((split(/:/,$datetime))[0])>2) { $datetime=~s/^(\d+):?(.*)//; $date=$1; $time=$2; } $time=($switch eq '--from') ? '00:00:00' : '23:59:59' if $time eq ''; error("Date portion of $switch must be exactly 8 digits") if $date!=0 && length($date)!=8; # Make sure time format correct. minimal being HH:MM. supply date and/or ":ss" $time="0$time" if $time=~/^\d{1}:/; $time.=":00" if $time!~/^\d{2}:\d{2}:\d{2}$/; error("$switch time format must be hh:mm[:ss]") if ($time!~/^\d{2}:\d{2}:\d{2}$/); ($hh, $mm, $ss)=split(/:/, $time); error("$switch specifies invalid time") if ($hh>23 || $mm >59 || $ss>59); return(($date,"$hh$mm$ss")); } sub getDateTime { my $seconds=shift; my ($sec, $min, $hour, $day, $mon, $year)=localtime($seconds); return(sprintf("%d%02d%02d %02d:%02d:%02d", $year+1900, $mon+1, $day, $hour, $min, $sec)); } sub checkHiRes { if ($TimeHiResCheck && $hiResFlag && -e '/lib/libc.so.6') { my $hiResVersion=Time::HiRes->VERSION; $hiResVersion=~/(\d+)\.(\d+)/; my $hiResMajor=$1; my $hiResMinor=$2; my $glibcAnnounce=`'/lib/libc.so.6'`; if ($hiResMajor==1 && $hiResMinor<91 && $glibcAnnounce=~/GNU C Library stable release version (\d+)\.(\d+)/) { my $glibcMajor=$1; my $glibcMinor=$2; if ($glibcMajor==2 && ($glibcMinor==4 || $glibcMinor==5)) { logmsg('W', "WARNING - Your versions of Time::HiRes and glibc are incompatible."); logmsg('W', " See /opt/hp/collectl/docs/RELEASE-collectl 'Restrictions' for details."); } } } } # This is only called during collection, not playback sub closeLogs { my $subsys=shift; my $ctype= shift; return if !$logToFileFlag; # when not specified, close both raw and plot files. $ctype='rp' if !defined($ctype); setFlags($subsys); # C l o s e R a w F i l e ( s ) # closing raw files based on presence of zlib and NOT -oz if ($rawFlag && $ctype=~/r/) { print "Closing raw logs\n" if $debug & 1; if ($zlibFlag && $logToFileFlag) { $ZRAW-> gzclose() if $recFlag0; $ZRAWP-> gzclose() if $recFlag1; } else { close $RAW if defined($RAW) && $recFlag0; close $RAWP if defined($RAWP) && $recFlag1; } } # C l o s e P l o t F i l e ( s ) if ($plotFlag && $ctype=~/p/) { print "Closing plot logs\n" if $debug & 1; # Even if not open, can't hurt to close them. if (!$zFlag) { close LOG; close BLK; close BUD; close CPU; close CLT; close DSK; close DSKX; close IB; close ENV; close NFS; close NET; close OST; close TCP; close SLB; close PRC; } else # These must be opened in order to close them { $temp="$SubsysCore$SubsysExcore"; $ZLOG-> gzclose() if $subsys=~/[$temp]+/ || $impSummaryFlag; $ZBUD-> gzclose() if $BFlag; $ZCLT-> gzclose() if $LFlag && CltFlag; $ZCPU-> gzclose() if $CFlag; $ZDSK-> gzclose() if $DFlag && $options!~/x/; $ZDSKX->gzclose() if $DFlag && $options=~/x/i; $ZIB-> gzclose() if $XFlag && $NumHCAs; $ZENV-> gzclose() if $EFlag; $ZNFS-> gzclose() if $FFlag; $ZNUMA->gzclose() if $MFlag; $ZNET-> gzclose() if $NFlag; $ZOST-> gzclose() if $LFlag && $OstFlag; $ZTCP-> gzclose() if $TFlag; $ZSLB-> gzclose() if $YFlag && !$rawtooFlag && !$slabAnalOnlyFlag; $ZPRC-> gzclose() if $ZFlag && !$rawtooFlag && !$procAnalOnlyFlag; } # Finally, close any detail logs that may have been opened via --import for (my $i=0; $i<$impNumMods; $i++) { next if $impOpts[$i]!~/d/; close $impText[$i] if !$zFlag; $impGz[$i]->gzclose() if $zFlag; } } } sub loadConfig { my $resizePath=''; my ($line, $num, $param, $value, $switches, $file, $openedFlag, $lib); # if no -C, look first in $BinDir, then 2 the same level as the BIN and # finally in /etc. if there IS a -C, it must include the path (if not in .) if ($configFile eq '') { # there may be a collectl.conf file at level above bin/ and in case # it backs up to /, we get leading // so clean it up and make only 1 my $etcDir=sprintf("%s/etc", dirname(dirname($BinDir))); $etcDir=~s[//][/]; # build up the search list being extra neat and leaving # off possible duplicate /etc $configFile="$BinDir/$ConfigFile;$etcDir/$ConfigFile"; $configFile.=";/etc/$ConfigFile" if $etcDir ne '/etc'; } print "Config File Search Path: $configFile\n" if $debug & 1; $openedFlag=0; foreach my $file (split(/;/, $configFile)) { if (open CONFIG, "<$file") { print "Reading Config File: $file\n" if $debug & 1; $configFile=$file; $openedFlag=1; last; } } logmsg("F", "Couldn't open '$configFile'") if !$openedFlag; $num=0; foreach $line () { $num++; next if $line=~/^\s*$|^\#/; # skip blank lines and comments if ($line!~/=/) { logmsg("W", "CONFIG ERROR: Line $num doesn't contain '='. Ignoring..."); next; } chomp $line; ($param, $value)=split(/\s*=\s*/, $line); print "Param: $param Value: $value\n" if $debug & 128; # S u b s y s t e m s A r e S p e c i a l # Subsystems -- this is a little tricky because after user overrides # SubsysCore, SubsysExcore needs to contain all other core subsystems. if ($param=~/SubsysCore/) { # we put everything in 'Excore' and substract what's in 'Core' $SubsysExcore="$SubsysCore$SubsysExcore"; error("config file entry for '$param' contains invalid subsystem(s) - $value") if $value!~/^[$SubsysExcore]+$/; $SubsysCore=$value; $SubsysExcore=~s/[$SubsysCore]//g; next; } # D a e m o n P r o c e s s i n g elsif ($param=~/DaemonCommands/ && $daemonFlag) { # Pull commmand string off line and add a 'special' end-of-line marker. # Note that we save off the whole thing for the header and we need the # ',2' in the split since we can have '=' in the options $DaemonOptions=$switches=(split(/=\s*/, $line, 2))[1]; $switches.=" -->>>EOL<<<"; # ultimately, we want to prepend these onto the ARG list. The problem is we need to # preserve the order and the easiest way to do this is to push onto a temp stack # and pop off when we're done. my $quote=''; my $switch=''; my @temp; foreach $param (split(/\s+/, $switches)) { if ($param=~/^-/) { # If new switch, time to write out old one (and arg), but note we're pushing them # onto a stack so we can retrieve them in the reverse order if ($switch ne '') { push @temp, $switch; push @temp, $arg if $arg ne ''; } last if $param eq '-->>>EOL<<<'; $switch=$param; $arg=''; next; } elsif ($quote ne '') # Processing quoted argument { $quote='' if $param=~/$quote$/; # this is the last piece $arg.=" $param"; next if $quote ne ''; } else # unquoted argument { $arg=$param; $quote=$1 if $param=~/^(['"])/; } } # now put them back, preserving the order while (my $arg=pop(@temp)) { unshift(@ARGV, $arg); } if ($debug & 128) { foreach my $arg (@ARGV) { print " $arg "; } print "\n"; } } # L i b r a r i e s A r e S p e c i a l T o o elsif ($param=~/Libraries/) { $Libraries=$value; foreach $lib (split(/\s+/, $Libraries)) { push @INC, $lib; } } # S t a n d a r d S e t else { $ReqDir=$value if $param=~/^ReqDir/; $Grep=$value if $param=~/^Grep/; $Egrep=$value if $param=~/^Egrep/; $Ps=$value if $param=~/^Ps/; $Rpm=$value if $param=~/^Rpm/; $Lspci=$value if $param=~/^Lspci/; $Lctl=$value if $param=~/^Lctl/; $resizePath=$value if $param=~/^Resize/; $ipmitoolPath=$value if $param=~/^Ipmitool/; $IpmiCache=$value if $param=~/^IpmiCache/; $IpmiTypes=$value if $param=~/^IpmiTypes/; # For Infiniband $PCounter=$value if $param=~/^PCounter/; $PQuery=$value if $param=~/^PQuery/; $VStat=$value if $param=~/^VStat/; $OfedInfo=$value if $param=~/^OfedInfo/; $Interval=$value if $param=~/^Interval$/; $Interval2=$value if $param=~/^Interval2/; $Interval3=$value if $param=~/^Interval3/; $LimSVC=$value if $param=~/^LimSVC/; $LimIOS=$value if $param=~/^LimIOS/; $LimLusKBS=$value if $param=~/^LimLusKBS/; $LimLusReints=$value if $param=~/^LimLusReints/; $LimBool=$value if $param=~/^LimBool/; $Port=$value if $param=~/^Port/; $Timeout=$value if $param=~/^Timeout/; $MaxZlibErrors=$value if $param=~/^ZMaxZlibErrors/; $LustreSvcLunMax=$value if $param=~/^LustreSvcLunMax/; $LustreMaxBlkSize=$value if $param=~/^LustreMaxBlkSize/; $LustreConfigInt=$value if $param=~/^LustreConfigInt/; $InterConnectInt=$value if $param=~/^InterConnectInt/; $TermHeight=$value if $param=~/^TermHeight/; $DefNetSpeed=$value if $param=~/^DefNetSpeed/; $TimeHiResCheck=$value if $param=~/^TimeHiResCheck/; $PasswdFile=$value if $param=~/^Passwd/; $DiskMaxValue=$value if $param=~/^DiskMaxValue/; $dISKfILTER=$value if $param=~/^DiskFilter/; # note different spelling!!! $ProcReadTest=$value if $param=~/^ProcReadTest/; } } close CONFIG; foreach my $bin (split/:/, $resizePath) { $Resize=$bin if -e $bin; } logmsg('I', "Couldn't find 'resize' so assuming terminal height of 24") if $Resize eq ''; # Just in case using an older collectl.conf file. Only a problem if # someone wants to collect IPMI data. if (!defined($ipmitoolPath)) { logmsg('E', "Can't find 'Ipmitool' in 'collectl.conf'. Is it old?"); $ipmitoolPath=''; } # Even though currently one entry, let's make this a path like above $Ipmitool=''; foreach my $bin (split/:/, $ipmitoolPath) { $Ipmitool=$bin if -e $bin; } # Unlike other parameters that can be overridden in collectl.conf, we DO need to know if # that has been done with DiskFilter so we can set the flag correctly and only then if (defined($dISKfILTER)) { # the leading/trailing /s are just there for ease of reading in collectl.conf $DiskFilterFlag=1; $DiskFilter=$dISKfILTER; $DiskFilter=~s/^\///; $DiskFilter=~s/\/$//; print "DiskFilter set in $configFile: >$DiskFilter<\n" if $debug & 1; } } sub loadSlabs { my $slabFilt= shift; if ($slabinfoFlag) { if (!open PROC,") { my $slab=(split(/\s+/, $line))[0]; foreach my $filter (split(/,/, $slabFilt)) { if ($slab=~/^$filter/) { $slabProc{$slab}=1; last; } } } if ($debug & 1024) { print "*** SLABS ***\n"; foreach $slab (sort keys %slabProc) { print "$slab\n"; } } } if ($slubinfoFlag) { ########################################### # build list of all slabs NOT softlinks ########################################### opendir SYS, '/sys/slab' or logmsg('F', "Couldn't open '/sys/slab'"); while (my $slab=readdir(SYS)) { next if $slab=~/^\./; # If a link, it's actually an alias $dirname="/sys/slab/$slab"; if (-l $dirname) { # If filtering, only keep those aliases that match next if $slabFilt ne '' && !passSlabFilter($slabFilt, $slab); # get the name of the slab this link points to my $linkname=readlink($dirname); my $rootslab=basename($linkname); # Note that since scalar returns the number of elements, it's always the index # we want to write the next entry into. We also want to save a list of the link # names so we can easily skip over them later. my $alias=(defined($slabdata{$rootslab}->{aliases})) ? scalar(@{$slabdata{$rootslab}->{aliases}}) : 0; $slabdata{$rootslab}->{aliases}->[$alias]=$slab; $slabskip{$slab}=1; } else { $slabdata{$slab}->{lastobj}=$slabdata{$slab}->{lastslabs}=0; } } ########################################## # secondary filter scan ########################################## if ($slabFilt ne '') { # Note, at this point we only have aliases that pass the filter and so we need # to keep the entries OR we have entries with no aliases that might still pass # filters only we couldn't check them yet so we need this second pass. foreach my $slab (keys %slabdata) { delete $slabdata{$slab} if !defined($slabdata{$slab}->{aliases}) && !passSlabFilter($slabFilt, $slab) } } ############################################################ # now find a better name to use, choosing length first ############################################################ # what we want to do here is also build up a list of all the aliases to # make it easier to insert them into the header as well as display with # --showslabaliases. Also note is --showrootslabs, we override '$first' # to that of the slab root name. foreach my $slab (sort keys %slabdata) { my ($first,$kmalloc,$list)=('','',' '); # NOTE - $list set to leading space! foreach my $alias (@{$slabdata{$slab}->{aliases}}) { $list.="$alias "; $kmalloc=$alias if $alias=~/^kmalloc/; $first=$alias if $alias!~/^kmalloc/ && length($alias)>length($first); } $first=$kmalloc if $first eq ''; $first=$slab if $first eq '' || $showRootSlabsFlag; $slabdata{$slab}->{first}=$first; $slabfirst{$first}=$slab; # note that in some cases there is only a single alias in which case 'list' is '' $list=~s/ $first / /; $list=~s/^ | $//g; $slabdata{$slab}->{aliaslist}=$first if $first ne $slab; $slabdata{$slab}->{aliaslist}.=" $list" if $list ne ''; } ref($slabfirst); # need to mention it to eliminate -w warning } } sub passSlabFilter { my $filters=shift; my $slab= shift; foreach my $name (split(/,/, $filters)) { return(1) if $slab=~/^$name/; } return(0); } # This needs some explaining... When doing processes, we build a list of all the pids that # match the --procfilt selection. However, over time a selected command could exist and restart again # under a different pid and we WANT to pick that up too. So, everytime we check the processes # and a non-pid selector has been specified we will have to recheck ALL pids to see in any new # ones show up. Naturally we can skip those in @skipPids and if the flag $pidsOnlyFlag is set # we can also skip the pid checking. Finally, since over time the list of pids can grow # unchecked we need to clean out the stale data after every polling cycle. sub loadPids { my $procs=shift; my ($process, $pid, $ppid, $user, $uid, $cmd, $line, $file, $temp); my ($type, $value, @ps, $selector, $pidOnly); # Step 0 - an enhancement! If the process list string is actually a # filename turn entries into one long string as if entered with --procfilt. # This makes it possible to have a constant --procfilt parameter yet change # the process list dynamically, before starting collectl. if (-e $procs) { $temp=''; open TEMP, "<$procs" or logmsg("F", "Couldn't open --procfilt file"); while ($line=) { chomp $line; next if $line=~/^#|^\s*$/; # ignore blank lines $line=~s/\s+//g; # get rid of ALL whitespace in each line $temp.="$line," # smoosh it all together } $temp=~s/,$//; # get rid of trailing comma $procs=$temp; } # this is pretty brute force, but we're only doing it at startup # Step 1 - validate list for invalid types OR non-numeric pids # assume including collectl $oneThreadFlag=($procs=~/\+/) ? 1 : 0; # handy flag to optimize non-thread cases $uidMin=$uidMax=$uidSelFlag=0; foreach $task (split(/,/, $procs)) { # for now, we don't do too much validation, but be sure to note # if our pid was requsted via 'p%' if ($task=~/^([cCpfPuU])\+*(.*)/) { $type=$1; $value=$2; if ($type=~/u/ && $value=~/(\d+)-(\d+)/) { # uids are a special case in that one can specify range or multiple singletons but not multiple # ranges. when we DO see a range, save it's min/max but DON'T include in the array of selectors error("you cannot specify multiple uuid ranges in --procfilt") if $uidMin; $uidMin=$1; $uidMax=$2; $uidSelFlag=1; next; } # if we ever do allow this in playback we can't handle 'f' error("--procfilt f not allowed in playback mode") if $type eq 'f' && $playback ne ''; # pids must be numeric error("pid $value not numeric in --procfilt") if $type=~/p/i && $value!~/^\d+$/; # max usernames returned by ps w/o converting to UID looks to be 19 error("cannot use usernames > 19 chars with procfilt") if $type=~/U/ && length($value)>19; # max command name length returned by ps o comm looks to be 15 error("cannot use commands > 15 chars with procfilt c/C") if $type=~/c/i && length($value)>15; # when dealing with embedded string in command line, note that spaces # are converted to NULs, so do it to our match string so it only happens # once and also be sure to quote any meta charaters the user may have # in mind to use. if ($type eq 'f') { $task=~s/ /\000/g; $task=quotemeta($task); } push @TaskSelectors, $task; next; } else { error("invalid task selection in --procfilt: $task"); } } # Step 2 - no longer needed. UIDs loaded earlier # Step 3 - find pids of all processes that match selection criteria # be sure to truncate leading spaces since pids are fixed width # Note: $cmd includes full directory path and args. Furthermore, this is NOT # what gets stored in /proc/XXX/stat and to make sure we look at the same # values dynamically as well as staticly, we better pull cmd from the stat # file itself. @ps=`ps axo pid,ppid,uid,comm,user`; my $firstFilePass=1; foreach $process (@ps) { next if $process=~/^\s+PID/; $process=~s/^\s+//; chomp $process; ($pid, $ppid, $uid, $cmd, $user)=split(/\s+/, $process); # if no criteria, select ALL if ($procs eq '') { $pidProc{$pid}=1; next; } # If uid range specified and this UID there, save it noting it's not # part of the task selection list so we do before the loop below. if ($uidMin>0 && $uid>=$uidMin && $uid<=$uidMax) { $pidOnly=0; $pidProc{$pid}=1; next; } # select based on criteria, but assume we're not getting a match $pidOnly=1; $keepPid=0; foreach $selector (@TaskSelectors) { $pidOnly=0 if $selector!~/^p/; $uidSelFlag=1 if $selector=~/^u/i; # need to know if doing UID matching if (($selector=~/^p\+*(.*)/ && $pid eq $1) || ($selector=~/^P\+*(.*)/ && $ppid eq $1) || ($selector=~/^c\+*(.*)/ && $cmd=~/$1/) || ($selector=~/^C\+*(.*)/ && $cmd=~/^$1/) || ($selector=~/^f\+*(.*)/ && cmdHasString($pid,$1)) || ($selector=~/^u\+*(.*)/ && $uid eq $1) || ($selector=~/^U\+*(.*)/ && $user eq $1)) { # We need to figure out if '+' appended to selector and set flag if so. # However, since it's extra overhead to maintain %pidThreads, we only set it # when there are threads to deal with. $pidThreads{$pid}=(substr($selector, 1, 1) eq '+') ? 1 : 0 if $oneThreadFlag; $keepPid=1; last; } } if ($keepPid) { $pidProc{$pid}=1; } else { $pidSkip{$pid}=1; } } # STEP 4 - deal with threads # &pidThreads has been set to 1 for any pids we want to watch threads for. # We clean this up when we clean pids in general. If no pid threads, # no %pidThreads. foreach $pid (keys %pidThreads) { findThreads($pid) if $pidThreads{$pid}; } # if a selection list and it's only for pids (and doesn't include uxx-yy), set # the $pidOnlyFlag so that those are all we ever want to look for # for force the $pidsOnlyFlag to be set. It's those minor optimization # in life that count! $pidOnlyFlag=1 if $procs ne '' && !$uidMin && $pidOnly; if ($debug & 256) { print "PIDS Selected: "; foreach $pid (sort keys %pidProc) { print "$pid "; } print "\n"; if ($oneThreadFlag) { print "TPIDS Selected: "; foreach $pid (sort keys %tpidProc) { print "$pid "; } } print "\nPIDS Skipped: "; foreach $pid (sort keys %pidSkip) { print "$pid "; } print "\n"; print "\$pidOnlyFlag set!!!\n" if $pidOnlyFlag; } } sub loadUids { my $passwd=shift; my (@passwd, $line, $user, $uid); print "Load UIDS from $passwd\n" if $debug & 1; if (!-e $passwd) { print "WARNING - UID translation file '$passwd' doesn't exist. consider using --passwd\n"; return; } @passwd=`$Cat $passwd`; foreach $line (@passwd) { next if $line=~/^\+|^\s*$/; # ignore '+' and blank lines ($user, $uid)=(split(/:/, $line))[0,2]; $UidSelector{$uid}=$user; } } # here we have just found a new pid neither in the list to skip nor to process so # we have to go back to our selector list and see if it meets the selection specs. # if so, return the pid AND be sure to add to pidProc{} so we don't come here again. # There are time we get called and /proc/$pid doesn't exist anymore. This is # because these are short lived processes that are there where in the directory # when first read but are gone by the time we want to open them. For efficiency # we do a test to see if the pid directory exists and then trap later opens in case it # disappeared by then! # NOTE - we could probably return 0/1 depending on whether or not pid found, but since # it's already in the $match variable, we return that for convenience sub pidNew { my $pid=shift; my ($selector, $type, $param, $match, $cmd, $ppid, $line, $uid); return(0) if !-e "/proc/$pid/stat"; # if no filter, by defition this is a match $match=($procFilt ne '') ? 0 : $pid; # if selecting by uid (either as a range or explict match), try to read this procs # UID and if not there, no match! if ($uidSelFlag) { $uid=0; open TMP, ") { if ($line=~/^Uid:\s+(\d+)/) { $uid=$1; last; } } # If UID not found it will be 0 and the following always fail $match=$pid if $uidMin>0 && $uid>=$uidMin && $uid<$uidMax; } foreach $selector (@TaskSelectors) { $type=substr($selector, 0, 1); next if $type eq 'p'; # if a pid, can't be a new one $param=substr($selector, 1); if ($oneThreadFlag) { $param=~s/(\+)//; $pidThreads{$pid}=($1 eq '+') ? 1 : 0; } # match on parents pid? or command? if ($type=~/[PCc]/) { open PROC, "; return(0) if !defined($temp); # pid must have gone away ($cmd, $ppid)=(split(/ /, $temp))[1,3]; $cmd=~s/[()]//g; if (($type eq 'P' && $param==$ppid) || ($type eq 'C' && $cmd=~/^$param/) || ($type eq 'c' && $cmd=~/$param/)) { $match=$pid; last; } } # match on full command path? elsif ($type=~/f/ && cmdHasString($pid, $param)) { $match=$pid; last; } # match on UID elsif ($type=~/u/ && $uid==$param) { $match=$pid; last; } # match on username elsif ($type=~/U/ && defined($UidSelector{$uid}) && $UidSelector{$uid} eq $param) { $match=$pid; last; } } print "%%% Discovered new pid for monitoring: $pid\n" if $match && ($debug & 256); $pidProc{$match}=1 if $match!=0; findThreads($match) if $match && $oneThreadFlag && $pidThreads{$pid}; # since this pid didn't match selection criteria, don't look at it again. # but, whenever we cycle though all the pids and delete the entire 'skip' # hash so in case someone we reuses a pid we skipped. if (!$match) { $pidSkip{$pid}=1; my $num=0; foreach my $pid (keys %pidSkip) { $num++; } # we only care about the first pid seen at the start of a monitoring interval if ($firstProcCycle) { if ($debug & 4096) { my $seconds=int($fullTime); my $timenow=(split(/\s+/, localtime($seconds)))[3]; printf "$timenow New PID: %5d LastNew: %5d NumPids: %4d NumSkip: $num\n", $pid, $lastFirstPid, $pid-$lastFirstPid; } if ($pid<$lastFirstPid) { undef(%pidSkip); $pid=0; # so we reset $lastFirstPid print "skipped pids flushed...\n" if $debug & 4096;; } $firstProcCycle=0; $lastFirstPid=$pid; } } return($match); } # see if the command that started a process contains a string sub cmdHasString { my $pid= shift; my $string=shift; my $line; # never include ourself when matching by a command line string since it will # ALWAYS match the collectl command itself return() if $pid==$$; # Not an error because proc may have already exited return(0) if (!open PROC, "; # since not all processes have command line associated with them be sure to # check before looking for a match and only return success after making it. return(!defined($cmdline) || $cmdline!~/$string/ ? 0 : 1) } # see if a pid has any active threads sub findThreads { my $pid=shift; # In some cases the thread owning process may have gone away. When this # happens we can't open 'task', so act accordingly. if (!opendir DIR2, "/proc/$pid/task") { logmsg("W", "Looks like $pid exited so not looking for new threads"); $pidThreads{$pid}=0; return; } while ($tpid=readdir(DIR2)) { next if $tpid=~/^\./; # skip . and .. next if $tpid==$pid; # skip parent beause already covered # since this routine gets called both at the start when %tpidProc is empty and # every thread found is new AND during runtime when they may not be, check the # active thread hash and only include it if not already there if (!defined($tpidProc{$tpid})) { print "%%% Discovered new thread $tpid for pid: $pid\n" if $debug & 256; $tpidProc{$tpid}=$pid; # add to thread watch list } } } sub cleanStalePids { my ($pid, %pidTemp, %tpidTemp, $removeFlag, $x); $removeFlag=0; foreach $pid (keys %pidProc) { if ($pidSeen{$pid}) { $pidTemp{$pid}=1; # If working with threads, we also need to purge the flag array that tells # us whether or not to look for thread pids $tpidTemp{$pid}=$pidThreads{$pid} if $oneThreadFlag; } else { print "%%% Stale Pid: $pid\n" if $debug & 256; $removeFlag=1; } } if ($removeFlag) { undef %pidProc; undef %pidThreads; %pidProc=%pidTemp; %pidThreads=%tpidTemp; } undef %pidTemp; undef %tpidTemp; if ($debug & 512) { foreach $x (sort keys %pidProc) { print "%%% pidProc{}: $x = $pidProc{$x}\n"; } } return unless $oneThreadFlag; # Do it again for threads... $removeFlag=0; foreach $pid (keys %tpidProc) { if ($tpidSeen{$pid}) { $pidTemp{$pid}=$tpidProc{$pid}; } else { print "%%% Stale TPid: $pid\n" if $debug & 256; $removeFlag=1; } } if ($removeFlag) { undef %tpidProc; %tpidProc=%pidTemp; } undef %pidTemp; if ($debug & 512) { foreach $x (sort keys %tpidProc) { print "%%% tpidProc{}: $x = $tpidProc{$x}\n"; } } } sub showSlabAliases { my $slabFilt=shift; # by setting the slub flag and calling the 'load' routine, we'll get the header # built $slubinfoFlag= (-e '/sys/slab') ? 1 : 0; error("this kernel does not support 'slub-based' slabs") if !$slubinfoFlag; loadSlabs($slabFilt); foreach my $slab (sort keys %slabdata) { my $aliaslist=$slabdata{$slab}->{aliaslist}; $aliaslist=$slab if !defined($aliaslist); next if $slab eq $aliaslist; printf "%-20s %s\n", $slab, $aliaslist if $aliaslist=~/ /; } exit(0); } sub showVersion { $temp=''; $temp.=sprintf("zlib:%s,", Compress::Zlib->VERSION) if $zlibFlag; $temp.=sprintf("HiRes:%s", Time::HiRes->VERSION) if $hiResFlag; $temp=~s/,$//; $version=sprintf("collectl V$Version %s\n\n", $temp ne '' ? "($temp)" : ''); $version.="$Copyright\n"; $version.="$License\n"; printText($version); exit(0); } sub showDefaults { printText("Default values by switch:\n"); printText(" Interactive Daemon\n"); printText(" -c -1 -1\n"); printText(" -i 1:$Interval2:$Interval3 $Interval:$Interval2:$Interval3\n"); printText(" --lustsvcs :$LustreConfigInt :$LustreConfigInt\n"); printText(" -s cdn $SubsysCore\n"); printText("Defaults only settable in config file:\n"); printText(" LimSVC = $LimSVC\n"); printText(" LimIOS = $LimIOS\n"); printText(" LimLusKBS = $LimLusKBS\n"); printText(" LimLusReints = $LimLusReints\n"); printText(" LimBool = $LimBool\n"); printText(" Port = $Port\n"); printText(" Timeout = $Timeout\n"); printText(" MaxZlibErrors = $MaxZlibErrors\n"); printText(" Libraries = $Libraries\n") if defined($Libraries); exit(0); } sub envTest { $subsys='E'; open ENV, "<$envTestFile" or error("Couldn't open '$envTestFile'"); while (my $line=) { next if $line=~/^\s*$|^#/; dataAnalyze('E', "ipmi $line"); } close ENV; $briefFlag=0; $verboseFlag=1; intervalPrint(time); `stty echo` if !$PcFlag && $termFlag && !$backFlag; } sub error { my $text=shift; if (defined($text)) { # when runing as a server, we need to turn off sockFlag otherwise # printText() will try to send error over socket and we want it local. $sockFlag=0 if $serverFlag; `stty echo` if !$PcFlag && $termFlag && !$backFlag; logmsg("F", "Error: $text") if $daemonFlag; # we can only call printText() when formatit loaded. if ($formatitLoaded) { printText("Error: $text\n"); printText("type '$Program -h' for help\n"); } else { print "Error: $text\n"; } exit(1); } my $help=<name --pname name set process name to 'collectl-pname' -P, --plot generate output in 'plot' format --procanalyze analyze process data, generating prcs file --quiet do note echo warning messages on the terminal -r, --rolllogs time,d,m roll logs at 'time', retaining for 'd' days, every 'm' minutes [default: d=7,m=1440] --rawtoo when run with -P, this tell collectl to also create a raw log file as well --runas uid[:gui] collectl will change its uid/gid in daemon mode see man page for details -R, --runtime duration time to run in format where unit is w,d,h,m,s --sep separator specify an alternate plot format separator --slabanalyze analyze slab data, generating slbs file --stats same as -oA -s, --subsys subsys record/playback data from one or more subsystems --showsubsys for details --sumstat same as --stats but only summary --thru time time thru which to playback data (see --from) --top [type][,num] show top 'num' processes sorted by type --showtopopts for details --tworaw synonym for -G and -group, which are now deprecated --umask mask set output file permissions mask (see man umask) --utime mask write diagnostic micro timestamps into raw file --verbose display output in verbose format (automatically selected when brief doesn't make sense) -w, --wide print wide field contents (don't use K/M/G) Synonyms --utc = -oU These are Alternate Display Formats --vmstat show output similar to vmstat Logging options --rawtoo used with -P, write raw data to a log as well --export name[,options] write data to an exported socket/file Various types of help -h, --help print this text -v, --version print version -V, --showdefs print operational defaults -x, --helpext extended help -X, --helpall shows all help concatenated together --showoptions show all the options --showsubopts show all substem specific options --showsubsys show all the subsystems --showtopopts show --top options --showheader show file header that 'would be' generated --showcolheaders show column headers that 'would be' generated --showslabaliases for SLUB allocator, show non-root aliases --showrootslabs same as --showslabaliases but use 'root' names --whatsnew show summary of recent version new features EOF2 printText("$extended\n"); return if defined($_[0]); printText("$Copyright\n"); printText("$License\n"); exit(0); } sub showSubsys { my $subsys=< 1G G - include decimal point (when it will fit) for numbers > 1G Exception Reporting x - report exceptions only (see man page) X - record all values + exceptions in plot format (see manpage) Modify results before display (do NOT effect collection) n - do NOT normalize rates to units/second Plot File Naming/Creation a - if plotfile exists, append [default=skip -p file] c - always create new plot file u - create unique plot file names - include time Plot Data Format 1 - plot format with 1 decimal place of precision 2 - plot format with 2 decimal places of precision z - don't compress output file(s) File Header Information i - include file header in output EOF4 printText($options); exit(0) if !defined($_[0]); } sub showSubopts { my $subopts=<[[,],...], and valid types are any combinations of: c - any substring in command name C - command name starts with this string f - full path of command (including args) contains string p - pid P - parent pid u - any processes owned by this user's UID or in range xxx-yyy U - any processes owned by this user NOTE1: if 'procs' is actually a filename, that file will be read and all lines concatenated together, comma separted, as if typed in as an argument of --procfilt. Lines beginning with # will be ignored as comments and blank lines will be skipped. NOTE2: if any type fields are immediatly followed by a plus sign, any threads associated with that process will also be reported. see man page for important restrictions --procstate Only show processes in one or more of the following states D - waiting in uninterruptable disk sleep R - running S - sleeping in uninterruptable wait T - traced or stopped W - paging Z - zombie Slab Options and Filters --slabopts s - only show slabs with non-zero allocations S - only show slabs that have changed since last interval --slabfilt: restricts which slabs are listed, where 'slab's is of the form: 'slab[,slab...]. only slabs whose names start with this name will be included TCP Stack Options - these DO effect data collection as well as printing --tcpfilt i - ip stats t - tcp stats u - udp stats c - Icmp Stats I - ip extended stats, no brief stats so including it will force --verbose T - tcp extended stats EOF5 printText($subopts); exit(0) if !defined($_[0]); } sub showTopopts { my $subopts=<; chomp $mac; close FILE; $vnets.="$vnet=$mac "; } $vnets=~s/ $//; record(2, "vnets $vnets\n") if $vnets ne ''; } } sub vnetInitInterval { } sub vnetAnalyze { my $type= shift; my $dataref=shift; foreach my $vnet (split(/\s+/, $$dataref)) { my ($netname, $mac)=split(/=/, $vnet); $mac=~s/^.{3}//; $virtMacs{$mac}=$netname; #print "virtMacs{$mac}=$netname\n"; } } sub vnetPrintBrief { } sub vnetPrintVerbose { } sub vnetPrintPlot { } sub vnetPrintExport { } 1; collectl-4.3.1/GPL0000664000175000017500000004310313366602004012034 0ustar mjsmjs GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Lesser General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. collectl-4.3.1/UNINSTALL0000775000175000017500000000206713366602004012772 0ustar mjsmjs#!/bin/sh DESTDIR=${DESTDIR:="/"} # R e m o v e O L D S t r u c t u r e I f T h e r e # These structures were created in a pre-$DESTDIR world BINDIR=/opt/hp/collectl # This is code rm -fr $BINDIR # These are all symlinks rm -f /usr/bin/collectl rm -f /usr/sbin/collectl rm -f /etc/collectl.conf rm -f /etc/init.d/collectl rm -f /usr/share/man/man1/collectl.1.gz rm -f /etc/init.d/rc?.d/*collectl rm -f /etc/rc.d/rc?.d/*collectl rm -f /etc/rc?.d/*collectl # debian different rm -f /etc/init.d/collectl # gentoo and generic rm -f /etc/runlevels/default/collectl # gentoo # N e w D i r e c t o r y S t r u c t u r e BINDIR=$DESTDIR/usr/bin DOCDIR=$DESTDIR/usr/share/doc/collectl SHRDIR=$DESTDIR/usr/share/collectl MANDIR=$DESTDIR/usr/share/man/man1 SYSDDIR=$DESTDIR/usr/lib/systemd/system ETCDIR=$DESTDIR/etc INITDIR=$ETCDIR/init.d rm -f $BINDIR/collectl rm -f $ETCDIR/collectl.conf rm -f $INITDIR/collectl rm -f $MANDIR/collectl* rm -f $SYSDDIR/collectl.service # may not be there... rm -fr $DOCDIR rm -fr $SHRDIR collectl-4.3.1/ARTISTIC0000664000175000017500000001373313366602004012642 0ustar mjsmjs The "Artistic License" Preamble The intent of this document is to state the conditions under which a Package may be copied, such that the Copyright Holder maintains some semblance of artistic control over the development of the package, while giving the users of the package the right to use and distribute the Package in a more-or-less customary fashion, plus the right to make reasonable modifications. Definitions: "Package" refers to the collection of files distributed by the Copyright Holder, and derivatives of that collection of files created through textual modification. "Standard Version" refers to such a Package if it has not been modified, or has been modified in accordance with the wishes of the Copyright Holder as specified below. "Copyright Holder" is whoever is named in the copyright or copyrights for the package. "You" is you, if you're thinking about copying or distributing this Package. "Reasonable copying fee" is whatever you can justify on the basis of media cost, duplication charges, time of people involved, and so on. (You will not be required to justify it to the Copyright Holder, but only to the computing community at large as a market that must bear the fee.) "Freely Available" means that no fee is charged for the item itself, though there may be fees involved in handling the item. It also means that recipients of the item may redistribute it under the same conditions they received it. 1. You may make and give away verbatim copies of the source form of the Standard Version of this Package without restriction, provided that you duplicate all of the original copyright notices and associated disclaimers. 2. You may apply bug fixes, portability fixes and other modifications derived from the Public Domain or from the Copyright Holder. A Package modified in such a way shall still be considered the Standard Version. 3. You may otherwise modify your copy of this Package in any way, provided that you insert a prominent notice in each changed file stating how and when you changed that file, and provided that you do at least ONE of the following: a) place your modifications in the Public Domain or otherwise make them Freely Available, such as by posting said modifications to Usenet or an equivalent medium, or placing the modifications on a major archive site such as uunet.uu.net, or by allowing the Copyright Holder to include your modifications in the Standard Version of the Package. b) use the modified Package only within your corporation or organization. c) rename any non-standard executables so the names do not conflict with standard executables, which must also be provided, and provide a separate manual page for each non-standard executable that clearly documents how it differs from the Standard Version. d) make other distribution arrangements with the Copyright Holder. 4. You may distribute the programs of this Package in object code or executable form, provided that you do at least ONE of the following: a) distribute a Standard Version of the executables and library files, together with instructions (in the manual page or equivalent) on where to get the Standard Version. b) accompany the distribution with the machine-readable source of the Package with your modifications. c) give non-standard executables non-standard names, and clearly document the differences in manual pages (or equivalent), together with instructions on where to get the Standard Version. d) make other distribution arrangements with the Copyright Holder. 5. You may charge a reasonable copying fee for any distribution of this Package. You may charge any fee you choose for support of this Package. You may not charge a fee for this Package itself. However, you may distribute this Package in aggregate with other (possibly commercial) programs as part of a larger (possibly commercial) software distribution provided that you do not advertise this Package as a product of your own. You may embed this Package's interpreter within an executable of yours (by linking); this shall be construed as a mere form of aggregation, provided that the complete Standard Version of the interpreter is so embedded. 6. The scripts and library files supplied as input to or produced as output from the programs of this Package do not automatically fall under the copyright of this Package, but belong to whoever generated them, and may be sold commercially, and may be aggregated with this Package. If such scripts or library files are aggregated with this Package via the so-called "undump" or "unexec" methods of producing a binary executable image, then distribution of such an image shall neither be construed as a distribution of this Package nor shall it fall under the restrictions of Paragraphs 3 and 4, provided that you do not represent such an executable image as a Standard Version of this Package. 7. C subroutines (or comparably compiled subroutines in other languages) supplied by you and linked into this Package in order to emulate subroutines and variables of the language defined by this Package shall not be considered part of this Package, but are the equivalent of input as in Paragraph 6, provided these subroutines do not change the language in any way that would cause it to fail the regression tests for the language. 8. Aggregation of this Package with a commercial distribution is always permitted provided that the use of this Package is embedded; that is, when no overt attempt is made to make this Package's interfaces visible to the end user of the commercial distribution. Such use shall not be construed as a distribution of this Package. 9. The name of the Copyright Holder may not be used to endorse or promote products derived from this software without specific prior written permission. 10. THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE. The End