collectl-3.6.9/ 0000775 0001750 0001750 00000000000 12230241726 011477 5 ustar mjs mjs collectl-3.6.9/docs/ 0000775 0001750 0001750 00000000000 12230241727 012430 5 ustar mjs mjs collectl-3.6.9/docs/TheMath.html 0000664 0001750 0001750 00000013436 12230241727 014657 0 ustar mjs mjs
However, there can be a problem that is important to understand and has been seen in the past. A device had the wrong firmware level and under some conditions caused a long delay in the middle of the collection interval. Some samples were collected close the the starting time of that interval while all that followed the delay were actually collected at a time much later than was being reported.
Consider the following in which we're looking at raw data collected for 2 subsystems, call them XXX and YYY. Let's also assume that the counters we're monitoring are increasing at a steady rate of 100 units/sec. In this example, during the 10:00:01 interval there was a 10 second hang in collecting the YYY sample. The XXX sample was correctly recorded, but by the time the YYY sample was collected, 1000 units were recorded. As we move to the next interval which was delayed by 10 seconds, the sample for XXX has accumulated 1000 units and the sample for YYY is 100.
TYPE XXX YYY 10:00:00 100 100 10:00:01 200 1100 10:00:11 1200 1200 10:00:12 1300 1300The problem here is when reporting the 2 rates at 10:00:01, we'll see a rate of 1000 units/sec for YYY because based on the timestamp that interval only appears to be 1 second long. Conversely, the rate reported for that same subsystem at 10:00:11 will be 10 units/sec because this interval is reported as 10 seconds long. Also note that for this interval the counter for XXX has been incremented correctly and the resultant rates are reported correctly. This is because the sampling occured before the delays. If one were to move the timestamp to the end of the interval, it would fix the problem with YYY, but then move it to XXX.
It IS important to understand that this is only a problem if the delay is during the data collection itself. If there is a system delay that causes all data collection to be delayed but once started runs as expected, and this has been seen to be the typical case, the intervals may be longer but the counters will have increased proportionaly and the results consistent.
The only real answer to this problem would be to timestamp individual samples, however it is also felt that this problem is rare enough as to not be of serious concern and changing the methodology of timestamping would cause more problems than it solves.
One other thing to consider is that when selecting only non-zero values be reported, one might be occasionally be surprised by see values of 0 being reported. This will occur if there is a non-zero value that is then nomalized to 0.
If you think you might need to see these close to 0 values, you should include -on which tells collectl not to normalize its output before reporting.
updated Feb 21, 2011 |
Assuming collectl has been installed from the rpm kit, it has been configured to be run as a service, but disabled from automatically starting at boot. To enable it, simply chkconfig collectl on, noting that by default collectl is configured to collect most data. To see what the specific subsystems are, execute collectl -V and look at the daemon default values for -s. You should then look at the DaemonCommands string /etc/collectl.conf to see if any changes to -s have been explictly set. At the time of this writing, collectl has been further configured to add slab and process data to the base defaults.
Further inspection of this command string will show the daemon has also been configured to write all its data to a set of compressed text files in /var/log/collectl, which was created when the kit was installed. To verify collectl will properly run as a service, you can execute the command /etc/init.d/collectl start (or as a shortcut on a redhat system use the command service collectl start) and examine the log file in /var/log/collectl for the startup (and hopefully no termination) messages as well as the appearance of either a raw or raw.gz data file in that same directory. Note that since the output is buffered, the data file will probably have a length of 0 until the buffer fills or the flush interval passes, which is currently set to 60 seconds, which ever comes first. Or the command /etc/init.d/collectl flush is executed.
In order to write its output as a compressed file, the perl Compress module must be present as it is with newer perl distributions. If not present you should install it, otherwise you will get messages warning you that compression is not installed.
To change any behaviors of the daemon such as the flush interval, output file location, etc., simply change the DaemonCommands line in /etc/collectl.conf, which specifies the actual command string collectl is passed at startup. Use care in setting this string as incorrect settings may cause collectl to abnormally exit and if it does, you should examine the log file for messages.
Since some of the filters can include pipes, one might choose the use the perl form of "abc|def|xyz" when using them interactively, having to use quotation marks to prevent the shell from acting on them. However if you include the quotes in the DaemonCommands line, the filters will not work correctly as collectl will see the quotes as part of the filter itself. |
One-time modification of runtime parameters
If you want to change the way collectl runs as a daemon for a specific instance, you can pass the normal collectl switches to the start script as its second parameter (more on the first parameter later). For example, to start collectl with a monitoring interval of 15 seconds, just start it as follows:
/etc/init.d/collectl start '-i 15'
The next time it is started (or restarted) it will use its default values.
Running multiple instances of a collectl daemon
By default, collectl only supports running a single instance of a daemon and it you try to start a second you will get an error message. However, there may be times you really want to run a second instance, most typically if you want to collect a subset of data at a different monitoring interval, and to do this one uses an alternative syntax which prefaces the parameters with a string such as test as in the following example:
/etc/init.d/collectl start test -i15
Also note in this example quotes weren't needed because there were no spaces in the second argument. In this case a process named collectl-test will be created and use the argument -i15. You must be careful when using this format because if you leave off the second argument you'd actually start the main process with the invalid switch of test.
To perform other operations on this second instance, such as stop or flush, simply add the test qualifier to the command. If you want to restart one of these instances be sure to include the appropriate arguments because you must use the 2 argument form of the command. This syntax was also chosen to assure the user does use additional switches because without them you'd essentially be running an identical copy of the default configuration.
updated Sep 5, 2013 |