agedu-20211129.8cd63c5/ 0000755 0001750 0001750 00000000000 14151034324 013151 5 ustar simon simon agedu-20211129.8cd63c5/agedu.1 0000644 0001750 0001750 00000111727 14151034324 014331 0 ustar simon simon .\" agedu version 20211129.8cd63c5 .ie \n(.g .ds Aq \(aq .el .ds Aq ' .TH "agedu" "1" "2008\(hy11\(hy02" "Simon\ Tatham" "Simon\ Tatham" .SH "NAME" .PP \fBagedu\fP - correlate disk usage with last-access times to identify large and disused data .SH "SYNOPSIS" .PP .nf \fBagedu\fP\ [\ \fIoptions\fP\ ]\ \fIaction\fP\ [\fIaction\fP...] .fi .SH "DESCRIPTION" .PP \fBagedu\fP scans a directory tree and produces reports about how much disk space is used in each directory and subdirectory, and also how that usage of disk space corresponds to files with last-access times a long time ago. .PP In other words, \fBagedu\fP is a tool you might use to help you free up disk space. It lets you see which directories are taking up the most space, as \fBdu\fP does; but unlike \fBdu\fP, it also distinguishes between large collections of data which are still in use and ones which have not been accessed in months or years - for instance, large archives downloaded, unpacked, used once, and never cleaned up. Where \fBdu\fP helps you find what\*(Aqs using your disk space, \fBagedu\fP helps you find what\*(Aqs \fIwasting\fP your disk space. .PP \fBagedu\fP has several operating modes. In one mode, it scans your disk and builds an index file containing a data structure which allows it to efficiently retrieve any information it might need. Typically, you would use it in this mode first, and then run it in one of a number of `query' modes to display a report of the disk space usage of a particular directory and its subdirectories. Those reports can be produced as plain text (much like \fBdu\fP) or as HTML. \fBagedu\fP can even run as a miniature web server, presenting each directory\*(Aqs HTML report with hyperlinks to let you navigate around the file system to similar reports for other directories. .PP So you would typically start using \fBagedu\fP by telling it to do a scan of a directory tree and build an index. This is done with a command such as .PP .nf $\ \fBagedu\ \-s\ /home/fred\fP .fi .PP which will build a large data file called \fBagedu.dat\fP in your current directory. (If that current directory is \fIinside\fP \fB/home/fred\fP, don\*(Aqt worry - \fBagedu\fP is smart enough to discount its own index file.) .PP Having built the index, you would now query it for reports of disk space usage. If you have a graphical web browser, the simplest and nicest way to query the index is by running \fBagedu\fP in web server mode: .PP .nf $\ \fBagedu\ \-w\fP .fi .PP which will print (among other messages) a URL on its standard output along the lines of .PP .nf URL:\ http://127.0.0.1:48638/ .fi .PP (That URL will always begin with `\fB127.\fP', meaning that it\*(Aqs in the \fBlocalhost\fP address space. So only processes running on the same computer can even try to connect to that web server, and also there is access control to prevent other users from seeing it - see below for more detail.) .PP Now paste that URL into your web browser, and you will be shown a graphical representation of the disk usage in \fB/home/fred\fP and its immediate subdirectories, with varying colours used to show the difference between disused and recently-accessed data. Click on any subdirectory to descend into it and see a report for its subdirectories in turn; click on parts of the pathname at the top of any page to return to higher-level directories. When you\*(Aqve finished browsing, you can just press Ctrl-D to send an end-of-file indication to \fBagedu\fP, and it will shut down. .PP After that, you probably want to delete the data file \fBagedu.dat\fP, since it\*(Aqs pretty large. In fact, the command \fBagedu -R\fP will do this for you; and you can chain \fBagedu\fP commands on the same command line, so that instead of the above you could have done .PP .nf $\ \fBagedu\ \-s\ /home/fred\ \-w\ \-R\fP .fi .PP for a single self-contained run of \fBagedu\fP which builds its index, serves web pages from it, and cleans it up when finished. .PP In some situations, you might want to scan the directory structure of one computer, but run \fBagedu\fP\*(Aqs user interface on another. In that case, you can do your scan using the \fBagedu -S\fP option in place of \fBagedu -s\fP, which will make \fBagedu\fP not bother building an index file but instead just write out its scan results in plain text on standard output; then you can funnel that output to the other machine using SSH (or whatever other technique you prefer), and there, run \fBagedu -L\fP to load in the textual dump and turn it into an index file. For example, you might run a command like this (plus any \fBssh\fP options you need) on the machine you want to scan: .PP .nf $\ \fBagedu\ \-S\ /home/fred\ |\ ssh\ indexing\-machine\ agedu\ \-L\fP .fi .PP or, equivalently, run something like this on the other machine: .PP .nf $\ \fBssh\ machine\-to\-scan\ agedu\ \-S\ /home/fred\ |\ agedu\ \-L\fP .fi .PP Either way, the \fBagedu -L\fP command will create an \fBagedu.dat\fP index file, which you can then use with \fBagedu -w\fP just as above. .PP (Another way to do this might be to build the index file on the first machine as normal, and then just copy it to the other machine once it's complete. However, for efficiency, the index file is formatted differently depending on the CPU architecture that \fBagedu\fP is compiled for. So if that doesn\*(Aqt match between the two machines - e.g. if one is a 32-bit machine and one 64-bit - then \fBagedu.dat\fP files written on one machine will not work on the other. The technique described above using \fB-S\fP and \fB-L\fP should work between any two machines.) .PP If you don't have a graphical web browser, you can do text-based queries instead of using \fBagedu\fP\*(Aqs web interface. Having scanned \fB/home/fred\fP in any of the ways suggested above, you might run .PP .nf $\ \fBagedu\ \-t\ /home/fred\fP .fi .PP which again gives a summary of the disk usage in \fB/home/fred\fP and its immediate subdirectories; but this time \fBagedu\fP will print it on standard output, in much the same format as \fBdu\fP. If you then want to find out how much \fIold\fP data is there, you can add the \fB-a\fP option to show only files last accessed a certain length of time ago. For example, to show only files which haven\*(Aqt been looked at in six months or more: .PP .nf $\ \fBagedu\ \-t\ /home/fred\ \-a\ 6m\fP .fi .PP That's the essence of what \fBagedu\fP does. It has other modes of operation for more complex situations, and the usual array of configurable options. The following sections contain a complete reference for all its functionality. .SH "OPERATING MODES" .PP This section describes the operating modes supported by \fBagedu\fP. Each of these is in the form of a command-line option, sometimes with an argument. Multiple operating-mode options may appear on the command line, in which case \fBagedu\fP will perform the specified actions one after another. For instance, as shown in the previous section, you might want to perform a disk scan and immediately launch a web server giving reports from that scan. .IP "\fB-s\fP \fIdirectory\fP or \fB--scan\fP \fIdirectory\fP" In this mode, \fBagedu\fP scans the file system starting at the specified directory, and indexes the results of the scan into a large data file which other operating modes can query. .RS .PP By default, the scan is restricted to a single file system (since the expected use of \fBagedu\fP is that you would probably use it because a particular disk partition was running low on space). You can remove that restriction using the \fB--cross-fs\fP option; other configuration options allow you to include or exclude files or entire subdirectories from the scan. See the next section for full details of the configurable options. .PP The index file is created with restrictive permissions, in case the file system you are scanning contains confidential information in its structure. .PP Index files are dependent on the characteristics of the CPU architecture you created them on. You should not expect to be able to move an index file between different types of computer and have it continue to work. If you need to transfer the results of a disk scan to a different kind of computer, see the \fB-D\fP and \fB-L\fP options below. .RE .IP "\fB-w\fP or \fB--web\fP" In this mode, \fBagedu\fP expects to find an index file already written. It allocates a network port, and starts up a web server on that port which serves reports generated from the index file. By default it invents its own URL and prints it out. .RS .PP The web server runs until \fBagedu\fP receives an end-of-file event on its standard input. (The expected usage is that you run it from the command line, immediately browse web pages until you\*(Aqre satisfied, and then press Ctrl-D.) To disable the EOF behaviour, use the \fB--no-eof\fP option. .PP In case the index file contains any confidential information about your file system, the web server protects the pages it serves from access by other people. On Linux, this is done transparently by means of using \fB/proc/net/tcp\fP to check the owner of each incoming connection; failing that, the web server will require a password to view the reports, and \fBagedu\fP will print the password it invented on standard output along with the URL. .PP Configurable options for this mode let you specify your own address and port number to listen on, and also specify your own choice of authentication method (including turning authentication off completely) and a username and password of your choice. .RE .IP "\fB-t\fP \fIdirectory\fP or \fB--text\fP \fIdirectory\fP" In this mode, \fBagedu\fP generates a textual report on standard output, listing the disk usage in the specified directory and all its subdirectories down to a given depth. By default that depth is 1, so that you see a report for \fIdirectory\fP itself and all of its immediate subdirectories. You can configure a different depth (or no depth limit) using \fB-d\fP, described in the next section. .RS .PP Used on its own, \fB-t\fP merely lists the \fItotal\fP disk usage in each subdirectory; \fBagedu\fP\*(Aqs additional ability to distinguish unused from recently-used data is not activated. To activate it, use the \fB-a\fP option to specify a minimum age. .PP The directory structure stored in \fBagedu\fP\*(Aqs index file is treated as a set of literal strings. This means that you cannot refer to directories by synonyms. So if you ran \fBagedu -s .\fP, then all the path names you later pass to the \fB-t\fP option must be either `\fB.\fP' or begin with `\fB./\fP'. Similarly, symbolic links within the directory you scanned will not be followed; you must refer to each directory by its canonical, symlink-free pathname. .RE .IP "\fB-R\fP or \fB--remove\fP" In this mode, \fBagedu\fP deletes its index file. Running just \fBagedu -R\fP on its own is therefore equivalent to typing \fBrm agedu.dat\fP. However, you can also put \fB-R\fP on the end of a command line to indicate that \fBagedu\fP should delete its index file after it finishes performing other operations. .IP "\fB-S\fP \fIdirectory\fP or \fB--scan-dump\fP \fIdirectory\fP" In this mode, \fBagedu\fP will scan a directory tree and convert the results straight into a textual dump on standard output, without generating an index file at all. The dump data is intended for \fBagedu -L\fP to read. .IP "\fB-L\fP or \fB--load\fP" In this mode, \fBagedu\fP expects to read a dump produced by the \fB-S\fP option from its standard input. It constructs an index file from that dump, exactly as it would have if it had read the same data from a disk scan in \fB-s\fP mode. .IP "\fB-D\fP or \fB--dump\fP" In this mode, \fBagedu\fP reads an existing index file and produces a dump of its contents on standard output, in the same format used by \fB-S\fP and \fB-L\fP. This option could be used to convert an existing index file into a format acceptable to a different kind of computer, by dumping it using \fB-D\fP and then loading the dump back in on the other machine using \fB-L\fP. .RS .PP (The output of \fBagedu -D\fP on an existing index file will not be exactly \fIidentical\fP to what \fBagedu -S\fP would have originally produced, due to a difference in treatment of last-access times on directories. However, it should be effectively equivalent for most purposes. See the documentation of the \fB--dir-atime\fP option in the next section for further detail.) .RE .IP "\fB-H\fP \fIdirectory\fP or \fB--html\fP \fIdirectory\fP" In this mode, \fBagedu\fP will generate an HTML report of the disk usage in the specified directory and its immediate subdirectories, in the same form that it serves from its web server in \fB-w\fP mode. .RS .PP By default, a single HTML report will be generated and simply written to standard output, with no hyperlinks pointing to other similar pages. If you also specify the \fB-d\fP option (see below), \fBagedu\fP will instead write out a collection of HTML files with hyperlinks between them, and call the top-level file \fBindex.html\fP. .RE .IP "\fB--cgi\fP" In this mode, \fBagedu\fP will run as the bulk of a CGI script which provides the same set of web pages as the built-in web server would. It will read the usual CGI environment variables, and write CGI-style data to its standard output. .RS .PP The actual CGI program itself should be a tiny wrapper around \fBagedu\fP which passes it the \fB--cgi\fP option, and also (probably) \fB-f\fP to locate the index file. \fBagedu\fP will do everything else. For example, your script might read .PP .nf #!/bin/sh \fI/some/path/to/\fPagedu\ \-\-cgi\ \-f\ \fI/some/other/path/to/\fPagedu.dat .fi .PP (Note that \fBagedu\fP will produce the \fIentire\fP CGI output, including status code, HTTP headers and the full HTML document. If you try to surround the call to \fBagedu --cgi\fP with code that adds your own HTML header and footer, you won\*(Aqt get the results you want, and \fBagedu\fP\*(Aqs HTTP-level features such as auto-redirecting to canonical versions of URIs will stop working.) .PP No access control is performed in this mode: restricting access to CGI scripts is assumed to be the job of the web server. .RE .IP "\fB--presort\fP and \fB--postsort\fP" In these two modes, \fBagedu\fP will expect to read a textual data dump from its standard input of the form produced by \fB-S\fP (and \fB-D\fP). It will transform the data into a different version of its text dump format, and write the transformed version on standard output. .RS .PP The ordinary dump file format is reasonably readable, but loading it into an index file using \fBagedu -L\fP requires it to be sorted in a specific order, which is complicated to describe and difficult to implement using ordinary Unix sorting tools. So if you want to construct your own data dump from a source of your own that \fBagedu\fP itself doesn\*(Aqt know how to scan, you will need to make sure it\*(Aqs sorted in the right order. .PP To help with this, \fBagedu\fP provides a secondary dump format which is `sortable', in the sense that ordinary \fBsort\fP(\fI1\fP) without arguments will arrange it into the right order. However, the sortable format is much more unreadable and also twice the size, so you wouldn\*(Aqt want to write it directly! .PP So the recommended procedure is to generate dump data in the ordinary format; then pipe it through \fBagedu --presort\fP to turn it into the sortable format; then sort it; \fIthen\fP pipe it into \fBagedu -L\fP (which can accept either the normal or the sortable format as input). For example: .PP .nf \fIgenerate_custom_data.sh\fP\ |\ agedu\ \-\-presort\ |\ sort\ |\ agedu\ \-L .fi .PP If you need to transform the sorted dump file back into the ordinary format, \fBagedu --postsort\fP can do that. But since \fBagedu -L\fP can accept either format as input, you may not need to. .RE .IP "\fB-h\fP or \fB--help\fP" Causes \fBagedu\fP to print some help text and terminate immediately. .IP "\fB-V\fP or \fB--version\fP" Causes \fBagedu\fP to print its version number and terminate immediately. .SH "OPTIONS" .PP This section describes the various configuration options that affect \fBagedu\fP\*(Aqs operation in one mode or another. .PP The following option affects nearly all modes (except \fB-S\fP): .IP "\fB-f\fP \fIfilename\fP or \fB--file\fP \fIfilename\fP" Specifies the location of the index file which \fBagedu\fP creates, reads or removes depending on its operating mode. By default, this is simply `\fBagedu.dat\fP', in whatever is the current working directory when you run \fBagedu\fP. .PP The following options affect the disk-scanning modes, \fB-s\fP and \fB-S\fP: .IP "\fB--cross-fs\fP and \fB--no-cross-fs\fP" These configure whether or not the disk scan is permitted to cross between different file systems. The default is not to: \fBagedu\fP will normally skip over subdirectories on which a different file system is mounted. This makes it convenient when you want to free up space on a particular file system which is running low. However, in other circumstances you might wish to see general information about the use of space no matter which file system it\*(Aqs on (for instance, if your real concern is your backup media running out of space, and if your backups do not treat different file systems specially); in that situation, use \fB--cross-fs\fP. .RS .PP (Note that this default is the opposite way round from the corresponding option in \fBdu\fP.) .RE .IP "\fB--prune\fP \fIwildcard\fP and \fB--prune-path\fP \fIwildcard\fP" These cause particular files or directories to be omitted entirely from the scan. If \fBagedu\fP\*(Aqs scan encounters a file or directory whose name matches the wildcard provided to the \fB--prune\fP option, it will not include that file in its index, and also if it\*(Aqs a directory it will skip over it and not scan its contents. .RS .PP Note that in most Unix shells, wildcards will probably need to be escaped on the command line, to prevent the shell from expanding the wildcard before \fBagedu\fP sees it. .PP \fB--prune-path\fP is similar to \fB--prune\fP, except that the wildcard is matched against the entire pathname instead of just the filename at the end of it. So whereas \fB--prune *a*b*\fP will match any file whose actual name contains an \fBa\fP somewhere before a \fBb\fP, \fB--prune-path *a*b*\fP will also match a file whose name contains \fBb\fP and which is inside a directory containing an \fBa\fP, or any file inside a directory of that form, and so on. .RE .IP "\fB--exclude\fP \fIwildcard\fP and \fB--exclude-path\fP \fIwildcard\fP" These cause particular files or directories to be omitted from the index, but not from the scan. If \fBagedu\fP\*(Aqs scan encounters a file or directory whose name matches the wildcard provided to the \fB--exclude\fP option, it will not include that file in its index - but unlike \fB--prune\fP, if the file in question is a directory it will still scan its contents and index them if they are not ruled out themselves by \fB--exclude\fP options. .RS .PP As above, \fB--exclude-path\fP is similar to \fB--exclude\fP, except that the wildcard is matched against the entire pathname. .RE .IP "\fB--include\fP \fIwildcard\fP and \fB--include-path\fP \fIwildcard\fP" These cause particular files or directories to be re-included in the index and the scan, if they had previously been ruled out by one of the above exclude or prune options. You can interleave include, exclude and prune options as you wish on the command line, and if more than one of them applies to a file then the last one takes priority. .RS .PP For example, if you wanted to see only the disk space taken up by MP3 files, you might run .PP .nf $\ \fBagedu\ \-s\ .\ \-\-exclude\ \*(Aq*\*(Aq\ \-\-include\ \*(Aq*.mp3\*(Aq\fP .fi .PP which will cause everything to be omitted from the scan, but then the MP3 files to be put back in. If you then wanted only a subset of those MP3s, you could then exclude some of them again by adding, say, `\fB--exclude-path \*(Aq./queen/*\*(Aq\fP' (or, more efficiently, `\fB--prune ./queen\fP') on the end of that command. .PP As with the previous two options, \fB--include-path\fP is similar to \fB--include\fP except that the wildcard is matched against the entire pathname. .RE .IP "\fB--progress\fP, \fB--no-progress\fP and \fB--tty-progress\fP" When \fBagedu\fP is scanning a directory tree, it will typically print a one-line progress report every second showing where it has reached in the scan, so you can have some idea of how much longer it will take. (Of course, it can\*(Aqt predict \fIexactly\fP how long it will take, since it doesn\*(Aqt know which of the directories it hasn\*(Aqt scanned yet will turn out to be huge.) .RS .PP By default, those progress reports are displayed on \fBagedu\fP\*(Aqs standard error channel, if that channel points to a terminal device. If you need to manually enable or disable them, you can use the above three options to do so: \fB--progress\fP unconditionally enables the progress reports, \fB--no-progress\fP unconditionally disables them, and \fB--tty-progress\fP reverts to the default behaviour which is conditional on standard error being a terminal. .RE .IP "\fB--dir-atime\fP and \fB--no-dir-atime\fP" In normal operation, \fBagedu\fP ignores the atimes (last access times) on the \fIdirectories\fP it scans: it only pays attention to the atimes of the \fIfiles\fP inside those directories. This is because directory atimes tend to be reset by a lot of system administrative tasks, such as \fBcron\fP jobs which scan the file system for one reason or another - or even other invocations of \fBagedu\fP itself, though it tries to avoid modifying any atimes if possible. So the literal atimes on directories are typically not representative of how long ago the data in question was last accessed with real intent to use that data in particular. .RS .PP Instead, \fBagedu\fP makes up a fake atime for every directory it scans, which is equal to the newest atime of any file in or below that directory (or the directory\*(Aqs last \fImodification\fP time, whichever is newest). This is based on the assumption that all \fIimportant\fP accesses to directories are actually accesses to the files inside those directories, so that when any file is accessed all the directories on the path leading to it should be considered to have been accessed as well. .PP In unusual cases it is possible that a directory itself might embody important data which is accessed by reading the directory. In that situation, \fBagedu\fP\*(Aqs atime-faking policy will misreport the directory as disused. In the unlikely event that such directories form a significant part of your disk space usage, you might want to turn off the faking. The \fB--dir-atime\fP option does this: it causes the disk scan to read the original atimes of the directories it scans. .PP The faking of atimes on directories also requires a processing pass over the index file after the main disk scan is complete. \fB--dir-atime\fP also turns this pass off. Hence, this option affects the \fB-L\fP option as well as \fB-s\fP and \fB-S\fP. .PP (The previous section mentioned that there might be subtle differences between the output of \fBagedu -s /path -D\fP and \fBagedu -S /path\fP. This is why. Doing a scan with \fB-s\fP and then dumping it with \fB-D\fP will dump the fully faked atimes on the directories, whereas doing a scan-to-dump with \fB-S\fP will dump only \fIpartially\fP faked atimes - specifically, each directory\*(Aqs last modification time - since the subsequent processing pass will not have had a chance to take place. However, loading either of the resulting dump files with \fB-L\fP will perform the atime-faking processing pass, leading to the same data in the index file in each case. In normal usage it should be safe to ignore all of this complexity.) .RE .IP "\fB--mtime\fP" This option causes \fBagedu\fP to index files by their last modification time instead of their last access time. You might want to use this if your last access times were completely useless for some reason: for example, if you had recently searched every file on your system, the system would have lost all the information about what files you hadn\*(Aqt recently accessed before then. Using this option is liable to be less effective at finding genuinely wasted space than the normal mode (that is, it will be more likely to flag things as disused when they\*(Aqre not, so you will have more candidates to go through by hand looking for data you don\*(Aqt need), but may be better than nothing if your last-access times are unhelpful. .RS .PP Another use for this mode might be to find \fIrecently created\fP large data. If your disk has been gradually filling up for years, the default mode of \fBagedu\fP will let you find unused data to delete; but if you know your disk had plenty of space recently and now it\*(Aqs suddenly full, and you suspect that some rogue program has left a large core dump or output file, then \fBagedu --mtime\fP might be a convenient way to locate the culprit. .RE .IP "\fB--logicalsize\fP" This option causes \fBagedu\fP to consider the size of each file to be its `logical' size, rather than the amount of space it consumes on disk. (That is, it will use \fBst_size\fP instead of \fBst_blocks\fP in the data returned from \fBstat\fP(\fI2\fP).) This option makes \fBagedu\fP less accurate at reporting how much of your disk is used, but it might be useful in specialist cases, such as working around a file system that is misreporting physical sizes. .RS .PP For most files, the physical size of a file will be larger than the logical size, reflecting the fact that filesystem layouts generally allocate a whole number of blocks of the disk to each file, so some space is wasted at the end of the last block. So counting only the logical file size will typically cause under-reporting of the disk usage (perhaps \fIlarge\fP under-reporting in the case of a very large number of very small files). .PP On the other hand, sometimes a file with a very large logical size can have `holes' where no data is actually stored, in which case using the logical size of the file will \fIover\fP-report its disk usage. So the use of logical sizes can give wrong answers in both directions. .RE .PP The following option affects all the modes that generate reports: the web server mode \fB-w\fP, the stand-alone HTML generation mode \fB-H\fP and the text report mode \fB-t\fP. .IP "\fB--files\fP" This option causes \fBagedu\fP\*(Aqs reports to list the individual files in each directory, instead of just giving a combined report for everything that\*(Aqs not in a subdirectory. .PP The following option affects the text report mode \fB-t\fP. .IP "\fB-a\fP \fIage\fP or \fB--age\fP \fIage\fP" This option tells \fBagedu\fP to report only files of at least the specified age. An age is specified as a number, followed by one of `\fBy\fP' (years), `\fBm\fP' (months), `\fBw\fP' (weeks) or `\fBd\fP' (days). (This syntax is also used by the \fB-r\fP option.) For example, \fB-a 6m\fP will produce a text report which includes only files at least six months old. .PP The following options affect the stand-alone HTML generation mode \fB-H\fP and the text report mode \fB-t\fP. .IP "\fB-d\fP \fIdepth\fP or \fB--depth\fP \fIdepth\fP" This option controls the maximum depth to which \fBagedu\fP recurses when generating a text or HTML report. .RS .PP In text mode, the default is 1, meaning that the report will include the directory given on the command line and all of its immediate subdirectories. A depth of two includes another level below that, and so on; a depth of zero means \fIonly\fP the directory on the command line. .PP In HTML mode, specifying this option switches \fBagedu\fP from writing out a single HTML file to writing out multiple files which link to each other. A depth of 1 means \fBagedu\fP will write out an HTML file for the given directory and also one for each of its immediate subdirectories. .PP If you want \fBagedu\fP to recurse as deeply as possible, give the special word `\fBmax\fP' as an argument to \fB-d\fP. .RE .IP "\fB-o\fP \fIfilename\fP or \fB--output\fP \fIfilename\fP" This option is used to specify an output file for \fBagedu\fP to write its report to. In text mode or single-file HTML mode, the argument is treated as the name of a file. In multiple-file HTML mode, the argument is treated as the name of a directory: the directory will be created if it does not already exist, and the output HTML files will be created inside it. .PP The following option affects only the stand-alone HTML generation mode \fB-H\fP, and even then, only in recursive mode (with \fB-d\fP): .IP "\fB--numeric\fP" This option tells \fBagedu\fP to name most of its output HTML files numerically. The root of the whole output file collection will still be called \fBindex.html\fP, but all the rest will have names like \fB73.html\fP or \fB12525.html\fP. (The numbers are essentially arbitrary; in fact, they\*(Aqre indices of nodes in the data structure used by \fBagedu\fP\*(Aqs index file.) .RS .PP This system of file naming is less intuitive than the default of naming files after the sub-pathname they index. It's also less stable: the same pathname will not necessarily be represented by the same filename if \fBagedu -H\fP is re-run after another scan of the same directory tree. However, it does have the virtue that it keeps the filenames \fIshort\fP, so that even if your directory tree is very deep, the output HTML files won\*(Aqt exceed any OS limit on filename length. .RE .PP The following options affect the web server mode \fB-w\fP, and in some cases also the stand-alone HTML generation mode \fB-H\fP: .IP "\fB-r\fP \fIage range\fP or \fB--age-range\fP \fIage range\fP" The HTML reports produced by \fBagedu\fP use a range of colours to indicate how long ago data was last accessed, running from red (representing the most disused data) to green (representing the newest). By default, the lengths of time represented by the two ends of that spectrum are chosen by examining the data file to see what range of ages appears in it. However, you might want to set your own limits, and you can do this using \fB-r\fP. .RS .PP The argument to \fB-r\fP consists of a single age, or two ages separated by a minus sign. An age is a number, followed by one of `\fBy\fP' (years), `\fBm\fP' (months), `\fBw\fP' (weeks) or `\fBd\fP' (days). (This syntax is also used by the \fB-a\fP option.) The first age in the range represents the oldest data, and will be coloured red in the HTML; the second age represents the newest, coloured green. If the second age is not specified, it will default to zero (so that green means data which has been accessed \fIjust now\fP). .PP For example, \fB-r 2y\fP will mark data in red if it has been unused for two years or more, and green if it has been accessed just now. \fB-r 2y-3m\fP will similarly mark data red if it has been unused for two years or more, but will mark it green if it has been accessed three months ago or later. .RE .IP "\fB--address\fP \fIaddr\fP[\fB:\fP\fIport\fP]" Specifies the network address and port number on which \fBagedu\fP should listen when running its web server. If you want \fBagedu\fP to listen for connections coming in from any source, specify the address as the special value \fBANY\fP. If the port number is omitted, an arbitrary unused port will be chosen for you and displayed. .RS .PP If you specify this option, \fBagedu\fP will not print its URL on standard output (since you are expected to know what address you told it to listen to). .RE .IP "\fB--auth\fP \fIauth-type\fP" Specifies how \fBagedu\fP should control access to the web pages it serves. The options are as follows: .RS .IP "\fBmagic\fP" This option only works on Linux, and only when the incoming connection is from the same machine that \fBagedu\fP is running on. On Linux, the special file \fB/proc/net/tcp\fP contains a list of network connections currently known to the operating system kernel, including which user id created them. So \fBagedu\fP will look up each incoming connection in that file, and allow access if it comes from the same user id under which \fBagedu\fP itself is running. Therefore, in \fBagedu\fP\*(Aqs normal web server mode, you can safely run it on a multi-user machine and no other user will be able to read data out of your index file. .IP "\fBbasic\fP" In this mode, \fBagedu\fP will use HTTP Basic authentication: the user will have to provide a username and password via their browser. \fBagedu\fP will normally make up a username and password for the purpose, but you can specify your own; see below. .IP "\fBnone\fP" In this mode, the web server is unauthenticated: anyone connecting to it has full access to the reports generated by \fBagedu\fP. Do not do this unless there is nothing confidential at all in your index file, or unless you are certain that nobody but you can run processes on your computer. .IP "\fBdefault\fP" This is the default mode if you do not specify one of the above. In this mode, \fBagedu\fP will attempt to use Linux magic authentication, but if it detects at startup time that \fB/proc/net/tcp\fP is absent or non-functional then it will fall back to using HTTP Basic authentication and invent a user name and password. .RE .IP "\fB--auth-file\fP \fIfilename\fP or \fB--auth-fd\fP \fIfd\fP" When \fBagedu\fP is using HTTP Basic authentication, these options allow you to specify your own user name and password. If you specify \fB--auth-file\fP, these will be read from the specified file; if you specify \fB--auth-fd\fP they will instead be read from a given file descriptor which you should have arranged to pass to \fBagedu\fP. In either case, the authentication details should consist of the username, followed by a colon, followed by the password, followed \fIimmediately\fP by end of file (no trailing newline, or else it will be considered part of the password). .IP "\fB--title\fP \fItitle\fP" Specify the string that appears at the start of the \fB
%s
\r\n" "\r\n", code, errmsg, extraheader ? extraheader : "", code, errmsg, code, errmsg, errtext); } static char *http_success(char *mimetype, bool stuff_cr, char *document) { return dupfmt("HTTP/1.1 200 OK\r\n" "Date: %D\r\n" "Expires: %D\r\n" "Server: " PNAME "\r\n" "Connection: close\r\n" "Content-Type: %s\r\n" "\r\n" "%S", mimetype, stuff_cr, document); } /* * Called when data comes in on a connection. * * If this function returns NULL, the platform code continues * reading from the socket. Otherwise, it returns some dynamically * allocated data which the platform code will then write to the * socket before closing it. */ char *got_data(struct connctx *ctx, char *data, int length, bool magic_access, const char *auth_string, const struct html_config *cfg) { char *line, *p, *q, *r, *z1, *z2, c1, c2; bool auth_correct = false; unsigned long index; char *document, *ret; /* * Add the data we've just received to our buffer. */ if (ctx->datasize < ctx->datalen + length) { ctx->datasize = (ctx->datalen + length) * 3 / 2 + 4096; ctx->data = sresize(ctx->data, ctx->datasize, char); } memcpy(ctx->data + ctx->datalen, data, length); ctx->datalen += length; /* * Gradually process the HTTP request as we receive it. */ if (ctx->state == READING_REQ_LINE) { /* * We're waiting for the first line of the input, which * contains the main HTTP request. See if we've got it * yet. */ line = ctx->data; /* * RFC 2616 section 4.1: `In the interest of robustness, * [...] if the server is reading the protocol stream at * the beginning of a message and receives a CRLF first, * it should ignore the CRLF.' */ while (line - ctx->data < ctx->datalen && (*line == '\r' || *line == '\n')) line++; q = line; while (q - ctx->data < ctx->datalen && *q != '\n') q++; if (q - ctx->data >= ctx->datalen) return NULL; /* not got request line yet */ /* * We've got the first line of the request. Zero-terminate * and parse it into method, URL and optional HTTP * version. */ *q = '\0'; ctx->headers = q+1; if (q > line && q[-1] == '\r') *--q = '\0'; z1 = z2 = q; c1 = c2 = *q; p = line; while (*p && !isspace((unsigned char)*p)) p++; if (*p) { z1 = p++; c1 = *z1; *z1 = '\0'; } while (*p && isspace((unsigned char)*p)) p++; q = p; while (*q && !isspace((unsigned char)*q)) q++; z2 = q++; c2 = *z2; *z2 = '\0'; while (*q && isspace((unsigned char)*q)) q++; /* * Now `line' points at the method name; p points at the * URL, if any; q points at the HTTP version, if any. */ /* * There should _be_ a URL, on any request type at all. */ if (!*p) { char *ret, *text; /* Restore the request to the way we received it. */ *z2 = c2; *z1 = c1; text = dupfmt("" PNAME "
received the HTTP request"
" \"%h
\", which contains no URL.",
line);
ret = http_error("400", "Bad request", NULL, text);
sfree(text);
return ret;
}
ctx->method = line;
ctx->url = p;
/*
* If there was an HTTP version, we might need to see
* headers. Otherwise, the request is done.
*/
if (*q) {
ctx->state = READING_HEADERS;
} else {
ctx->state = DONE;
}
}
if (ctx->state == READING_HEADERS) {
/*
* While we're receiving the HTTP request headers, all we
* do is to keep scanning to see if we find two newlines
* next to each other.
*/
q = ctx->data + ctx->datalen;
for (p = ctx->headers; p < q; p++) {
if (*p == '\n' &&
((p+1 < q && p[1] == '\n') ||
(p+2 < q && p[1] == '\r' && p[2] == '\n'))) {
p[1] = '\0';
ctx->state = DONE;
break;
}
}
}
if (ctx->state == DONE) {
/*
* Now we have the entire HTTP request. Decide what to do
* with it.
*/
if (auth_string) {
/*
* Search the request headers for Authorization.
*/
q = ctx->data + ctx->datalen;
for (p = ctx->headers; p < q; p++) {
const char *hdr = "Authorization:";
int i;
for (i = 0; hdr[i]; i++) {
if (p >= q || tolower((unsigned char)*p) !=
tolower((unsigned char)hdr[i]))
break;
p++;
}
if (!hdr[i])
break; /* found our header */
p = memchr(p, '\n', q - p);
if (!p)
p = q;
}
if (p < q) {
while (p < q && isspace((unsigned char)*p))
p++;
r = p;
while (p < q && !isspace((unsigned char)*p))
p++;
if (p < q) {
*p++ = '\0';
if (!strcasecmp(r, "Basic")) {
while (p < q && isspace((unsigned char)*p))
p++;
r = p;
while (p < q && !isspace((unsigned char)*p))
p++;
if (p < q) {
*p++ = '\0';
if (!strcmp(r, auth_string))
auth_correct = true;
}
}
}
}
}
if (!magic_access && !auth_correct) {
if (auth_string) {
ret = http_error("401", "Unauthorized",
"WWW-Authenticate: Basic realm=\""PNAME"\"\r\n",
"\nYou must authenticate to view these pages.");
} else {
ret = http_error("403", "Forbidden", NULL,
"This is a restricted-access set of pages.");
}
} else {
p = ctx->url;
if (!html_parse_path(ctx->t, p, cfg, &index)) {
ret = http_error("404", "Not Found", NULL,
"This is not a valid pathname.");
} else {
char *canonpath = html_format_path(ctx->t, cfg, index);
if (!strcmp(canonpath, p)) {
/*
* This is a canonical path. Return the document.
*/
document = html_query(ctx->t, index, cfg, true);
if (document) {
ret = http_success("text/html", true, document);
sfree(document);
} else {
ret = http_error("404", "Not Found", NULL,
"This is not a valid pathname.");
}
} else {
/*
* This is a non-canonical path. Return a redirect
* to the right one.
*
* To do this, we must search the request headers
* for Host:, to see what the client thought it
* was calling our server.
*/
char *host = NULL;
q = ctx->data + ctx->datalen;
for (p = ctx->headers; p < q; p++) {
const char *hdr = "Host:";
int i;
for (i = 0; hdr[i]; i++) {
if (p >= q || tolower((unsigned char)*p) !=
tolower((unsigned char)hdr[i]))
break;
p++;
}
if (!hdr[i])
break; /* found our header */
p = memchr(p, '\n', q - p);
if (!p)
p = q;
}
if (p < q) {
while (p < q && isspace((unsigned char)*p))
p++;
r = p;
while (p < q) {
if (*p == '\r' && (p+1 >= q || p[1] == '\n'))
break;
p++;
}
host = snewn(p-r+1, char);
memcpy(host, r, p-r);
host[p-r] = '\0';
}
if (host) {
char *header = dupfmt("Location: http://%s%s\r\n",
host, canonpath);
ret = http_error("301", "Moved", header,
"This is not the canonical form of"
" this pathname.");
sfree(header);
} else {
ret = http_error("400", "Bad Request", NULL,
"Needed a Host: header to return"
" the intended redirection.");
}
}
sfree(canonpath);
}
}
return ret;
} else
return NULL;
}
/* --- Platform support for running a web server. --- */
enum { FD_CLIENT, FD_LISTENER, FD_CONNECTION };
struct fd {
int fd;
int type;
bool deleted;
char *wdata;
int wdatalen, wdatapos;
bool magic_access;
struct connctx *cctx;
};
struct fd *fds = NULL;
int nfds = 0, fdsize = 0;
struct fd *new_fdstruct(int fd, int type)
{
struct fd *ret;
if (nfds >= fdsize) {
fdsize = nfds * 3 / 2 + 32;
fds = sresize(fds, fdsize, struct fd);
}
ret = &fds[nfds++];
ret->fd = fd;
ret->type = type;
ret->wdata = NULL;
ret->wdatalen = ret->wdatapos = 0;
ret->cctx = NULL;
ret->deleted = false;
ret->magic_access = false;
return ret;
}
int check_owning_uid(int fd, int flip)
{
struct sockaddr_storage sock, peer;
socklen_t addrlen;
char linebuf[4096], matchbuf[128];
char *filename;
int matchlen;
FILE *fp;
addrlen = sizeof(sock);
if (getsockname(fd, (struct sockaddr *)&sock, &addrlen)) {
fprintf(stderr, "getsockname: %s\n", strerror(errno));
exit(1);
}
addrlen = sizeof(peer);
if (getpeername(fd, (struct sockaddr *)&peer, &addrlen)) {
if (errno == ENOTCONN) {
memset(&peer, 0, sizeof(peer));
peer.ss_family = sock.ss_family;
} else {
fprintf(stderr, "getpeername: %s\n", strerror(errno));
exit(1);
}
}
if (flip) {
struct sockaddr_storage tmp = sock;
sock = peer;
peer = tmp;
}
#ifdef AGEDU_IPV4
if (peer.ss_family == AF_INET) {
struct sockaddr_in *sock4 = (struct sockaddr_in *)&sock;
struct sockaddr_in *peer4 = (struct sockaddr_in *)&peer;
assert(peer4->sin_family == AF_INET);
sprintf(matchbuf, "%08X:%04X %08X:%04X",
peer4->sin_addr.s_addr, ntohs(peer4->sin_port),
sock4->sin_addr.s_addr, ntohs(sock4->sin_port));
filename = "/proc/net/tcp";
} else
#endif
#ifdef AGEDU_IPV6
if (peer.ss_family == AF_INET6) {
struct sockaddr_in6 *sock6 = (struct sockaddr_in6 *)&sock;
struct sockaddr_in6 *peer6 = (struct sockaddr_in6 *)&peer;
char *p;
assert(peer6->sin6_family == AF_INET6);
p = matchbuf;
for (int i = 0; i < 4; i++)
p += sprintf(p, "%08X",
((uint32_t *)peer6->sin6_addr.s6_addr)[i]);
p += sprintf(p, ":%04X ", ntohs(peer6->sin6_port));
for (int i = 0; i < 4; i++)
p += sprintf(p, "%08X",
((uint32_t *)sock6->sin6_addr.s6_addr)[i]);
p += sprintf(p, ":%04X", ntohs(sock6->sin6_port));
filename = "/proc/net/tcp6";
} else
#endif
{
return -1; /* unidentified family */
}
matchlen = strlen(matchbuf);
fp = fopen(filename, "r");
if (fp) {
while (fgets(linebuf, sizeof(linebuf), fp)) {
/*
* Check for, and skip over, the initial sequence number
* that appears before the sockaddr/peeraddr pair. This is
* printf'ed as "%4d: ", so it could be prefixed by
* spaces, but could also be longer than 4 digits.
*/
const char *p = linebuf;
p += strspn(p, " ");
p += strspn(p, "0123456789");
if (*p != ':')
goto not_this_line;
p++;
p += strspn(p, " ");
if (!strncmp(p, matchbuf, matchlen)) {
/*
* This line matches the address string. Skip 4 words
* after that (TCP state-machine state, tx/rx queue,
* timer details, number of retransmissions) and then
* we expect to find the uid.
*/
int word;
p += matchlen;
p += strspn(p, " ");
for (word = 0; word < 4; word++) {
p += strcspn(p, " ");
if (*p != ' ')
goto not_this_line;
p += strspn(p, " ");
}
fclose(fp);
return atoi(p);
}
not_this_line:;
}
fclose(fp);
}
return -1;
}
void check_magic_access(struct fd *fd)
{
if (check_owning_uid(fd->fd, 0) == getuid())
fd->magic_access = true;
}
static void base64_encode_atom(unsigned char *data, int n, char *out)
{
static const char base64_chars[] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
unsigned word;
word = data[0] << 16;
if (n > 1)
word |= data[1] << 8;
if (n > 2)
word |= data[2];
out[0] = base64_chars[(word >> 18) & 0x3F];
out[1] = base64_chars[(word >> 12) & 0x3F];
if (n > 1)
out[2] = base64_chars[(word >> 6) & 0x3F];
else
out[2] = '=';
if (n > 2)
out[3] = base64_chars[word & 0x3F];
else
out[3] = '=';
}
struct listenfds {
int v4, v6;
};
static int make_listening_sockets(struct listenfds *fds, const char *address,
const char *portstr, char **outhostname)
{
/*
* Establish up to 2 listening sockets, for IPv4 and IPv6, on the
* same arbitrarily selected port. Return them in fds.v4 and
* fds.v6, with each entry being -1 if that socket was not
* established at all. Main return value is the port chosen, or <0
* if the whole process failed.
*/
struct sockaddr_in6 addr6;
struct sockaddr_in addr4;
bool got_v6, got_v4;
socklen_t addrlen;
int ret, port = 0;
/*
* Special case of the address parameter: if it's "0.0.0.0", treat
* it like NULL, because that was how you specified listen-on-any-
* address in versions before the IPv6 revamp.
*/
{
int u,v,w,x;
if (address &&
4 == sscanf(address, "%d.%d.%d.%d", &u, &v, &w, &x) &&
u==0 && v==0 && w==0 && x==0)
address = NULL;
}
if (portstr && !*portstr)
portstr = NULL; /* normalise NULL and empty string */
if (!address) {
char hostname[HOST_NAME_MAX];
if (gethostname(hostname, sizeof(hostname)) < 0) {
perror("hostname");
return -1;
}
*outhostname = dupstr(hostname);
} else {
*outhostname = dupstr(address);
}
fds->v6 = fds->v4 = -1;
got_v6 = false;
got_v4 = false;
#if HAVE_GETADDRINFO
/*
* Resolve the given address using getaddrinfo, yielding an IPv6
* address or an IPv4 one or both.
*/
struct addrinfo hints;
struct addrinfo *addrs, *ai;
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_STREAM;
hints.ai_protocol = 0;
hints.ai_flags = AI_PASSIVE;
ret = getaddrinfo(address, portstr, &hints, &addrs);
if (ret) {
fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(ret));
return -1;
}
for (ai = addrs; ai; ai = ai->ai_next) {
#ifdef AGEDU_IPV6
if (!got_v6 && ai->ai_family == AF_INET6) {
memcpy(&addr6, ai->ai_addr, ai->ai_addrlen);
if (portstr && !port)
port = ntohs(addr6.sin6_port);
got_v6 = true;
}
#endif
#ifdef AGEDU_IPV4
if (!got_v4 && ai->ai_family == AF_INET) {
memcpy(&addr4, ai->ai_addr, ai->ai_addrlen);
if (portstr && !port)
port = ntohs(addr4.sin_port);
got_v4 = true;
}
#endif
}
#elif HAVE_GETHOSTBYNAME
/*
* IPv4-only setup using inet_addr and gethostbyname.
*/
struct hostent *h;
memset(&addr4, 0, sizeof(addr4));
addr4.sin_family = AF_INET;
if (!address) {
addr4.sin_addr.s_addr = htons(INADDR_ANY);
got_v4 = true;
} else if (inet_aton(address, &addr4.sin_addr)) {
got_v4 = true; /* numeric address */
} else if ((h = gethostbyname(address)) != NULL) {
memcpy(&addr4.sin_addr, h->h_addr, sizeof(addr4.sin_addr));
got_v4 = true;
} else {
fprintf(stderr, "gethostbyname: %s\n", hstrerror(h_errno));
return -1;
}
if (portstr) {
struct servent *s;
if (!portstr[strspn(portstr, "0123456789")]) {
port = atoi(portstr);
} else if ((s = getservbyname(portstr, NULL)) != NULL) {
port = ntohs(s->s_port);
} else {
fprintf(stderr, "getservbyname: port '%s' not understood\n",
portstr);
return -1;
}
}
#endif
#ifdef AGEDU_IPV6
#ifdef AGEDU_IPV4
retry:
#endif
if (got_v6) {
fds->v6 = socket(PF_INET6, SOCK_STREAM, 0);
if (fds->v6 < 0) {
fprintf(stderr, "socket(PF_INET6): %s\n", strerror(errno));
goto done_v6;
}
#ifdef IPV6_V6ONLY
{
int i = 1;
if (setsockopt(fds->v6, IPPROTO_IPV6, IPV6_V6ONLY,
(char *)&i, sizeof(i)) < 0) {
fprintf(stderr, "setsockopt(IPV6_V6ONLY): %s\n",
strerror(errno));
close(fds->v6);
fds->v6 = -1;
goto done_v6;
}
}
#endif /* IPV6_V6ONLY */
addr6.sin6_port = htons(port);
addrlen = sizeof(addr6);
if (bind(fds->v6, (const struct sockaddr *)&addr6, addrlen) < 0) {
fprintf(stderr, "bind: %s\n", strerror(errno));
close(fds->v6);
fds->v6 = -1;
goto done_v6;
}
if (listen(fds->v6, 5) < 0) {
fprintf(stderr, "listen: %s\n", strerror(errno));
close(fds->v6);
fds->v6 = -1;
goto done_v6;
}
if (port == 0) {
addrlen = sizeof(addr6);
if (getsockname(fds->v6, (struct sockaddr *)&addr6,
&addrlen) < 0) {
fprintf(stderr, "getsockname: %s\n", strerror(errno));
close(fds->v6);
fds->v6 = -1;
goto done_v6;
}
port = ntohs(addr6.sin6_port);
}
}
done_v6:
#endif
#ifdef AGEDU_IPV4
if (got_v4) {
fds->v4 = socket(PF_INET, SOCK_STREAM, 0);
if (fds->v4 < 0) {
fprintf(stderr, "socket(PF_INET): %s\n", strerror(errno));
goto done_v4;
}
addr4.sin_port = htons(port);
addrlen = sizeof(addr4);
if (bind(fds->v4, (const struct sockaddr *)&addr4, addrlen) < 0) {
#ifdef AGEDU_IPV6
if (fds->v6 >= 0) {
/*
* If we support both v6 and v4, it's a failure
* condition if we didn't manage to bind to both. If
* the port number was arbitrary, we go round and try
* again. Otherwise, give up.
*/
close(fds->v6);
close(fds->v4);
fds->v6 = fds->v4 = -1;
port = 0;
if (!portstr)
goto retry;
}
#endif
fprintf(stderr, "bind: %s\n", strerror(errno));
close(fds->v4);
fds->v4 = -1;
goto done_v4;
}
if (listen(fds->v4, 5) < 0) {
fprintf(stderr, "listen: %s\n", strerror(errno));
close(fds->v4);
fds->v4 = -1;
goto done_v4;
}
if (port == 0) {
addrlen = sizeof(addr4);
if (getsockname(fds->v4, (struct sockaddr *)&addr4,
&addrlen) < 0) {
fprintf(stderr, "getsockname: %s\n", strerror(errno));
close(fds->v4);
fds->v4 = -1;
goto done_v4;
}
port = ntohs(addr4.sin_port);
}
}
done_v4:
#endif
if (fds->v6 >= 0 || fds->v4 >= 0)
return port;
else
return -1;
}
void run_httpd(const void *t, int authmask, const struct httpd_config *dcfg,
const struct html_config *incfg)
{
struct listenfds lfds;
int port;
int authtype;
char *authstring = NULL;
char *hostname;
const char *openbracket, *closebracket;
struct html_config cfg = *incfg;
/*
* Establish the listening socket(s) and retrieve its port
* number.
*/
port = make_listening_sockets(&lfds, dcfg->address, dcfg->port, &hostname);
if (port < 0)
exit(1); /* already reported an error */
if ((authmask & HTTPD_AUTH_MAGIC) &&
(lfds.v4 < 0 || check_owning_uid(lfds.v4, 1) == getuid()) &&
(lfds.v6 < 0 || check_owning_uid(lfds.v6, 1) == getuid())) {
authtype = HTTPD_AUTH_MAGIC;
if (authmask != HTTPD_AUTH_MAGIC)
printf("Using Linux /proc/net magic authentication\n");
} else if ((authmask & HTTPD_AUTH_BASIC)) {
char username[128], password[128], userpassbuf[259];
const char *userpass;
const char *rname;
unsigned char passbuf[10];
int i, j, k, fd;
authtype = HTTPD_AUTH_BASIC;
if (authmask != HTTPD_AUTH_BASIC)
printf("Using HTTP Basic authentication\n");
if (dcfg->basicauthdata) {
userpass = dcfg->basicauthdata;
} else {
strcpy(username, PNAME);
rname = "/dev/urandom";
fd = open(rname, O_RDONLY);
if (fd < 0) {
int err = errno;
rname = "/dev/random";
fd = open(rname, O_RDONLY);
if (fd < 0) {
int err2 = errno;
fprintf(stderr, "/dev/urandom: open: %s\n", strerror(err));
fprintf(stderr, "/dev/random: open: %s\n", strerror(err2));
exit(1);
}
}
for (i = 0; i < 10 ;) {
j = read(fd, passbuf + i, 10 - i);
if (j <= 0) {
fprintf(stderr, "%s: read: %s\n", rname,
j < 0 ? strerror(errno) : "unexpected EOF");
exit(1);
}
i += j;
}
close(fd);
for (i = 0; i < 16; i++) {
/*
* 32 characters out of the 36 alphanumerics gives
* me the latitude to discard i,l,o for being too
* numeric-looking, and w because it has two too
* many syllables and one too many presidential
* associations.
*/
static const char chars[32] =
"0123456789abcdefghjkmnpqrstuvxyz";
int v = 0;
k = i / 8 * 5;
for (j = 0; j < 5; j++)
v |= ((passbuf[k+j] >> (i%8)) & 1) << j;
password[i] = chars[v];
}
password[i] = '\0';
sprintf(userpassbuf, "%s:%s", username, password);
userpass = userpassbuf;
printf("Username: %s\nPassword: %s\n", username, password);
}
k = strlen(userpass);
authstring = snewn(k * 4 / 3 + 16, char);
for (i = j = 0; i < k ;) {
int s = k-i < 3 ? k-i : 3;
base64_encode_atom((unsigned char *)(userpass+i), s, authstring+j);
i += s;
j += 4;
}
authstring[j] = '\0';
} else if ((authmask & HTTPD_AUTH_NONE)) {
authtype = HTTPD_AUTH_NONE;
if (authmask != HTTPD_AUTH_NONE)
printf("Web server is unauthenticated\n");
} else {
fprintf(stderr, PNAME ": authentication method not supported\n");
exit(1);
}
if (strchr(hostname, ':')) {
/* If the hostname is an IPv6 address literal, enclose it in
* square brackets to prevent misinterpretation of the
* colons. */
openbracket = "[";
closebracket = "]";
} else {
openbracket = closebracket = "";
}
char *url;
if (port == 80) {
url = dupfmt("http://%s%s%s/", openbracket, hostname, closebracket);
} else {
url = dupfmt("http://%s%s%s:%d/", openbracket, hostname, closebracket,
port);
}
printf("URL: %s\n", url);
fflush(stdout);
if (dcfg->url_launch_command) {
pid_t pid = fork();
if (pid < 0) {
fprintf(stderr, "Unable to fork for launch command: %s\n",
strerror(errno));
} else if (pid == 0) {
char *args[5];
args[0] = dupstr("sh");
args[1] = dupstr("-c");
args[2] = dupfmt("%s \"$0\"", dcfg->url_launch_command);
args[3] = dupstr(url);
args[4] = NULL;
execvp("/bin/sh", args);
_exit(127);
} else {
int status;
if (waitpid(pid, &status, 0) < 0) {
fprintf(stderr, "Unable to wait for launch command: %s\n",
strerror(errno));
} else if (WIFSIGNALED(status)) {
int sig = WTERMSIG(status);
fprintf(stderr, "Launch command terminated with signal "
"%d%s%s%s\n", sig,
#if HAVE_STRSIGNAL
" (", strsignal(sig), ")"
#else
"", "", ""
#endif
);
} else {
int exitcode = WEXITSTATUS(status);
if (exitcode) {
fprintf(stderr,
"Launch command terminated with status %d\n",
exitcode);
}
}
}
}
sfree(url);
/*
* Now construct fd structure(s) to hold the listening sockets.
*/
if (lfds.v4 >= 0)
new_fdstruct(lfds.v4, FD_LISTENER);
if (lfds.v6 >= 0)
new_fdstruct(lfds.v6, FD_LISTENER);
if (dcfg->closeoneof) {
/*
* Read from standard input, and treat EOF as a notification
* to exit.
*/
new_fdstruct(0, FD_CLIENT);
}
/*
* Now we're ready to run our main loop. Keep looping round on
* select.
*/
while (1) {
fd_set rfds, wfds;
int i, j;
int maxfd;
int ret;
#define FD_SET_MAX(fd, set, max) \
do { FD_SET((fd),(set)); (max) = ((max)<=(fd)?(fd)+1:(max)); } while(0)
/*
* Loop round the fd list putting fds into our select
* sets. Also in this loop we remove any that were marked
* as deleted in the previous loop.
*/
FD_ZERO(&rfds);
FD_ZERO(&wfds);
maxfd = 0;
for (i = j = 0; j < nfds; j++) {
if (fds[j].deleted) {
sfree(fds[j].wdata);
free_connection(fds[j].cctx);
continue;
}
fds[i] = fds[j];
switch (fds[i].type) {
case FD_CLIENT:
FD_SET_MAX(fds[i].fd, &rfds, maxfd);
break;
case FD_LISTENER:
FD_SET_MAX(fds[i].fd, &rfds, maxfd);
break;
case FD_CONNECTION:
/*
* Always read from a connection socket. Even
* after we've started writing, the peer might
* still be sending (e.g. because we shamefully
* jumped the gun before waiting for the end of
* the HTTP request) and so we should be prepared
* to read data and throw it away.
*/
FD_SET_MAX(fds[i].fd, &rfds, maxfd);
/*
* Also attempt to write, if we have data to write.
*/
if (fds[i].wdatapos < fds[i].wdatalen)
FD_SET_MAX(fds[i].fd, &wfds, maxfd);
break;
}
i++;
}
nfds = i;
ret = select(maxfd, &rfds, &wfds, NULL, NULL);
if (ret <= 0) {
if (ret < 0 && (errno != EINTR)) {
fprintf(stderr, "select: %s", strerror(errno));
exit(1);
}
continue;
}
for (i = 0; i < nfds; i++) {
switch (fds[i].type) {
case FD_CLIENT:
if (FD_ISSET(fds[i].fd, &rfds)) {
char buf[4096];
int ret = read(fds[i].fd, buf, sizeof(buf));
if (ret <= 0) {
if (ret < 0) {
fprintf(stderr, "standard input: read: %s\n",
strerror(errno));
exit(1);
}
return;
}
}
break;
case FD_LISTENER:
if (FD_ISSET(fds[i].fd, &rfds)) {
/*
* New connection has come in. Accept it.
*/
struct fd *f;
struct sockaddr_in addr;
socklen_t addrlen = sizeof(addr);
int newfd = accept(fds[i].fd, (struct sockaddr *)&addr,
&addrlen);
if (newfd < 0)
break; /* not sure what happened there */
f = new_fdstruct(newfd, FD_CONNECTION);
f->cctx = new_connection(t);
if (authtype == HTTPD_AUTH_MAGIC)
check_magic_access(f);
}
break;
case FD_CONNECTION:
if (FD_ISSET(fds[i].fd, &rfds)) {
/*
* There's data to be read.
*/
char readbuf[4096];
int ret;
ret = read(fds[i].fd, readbuf, sizeof(readbuf));
if (ret <= 0) {
/*
* This shouldn't happen in a sensible
* HTTP connection, so we abandon the
* connection if it does.
*/
close(fds[i].fd);
fds[i].deleted = true;
break;
} else {
if (!fds[i].wdata) {
/*
* If we haven't got an HTTP response
* yet, keep processing data in the
* hope of acquiring one.
*/
fds[i].wdata = got_data
(fds[i].cctx, readbuf, ret,
(authtype == HTTPD_AUTH_NONE ||
fds[i].magic_access), authstring, &cfg);
if (fds[i].wdata) {
fds[i].wdatalen = strlen(fds[i].wdata);
fds[i].wdatapos = 0;
}
} else {
/*
* Otherwise, just drop our read data
* on the floor.
*/
}
}
}
if (FD_ISSET(fds[i].fd, &wfds) &&
fds[i].wdatapos < fds[i].wdatalen) {
/*
* The socket is writable, and we have data to
* write. Write it.
*/
int ret = write(fds[i].fd, fds[i].wdata + fds[i].wdatapos,
fds[i].wdatalen - fds[i].wdatapos);
if (ret <= 0) {
/*
* Shouldn't happen; abandon the connection.
*/
close(fds[i].fd);
fds[i].deleted = true;
break;
} else {
fds[i].wdatapos += ret;
if (fds[i].wdatapos == fds[i].wdatalen) {
shutdown(fds[i].fd, SHUT_WR);
}
}
}
break;
}
}
}
}
agedu-20211129.8cd63c5/html.h 0000644 0001750 0001750 00000014040 14151034324 014265 0 ustar simon simon /*
* html.h: HTML output format for agedu.
*/
struct html_config {
/*
* Configure the format of the URI pathname fragment corresponding
* to a given tree entry.
*
* 'uriformat' is expected to have the following format:
* - it consists of one or more _options_, each indicating a
* particular way to format a URI, separated by '%|'
* - each option contains _at most one_ formatting directive;
* without any, it is assumed to only be able to encode the
* root tree entry
* - the formatting directive may be followed before and/or
* afterwards with literal text; percent signs in that literal
* text are specified as %% (which doesn't count as a
* formatting directive for the 'at most one' rule)
* - formatting directives are as follows:
* + '%n' outputs the numeric index (in decimal) of the tree
* entry
* + '%p' outputs the pathname of the tree entry, not counting
* any common prefix of the whole tree or a subdirectory
* separator following that (so that the root directory of
* the tree will always be rendered as the empty string).
* The subdirectory separator is translated into '/'; any
* remotely worrying character is escaped as = followed by
* two hex digits (including, in particular, = itself). The
* only characters not escaped are the ASCII alphabets and
* numbers, the subdirectory separator as mentioned above,
* and the four punctuation characters -.@_ (with the
* exception that at the very start of a pathname, even '.'
* is escaped).
* - '%/p' outputs the pathname of the tree entry, but this time
* the subdirectory separator is also considered to be a
* worrying character and is escaped.
* - '%-p' and '%-/p' are like '%p' and '%/p' respectively,
* except that they use the full pathname stored in the tree
* without stripping a common prefix.
*
* These formats are used both for generating and parsing URI
* fragments. When generating, the first valid option is used
* (which is always the very first one if we're generating the
* root URI, or else it's the first option with any formatting
* directive); when parsing, the first option that matches will be
* accepted. (Thus, you can have '.../subdir' and '.../subdir/'
* both accepted, but make the latter canonical; clients of this
* mechanism will typically regenerate a URI string after parsing
* an index out of it, and return an HTTP redirect if it isn't in
* canonical form.)
*
* All hyperlinks should be correctly generated as relative (i.e.
* with the right number of ../ and ./ considering both the
* pathname for the page currently being generated, and the one
* for the link target).
*
* If 'uriformat' is NULL, the HTML is generated without hyperlinks.
*/
const char *uriformat;
/*
* Configure the filenames output by html_dump(). These can be
* configured separately from the URI formats, so that the root
* file can be called index.html on disk but have a notional URI
* of just / or similar.
*
* Formatting directives are the same as the uriformat above.
*/
const char *fileformat;
/*
* Time stamps to assign to the extreme ends of the colour
* scale. If "autoage" is true, they are ignored and the time
* stamps are derived from the limits of the age data stored
* in the index.
*/
bool autoage;
time_t oldest, newest;
/*
* Specify whether to show individual files as well as
* directories in the report.
*/
bool showfiles;
/*
* The string appearing in the = 0) htprintf(ctx, " title=\"%s\"", ctx->titletexts[colour]); htprintf(ctx, "> | \n"); } } static void end_colour_bar(struct html *ctx) { htprintf(ctx, "
");
htescape(ctx, vec->name, strlen(vec->name), 1);
if (vec->literal)
htprintf(ctx, "
");
if (doing_href)
htprintf(ctx, "");
}
htprintf(ctx, "\n");
q = path;
for (p = strchr(path, pathsep); p && p[1]; p = strchr(p, pathsep)) {
int doing_href = 0;
char c, *zp;
/*
* See if this path prefix exists in the trie. If so,
* generate a hyperlink.
*/
zp = p;
if (p == path) /* special case for "/" at start */
zp++;
p++;
c = *zp;
*zp = '\0';
index2 = trie_before(t, path);
trie_getpath(t, index2, path2);
if (!strcmptrailingpathsep(path, path2) && cfg->uriformat) {
char *targeturi = format_string(cfg->uriformat, index2, t);
char *link = make_href(ctx->oururi, targeturi);
htprintf(ctx, "", link);
sfree(link);
sfree(targeturi);
doing_href = 1;
}
*zp = c;
htescape(ctx, q, zp - q, 1);
if (doing_href)
htprintf(ctx, "");
htescape(ctx, zp, p - zp, 1);
q = p;
}
htescape(ctx, q, strlen(q), 1);
htprintf(ctx, "
\n");
/*
* Decide on the age limit of our colour coding, establish the
* colour thresholds, and write out a key.
*/
ctx->now = time(NULL);
if (cfg->autoage) {
ctx->oldest = index_order_stat(t, 0.05);
ctx->newest = index_order_stat(t, 1.0);
ctx->oldest = round_and_format_age(ctx, ctx->oldest, agebuf1, -1);
ctx->newest = round_and_format_age(ctx, ctx->newest, agebuf2, +1);
} else {
ctx->oldest = cfg->oldest;
ctx->newest = cfg->newest;
ctx->oldest = round_and_format_age(ctx, ctx->oldest, agebuf1, 0);
ctx->newest = round_and_format_age(ctx, ctx->newest, agebuf2, 0);
}
for (i = 0; i < MAXCOLOUR; i++) {
ctx->thresholds[i] =
ctx->oldest + (ctx->newest - ctx->oldest) * i / (MAXCOLOUR-1);
}
for (i = 0; i <= MAXCOLOUR; i++) {
char buf[80];
if (i == 0) {
strcpy(buf, "> ");
round_and_format_age(ctx, ctx->thresholds[0], buf+5, 0);
} else if (i == MAXCOLOUR) {
strcpy(buf, "< ");
round_and_format_age(ctx, ctx->thresholds[MAXCOLOUR-1], buf+5, 0);
} else {
unsigned long long midrange =
(ctx->thresholds[i-1] + ctx->thresholds[i]) / 2;
round_and_format_age(ctx, midrange, buf, 0);
}
ctx->titletexts[i] = dupstr(buf);
}
htprintf(ctx, "
Key to colour coding (mouse over for more detail):\n"); htprintf(ctx, "
"); begin_colour_bar(ctx); htprintf(ctx, "
\n
agedu
suffered an internal error."
"\n");
return 0;
}
return 1;
}
if (fstat(fd, &st) < 0) {
fprintf(stderr, "%s: %s: fstat: %s\n", PNAME, filename,
strerror(errno));
if (!querydir) {
printf("Status: 500\nContent-type: text/html\n\n"
"
agedu
suffered an internal error."
"\n");
return 0;
}
return 1;
}
totalsize = st.st_size;
mappedfile = mmap(NULL, totalsize, PROT_READ, MAP_SHARED, fd, 0);
if (mappedfile == MAP_FAILED) {
fprintf(stderr, "%s: %s: mmap: %s\n", PNAME, filename,
strerror(errno));
if (!querydir) {
printf("Status: 500\nContent-type: text/html\n\n"
"
agedu
suffered an internal error."
"\n");
return 0;
}
return 1;
}
if (!trie_check_magic(mappedfile)) {
fprintf(stderr, "%s: %s: magic numbers did not match\n"
"%s: check that the index was built by this version of agedu on this platform\n", PNAME, filename, PNAME);
if (!querydir) {
printf("Status: 500\nContent-type: text/html\n\n"
"
agedu
suffered an internal error."
"\n");
return 0;
}
return 1;
}
pathsep = trie_pathsep(mappedfile);
maxpathlen = trie_maxpathlen(mappedfile);
pathbuf = snewn(maxpathlen, char);
if (!querydir || !gotdepth) {
/*
* Single output file.
*/
if (!querydir) {
cfg.uriformat = "/%|/%p/%|%|/%p";
} else {
cfg.uriformat = NULL;
}
cfg.autoage = htmlautoagerange;
cfg.oldest = htmloldest;
cfg.newest = htmlnewest;
cfg.showfiles = showfiles;
} else {
if (!numeric) {
cfg.uriformat = "/index.html%|/%/p.html";
cfg.fileformat = "/index.html%|/%/p.html";
} else {
cfg.uriformat = "/index.html%|/%n.html";
cfg.fileformat = "/index.html%|/%n.html";
}
cfg.autoage = htmlautoagerange;
cfg.oldest = htmloldest;
cfg.newest = htmlnewest;
cfg.showfiles = showfiles;
}
cfg.html_title = html_title;
if (!querydir) {
/*
* If we're run in --cgi mode, read PATH_INFO to get
* a numeric pathname index.
*/
char *path_info = getenv("PATH_INFO");
if (!path_info)
path_info = "";
/*
* Parse the path.
*/
if (!html_parse_path(mappedfile, path_info, &cfg, &xi)) {
printf("Status: 404\nContent-type: text/html\n\n"
"
Invalid agedu
pathname."
"\n");
return 0;
}
/*
* If the path was parseable but not canonically
* expressed, return a redirect to the canonical
* version.
*/
char *canonpath = html_format_path(mappedfile, &cfg, xi);
if (strcmp(canonpath, path_info)) {
char *servername = getenv("SERVER_NAME");
char *scriptname = getenv("SCRIPT_NAME");
if (!servername || !scriptname) {
if (servername)
fprintf(stderr, "%s: SCRIPT_NAME unset\n", PNAME);
else if (scriptname)
fprintf(stderr, "%s: SCRIPT_NAME unset\n", PNAME);
else
fprintf(stderr, "%s: SERVER_NAME and "
"SCRIPT_NAME both unset\n", PNAME);
printf("Status: 500\nContent-type: text/html\n\n"
"
agedu
suffered an internal "
"error."
"\n");
return 0;
}
printf("Status: 301\n"
"Location: http://%s/%s%s\n"
"Content-type: text/html\n\n"
"
Moved." "\n", servername, scriptname, canonpath); return 0; } } else { /* * In ordinary --html mode, process a query * directory passed in on the command line. */ /* * Trim trailing slash, just in case. * * (Note that we do this if pathlen > 1, not if * pathlen > 0. That is, the one case of a trailing * slash that we leave intact is the case where it's * the whole string because the query directory is * just "/".) */ pathlen = strlen(querydir); if (pathlen > 1 && querydir[pathlen-1] == pathsep) querydir[--pathlen] = '\0'; xi = trie_before(mappedfile, querydir); if (xi >= trie_count(mappedfile) || (trie_getpath(mappedfile, xi, pathbuf), strcmp(pathbuf, querydir))) { fprintf(stderr, "%s: pathname '%s' does not exist in index\n" "%*s(check it is spelled exactly as it is in the " "index, including\n%*sany leading './')\n", PNAME, querydir, (int)(1+sizeof(PNAME)), "", (int)(1+sizeof(PNAME)), ""); return 1; } else if (!index_has_root(mappedfile, xi)) { fprintf(stderr, "%s: pathname '%s' is" " a file, not a directory\n", PNAME, querydir); return 1; } } if (!querydir || !gotdepth) { /* * Single output file. */ html = html_query(mappedfile, xi, &cfg, true); if (querydir && outfile != NULL) { FILE *fp = fopen(outfile, "w"); if (!fp) { fprintf(stderr, "%s: %s: open: %s\n", PNAME, outfile, strerror(errno)); return 1; } else if (fputs(html, fp) < 0) { fprintf(stderr, "%s: %s: write: %s\n", PNAME, outfile, strerror(errno)); fclose(fp); return 1; } else if (fclose(fp) < 0) { fprintf(stderr, "%s: %s: fclose: %s\n", PNAME, outfile, strerror(errno)); return 1; } } else { if (!querydir) { printf("Content-type: text/html\n\n"); } fputs(html, stdout); } } else { /* * Multiple output files. */ int dirlen = outfile ? 2+strlen(outfile) : 3; char prefix[dirlen]; if (outfile) { if (mkdir(outfile, 0777) < 0 && errno != EEXIST) { fprintf(stderr, "%s: %s: mkdir: %s\n", PNAME, outfile, strerror(errno)); return 1; } snprintf(prefix, dirlen, "%s/", outfile); } else snprintf(prefix, dirlen, "./"); unsigned long xi2; /* * pathbuf is only set up in the plain-HTML case and * not in the CGI case; but that's OK, because the * CGI case can't come to this branch of the if * anyway. */ make_successor(pathbuf); xi2 = trie_before(mappedfile, pathbuf); if (html_dump(mappedfile, xi, xi2, depth, &cfg, prefix)) return 1; } munmap(mappedfile, totalsize); sfree(pathbuf); } else if (mode == DUMP) { size_t maxpathlen; char *buf; fd = open(filename, O_RDONLY); if (fd < 0) { fprintf(stderr, "%s: %s: open: %s\n", PNAME, filename, strerror(errno)); return 1; } if (fstat(fd, &st) < 0) { perror(PNAME ": fstat"); return 1; } totalsize = st.st_size; mappedfile = mmap(NULL, totalsize, PROT_READ, MAP_SHARED, fd, 0); if (mappedfile == MAP_FAILED) { perror(PNAME ": mmap"); return 1; } if (!trie_check_magic(mappedfile)) { fprintf(stderr, "%s: %s: magic numbers did not match\n" "%s: check that the index was built by this version of agedu on this platform\n", PNAME, filename, PNAME); return 1; } pathsep = trie_pathsep(mappedfile); maxpathlen = trie_maxpathlen(mappedfile); buf = snewn(maxpathlen, char); writestate.fp = stdout; writestate.sortable = false; writestate.pathsep = pathsep; if (!dump_write_header(&writestate)) fatal("standard output: %s", strerror(errno)); tw = triewalk_new(mappedfile); while ((tf = triewalk_next(tw, buf)) != NULL) dump_line(buf, tf); triewalk_free(tw); munmap(mappedfile, totalsize); } else if (mode == HTTPD) { struct html_config pcfg; struct httpd_config dcfg; fd = open(filename, O_RDONLY); if (fd < 0) { fprintf(stderr, "%s: %s: open: %s\n", PNAME, filename, strerror(errno)); return 1; } if (fstat(fd, &st) < 0) { perror(PNAME ": fstat"); return 1; } totalsize = st.st_size; mappedfile = mmap(NULL, totalsize, PROT_READ, MAP_SHARED, fd, 0); if (mappedfile == MAP_FAILED) { perror(PNAME ": mmap"); return 1; } if (!trie_check_magic(mappedfile)) { fprintf(stderr, "%s: %s: magic numbers did not match\n" "%s: check that the index was built by this version of agedu on this platform\n", PNAME, filename, PNAME); return 1; } pathsep = trie_pathsep(mappedfile); dcfg.address = httpserveraddr; dcfg.port = httpserverport; dcfg.closeoneof = closeoneof; dcfg.basicauthdata = httpauthdata; dcfg.url_launch_command = url_launch_command; pcfg.uriformat = "/%|/%p/%|%|/%p"; pcfg.autoage = htmlautoagerange; pcfg.oldest = htmloldest; pcfg.newest = htmlnewest; pcfg.showfiles = showfiles; pcfg.html_title = html_title; run_httpd(mappedfile, auth, &dcfg, &pcfg); munmap(mappedfile, totalsize); } else if (mode == REMOVE) { if (remove(filename) < 0) { fprintf(stderr, "%s: %s: remove: %s\n", PNAME, filename, strerror(errno)); return 1; } } else if (mode == PRESORT || mode == POSTSORT) { dumpfile_load_state *dls; dumpfile_write_state dws; dls = dumpfile_load_init(stdin, false); if (!dls) return 1; dws.fp = stdout; dws.pathsep = dumpfile_load_get_pathsep(dls); dws.sortable = (mode == PRESORT); if (!dump_write_header(&dws)) fatal("standard output: %s", strerror(errno)); dumpfile_record dr; int retd; while ((retd = dumpfile_load_record(dls, &dr)) != 0) { if (retd < 0) return 1; if (!dump_write_record(&dws, &dr)) fatal("standard output: %s", strerror(errno)); } dumpfile_load_finish(dls); } } return 0; } agedu-20211129.8cd63c5/agedu.but 0000644 0001750 0001750 00000111004 14151034324 014747 0 ustar simon simon \cfg{man-identity}{agedu}{1}{2008-11-02}{Simon Tatham}{Simon Tatham} \define{dash} \u2013{-} \title Man page for \cw{agedu} \U NAME \cw{agedu} \dash correlate disk usage with last-access times to identify large and disused data \U SYNOPSIS \c agedu [ options ] action [action...] \e bbbbb iiiiiii iiiiii iiiiii \U DESCRIPTION \cw{agedu} scans a directory tree and produces reports about how much disk space is used in each directory and subdirectory, and also how that usage of disk space corresponds to files with last-access times a long time ago. In other words, \cw{agedu} is a tool you might use to help you free up disk space. It lets you see which directories are taking up the most space, as \cw{du} does; but unlike \cw{du}, it also distinguishes between large collections of data which are still in use and ones which have not been accessed in months or years \dash for instance, large archives downloaded, unpacked, used once, and never cleaned up. Where \cw{du} helps you find what's using your disk space, \cw{agedu} helps you find what's \e{wasting} your disk space. \cw{agedu} has several operating modes. In one mode, it scans your disk and builds an index file containing a data structure which allows it to efficiently retrieve any information it might need. Typically, you would use it in this mode first, and then run it in one of a number of \q{query} modes to display a report of the disk space usage of a particular directory and its subdirectories. Those reports can be produced as plain text (much like \cw{du}) or as HTML. \cw{agedu} can even run as a miniature web server, presenting each directory's HTML report with hyperlinks to let you navigate around the file system to similar reports for other directories. So you would typically start using \cw{agedu} by telling it to do a scan of a directory tree and build an index. This is done with a command such as \c $ agedu -s /home/fred \e bbbbbbbbbbbbbbbbbbb which will build a large data file called \c{agedu.dat} in your current directory. (If that current directory is \e{inside} \cw{/home/fred}, don't worry \dash \cw{agedu} is smart enough to discount its own index file.) Having built the index, you would now query it for reports of disk space usage. If you have a graphical web browser, the simplest and nicest way to query the index is by running \cw{agedu} in web server mode: \c $ agedu -w \e bbbbbbbb which will print (among other messages) a URL on its standard output along the lines of \c URL: http://127.0.0.1:48638/ (That URL will always begin with \cq{127.}, meaning that it's in the \cw{localhost} address space. So only processes running on the same computer can even try to connect to that web server, and also there is access control to prevent other users from seeing it \dash see below for more detail.) Now paste that URL into your web browser, and you will be shown a graphical representation of the disk usage in \cw{/home/fred} and its immediate subdirectories, with varying colours used to show the difference between disused and recently-accessed data. Click on any subdirectory to descend into it and see a report for its subdirectories in turn; click on parts of the pathname at the top of any page to return to higher-level directories. When you've finished browsing, you can just press Ctrl-D to send an end-of-file indication to \cw{agedu}, and it will shut down. After that, you probably want to delete the data file \cw{agedu.dat}, since it's pretty large. In fact, the command \cw{agedu -R} will do this for you; and you can chain \cw{agedu} commands on the same command line, so that instead of the above you could have done \c $ agedu -s /home/fred -w -R \e bbbbbbbbbbbbbbbbbbbbbbbbb for a single self-contained run of \cw{agedu} which builds its index, serves web pages from it, and cleans it up when finished. In some situations, you might want to scan the directory structure of one computer, but run \cw{agedu}'s user interface on another. In that case, you can do your scan using the \cw{agedu -S} option in place of \cw{agedu -s}, which will make \cw{agedu} not bother building an index file but instead just write out its scan results in plain text on standard output; then you can funnel that output to the other machine using SSH (or whatever other technique you prefer), and there, run \cw{agedu -L} to load in the textual dump and turn it into an index file. For example, you might run a command like this (plus any \cw{ssh} options you need) on the machine you want to scan: \c $ agedu -S /home/fred | ssh indexing-machine agedu -L \e bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb or, equivalently, run something like this on the other machine: \c $ ssh machine-to-scan agedu -S /home/fred | agedu -L \e bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb Either way, the \cw{agedu -L} command will create an \cw{agedu.dat} index file, which you can then use with \cw{agedu -w} just as above. (Another way to do this might be to build the index file on the first machine as normal, and then just copy it to the other machine once it's complete. However, for efficiency, the index file is formatted differently depending on the CPU architecture that \cw{agedu} is compiled for. So if that doesn't match between the two machines \dash e.g. if one is a 32-bit machine and one 64-bit \dash then \cw{agedu.dat} files written on one machine will not work on the other. The technique described above using \cw{-S} and \cw{-L} should work between any two machines.) If you don't have a graphical web browser, you can do text-based queries instead of using \cw{agedu}'s web interface. Having scanned \cw{/home/fred} in any of the ways suggested above, you might run \c $ agedu -t /home/fred \e bbbbbbbbbbbbbbbbbbb which again gives a summary of the disk usage in \cw{/home/fred} and its immediate subdirectories; but this time \cw{agedu} will print it on standard output, in much the same format as \cw{du}. If you then want to find out how much \e{old} data is there, you can add the \cw{-a} option to show only files last accessed a certain length of time ago. For example, to show only files which haven't been looked at in six months or more: \c $ agedu -t /home/fred -a 6m \e bbbbbbbbbbbbbbbbbbbbbbbbb That's the essence of what \cw{agedu} does. It has other modes of operation for more complex situations, and the usual array of configurable options. The following sections contain a complete reference for all its functionality. \U OPERATING MODES This section describes the operating modes supported by \cw{agedu}. Each of these is in the form of a command-line option, sometimes with an argument. Multiple operating-mode options may appear on the command line, in which case \cw{agedu} will perform the specified actions one after another. For instance, as shown in the previous section, you might want to perform a disk scan and immediately launch a web server giving reports from that scan. \dt \cw{-s} \e{directory} or \cw{--scan} \e{directory} \dd In this mode, \cw{agedu} scans the file system starting at the specified directory, and indexes the results of the scan into a large data file which other operating modes can query. \lcont{ By default, the scan is restricted to a single file system (since the expected use of \cw{agedu} is that you would probably use it because a particular disk partition was running low on space). You can remove that restriction using the \cw{--cross-fs} option; other configuration options allow you to include or exclude files or entire subdirectories from the scan. See the next section for full details of the configurable options. The index file is created with restrictive permissions, in case the file system you are scanning contains confidential information in its structure. Index files are dependent on the characteristics of the CPU architecture you created them on. You should not expect to be able to move an index file between different types of computer and have it continue to work. If you need to transfer the results of a disk scan to a different kind of computer, see the \cw{-D} and \cw{-L} options below. } \dt \cw{-w} or \cw{--web} \dd In this mode, \cw{agedu} expects to find an index file already written. It allocates a network port, and starts up a web server on that port which serves reports generated from the index file. By default it invents its own URL and prints it out. \lcont{ The web server runs until \cw{agedu} receives an end-of-file event on its standard input. (The expected usage is that you run it from the command line, immediately browse web pages until you're satisfied, and then press Ctrl-D.) To disable the EOF behaviour, use the \cw{--no-eof} option. In case the index file contains any confidential information about your file system, the web server protects the pages it serves from access by other people. On Linux, this is done transparently by means of using \cw{/proc/net/tcp} to check the owner of each incoming connection; failing that, the web server will require a password to view the reports, and \cw{agedu} will print the password it invented on standard output along with the URL. Configurable options for this mode let you specify your own address and port number to listen on, and also specify your own choice of authentication method (including turning authentication off completely) and a username and password of your choice. } \dt \cw{-t} \e{directory} or \cw{--text} \e{directory} \dd In this mode, \cw{agedu} generates a textual report on standard output, listing the disk usage in the specified directory and all its subdirectories down to a given depth. By default that depth is 1, so that you see a report for \e{directory} itself and all of its immediate subdirectories. You can configure a different depth (or no depth limit) using \cw{-d}, described in the next section. \lcont{ Used on its own, \cw{-t} merely lists the \e{total} disk usage in each subdirectory; \cw{agedu}'s additional ability to distinguish unused from recently-used data is not activated. To activate it, use the \cw{-a} option to specify a minimum age. The directory structure stored in \cw{agedu}'s index file is treated as a set of literal strings. This means that you cannot refer to directories by synonyms. So if you ran \cw{agedu -s .}, then all the path names you later pass to the \cw{-t} option must be either \cq{.} or begin with \cq{./}. Similarly, symbolic links within the directory you scanned will not be followed; you must refer to each directory by its canonical, symlink-free pathname. } \dt \cw{-R} or \cw{--remove} \dd In this mode, \cw{agedu} deletes its index file. Running just \cw{agedu -R} on its own is therefore equivalent to typing \cw{rm agedu.dat}. However, you can also put \cw{-R} on the end of a command line to indicate that \cw{agedu} should delete its index file after it finishes performing other operations. \dt \cw{-S} \e{directory} or \cw{--scan-dump} \e{directory} \dd In this mode, \cw{agedu} will scan a directory tree and convert the results straight into a textual dump on standard output, without generating an index file at all. The dump data is intended for \cw{agedu -L} to read. \dt \cw{-L} or \cw{--load} \dd In this mode, \cw{agedu} expects to read a dump produced by the \cw{-S} option from its standard input. It constructs an index file from that dump, exactly as it would have if it had read the same data from a disk scan in \cw{-s} mode. \dt \cw{-D} or \cw{--dump} \dd In this mode, \cw{agedu} reads an existing index file and produces a dump of its contents on standard output, in the same format used by \cw{-S} and \cw{-L}. This option could be used to convert an existing index file into a format acceptable to a different kind of computer, by dumping it using \cw{-D} and then loading the dump back in on the other machine using \cw{-L}. \lcont{ (The output of \cw{agedu -D} on an existing index file will not be exactly \e{identical} to what \cw{agedu -S} would have originally produced, due to a difference in treatment of last-access times on directories. However, it should be effectively equivalent for most purposes. See the documentation of the \cw{--dir-atime} option in the next section for further detail.) } \dt \cw{-H} \e{directory} or \cw{--html} \e{directory} \dd In this mode, \cw{agedu} will generate an HTML report of the disk usage in the specified directory and its immediate subdirectories, in the same form that it serves from its web server in \cw{-w} mode. \lcont{ By default, a single HTML report will be generated and simply written to standard output, with no hyperlinks pointing to other similar pages. If you also specify the \cw{-d} option (see below), \cw{agedu} will instead write out a collection of HTML files with hyperlinks between them, and call the top-level file \cw{index.html}. } \dt \cw{--cgi} \dd In this mode, \cw{agedu} will run as the bulk of a CGI script which provides the same set of web pages as the built-in web server would. It will read the usual CGI environment variables, and write CGI-style data to its standard output. \lcont{ The actual CGI program itself should be a tiny wrapper around \cw{agedu} which passes it the \cw{--cgi} option, and also (probably) \cw{-f} to locate the index file. \cw{agedu} will do everything else. For example, your script might read \c #!/bin/sh \c /some/path/to/agedu --cgi -f /some/other/path/to/agedu.dat \e iiiiiiiiiiiiii iiiiiiiiiiiiiiiiiiii (Note that \cw{agedu} will produce the \e{entire} CGI output, including status code, HTTP headers and the full HTML document. If you try to surround the call to \cw{agedu --cgi} with code that adds your own HTML header and footer, you won't get the results you want, and \cw{agedu}'s HTTP-level features such as auto-redirecting to canonical versions of URIs will stop working.) No access control is performed in this mode: restricting access to CGI scripts is assumed to be the job of the web server. } \dt \cw{--presort} and \cw{--postsort} \dd In these two modes, \cw{agedu} will expect to read a textual data dump from its standard input of the form produced by \cw{-S} (and \c{-D}). It will transform the data into a different version of its text dump format, and write the transformed version on standard output. \lcont{ The ordinary dump file format is reasonably readable, but loading it into an index file using \cw{agedu -L} requires it to be sorted in a specific order, which is complicated to describe and difficult to implement using ordinary Unix sorting tools. So if you want to construct your own data dump from a source of your own that \cw{agedu} itself doesn't know how to scan, you will need to make sure it's sorted in the right order. To help with this, \cw{agedu} provides a secondary dump format which is \q{sortable}, in the sense that ordinary \cw{sort}(\e{1}) without arguments will arrange it into the right order. However, the sortable format is much more unreadable and also twice the size, so you wouldn't want to write it directly! So the recommended procedure is to generate dump data in the ordinary format; then pipe it through \cw{agedu --presort} to turn it into the sortable format; then sort it; \e{then} pipe it into \cw{agedu -L} (which can accept either the normal or the sortable format as input). For example: \c generate_custom_data.sh | agedu --presort | sort | agedu -L \e iiiiiiiiiiiiiiiiiiiiiii If you need to transform the sorted dump file back into the ordinary format, \cw{agedu --postsort} can do that. But since \cw{agedu -L} can accept either format as input, you may not need to. } \dt \cw{-h} or \cw{--help} \dd Causes \cw{agedu} to print some help text and terminate immediately. \dt \cw{-V} or \cw{--version} \dd Causes \cw{agedu} to print its version number and terminate immediately. \U OPTIONS This section describes the various configuration options that affect \cw{agedu}'s operation in one mode or another. The following option affects nearly all modes (except \cw{-S}): \dt \cw{-f} \e{filename} or \cw{--file} \e{filename} \dd Specifies the location of the index file which \cw{agedu} creates, reads or removes depending on its operating mode. By default, this is simply \cq{agedu.dat}, in whatever is the current working directory when you run \cw{agedu}. The following options affect the disk-scanning modes, \cw{-s} and \cw{-S}: \dt \cw{--cross-fs} and \cw{--no-cross-fs} \dd These configure whether or not the disk scan is permitted to cross between different file systems. The default is not to: \cw{agedu} will normally skip over subdirectories on which a different file system is mounted. This makes it convenient when you want to free up space on a particular file system which is running low. However, in other circumstances you might wish to see general information about the use of space no matter which file system it's on (for instance, if your real concern is your backup media running out of space, and if your backups do not treat different file systems specially); in that situation, use \cw{--cross-fs}. \lcont{ (Note that this default is the opposite way round from the corresponding option in \cw{du}.) } \dt \cw{--prune} \e{wildcard} and \cw{--prune-path} \e{wildcard} \dd These cause particular files or directories to be omitted entirely from the scan. If \cw{agedu}'s scan encounters a file or directory whose name matches the wildcard provided to the \cw{--prune} option, it will not include that file in its index, and also if it's a directory it will skip over it and not scan its contents. \lcont{ Note that in most Unix shells, wildcards will probably need to be escaped on the command line, to prevent the shell from expanding the wildcard before \cw{agedu} sees it. \cw{--prune-path} is similar to \cw{--prune}, except that the wildcard is matched against the entire pathname instead of just the filename at the end of it. So whereas \cw{--prune *a*b*} will match any file whose actual name contains an \cw{a} somewhere before a \cw{b}, \cw{--prune-path *a*b*} will also match a file whose name contains \cw{b} and which is inside a directory containing an \cw{a}, or any file inside a directory of that form, and so on. } \dt \cw{--exclude} \e{wildcard} and \cw{--exclude-path} \e{wildcard} \dd These cause particular files or directories to be omitted from the index, but not from the scan. If \cw{agedu}'s scan encounters a file or directory whose name matches the wildcard provided to the \cw{--exclude} option, it will not include that file in its index \dash but unlike \cw{--prune}, if the file in question is a directory it will still scan its contents and index them if they are not ruled out themselves by \cw{--exclude} options. \lcont{ As above, \cw{--exclude-path} is similar to \cw{--exclude}, except that the wildcard is matched against the entire pathname. } \dt \cw{--include} \e{wildcard} and \cw{--include-path} \e{wildcard} \dd These cause particular files or directories to be re-included in the index and the scan, if they had previously been ruled out by one of the above exclude or prune options. You can interleave include, exclude and prune options as you wish on the command line, and if more than one of them applies to a file then the last one takes priority. \lcont{ For example, if you wanted to see only the disk space taken up by MP3 files, you might run \c $ agedu -s . --exclude '*' --include '*.mp3' \e bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb which will cause everything to be omitted from the scan, but then the MP3 files to be put back in. If you then wanted only a subset of those MP3s, you could then exclude some of them again by adding, say, \cq{--exclude-path './queen/*'} (or, more efficiently, \cq{--prune ./queen}) on the end of that command. As with the previous two options, \cw{--include-path} is similar to \cw{--include} except that the wildcard is matched against the entire pathname. } \dt \cw{--progress}, \cw{--no-progress} and \cw{--tty-progress} \dd When \cw{agedu} is scanning a directory tree, it will typically print a one-line progress report every second showing where it has reached in the scan, so you can have some idea of how much longer it will take. (Of course, it can't predict \e{exactly} how long it will take, since it doesn't know which of the directories it hasn't scanned yet will turn out to be huge.) \lcont{ By default, those progress reports are displayed on \cw{agedu}'s standard error channel, if that channel points to a terminal device. If you need to manually enable or disable them, you can use the above three options to do so: \cw{--progress} unconditionally enables the progress reports, \cw{--no-progress} unconditionally disables them, and \cw{--tty-progress} reverts to the default behaviour which is conditional on standard error being a terminal. } \dt \cw{--dir-atime} and \cw{--no-dir-atime} \dd In normal operation, \cw{agedu} ignores the atimes (last access times) on the \e{directories} it scans: it only pays attention to the atimes of the \e{files} inside those directories. This is because directory atimes tend to be reset by a lot of system administrative tasks, such as \cw{cron} jobs which scan the file system for one reason or another \dash or even other invocations of \cw{agedu} itself, though it tries to avoid modifying any atimes if possible. So the literal atimes on directories are typically not representative of how long ago the data in question was last accessed with real intent to use that data in particular. \lcont{ Instead, \cw{agedu} makes up a fake atime for every directory it scans, which is equal to the newest atime of any file in or below that directory (or the directory's last \e{modification} time, whichever is newest). This is based on the assumption that all \e{important} accesses to directories are actually accesses to the files inside those directories, so that when any file is accessed all the directories on the path leading to it should be considered to have been accessed as well. In unusual cases it is possible that a directory itself might embody important data which is accessed by reading the directory. In that situation, \cw{agedu}'s atime-faking policy will misreport the directory as disused. In the unlikely event that such directories form a significant part of your disk space usage, you might want to turn off the faking. The \cw{--dir-atime} option does this: it causes the disk scan to read the original atimes of the directories it scans. The faking of atimes on directories also requires a processing pass over the index file after the main disk scan is complete. \cw{--dir-atime} also turns this pass off. Hence, this option affects the \cw{-L} option as well as \cw{-s} and \cw{-S}. (The previous section mentioned that there might be subtle differences between the output of \cw{agedu -s /path -D} and \cw{agedu -S /path}. This is why. Doing a scan with \cw{-s} and then dumping it with \cw{-D} will dump the fully faked atimes on the directories, whereas doing a scan-to-dump with \cw{-S} will dump only \e{partially} faked atimes \dash specifically, each directory's last modification time \dash since the subsequent processing pass will not have had a chance to take place. However, loading either of the resulting dump files with \cw{-L} will perform the atime-faking processing pass, leading to the same data in the index file in each case. In normal usage it should be safe to ignore all of this complexity.) } \dt \cw{--mtime} \dd This option causes \cw{agedu} to index files by their last modification time instead of their last access time. You might want to use this if your last access times were completely useless for some reason: for example, if you had recently searched every file on your system, the system would have lost all the information about what files you hadn't recently accessed before then. Using this option is liable to be less effective at finding genuinely wasted space than the normal mode (that is, it will be more likely to flag things as disused when they're not, so you will have more candidates to go through by hand looking for data you don't need), but may be better than nothing if your last-access times are unhelpful. \lcont{ Another use for this mode might be to find \e{recently created} large data. If your disk has been gradually filling up for years, the default mode of \cw{agedu} will let you find unused data to delete; but if you know your disk had plenty of space recently and now it's suddenly full, and you suspect that some rogue program has left a large core dump or output file, then \cw{agedu --mtime} might be a convenient way to locate the culprit. } \dt \cw{--logicalsize} \dd This option causes \cw{agedu} to consider the size of each file to be its \q{logical} size, rather than the amount of space it consumes on disk. (That is, it will use \c{st_size} instead of \c{st_blocks} in the data returned from \cw{stat}(\e{2}).) This option makes \cw{agedu} less accurate at reporting how much of your disk is used, but it might be useful in specialist cases, such as working around a file system that is misreporting physical sizes. \lcont{ For most files, the physical size of a file will be larger than the logical size, reflecting the fact that filesystem layouts generally allocate a whole number of blocks of the disk to each file, so some space is wasted at the end of the last block. So counting only the logical file size will typically cause under-reporting of the disk usage (perhaps \e{large} under-reporting in the case of a very large number of very small files). On the other hand, sometimes a file with a very large logical size can have \q{holes} where no data is actually stored, in which case using the logical size of the file will \e{over}-report its disk usage. So the use of logical sizes can give wrong answers in both directions. } The following option affects all the modes that generate reports: the web server mode \cw{-w}, the stand-alone HTML generation mode \cw{-H} and the text report mode \cw{-t}. \dt \cw{--files} \dd This option causes \cw{agedu}'s reports to list the individual files in each directory, instead of just giving a combined report for everything that's not in a subdirectory. The following option affects the text report mode \cw{-t}. \dt \cw{-a} \e{age} or \cw{--age} \e{age} \dd This option tells \cw{agedu} to report only files of at least the specified age. An age is specified as a number, followed by one of \cq{y} (years), \cq{m} (months), \cq{w} (weeks) or \cq{d} (days). (This syntax is also used by the \cw{-r} option.) For example, \cw{-a 6m} will produce a text report which includes only files at least six months old. The following options affect the stand-alone HTML generation mode \cw{-H} and the text report mode \cw{-t}. \dt \cw{-d} \e{depth} or \cw{--depth} \e{depth} \dd This option controls the maximum depth to which \cw{agedu} recurses when generating a text or HTML report. \lcont{ In text mode, the default is 1, meaning that the report will include the directory given on the command line and all of its immediate subdirectories. A depth of two includes another level below that, and so on; a depth of zero means \e{only} the directory on the command line. In HTML mode, specifying this option switches \cw{agedu} from writing out a single HTML file to writing out multiple files which link to each other. A depth of 1 means \cw{agedu} will write out an HTML file for the given directory and also one for each of its immediate subdirectories. If you want \cw{agedu} to recurse as deeply as possible, give the special word \cq{max} as an argument to \cw{-d}. } \dt \cw{-o} \e{filename} or \cw{--output} \e{filename} \dd This option is used to specify an output file for \cw{agedu} to write its report to. In text mode or single-file HTML mode, the argument is treated as the name of a file. In multiple-file HTML mode, the argument is treated as the name of a directory: the directory will be created if it does not already exist, and the output HTML files will be created inside it. The following option affects only the stand-alone HTML generation mode \cw{-H}, and even then, only in recursive mode (with \cw{-d}): \dt \cw{--numeric} \dd This option tells \cw{agedu} to name most of its output HTML files numerically. The root of the whole output file collection will still be called \cw{index.html}, but all the rest will have names like \cw{73.html} or \cw{12525.html}. (The numbers are essentially arbitrary; in fact, they're indices of nodes in the data structure used by \cw{agedu}'s index file.) \lcont{ This system of file naming is less intuitive than the default of naming files after the sub-pathname they index. It's also less stable: the same pathname will not necessarily be represented by the same filename if \cw{agedu -H} is re-run after another scan of the same directory tree. However, it does have the virtue that it keeps the filenames \e{short}, so that even if your directory tree is very deep, the output HTML files won't exceed any OS limit on filename length. } The following options affect the web server mode \cw{-w}, and in some cases also the stand-alone HTML generation mode \cw{-H}: \dt \cw{-r} \e{age range} or \cw{--age-range} \e{age range} \dd The HTML reports produced by \cw{agedu} use a range of colours to indicate how long ago data was last accessed, running from red (representing the most disused data) to green (representing the newest). By default, the lengths of time represented by the two ends of that spectrum are chosen by examining the data file to see what range of ages appears in it. However, you might want to set your own limits, and you can do this using \cw{-r}. \lcont{ The argument to \cw{-r} consists of a single age, or two ages separated by a minus sign. An age is a number, followed by one of \cq{y} (years), \cq{m} (months), \cq{w} (weeks) or \cq{d} (days). (This syntax is also used by the \cw{-a} option.) The first age in the range represents the oldest data, and will be coloured red in the HTML; the second age represents the newest, coloured green. If the second age is not specified, it will default to zero (so that green means data which has been accessed \e{just now}). For example, \cw{-r 2y} will mark data in red if it has been unused for two years or more, and green if it has been accessed just now. \cw{-r 2y-3m} will similarly mark data red if it has been unused for two years or more, but will mark it green if it has been accessed three months ago or later. } \dt \cw{--address} \e{addr}[\cw{:}\e{port}] \dd Specifies the network address and port number on which \cw{agedu} should listen when running its web server. If you want \cw{agedu} to listen for connections coming in from any source, specify the address as the special value \cw{ANY}. If the port number is omitted, an arbitrary unused port will be chosen for you and displayed. \lcont{ If you specify this option, \cw{agedu} will not print its URL on standard output (since you are expected to know what address you told it to listen to). } \dt \cw{--auth} \e{auth-type} \dd Specifies how \cw{agedu} should control access to the web pages it serves. The options are as follows: \lcont{ \dt \cw{magic} \dd This option only works on Linux, and only when the incoming connection is from the same machine that \cw{agedu} is running on. On Linux, the special file \cw{/proc/net/tcp} contains a list of network connections currently known to the operating system kernel, including which user id created them. So \cw{agedu} will look up each incoming connection in that file, and allow access if it comes from the same user id under which \cw{agedu} itself is running. Therefore, in \cw{agedu}'s normal web server mode, you can safely run it on a multi-user machine and no other user will be able to read data out of your index file. \dt \cw{basic} \dd In this mode, \cw{agedu} will use HTTP Basic authentication: the user will have to provide a username and password via their browser. \cw{agedu} will normally make up a username and password for the purpose, but you can specify your own; see below. \dt \cw{none} \dd In this mode, the web server is unauthenticated: anyone connecting to it has full access to the reports generated by \cw{agedu}. Do not do this unless there is nothing confidential at all in your index file, or unless you are certain that nobody but you can run processes on your computer. \dt \cw{default} \dd This is the default mode if you do not specify one of the above. In this mode, \cw{agedu} will attempt to use Linux magic authentication, but if it detects at startup time that \cw{/proc/net/tcp} is absent or non-functional then it will fall back to using HTTP Basic authentication and invent a user name and password. } \dt \cw{--auth-file} \e{filename} or \cw{--auth-fd} \e{fd} \dd When \cw{agedu} is using HTTP Basic authentication, these options allow you to specify your own user name and password. If you specify \cw{--auth-file}, these will be read from the specified file; if you specify \cw{--auth-fd} they will instead be read from a given file descriptor which you should have arranged to pass to \cw{agedu}. In either case, the authentication details should consist of the username, followed by a colon, followed by the password, followed \e{immediately} by end of file (no trailing newline, or else it will be considered part of the password). \dt \cw{--title} \e{title} \dd Specify the string that appears at the start of the \cw{