pax_global_header00006660000000000000000000000064147540072230014516gustar00rootroot0000000000000052 comment=9f779187a6d02c73f9dc0a5f9f9681a41e3fc28b vnlog-1.40/000077500000000000000000000000001475400722300125675ustar00rootroot00000000000000vnlog-1.40/.dir-locals.el000066400000000000000000000077471475400722300152370ustar00rootroot00000000000000;; Very similar logic appears in ;; https://www.github.com/dkogan/gnuplotlib ;; https://www.github.com/dkogan/feedgnuplot ;; https://www.github.com/dkogan/vnlog ;; ;; I need some advices to be able to generate all the images. I'm not using the org ;; exporter to produce the html, but relying on github's limited org parser to ;; display everything. github's parser doesn't do the org export, so I must ;; pre-generate all the figures with (org-babel-execute-buffer) (C-c C-v C-b). ;; This requires advices to: ;; - Generate unique image filenames ;; - Communicate those filenames to feedgnuplot ;; - Display code that produces an interactive plot (so that the readers can ;; cut/paste the snippets), but run code that writes to the image that ends up in ;; the documentation (( org-mode . ((eval . (progn (setq org-confirm-babel-evaluate nil) (org-babel-do-load-languages 'org-babel-load-languages '((shell . t) (gnuplot . t))) ;; This sets a default :file tag, set to a unique filename. I want each demo to ;; produce an image, but I don't care what it is called. I omit the :file tag ;; completely, and this advice takes care of it (defun dima-info-local-get-property (params what) (condition-case nil (cdr (assq what params)) (error ""))) (defun dima-org-babel-is-feedgnuplot (params) (and (or (not (assq :file params)) (string-match "^guide-[0-9]+\\.svg$" (cdr (assq :file params)))) (string-match "\\" (dima-info-local-get-property params :exports) ) (string-match "\\" (dima-info-local-get-property params :results )))) (defun dima-org-babel-sh-unique-plot-filename (f &optional arg info params) (let ((info-local (or info (org-babel-get-src-block-info t)))) (if (and info-local (string= (car info-local) "sh") (dima-org-babel-is-feedgnuplot (caddr info-local))) ;; We're looking at a feedgnuplot block. Add a default :file (funcall f arg info (cons (cons ':file (format "guide-%d.svg" (condition-case nil (setq dima-unique-plot-number (1+ dima-unique-plot-number)) (error (setq dima-unique-plot-number 0))))) params)) ;; Not feedgnuplot. Just do the normal thing (funcall f arg info params)))) (unless (advice-member-p #'dima-org-babel-sh-unique-plot-filename #'org-babel-execute-src-block) (advice-add #'org-babel-execute-src-block :around #'dima-org-babel-sh-unique-plot-filename)) ;; If I'm regenerating ALL the plots, I start counting the plots from 0 (defun dima-reset-unique-plot-number (&rest args) (setq dima-unique-plot-number 0)) (unless (advice-member-p #'dima-reset-unique-plot-number #'org-babel-execute-buffer) (advice-add #'org-babel-execute-buffer :before #'dima-reset-unique-plot-number)) ;; I'm using github to display guide.org, so I'm not using the "normal" org ;; exporter. I want the demo text to not contain --hardcopy, but clearly I ;; need --hardcopy when generating the plots. I add the --hardcopy to the ;; command before running it (defun dima-org-babel-sh-set-demo-output (f body params) (when (dima-org-babel-is-feedgnuplot params) (with-temp-buffer (insert body) ;; write to svg (forward-word-strictly) (insert (format " --terminal 'svg noenhanced solid size 800,600 font \",14\"' --hardcopy %s" (cdr (assq :file params)))) (setq body (buffer-substring-no-properties (point-min) (point-max))))) (funcall f body params)) (unless (advice-member-p #'dima-org-babel-sh-set-demo-output #'org-babel-execute:sh) (advice-add #'org-babel-execute:sh :around #'dima-org-babel-sh-set-demo-output)) ))))) vnlog-1.40/.gitattributes000066400000000000000000000000211475400722300154530ustar00rootroot00000000000000README.org -diff vnlog-1.40/.gitignore000066400000000000000000000001601475400722300145540ustar00rootroot00000000000000*~ *.s *.i *.o *.d *.so *.so.* *.pyc cscope* *_generated.h test/test1 test/*.got test/vnlog_fields_generated*.h vnlog-1.40/Changes000066400000000000000000000225141475400722300140660ustar00rootroot00000000000000vnlog (1.40) * "vnl-sort -h" is no longer ambiguous -- Dima Kogan Fri, 14 Feb 2025 19:48:26 -0800 vnlog (1.39) * vnl-filter: pacified warning about duplicate begin/end options -- Dima Kogan Wed, 29 Jan 2025 09:38:30 -0800 vnlog (1.38) * Python vnlog.slurp() can accept a structured dtype, so it can now read non-numerical data -- Dima Kogan Tue, 02 Jul 2024 09:15:59 -0700 vnlog (1.37) * Fixed vnl-filter --perl --stream --eval The --stream option means "flush after each print". But the code was calling flush() explicitly, so --eval would break the --stream: it would be the user's job to add their own flush(). This patch adds "$|=1" to the preamble in this case, so the language would add their own implicit flush(). The equivalent was already being done in awk -- Dima Kogan Mon, 27 May 2024 10:23:15 -0700 vnlog (1.36) * C API: I don't fflush() with each record The standard C library will already do this for line-buffered output (for humans) and will NOT do this when writing to files (for efficiency). And there's a standard way to force one or the other behavior (stdbuf). -- Dima Kogan Tue, 25 Jul 2023 23:39:34 -0700 vnlog (1.35) * Build system uses packaged mrbuild. I no longer ship a copy of mrbuild with these source * vnlog parser in C included -- Dima Kogan Mon, 19 Jun 2023 12:45:49 -0700 vnlog (1.34) * Minor improvements to the tab-completions * Minor improvements to error reporting * Removed python2 support from the Makefile, test suite -- Dima Kogan Fri, 13 Jan 2023 12:11:06 -0800 vnlog (1.33) * vnl-filter: added --begin, --end, --sub-abs, -l -- Dima Kogan Tue, 28 Jun 2022 12:22:47 -0700 vnlog (1.32) * Python parser can handle trailing comments * Bug fix: symlinked binaries work (Thanks to Jim Radford for the report) * Python vnlog.slurp() always returns 2D arrays * added latestdefined() to vnl-filter -- Dima Kogan Fri, 29 Oct 2021 14:06:21 -0700 vnlog (1.31) * vnl-filter: correct behavior with field names containing small integers -- Dima Kogan Thu, 03 Dec 2020 16:57:25 -0800 vnlog (1.30) * vnl-align: empty lines are treated like comments, and are preserved * vnl-filter bug fix: --perl puts () around each "match" expression -- Dima Kogan Thu, 03 Dec 2020 16:56:01 -0800 vnlog (1.29) * Added vnl-uniq test to the test suite * vnl-uniq works on OSX -- Dima Kogan Mon, 15 Jun 2020 11:06:22 -0700 vnlog (1.28) * Compatibility improvements for *BSD (tested on OSX and FreeBSD) - vnl-join children use the same perl executable as the parent. This is important if we're not using a vanilla /usr/bin/perl - renamed Makefile -> GNUmakefile to make it clear it is only for GNU Make - #! lines reference '/usr/bin/env ...'. Again, if we don't have /usr/bin/perl then this is required - imported new mrbuild Makefile to support the FreeBSD linker - cmdline tools have --vnl-tool to specify the underlying tool. FreeBSD ships "ts" as "moreutils-ts" so this flexibility is required -- Dima Kogan Wed, 03 Jun 2020 20:08:14 -0700 vnlog (1.27) * vnl-filter produces - for any expression that evaluates to an empty string * vnlog_set_output_FILE() can take ctx==NULL for the global context -- Dima Kogan Tue, 14 Apr 2020 22:21:05 -0700 vnlog (1.26) * Revert "I explicitly refuse to do anything if --stream --eval" To make it possible to produce unbuffered output with awk -- Dima Kogan Mon, 03 Feb 2020 22:12:52 -0800 vnlog (1.25) * vnl-filter: rel()/diff()/prev() work next to newlines -- Dima Kogan Wed, 29 Jan 2020 15:53:57 -0800 vnlog (1.24) * Test suite fix -- Dima Kogan Sat, 21 Dec 2019 13:33:42 -0800 vnlog (1.23) * vnl-filter diff() returns - for the first record * added vnl-uniq -- Dima Kogan Sat, 21 Dec 2019 11:40:01 -0800 vnlog (1.22) * vnl-join handles empty vnlog files properly -- Dima Kogan Sat, 05 Oct 2019 12:36:14 -0700 vnlog (1.21) * --has can take regular expressions * Columns with = in the name are now supported -- Dima Kogan Sun, 25 Aug 2019 16:09:14 -0700 vnlog (1.20) * Exported the base64 interface. It's now usable standalone -- Dima Kogan Tue, 16 Jul 2019 15:05:41 -0700 vnlog (1.19) * Looser handling of whitespace in the cmdline tools and parsers: - Blank lines count as comments - leading whitespace before # doesn't matter - # by itself works the same as ##. This makes it easier for the user to # comment-out stuff -- Dima Kogan Tue, 16 Jul 2019 15:04:30 -0700 vnlog (1.18) * Fixed recent regression: vnl-filter with multiple 'matches' expressions works again * vnl-filter -p 'prev(x)' now outputs '-' for the first line instead of 0. This is more truthful. -- Dima Kogan Tue, 16 Jul 2019 15:02:16 -0700 vnlog (1.17) * vnl-join --autoprefix handles numerical filenames better A common special case is that the input files are of the form aaaNNNbbb where NNN are numbers. If the numbers are 0-padded, the set of NNN could be "01", "02", "03". In this case the "0" is a common prefix, so it would not be included in the file labels, which is NOT what you want here: the labels should be "01", "02", ... not "1", "2". Here I handle this case by removing all trailing digits from the common prefix * Support for grep-style -A/-B/-C options -- Dima Kogan Fri, 29 Mar 2019 18:20:08 -0700 vnlog (1.16) * 'vnl-join --vnl-sort' runs a STABLE pre-sort -- Dima Kogan Mon, 21 Jan 2019 17:26:32 -0800 vnlog (1.15) * Vnlog::Parser perl library handles whitespace properly -- Dima Kogan Sun, 06 Jan 2019 21:15:11 -0800 vnlog (1.14) * "vnl-filter --stream" is now a synonym for "vnl-filter --unbuffered" * added new tool: vnl-ts -- Dima Kogan Fri, 28 Dec 2018 12:32:22 -0800 vnlog (1.13) * vnl-join doesn't get confused by trailing whitespace * vnl-filter new special-operations: sum(), prev() -- Dima Kogan Sun, 02 Dec 2018 13:49:59 -0800 vnlog (1.12) * test data now lives in separate subdirectories per tool And as as a result, parallel testing works again -- Dima Kogan Sun, 17 Jun 2018 20:58:08 -0700 vnlog (1.11) * N-way vnl-join: fixed deadlock with large files -- Dima Kogan Fri, 15 Jun 2018 19:33:57 -0700 vnlog (1.10) * vnl-join updates: - N-way vnl-join now invoke the sub-joins in parallel - Updated tab completion with new vnl-join arguments - 'make clean' leaves the README.org alone -- Dima Kogan Fri, 15 Jun 2018 15:00:49 -0700 vnlog (1.9) * vnl-join updates: - N-way joins are supported rather than just 2-way - -a- available as a shorthand for -a1 -a2 -a3 -a4 ... - -v- available as a shorthand for -v1 -v2 -v3 -v4 ... - --vnl-autoprefix/--vnl-autosuffix available to infer the prefix/suffix from the filenames -- Dima Kogan Tue, 12 Jun 2018 23:17:10 -0700 vnlog (1.8) * vnl-filter: bug-fix for compatibility with older perls (5.16 works now) -- Dima Kogan Sat, 28 Apr 2018 19:46:49 -0700 vnlog (1.7) * vnl-filter: added exclusion columns: vnl-filter -p !xxx * vnl-filter handles duplicate columns robustly * 'vnl-filter -p x*' no longer matches all the columns * Implemented and documented non-distro installation * added sample packaging files * README.org now contains all the manpages -- Dima Kogan Thu, 26 Apr 2018 20:26:48 -0700 vnlog (1.6) * ABI break: C library works on armhf, armel. API unchanged -- Dima Kogan Sun, 01 Apr 2018 22:12:27 -0700 vnlog (1.5) * Test suite runs in parallel * vnlog.py supports python2 and python3 * install: only the public header is shipped -- Dima Kogan Sat, 31 Mar 2018 02:06:13 -0700 vnlog (1.4) * zsh completion: --[TAB] assumes it's not a 'matches' expression Otherwise no cmdline options ever complete -- Dima Kogan Sun, 11 Mar 2018 18:03:11 -0700 vnlog (1.3) * vnl-sort, vnl-join, vnl-tail now respond to -h and have better help * added bash,nzsh tab-completion * added bash completions * tests no longer always report successes * vnl-join, vnl-sort, vnl-filter no longer barf at unrelated duplicated fields * Simpler Vnlog::Parser API * Added python parsing API -- Dima Kogan Fri, 09 Mar 2018 15:59:29 -0800 vnlog (1.2) * minor fix to not complain about doubly-defined columns -- Dima Kogan Mon, 26 Feb 2018 12:33:24 -0800 vnlog (1.1) * A number of updates -- Dima Kogan Thu, 22 Feb 2018 23:07:17 -0800 vnlog (1.0) * First public release! -- Dima Kogan Sat, 10 Feb 2018 21:21:02 -0800 # Local Variables: # mode: debian-changelog # End: vnlog-1.40/GNUmakefile000066400000000000000000000057251475400722300146520ustar00rootroot00000000000000include choose_mrbuild.mk include $(MRBUILD_MK)/Makefile.common.header PROJECT_NAME := vnlog ABI_VERSION := 0 TAIL_VERSION := 1 LIB_SOURCES := \ b64_cencode.c \ vnlog.c \ vnlog-parser.c BIN_SOURCES := test/test1.c test/test-parser.c TOOLS := \ vnl-filter \ vnl-align \ vnl-sort \ vnl-join \ vnl-tail \ vnl-ts \ vnl-uniq \ vnl-gen-header \ vnl-make-matrix # I construct the README.org from the template. The only thing I do is to insert # the manpages. Note that this is more complicated than it looks: # # 1. The tools are written in perl and contain POD documentation # 2. This documentation is stripped out here with pod2text, and included in the # README. This README is an org-mode file, and the README.template.org # container included the manpage text inside a #+BEGIN_EXAMPLE/#+END_EXAMPLE. # So the manpages are treated as a verbatim, unformatted text blob # 3. Further down, the same POD is converted to a manpage via pod2man define MAKE_README = BEGIN \ { \ for $$a (@ARGV) \ { \ $$c{$$a} = `pod2text $$a | mawk "/REPOSITORY/{exit} {print}"`; \ } \ } \ \ while() \ { \ print s/xxx-manpage-(.*?)-xxx/$$c{$$1}/gr; \ } endef README.org: README.template.org $(TOOLS) < $(filter README%,$^) perl -e '$(MAKE_README)' $(filter-out README%,$^) > $@ all: README.org b64_cencode.o: CFLAGS += -Wno-implicit-fallthrough # Make can't deal with ':' in filenames, so I hack it coloncolon := __colon____colon__ doc: $(addprefix man1/,$(addsuffix .1,$(TOOLS))) $(patsubst lib/Vnlog/%.pm,man3/Vnlog$(coloncolon)%.3pm,$(wildcard lib/Vnlog/*.pm)) .PHONY: doc %/: mkdir -p $@ man1/%.1: % | man1/ pod2man -r '' --section 1 --center "vnlog" $< $@ man3/Vnlog$(coloncolon)%.3pm: lib/Vnlog/%.pm | man3/ pod2man -r '' --section 3pm --center "vnlog" $< $@ EXTRA_CLEAN += man1 man3 CFLAGS += -I. -std=gnu99 -Wno-missing-field-initializers test/test1: test/test2.o test/test1.o: test/vnlog_fields_generated1.h test/test2.o: test/vnlog_fields_generated2.h test/vnlog_fields_generated%.h: test/vnlog%.defs vnl-gen-header ./vnl-gen-header < $< | perl -pe 's{vnlog/vnlog.h}{vnlog.h}' > $@ EXTRA_CLEAN += test/vnlog_fields_generated*.h test/*.got # Set up the test suite to be runnable in parallel test check: \ test/test_vnl-filter.pl.RUN \ test/test_vnl-sort.pl.RUN \ test/test_vnl-join.pl.RUN \ test/test_vnl-uniq.pl.RUN \ test/test_c_api.sh.RUN \ test/test_perl_parser.pl.RUN \ test/test_python_parser.py.RUN @echo "All tests in the test suite passed!" .PHONY: test check %.RUN: % $< test/test_c_api.sh.RUN: test/test1 test/test-parser EXTRA_CLEAN += test/testdata_* DIST_INCLUDE := vnlog*.h DIST_BIN := $(TOOLS) DIST_PERL_MODULES := lib/Vnlog DIST_PY3_MODULES := lib/vnlog.py install: doc DIST_MAN := man1/ man3/ include $(MRBUILD_MK)/Makefile.common.footer vnlog-1.40/README.org000066400000000000000000002421201475400722300142360ustar00rootroot00000000000000* Talk I just gave a talk about this at [[https://www.socallinuxexpo.org/scale/17x][SCaLE 17x]]. Here are the [[https://www.youtube.com/watch?v=Qvb_uNkFGNQ&t=12830s][video of the talk]] and the [[https://github.com/dkogan/talk-feedgnuplot-vnlog/blob/master/feedgnuplot-vnlog.org]["slides"]]. * Summary Vnlog ("vanilla-log") is a toolkit for manipulating tabular ASCII data with labelled fields using normal UNIX tools. If you regularly use =awk= and =sort= and =uniq= and others, these tools will make you infinitely more powerful. The vnlog tools /extend/, rather than replace the standard tooling, so minimal effort is required to learn and use these tools. Everything assumes a trivially simple log format: - A whitespace-separated table of ASCII human-readable text - A =#= character starts a comment that runs to the end of the line (like in many scripting languages) - The first line that begins with a single =#= (not =##= or =#!=) is a /legend/, naming each column. This is required, and the field names that appear here are referenced by all the tools. - Empty fields reported as =-= This describes 99% of the format, with some extra details [[#format-details][below]]. Example: #+BEGIN_EXAMPLE #!/usr/bin/whatever # a b c 1 2 3 ## comment 4 5 6 #+END_EXAMPLE Such data can be processed directly with almost any existing tool, and /this/ toolkit allows the user to manipulate this data in a nicer way by relying on standard UNIX tools. The core philosophy is to avoid creating new knowledge as much as possible. Consequently, the vnlog toolkit relies /heavily/ on existing (and familiar!) tools and workflows. As such, the toolkit is small, light, and has a /very/ friendly learning curve. * Synopsis I have [[https://raw.githubusercontent.com/dkogan/vnlog/master/dji-tsla.tar.gz][two sets of historical stock data]], from the start of 2018 until now (2018/11): #+BEGIN_SRC sh :results output :exports both < dji.vnl head -n 4 #+END_SRC #+RESULTS: : # Date Open High Low Close AdjClose Volume : 2018-11-15 25061.48 25354.56 24787.79 25289.27 25289.27 383292840 : 2018-11-14 25388.08 25501.29 24935.82 25080.50 25080.50 384240000 : 2018-11-13 25321.21 25511.03 25193.78 25286.49 25286.49 339690000 And #+BEGIN_SRC sh :results output :exports both < tsla.vnl head -n 4 #+END_SRC #+RESULTS: : # Date Open High Low Close AdjClose Volume : 2018-11-15 342.33 348.58 339.04 348.44 348.44 4486339 : 2018-11-14 342.70 347.11 337.15 344.00 344.00 5036300 : 2018-11-13 333.16 344.70 332.20 338.73 338.73 5448600 I can add whitespace to make the headers more legible by humans: #+BEGIN_SRC sh :results output :exports both < dji.vnl head -n 4 | vnl-align #+END_SRC #+RESULTS: : # Date Open High Low Close AdjClose Volume : 2018-11-15 25061.48 25354.56 24787.79 25289.27 25289.27 383292840 : 2018-11-14 25388.08 25501.29 24935.82 25080.50 25080.50 384240000 : 2018-11-13 25321.21 25511.03 25193.78 25286.49 25286.49 339690000 I can pull out the closing prices: #+BEGIN_SRC sh :results output :exports both < dji.vnl vnl-filter -p Close | head -n4 #+END_SRC #+RESULTS: : # Close : 25289.27 : 25080.50 : 25286.49 =vnl-filter= is primarily a wrapper around =awk= or =perl=, allowing the user to reference columns by name. I can then plot the closing prices: #+BEGIN_SRC sh :results file link :exports both < dji.vnl vnl-filter -p Close | feedgnuplot --lines --unset grid #+END_SRC #+RESULTS: [[file:guide-1.svg]] Here I kept /only/ the closing price column, so the x-axis is just the row index. The data was in reverse chronological order, so this plot is also in reverse chronological order. Let's fix that: #+BEGIN_SRC sh :results file link :exports both < dji.vnl vnl-sort -k Date | vnl-filter -p Close | feedgnuplot --lines --unset grid #+END_SRC #+RESULTS: [[file:guide-2.svg]] The =vnl-sort= tool (and most of the other =vnl-xxx= tools) are wrappers around the core tools already available on the system (such as =sort=, in this case). With the primary difference being reading/writing vnlog, and referring to columns by name. We now have the data in the correct order, but it'd be nice to see the actual dates on the x-axis. While we're at it, let's label the axes too: #+BEGIN_SRC sh :results output :exports both < dji.vnl vnl-filter -p Date,Close | head -n4 #+END_SRC #+RESULTS: : # Date Close : 2018-11-15 25289.27 : 2018-11-14 25080.50 : 2018-11-13 25286.49 #+BEGIN_SRC sh :results file link :exports both < dji.vnl vnl-sort -k Date | vnl-filter -p Date,Close | feedgnuplot --lines --unset grid --timefmt %Y-%m-%d --domain \ --xlabel 'Date' --ylabel 'Price ($)' #+END_SRC #+RESULTS: [[file:guide-3.svg]] What was the highest value of the Dow-Jones index, and when did it happen? #+BEGIN_SRC sh :results output :exports both < dji.vnl vnl-sort -rgk Close | head -n2 | vnl-align #+END_SRC #+RESULTS: : # Date Open High Low Close AdjClose Volume : 2018-10-03 26833.47 26951.81 26789.08 26828.39 26828.39 280130000 Alrighty. Looks like the high was in October. Let's zoom in on that month: #+BEGIN_SRC sh :results file link :exports both < dji.vnl vnl-sort -k Date | vnl-filter 'Date ~ /2018-10/' -p Date,Close | feedgnuplot --lines --unset grid --timefmt %Y-%m-%d --domain \ --xlabel 'Date' --ylabel 'Price ($)' #+END_SRC #+RESULTS: [[file:guide-4.svg]] OK. Is this thing volatile? What was the largest single-day gain? #+BEGIN_SRC sh :results output :exports both < dji.vnl vnl-filter -p '.,d=diff(Close)' | head -n4 | vnl-align #+END_SRC #+RESULTS: : # Date Open High Low Close AdjClose Volume d : 2018-11-15 25061.48 25354.56 24787.79 25289.27 25289.27 383292840 - : 2018-11-14 25388.08 25501.29 24935.82 25080.50 25080.50 384240000 -208.77 : 2018-11-13 25321.21 25511.03 25193.78 25286.49 25286.49 339690000 205.99 #+BEGIN_SRC sh :results output :exports both < dji.vnl vnl-filter -p '.,d=diff(Close)' | vnl-sort -rgk d | head -n2 | vnl-align #+END_SRC #+RESULTS: : # Date Open High Low Close AdjClose Volume d : 2018-02-02 26061.79 26061.79 25490.66 25520.96 25520.96 522880000 1175.21 Whoa. So the best single-gain day was 2018-02-02: the dow gained 1175.21 points between closing on Feb 1 and Feb 2. But it actually lost ground that day! What if I looked at the difference between the opening and closing in a single day? #+BEGIN_SRC sh :results output :exports both < dji.vnl vnl-filter -p '.,d=Close-Open' | vnl-sort -rgk d | head -n2 | vnl-align #+END_SRC #+RESULTS: : # Date Open High Low Close AdjClose Volume d : 2018-02-06 24085.17 24946.23 23778.74 24912.77 24912.77 823940000 827.6 I guess by that metric 2018-02-06 was better. Let's join the Dow-jones index data and the TSLA data, and let's look at them together: #+BEGIN_SRC sh :results output :exports both vnl-join --vnl-autosuffix dji.vnl tsla.vnl -j Date | head -n4 | vnl-align #+END_SRC #+RESULTS: : # Date Open_dji High_dji Low_dji Close_dji AdjClose_dji Volume_dji Open_tsla High_tsla Low_tsla Close_tsla AdjClose_tsla Volume_tsla : 2018-11-15 25061.48 25354.56 24787.79 25289.27 25289.27 383292840 342.33 348.58 339.04 348.44 348.44 4486339 : 2018-11-14 25388.08 25501.29 24935.82 25080.50 25080.50 384240000 342.70 347.11 337.15 344.00 344.00 5036300 : 2018-11-13 25321.21 25511.03 25193.78 25286.49 25286.49 339690000 333.16 344.70 332.20 338.73 338.73 5448600 #+BEGIN_SRC sh :results output :exports both vnl-join --vnl-autosuffix dji.vnl tsla.vnl -j Date | vnl-filter -p '^Close' | head -n4 | vnl-align #+END_SRC #+RESULTS: : # Close_dji Close_tsla : 25289.27 348.44 : 25080.50 344.00 : 25286.49 338.73 #+BEGIN_SRC sh :results file link :exports both vnl-join --vnl-autosuffix dji.vnl tsla.vnl -j Date | vnl-filter -p '^Close' | feedgnuplot --domain --points --unset grid \ --xlabel 'DJI price ($)' --ylabel 'TSLA price ($)' #+END_SRC #+RESULTS: [[file:guide-5.svg]] Huh. Apparently there's no obvious, strong correlation between TSLA and Dow-Jones closing prices. And we saw that with just a few shell commands, without dropping down into a dedicated analysis system. * Build and installation vnlog is a part of Debian/buster and Ubuntu/cosmic (18.10) and later. On those boxes you can simply #+BEGIN_EXAMPLE $ sudo apt install vnlog libvnlog-dev libvnlog-perl python3-vnlog #+END_EXAMPLE to get the binary tools, the C API, the perl and python3 interfaces respectively. ** Install on non-Debian boxes Most of this is written in an interpreted language, so there's nothing to build or install, and you can run the tools directly from the source tree: #+BEGIN_EXAMPLE $ git clone https://github.com/dkogan/vnlog.git $ cd vnlog $ ./vnl-filter ..... #+END_EXAMPLE The python and perl libraries can be run from the tree by setting the =PYTHONPATH= and =PERL5LIB= environment variables respectively. For the C library, you should =make=, and then point your =CFLAGS= and =LDLIBS= and =LD_LIBRARY_PATH= to the local tree. If you do want to install to some arbitrary location to simplify the paths, do this: #+BEGIN_EXAMPLE $ make $ PREFIX=/usr/local make install #+END_EXAMPLE This will install /all/ the components into =/usr/local=. * Description Vnlog data is nicely readable by both humans and machines. Any time your application invokes =printf()= for either diagnostics or logging, consider writing out vnlog-formatted data. You retain human readability, but gain the power all the =vnl-...= tools provide. Vnlog tools are designed to be very simple and light. There's an ever-growing list of other tools that do vaguely the same thing. Some of these: - https://github.com/BurntSushi/xsv - https://csvkit.readthedocs.io/ - https://github.com/johnkerl/miller - https://github.com/jqnatividad/qsv - https://github.com/greymd/teip - https://github.com/eBay/tsv-utils-dlang - https://www.gnu.org/software/datamash/ - https://stedolan.github.io/jq/ - https://github.com/benbernard/RecordStream - https://github.com/dinedal/textql - https://www.visidata.org/ - http://harelba.github.io/q/ - https://github.com/BatchLabs/charlatan - https://github.com/dbohdan/sqawk Many of these provide facilities to run various analyses, and others focus on data types that aren't just a table (json for instance). Vnlog by contrast doesn't analyze anything, and targets the most trivial possible data format. This makes it very easy to run any analysis you like in any tool you like. The main envisioned use case is one-liners, and the tools are geared for that purpose. The above mentioned tools are much more powerful than vnlog, so they could be a better fit for some use cases. I claim that - 90% of the time you want to do simple things, and vnlog is a great fit for the task - If you really do need to do something complex, you shouldn't be in the shell writing oneliners anymore, and a fully-fledged analysis system (numpy, etc) is more appropriate In the spirit of doing as little as possible, the provided tools are wrappers around tools you already have and are familiar with. The provided tools are: - =vnl-filter= is a tool to select a subset of the rows/columns in a vnlog and/or to manipulate the contents. This is an =awk= wrapper where the fields can be referenced by name instead of index. 20-second tutorial: #+BEGIN_SRC sh :results none :exports code vnl-filter -p col1,col2,colx=col3+col4 'col5 > 10' --has col6 #+END_SRC will read the input, and produce a vnlog with 3 columns: =col1= and =col2= from the input, and a column =colx= that's the sum of =col3= and =col4= in the input. Only those rows for which /both/ =col5 > 10= is true /and/ that have a non-null value for =col6= will be output. A null entry is signified by a single =-= character. #+BEGIN_SRC sh :results none :exports code vnl-filter --eval '{s += x} END {print s}' #+END_SRC #+RESULTS: will evaluate the given awk program on the input, but the column names work as you would hope they do: if the input has a column named =x=, this would produce the sum of all values in this column. - =vnl-sort=, =vnl-uniq=, =vnl-join=, =vnl-tail=, =vnl-ts= are wrappers around the corresponding commandline tools. These work exactly as you would expect also: the columns can be referenced by name, and the legend comment is handled properly. These are wrappers, so all the commandline options those tools have "just work" (except options that don't make sense in the context of vnlog). As an example, =vnl-tail -f= will follow a log: data will be read by =vnl-tail= as it is written into the log (just like =tail -f=, but handling the legend properly). And you already know how to use these tools without even reading the manpages! Note: I use the Linux kernel and the tools from GNU Coreutils exclusively, but this all has been successfully tested on FreeBSD and OSX also. Please let me know if something doesn't work. - =vnl-align= aligns vnlog columns for easy interpretation by humans. The meaning is unaffected - =Vnlog::Parser= is a simple perl library to read a vnlog - =vnlog= is a simple python library to read a vnlog. Both python2 and python3 are supported - =libvnlog= is a C library to simplify reading and writing a vnlog. Clearly all you /really/ need for writing is =printf()=, but this is useful if we have lots of columns, many containing null values in any given row, and/or if we have parallel threads writing to a log. In my usage I have hundreds of columns of sparse data, so this is handy - =vnl-make-matrix= converts a one-point-per-line vnlog to a matrix of data. I.e. #+BEGIN_EXAMPLE $ cat dat.vnl # i j x 0 0 1 0 1 2 0 2 3 1 0 4 1 1 5 1 2 6 2 0 7 2 1 8 2 2 9 3 0 10 3 1 11 3 2 12 $ < dat.vnl vnl-filter -p i,x | vnl-make-matrix --outdir /tmp Writing to '/tmp/x.matrix' $ cat /tmp/x.matrix 1 2 3 4 5 6 7 8 9 10 11 12 #+END_EXAMPLE All the tools have manpages that contain more detail. And more tools will probably be added with time. * Format details The high-level description of the vnlog format from [[#Summary][above]] is sufficient to read/write "normal" vnlog data, but there are a few corner cases that should be mentioned. To reiterate, the format description from above describes vnlog as: - A whitespace-separated table of ASCII human-readable text - A =#= character starts a comment that runs to the end of the line (like in many scripting languages) - The first line that begins with a single =#= (not =##= or =#!=) is a /legend/, naming each column. This is required, and the field names that appear here are referenced by all the tools. - Empty fields reported as =-= For a few years now I've been using these tools myself, and supporting others as they were passing vnlog data around. In the process I've encountered some slightly-weird data, and patched the tools to accept it. So today the included vnlog tools are /very/ permissive, and accept any vnlog data that can possibly be accepted. Other vnlog tools may not be quite as permissive, and may not be able to interpret "weird" data. Points of note, describing the included vnlog tools: - Leading and trailing whitespace is ignored. Everywhere. So this data file will be read properly, with the =x= column containing 1 and 3: #+begin_example # x y 1 2 3 4 #+end_example - Empty (or whitespace-only) lines anywhere are ignored, and treated as a comment - An initial =#= comment without field names is treated as a comment, and we continue looking for the legend in the following lines. So this data file will be read properly: #+begin_example ## comment # # x y 1 2 3 4 #+end_example - Trailing comments are supported, like in most scripting languages. So this data file will be read properly: #+begin_example # x y 1 2 # comment 3 4 #+end_example - Field names are /very/ permissive: anything that isn't whitespace is supported. So this data file will be read properly: #+begin_example # x y # 1+ - 1 2 3 4 5 11 12 13 14 15 #+end_example We can pull out the =#= and =1+= and =-= columns: #+begin_src sh vnl-filter -p '#,1+,-' #+end_src And we can even operate on them, if we use whitespace to indicate field boundaries: #+begin_src sh vnl-filter -p 'x=1+ + 5' #+end_src Note that this implies that trailing comments in a legend line are /not/ supported: the extra =#= characters will be used for field names. Field names containing =,= or === are currently not accepted by =vnl-filter=, but /are/ accepted by the other tools (=vnl-sort= and such). I'll make =vnl-filter= able to work with those field names too, eventually, but as a user, the simplest thing to do is to not pass around data with such field names. - Duplicated labels are supported whenever possible. So #+begin_example # x y z z 1 2 3 4 11 12 13 14 #+end_example will work just fine, unless we're operating on =z=. With this data, both of these commands work: #+begin_src sh vnl-filter -p x vnl-filter -p z #+end_src Picking =z= selects both of the =z= columns. But neither of these commands can work with the non-unique =z= column: #+begin_src sh vnl-filter -p s=z+1 vnl-sort -k z #+end_src * Workflows and recipes ** Storing disjoint data A common use case is a complex application that produces several semi-related subsets of data at once. Example: a moving vehicle is reporting both its own position and the observed positions of other vehicles; at any given time any number of other vehicles may be observed. Two equivalent workflows are possible: - a single unified vnlog stream for /all/ the data - several discrete vnlog streams for each data subset Both are valid approaches *** One unified vnlog stream Here the application produces a /single/ vnlog that contains /all/ the columns, from /all/ the data subsets. In any given row, many of the columns will be empty (i.e. contain only =-= ). For instance, a row describing a vehicle own position will not have data about any observations, and vice versa. It is inefficient to store all the extra =-= but it makes many things much nicer, so it's often worth it. =vnl-filter= can be used to pull out the different subsets. Sample =joint.vnl=: #+BEGIN_EXAMPLE # time x_self x_observation 1 10 - 2 20 - 2 - 100 3 30 - 3 - 200 3 - 300 #+END_EXAMPLE Here we have 3 instances in time. We have no observations at =time= 1, one observation at =time= 2, and two observations at =time= 3. We can use =vnl-filter= to pull out the data we want: #+BEGIN_EXAMPLE $ < joint.vnl vnl-filter -p time,self # time x_self 1 10 2 20 2 - 3 30 3 - 3 - #+END_EXAMPLE If we only care about our own positions, the =+= modifier in picked columns in =vnl-filter= is very useful here: #+BEGIN_EXAMPLE $ < joint.vnl vnl-filter -p time,+self # time x_self 1 10 2 20 3 30 $ < joint.vnl vnl-filter -p time,+observation # time x_observation 2 100 3 200 3 300 #+END_EXAMPLE Note that the default is =--skipempty=, so if we're /only/ looking at =x_self= for instance, then we don't even need to =+= modifier: #+begin_example $ < joint.vnl vnl-filter -p self # x_self 10 20 30 #+end_example Also, note that the =vnlog= C interface works very nicely to produce these datafiles: - You can define lots and lots of columns, but only fill some of them before calling =vnlog_emit_record()=. The rest will be set to =-=. - You can create multiple contexts for each type of data, and you can populate them with data independently. And when calling =vnlog_emit_record_ctx()=, you'll get a record with data for just that context. *** Several discrete vnlog streams Conversely, the application can produce /separate/ vnlog streams for /each/ subset of data. Depending on what is desired, exactly, =vnl-join= can be used to re-join them: #+BEGIN_EXAMPLE $ cat self.vnl # time x_self 1 10 2 20 3 30 $ cat observations.vnl # time x_observation 2 100 3 200 3 300 $ vnl-join -j time -a- self.vnl observations.vnl # time x_self x_observation 1 10 - 2 20 100 3 30 200 3 30 300 #+END_EXAMPLE ** Data statistics A common need is to compute basic statistics from your data. Many of the alternative toolkits listed above provide built-in facilities to do this, but vnlog does not: it's meant to be unixy, where each tool has very limited scope. Thus you can either do this with =awk= like you would normally, or you can use other standalone tools to perform the needed computations. For instance, I can generate some data: #+BEGIN_EXAMPLE $ seq 2 100 | awk 'BEGIN {print "# x"} {print log($1)}' > /tmp/log.vnl #+END_EXAMPLE Then I can compute the mean with =awk=: #+BEGIN_EXAMPLE $ < /tmp/log.vnl vnl-filter --eval '{sum += x} END {print sum/NR}' 3.67414 #+END_EXAMPLE Or I can compute the mean (and other stuff) with a separate standalone tool: #+BEGIN_EXAMPLE $ < /tmp/log.vnl ministat x +----------------------------------------------------------------------------+ | xx | | x xxxxxxx | | xx xxxxxxxxxxxx| | x x xxxxxxxxxxxxxxxxxxxxxxx| |x x x x x x x x x xx xx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx| | |_______________A____M___________| | +----------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 99 0.693147 4.60517 3.93183 3.6741353 0.85656382 #+END_EXAMPLE =ministat= is not a part of the vnlog toolkit, but the vnlog format is generic so it works just fine. ** Powershell-style filtering of common shell commands Everything about vnlog is generic and simple, so it's easy to use it to process data that wasn't originally meant to be used this way. For instance filtering the output of =ls -l= to report only file names and sizes, skipping directories, and sorting by file sizes: #+BEGIN_EXAMPLE $ ls -l total 320 -rw-r--r-- 1 dima dima 5044 Aug 25 15:04 Changes -rw-r--r-- 1 dima dima 12749 Aug 25 15:04 Makefile -rw-r--r-- 1 dima dima 69789 Aug 25 15:04 README.org -rw-r--r-- 1 dima dima 33781 Aug 25 15:04 README.template.org -rw-r--r-- 1 dima dima 5359 Aug 25 15:04 b64_cencode.c drwxr-xr-x 4 dima dima 4096 Aug 25 15:04 completions drwxr-xr-x 3 dima dima 4096 Aug 25 15:04 lib drwxr-xr-x 3 dima dima 4096 Aug 25 15:04 packaging drwxr-xr-x 2 dima dima 4096 Aug 25 15:04 test -rwxr-xr-x 1 dima dima 5008 Aug 25 15:04 vnl-align -rwxr-xr-x 1 dima dima 56637 Aug 25 15:04 vnl-filter -rwxr-xr-x 1 dima dima 5678 Aug 25 15:04 vnl-gen-header -rwxr-xr-x 1 dima dima 29815 Aug 25 15:04 vnl-join -rwxr-xr-x 1 dima dima 3631 Aug 25 15:04 vnl-make-matrix -rwxr-xr-x 1 dima dima 8372 Aug 25 15:04 vnl-sort -rwxr-xr-x 1 dima dima 5822 Aug 25 15:04 vnl-tail -rwxr-xr-x 1 dima dima 4439 Aug 25 15:04 vnl-ts -rw-r--r-- 1 dima dima 559 Aug 25 15:04 vnlog-base64.h -rw-r--r-- 1 dima dima 8169 Aug 25 15:04 vnlog.c -rw-r--r-- 1 dima dima 12677 Aug 25 15:04 vnlog.h $ (echo '# permissions num_links user group size month day time name'; ls -l | tail -n +2) | vnl-filter 'permissions !~ "^d"' -p name,size | vnl-sort -gk size | vnl-align # name size vnlog-base64.h 559 vnl-make-matrix 3631 vnl-ts 4439 vnl-align 5008 Changes 5044 b64_cencode.c 5359 vnl-gen-header 5678 vnl-tail 5822 vnlog.c 8169 vnl-sort 8372 vnlog.h 12677 Makefile 12749 vnl-join 29815 README.template.org 33781 vnl-filter 56637 README.org 69789 #+END_EXAMPLE With a bit of shell manipulation, these tools can be applied to a whole lot of different data streams that know nothing of vnlog. * C interface ** Writing vnlog files *** Basic usage For most uses, vnlog files are simple enough to be generated with plain prints. But then each print statement has to know which numeric column we're populating, which becomes effortful with many columns. In my usage it's common to have a large parallelized C program that's writing logs with hundreds of columns where any one record would contain only a subset of the columns. In such a case, it's helpful to have a library that can output the log files. This is available. Basic usage looks like this: In a shell: #+BEGIN_SRC sh :results none :exports code vnl-gen-header 'int w' 'uint8_t x' 'char* y' 'double z' 'void* binary' > vnlog_fields_generated.h #+END_SRC #+RESULTS: In a C program test.c: #+BEGIN_SRC C #include "vnlog_fields_generated.h" int main() { vnlog_emit_legend(); vnlog_set_field_value__w(-10); vnlog_set_field_value__x(40); vnlog_set_field_value__y("asdf"); vnlog_emit_record(); vnlog_set_field_value__z(0.3); vnlog_set_field_value__x(50); vnlog_set_field_value__w(-20); vnlog_set_field_value__binary("\x01\x02\x03", 3); vnlog_emit_record(); vnlog_set_field_value__w(-30); vnlog_set_field_value__x(10); vnlog_set_field_value__y("whoa"); vnlog_set_field_value__z(0.5); vnlog_emit_record(); return 0; } #+END_SRC Then we build and run, and we get #+BEGIN_EXAMPLE $ cc -o test test.c -lvnlog $ ./test # w x y z binary -10 40 asdf - - -20 50 - 0.2999999999999999889 AQID -30 10 whoa 0.5 - #+END_EXAMPLE The binary field in base64-encoded. This is a rarely-used feature, but sometimes you really need to log binary data for later processing, and this makes it possible. So you 1. Generate the header to define your columns 2. Call =vnlog_emit_legend()= 3. Call =vnlog_set_field_value__...()= for each field you want to set in that row. 4. Call =vnlog_emit_record()= to write the row and to reset all fields for the next row. Any fields unset with a =vnlog_set_field_value__...()= call are written as null: =-= This is enough for 99% of the use cases. Things get a bit more complex if we have have threading or if we have multiple vnlog ouput streams in the same program. For both of these we use vnlog /contexts/. *** Contexts To support independent writing into the same vnlog (possibly by multiple threads; this is reentrant), each log-writer should create a context, and use it when talking to vnlog. The context functions will make sure that the fields in each context are independent and that the output records won't clobber each other: #+BEGIN_SRC C void child_writer( // the parent context also writes to this vnlog. Pass NULL to // use the global one struct vnlog_context_t* ctx_parent ) { struct vnlog_context_t ctx; vnlog_init_child_ctx(&ctx, ctx_parent); while(records) { vnlog_set_field_value_ctx__xxx(&ctx, ...); vnlog_set_field_value_ctx__yyy(&ctx, ...); vnlog_set_field_value_ctx__zzz(&ctx, ...); vnlog_emit_record_ctx(&ctx); } vnlog_free_ctx(&ctx); // required only if we have any binary fields } #+END_SRC If we want to have multiple independent vnlog writers to /different/ streams (with different columns and legends), we do this instead: =file1.c=: #+BEGIN_SRC C #include "vnlog_fields_generated1.h" void f(void) { // Write some data out to the default context and default output (STDOUT) vnlog_emit_legend(); ... vnlog_set_field_value__xxx(...); vnlog_set_field_value__yyy(...); ... vnlog_emit_record(); } #+END_SRC =file2.c=: #+BEGIN_SRC C #include "vnlog_fields_generated2.h" void g(void) { // Make a new session context, send output to a different file, write // out legend, and send out the data struct vnlog_context_t ctx; vnlog_init_session_ctx(&ctx); FILE* fp = fopen(...); vnlog_set_output_FILE(&ctx, fp); vnlog_emit_legend_ctx(&ctx); ... vnlog_set_field_value__a(...); vnlog_set_field_value__b(...); ... vnlog_free_ctx(&ctx); // required only if we have any binary fields vnlog_emit_record(); } #+END_SRC Note that it's the user's responsibility to make sure the new sessions go to a different =FILE= by invoking =vnlog_set_output_FILE()=. Furthermore, note that the included =vnlog_fields_....h= file defines the fields we're writing to; and if we have multiple different vnlog field definitions in the same program (as in this example), then the different writers /must/ live in different source files. The compiler will barf if you try to =#include= two different =vnlog_fields_....h= files in the same source. *** Remaining APIs - =vnlog_printf(...)= and =vnlog_printf_ctx(ctx, ...)= write to a pipe like =printf()= does. This exists primarily for comments. - =vnlog_clear_fields_ctx(ctx, do_free_binary)= clears out the data in a context and makes it ready to be used for the next record. It is rare for the user to have to call this manually. The most common case is handled automatically (clearing out a context after emitting a record). One area where this is useful is when making a copy of a context: #+BEGIN_SRC C struct vnlog_context_t ctx1; // .... do stuff with ctx1 ... add data to it ... struct vnlog_context_t ctx2 = ctx1; // ctx1 and ctx2 now both have the same data, and the same pointers to // binary data. I need to get rid of the pointer references in ctx1 vnlog_clear_fields_ctx(&ctx1, false); #+END_SRC - =vnlog_free_ctx(ctx)= frees memory for an vnlog context. Do this before throwing the context away. Currently this is only needed for context that have binary fields, but this should be called for all contexts anyway, in case this changes in a later revision ** Reading vnlog files The basic usage goes like this: #+begin_src c #include #include #include bool parse_vnlog(const char* filename) { FILE* fp = fopen(filename); if(fp == NULL) return false; vnlog_parser_t ctx; if(VNL_OK != vnlog_parser_init(&ctx, fp)) return false; // String in the "time" column for the most-recently-parsed row const char*const* time_record = vnlog_parser_record_from_key(&ctx, "time"); if(time_record == NULL) { vnlog_parser_free(&ctx); return false; } int i_record = 0; vnlog_parser_result_t result; while(VNL_OK == (result = vnlog_parser_read_record(&ctx, fp))) { for(int i=0; i array(['image1.png', 'image2.png'], dtype=' array([[1, 2, 5], [3, 4, 1]]) print(arr['temperature']) ---> array([34., 35.]) #+end_example Notes: - The given structured dtype defines both how to organize the data, and which data to extract. So it can be used to read in only a subset of the available columns. Here I could have omitted the 'temperature' column, for instance - Sub-arrays are allowed. In the example I could say either #+begin_src python dtype = np.dtype([ ('image', 'U16'), ('x y z', int, (3,)), ('temperature', float), ]) #+end_src or #+begin_src python dtype = np.dtype([ ('image', 'U16'), ('x', int), ('y', int), ('z', int), ('temperature', float), ]) #+end_src The latter would read =x=, =y=, =z= into separate, individual arrays. Sometime we want this, sometimes not. - Nested structured dtypes are not allowed. Fields inside other fields are not supported, since it's not clear how to map that to a flat vnlog legend - If a structured dtype is given, =slurp()= returns the array only, since the field names are already available in the dtype * numpy interface If we need to read data into numpy specifically, nicer tools are available than the generic =vnlog= Python module. The built-in =numpy.loadtxt= =numpy.savetxt= functions work well (with the caveat that =numpy.loadtxt()= should be followed by =numpysane.atleast_dims(..., -2)= to make sure that a data array of shape =(Nrows,Ncols)= is returned even if =Nrows==1=. For example to write to standard output a vnlog with fields =a=, =b= and =c=: #+BEGIN_SRC python numpy.savetxt(sys.stdout, array, fmt="%g", header="a b c") #+END_SRC Note that numpy automatically adds the =#= to the header. To read a vnlog from a file on disk, do something like #+BEGIN_SRC python array = numpysane.atleast_dims(numpy.loadtxt('data.vnl'), -2) #+END_SRC These functions know that =#= lines are comments, but don't interpret anything as field headers. That's easy to do, so I'm not providing any helper libraries. I might do that at some point, but in the meantime, patches are welcome. * Compatibility I use GNU/Linux-based systems exclusively, but everything has been tested functional on FreeBSD and OSX in addition to Debian, Ubuntu and CentOS. I can imagine there's something I missed when testing on non-Linux systems, so please let me know if you find any issues. * Caveats and bugs These tools are meant to be simple, so some things are hard requirements. A big one is that columns are whitespace-separated. There is /no/ mechanism for escaping or quoting whitespace into a single field. I think supporting something like that is more trouble than it's worth. * Manpages ** vnl-filter #+BEGIN_EXAMPLE NAME vnl-filter - filters vnlogs to select particular rows, fields SYNOPSIS $ cat run.vnl # time x y z temperature 3 1 2.3 4.8 30 4 1.1 2.2 4.7 31 6 1 2.0 4.0 35 7 1 1.6 3.1 42 $ = 35' | vnl-align # time x y z temperature 6 1 2.0 4.0 35 7 1 1.6 3.1 42 $ 10' would select only those rows whose "size" column contains a value > 10. See the detailed description of matches expressions below for more detail. Context lines "vnl-filter" supports the context output options ("-A", "-B" and "-C") exactly like the "grep" tool. I.e to print out all rows whose "size" column contains a value > 10 *but also* include the 3 rows immediately before *and* after such matching rows, do this: vnl-filter -C3 'size > 10' "-B" reports the rows *before* matching ones and "-A" the rows *after* matching ones. "-C" reports both. Note that this applies *only* to *matches* expressions: records skipped because they fail "--has" or "--skipempty" are *not* included in contextual output. Backend choice By default, the parsing of arguments and the legend happens in perl, which then constructs a simple awk script, and invokes "mawk" to actually read the data and to process it. This is done because awk is lighter weight and runs faster, which is important because our data sets could be quite large. We default to "mawk" specifically, since this is a simpler implementation than "gawk", and runs much faster. If for whatever reason we want to do everything with perl, this can be requested with the "--perl" option. Special functions For convenience we support several special functions in any expression passed on to awk or perl (named expressions, matches expressions, "--eval" strings). These generally maintain some internal state, and vnl-filter makes sure that this state is consistent. Note that these are evaluated *after* "--skipcomments" and "--has". So any record skipped because of a "--has" expression, for instance, will *not* be considered in prev(), diff() and so on. * rel(x) returns value of "x" relative to the first value of "x". For instance we might want to see the time or position relative to the start, not relative to some absolute beginning. Example: $ cat tst.vnl # time x 100 200 101 212 102 209 $ ) # read each line { chomp; next unless matches; # skip non-matching lines evalexpr(); } --function|--sub Evaluates the given expression as a function that can be used in other expressions. This is most useful when you want to print something that can't trivially be written as a simple expression. For instance: $ cat tst.vnl # s 1-2 3-4 5-6 $ < tst.vnl vnl-filter --function 'before(x) { sub("-.*","",x); return x }' \ --function 'after(x) { sub(".*-","",x); return x }' \ -p 'b=before(s),a=after(s)' # b a 1 2 3 4 5 6 See the CAVEATS section below if you're doing something sufficiently-complicated where you need this. --function-abs|--sub-abs Convenience option to add an absolute-value abs() function. This is only useful for awk programs (the default, no "--perl" given) since perl already provides abs() by default. --begin|--BEGIN Evaluates the given expression in the BEGIN {} block of the generated awk (or perl) program. --end|--END Evaluates the given expression in the END {} block of the generated awk (or perl) program. --[no]skipempty Do [not] skip records where all fields are blank. By default we *do* skip all empty records; to include them, pass "--noskipempty" --skipcomments Don't output non-legend comments --perl By default all procesing is performed by "mawk", but if for whatever reason we want perl instead, pass "--perl". Both modes work, but "mawk" is noticeably faster. "--perl" could be useful because it is more powerful, which could be important since a number of things pass commandline strings directly to the underlying language (named expressions, matches expressions, "--eval" strings). Note that while variables in perl use sigils, column references should *not* use sigils. To print the sum of all values in column "a" you'd do this in awk vnl-filter --eval '{suma += a} END {print suma}' and this in perl vnl-filter --perl --eval '{$suma += a} END {say $suma}' The perl strings are evaluated without "use strict" or "use warnings" so I didn't have to declare $suma in the example. With "--perl", empty strings ("-" in the vnlog file) are converted to "undef". --dumpexprs Used for debugging. This spits out all the final awk (or perl) program we run for the given commandline options and given input. This is the final program, with the column references resolved to numeric indices, so one can figure out what went wrong. --unbuffered Flushes each line after each print. This makes sure each line is output as soon as it is available, which is crucial for realtime output and streaming plots. --stream Synonym for "--unbuffered" CAVEATS This tool is very lax in its input validation (on purpose). As a result, columns with names like %CPU and "TIME+" do work (i.e. you can more or less feed in output from "top -b"). The downside is that shooting yourself in the foot is possible. This tradeoff is currently tuned to be very permissive, which works well for my use cases. I'd be interested in hearing other people's experiences. Potential pitfalls/unexpected behaviors: * All column names are replaced in all eval strings without regard to context. The earlier example that reports the sum of values in a column: vnl-filter --eval '{suma += a} END {print suma}' will work fine if we *do* have a column named "a" and do *not* have a column named "suma". But will not do the right thing if any of those are violated. For instance, if a column "a" doesn't exist, then "awk" would see "suma += a" instead of something like "suma += $5". "a" would be an uninitialized variable, which evaluates to 0, so the full "vnl-filter" command would not fail, but would print 0 instead. It's the user's responsibility to make sure we're talking about the right columns. The focus here was one-liners so hopefully nobody has so many columns, they can't keep track of all of them in their head. I don't see any way to resolve this without seriously impacting the scope of the tool, so I'm leaving this alone. * It is natural to use vnlog as a database. You can run queries with something like vnl-filter 'key == 5' This works. But unlike a real database this is clearly a linear lookup. With large data files, this would be significantly slower than the logarithmic searches provided by a real database. The meaning of "large" and "significant" varies, and you should test it. In my experience vnlog "databases" scale surprisingly well. But at some point, importing your data to something like sqlite is well worth it. * When substituting column names I match *either* a word-nonword transition ("\b") *or* a whitespace-nonword transition. The word boundaries is what would be used 99% of the time. But the keys may have special characters in them, which don't work with "\b". This means that whitespace becomes important: "1+%CPU" will not be parsed as expected, which is correct since "+%CPU" is also a valid field name. But "1+ %CPU" will be parsed correctly, so if you have weird field names, put the whitespace into your expressions. It'll make them more readable anyway. * Strings passed to "-p" are split on "," *except* if the "," is inside balanced "()". This makes it possible to say things like vnl-filter --function 'f(a,b) { ... }' -p 'c=f(a,b)'. This is probably the right behavior, although some questionable looking field names become potentially impossible: "f(a" and "b)" *could* otherwise be legal field names, but you're probably asking for trouble if you do that. * Currently there're two modes: a pick/print mode and an "--eval" mode. Then there's also "--function", which adds bits of "--eval" to the pick/print mode, but it feels maybe insufficient. I don't yet have strong feelings about what this should become. Comments welcome #+END_EXAMPLE ** vnl-align #+BEGIN_EXAMPLE NAME vnl-align - aligns vnlog columns for easy interpretation by humans SYNOPSIS $ cat tst.vnl # w x y z -10 40 asdf - -20 50 - 0.300000 -30 10 whoa 0.500000 $ vnl-align tst.vnl # w x y z -10 40 asdf - -20 50 - 0.300000 -30 10 whoa 0.500000 DESCRIPTION The basic usage is vnl-align logfile The arguments are assumed to be the vnlog files. If no arguments are given, the input comes from STDIN. This is very similar to "column -t", but handles "#" lines properly: 1. The first "#" line is the legend. For the purposes of alignment, the leading "#" character and the first column label are treated as one column 2. All other "#" lines are output verbatim. #+END_EXAMPLE ** vnl-sort #+BEGIN_EXAMPLE NAME vnl-sort - sorts an vnlog file, preserving the legend SYNOPSIS $ cat a.vnl # a b AA 11 bb 12 CC 13 dd 14 dd 123 Sort lexically by a: $ 2. See below for details. Past that, everything "join" does is supported, so see that man page for detailed documentation. Note that all non-legend comments are stripped out, since it's not obvious where they should end up. Field names in the output By default, the field names in the output match those in the input. This is what you want most of the time. It is possible, however that a column name adjustment is needed. One common use case for this is if the files being joined have identically-named columns, which would produce duplicate columns in the output. Example: we fixed a bug in a program, and want to compare the results before and after the fix. The program produces an x-y trajectory as a function of time, so both the bugged and the bug-fixed programs produce a vnlog with a legend # time x y Joining this on "time" will produce a vnlog with a legend # time x y x y which is confusing, and *not* what you want. Instead, we invoke "vnl-join" as vnl-join --vnl-suffix1 _buggy --vnl-suffix2 _fixed -j time buggy.vnl fixed.vnl And in the output we get a legend # time x_buggy y_buggy x_fixed y_fixed Much better. Note that "vnl-join" provides several ways of specifying this. The above works *only* for 2-way joins. An alternate syntax is available for N-way joins, a comma-separated list. The same could be expressed like this: vnl-join -a- --vnl-suffix _buggy,_fixed -j time buggy.vnl fixed.vnl Finally, if passing in structured filenames, "vnl-join" can infer the desired syntax from the filenames. The same as above could be expressed even simpler: vnl-join --vnl-autosuffix -j time buggy.vnl fixed.vnl This works by looking at the set of passed in filenames, and stripping out the common leading and trailing strings. Sorting of input and output The GNU coreutils "join" tool expects sorted columns because it can then take only a single pass through the data. If the input isn't sorted, then we can use normal shell substitutions to sort it: $ vnl-join -j key <(vnl-sort -s -k key a.vnl) <(vnl-sort -s -k key b.vnl) For convenience "vnl-join" provides a "--vnl-sort" option. This allows the above to be equivalently expressed as $ vnl-join -j key --vnl-sort - a.vnl b.vnl The "-" after the "--vnl-sort" indicates that we want to sort the *input* only. If we also want to sort the output, pass the short codes "sort" accepts instead of the "-". For instance, to sort the input for "join" and to sort the output numerically, in reverse, do this: $ vnl-join -j key --vnl-sort rg a.vnl b.vnl The reason this shorthand exists is to work around a quirk of "join". The sort order is *assumed* by "join" to be lexicographical, without any way to change this. For "sort", this is the default sort order, but "sort" has many options to change the sort order, options which are sorely missing from "join". A real-world example affected by this is the joining of numerical data. If you have "a.vnl": # time a 8 a 9 b 10 c and "b.vnl": # time b 9 d 10 e Then you cannot use "vnl-join" directly to join the data on time: $ vnl-join -j time a.vnl b.vnl # time a b join: /dev/fd/4:3: is not sorted: 10 c join: /dev/fd/5:2: is not sorted: 10 e 9 b d 10 c e Instead you must re-sort both files lexicographically, *and* then (because you almost certainly want to) sort it back into numerical order: $ vnl-join -j time <(vnl-sort -s -k time a.vnl) <(vnl-sort -s -k time b.vnl) | vnl-sort -s -n -k time # time a b 9 b d 10 c e Yuck. The shorthand described earlier makes the interface part of this palatable: $ vnl-join -j time --vnl-sort n a.vnl b.vnl # time a b 9 b d 10 c e Note that the input sort is stable: "vnl-join" will invoke "vnl-sort -s". If you want a stable post-sort, you need to ask for it with "--vnl-sort s...". N-way joins The GNU coreutils "join" tool is inherently designed to join *exactly* two files. "vnl-join" extends this capability by chaining together a number of "join" invocations to produce a generic N-way join. This works exactly how you would expect with the following caveats: * Full outer joins are supported by passing "-a-", but no other "-a" option is supported. This is possible, but wasn't obviously worth the trouble. * "-v" is not supported. Again, this is possible, but wasn't obviously worth the trouble. * Similarly, "-o" is not supported. This is possible, but wasn't obviously worth the trouble, especially since the desired behavior can be obtained by post-processing with "vnl-filter". BUGS AND CAVEATS The underlying "sort" tool assumes lexicographic ordering, and matches fields purely based on their textual contents. This means that for the purposes of joining, 10, 10.0 and 1.0e1 are all considered different. If needed, you can normalize your keys with something like this: vnl-filter -p x='sprintf("%f",x)' COMPATIBILITY I use GNU/Linux-based systems exclusively, but everything has been tested functional on FreeBSD and OSX in addition to Debian, Ubuntu and CentOS. I can imagine there's something I missed when testing on non-Linux systems, so please let me know if you find any issues. SEE ALSO join(1) #+END_EXAMPLE ** vnl-tail #+BEGIN_EXAMPLE NAME vnl-tail - tail a log file, preserving the legend SYNOPSIS $ read_temperature | tee temp.vnl # temperature 29.5 30.4 28.3 22.1 ... continually produces data ... at the same time, in another terminal $ vnl-tail -f temp.vnl # temperature 28.3 22.1 ... outputs data as it comes in DESCRIPTION Usage: vnl-tail [options] logfile logfile logfile ... < logfile This tool runs "tail" on given vnlog files in various ways. "vnl-tail" is a wrapper around the GNU coreutils "tail" tool. Since this is a wrapper, most commandline options and behaviors of the "tail" tool are present; consult the tail(1) manpage for detail. The differences from GNU coreutils "tail" are * The input and output to this tool are vnlog files, complete with a legend * "-c" is not supported because vnlog really doesn't want to break up lines * "--zero-terminated" is not supported because vnlog assumes newline-separated records * By default we call the "tail" tool to do the actual work. If the underlying tool has a different name or lives in an odd path, this can be specified by passing "--vnl-tool TOOL" Past that, everything "tail" does is supported, so see that man page for detailed documentation. COMPATIBILITY I use GNU/Linux-based systems exclusively, but everything has been tested functional on FreeBSD and OSX in addition to Debian, Ubuntu and CentOS. I can imagine there's something I missed when testing on non-Linux systems, so please let me know if you find any issues. SEE ALSO tail(1) #+END_EXAMPLE ** vnl-ts #+BEGIN_EXAMPLE NAME vnl-ts - add a timestamp to a vnlog stream SYNOPSIS $ read_temperature # temperature 29.5 30.4 28.3 22.1 ... continually produces data at 1Hz $ read_temperature | vnl-ts -s %.s # time-rel temperature 0.013893 30.2 1.048695 28.6 2.105592 29.3 3.162873 22.0 ... DESCRIPTION Usage: vnl-ts [-i | -s] [-m] [--vnl-field t] format < pipe This tool runs "ts" on given vnlog streams. "vnl-ts" is a wrapper around the "ts" tool from Joey Hess's moreutils toolkit. Since this is a wrapper, most commandline options and behaviors of the "ts" tool are present; consult the ts(1) manpage for details. The differences from "ts" are * The input and output to this tool are vnlog files, complete with a legend * The format *must* be passed-in by the user; no default is assumed. * The given format *must not* contain whitespace, so that it fits a single vnlog field. * "-r" is not supported: it assumes input timestamps with whitespace, which is incompatible with vnlog * A "vnl-ts"-specific option "--vnl-field" is available to set the name of the new field. If omitted, a reasonable default will be used. * By default we call the "ts" tool to do the actual work. If the underlying tool has a different name or lives in an odd path, this can be specified by passing "--vnl-tool TOOL" Past that, everything "ts" does is supported, so see that man page for detailed documentation. COMPATIBILITY By default this calls the tool named "ts". At least on FreeBSD, it's called "moreutils-ts", so on such systems you should invoke "vnl-ts --vnl-tool moreutils-ts ..." I use GNU/Linux-based systems exclusively, but everything has been tested functional on FreeBSD and OSX in addition to Debian, Ubuntu and CentOS. I can imagine there's something I missed when testing on non-Linux systems, so please let me know if you find any issues. SEE ALSO ts(1) #+END_EXAMPLE ** vnl-uniq #+BEGIN_EXAMPLE NAME vnl-uniq - uniq a log file, preserving the legend SYNOPSIS $ cat colors.vnl # color blue yellow yellow blue yellow orange orange $ < colors.vnl | vnl-sort | vnl-uniq -c # count color 2 blue 2 orange 3 yellow DESCRIPTION Usage: vnl-uniq [options] < logfile This tool runs "uniq" on a given vnlog dataset. "vnl-uniq" is a wrapper around the GNU coreutils "uniq" tool. Since this is a wrapper, most commandline options and behaviors of the "uniq" tool are present; consult the uniq(1) manpage for detail. The differences from GNU coreutils "uniq" are * The input and output to this tool are vnlog files, complete with a legend * "--zero-terminated" is not supported because vnlog assumes newline-separated records * Only *one* input is supported (a file on the cmdline or data on standard input), and the output *always* goes to standard output. Specifying the output as a file on the commandline is not supported. * "--vnl-count NAME" can be given to name the "count" column. "-c" is still supported to add the default new column named "count", but if another name is wanted, "--vnl-count" does that. "--vnl-count" implies "-c" * In addition to the normal behavior of skipping fields at the start, "-f" and "--skip-fields" can take a negative argument to skip the *all but the last* N fields. For instance, to use only the one last field, pass "-f -1" or "--skip-fields=-1". * By default we call the "uniq" tool to do the actual work. If the underlying tool has a different name or lives in an odd path, this can be specified by passing "--vnl-tool TOOL" Past that, everything "uniq" does is supported, so see that man page for detailed documentation. COMPATIBILITY I use GNU/Linux-based systems exclusively, but everything has been tested functional on FreeBSD and OSX in addition to Debian, Ubuntu and CentOS. I can imagine there's something I missed when testing on non-Linux systems, so please let me know if you find any issues. SEE ALSO uniq(1) #+END_EXAMPLE ** vnl-gen-header #+BEGIN_EXAMPLE NAME vnl-gen-header - create definition for vnlog output from C SYNOPSIS $ vnl-gen-header 'int w' 'uint8_t x' 'char* y' 'double z' > vnlog_fields_generated.h DESCRIPTION We provide a simple C library to produce vnlog output. The fields this library outputs must be known at compile time, and are specified in a header created by this tool. Please see the vnlog documentation for instructions on how to use the library ARGUMENTS This tool needs to be given a list of field definitions. First we look at the commandline, and if the definitions are not available there, we look on STDIN. Each definition is a string "type name" (one def per argument on the commandline or per line on STDIN). If reading from STDIN, we ignore blank lines, and treat any line starting with "#" as a comment. Each def represents a single output field. Each such field spec in a C-style variable declaration with a type followed by a name. Note that these field specs contain whitespace, so each one must be quoted before being passed to the shell. The types can be basic scalars, possibly with set widths ("char", "double", "int", "uint32_t", "unsigned int", ...), a NULL-terminated string ("char*") or a generic chunk of binary data ("void*"). The names must consist entirely of letters, numbers or "_", like variables in C. #+END_EXAMPLE ** vnl-make-matrix #+BEGIN_EXAMPLE NAME vnl-make-matrix - create a matrix from a one-point-per-record vnlog SYNOPSIS $ cat /tmp/dat.vnl # i j x 0 0 1 0 1 2 0 2 3 1 0 4 1 1 5 1 2 6 2 0 7 2 1 8 2 2 9 3 0 10 3 1 11 3 2 12 $ 10' --has col6 #+END_SRC will read the input, and produce a vnlog with 3 columns: =col1= and =col2= from the input, and a column =colx= that's the sum of =col3= and =col4= in the input. Only those rows for which /both/ =col5 > 10= is true /and/ that have a non-null value for =col6= will be output. A null entry is signified by a single =-= character. #+BEGIN_SRC sh :results none :exports code vnl-filter --eval '{s += x} END {print s}' #+END_SRC #+RESULTS: will evaluate the given awk program on the input, but the column names work as you would hope they do: if the input has a column named =x=, this would produce the sum of all values in this column. - =vnl-sort=, =vnl-uniq=, =vnl-join=, =vnl-tail=, =vnl-ts= are wrappers around the corresponding commandline tools. These work exactly as you would expect also: the columns can be referenced by name, and the legend comment is handled properly. These are wrappers, so all the commandline options those tools have "just work" (except options that don't make sense in the context of vnlog). As an example, =vnl-tail -f= will follow a log: data will be read by =vnl-tail= as it is written into the log (just like =tail -f=, but handling the legend properly). And you already know how to use these tools without even reading the manpages! Note: I use the Linux kernel and the tools from GNU Coreutils exclusively, but this all has been successfully tested on FreeBSD and OSX also. Please let me know if something doesn't work. - =vnl-align= aligns vnlog columns for easy interpretation by humans. The meaning is unaffected - =Vnlog::Parser= is a simple perl library to read a vnlog - =vnlog= is a simple python library to read a vnlog. Both python2 and python3 are supported - =libvnlog= is a C library to simplify reading and writing a vnlog. Clearly all you /really/ need for writing is =printf()=, but this is useful if we have lots of columns, many containing null values in any given row, and/or if we have parallel threads writing to a log. In my usage I have hundreds of columns of sparse data, so this is handy - =vnl-make-matrix= converts a one-point-per-line vnlog to a matrix of data. I.e. #+BEGIN_EXAMPLE $ cat dat.vnl # i j x 0 0 1 0 1 2 0 2 3 1 0 4 1 1 5 1 2 6 2 0 7 2 1 8 2 2 9 3 0 10 3 1 11 3 2 12 $ < dat.vnl vnl-filter -p i,x | vnl-make-matrix --outdir /tmp Writing to '/tmp/x.matrix' $ cat /tmp/x.matrix 1 2 3 4 5 6 7 8 9 10 11 12 #+END_EXAMPLE All the tools have manpages that contain more detail. And more tools will probably be added with time. * Format details The high-level description of the vnlog format from [[#Summary][above]] is sufficient to read/write "normal" vnlog data, but there are a few corner cases that should be mentioned. To reiterate, the format description from above describes vnlog as: - A whitespace-separated table of ASCII human-readable text - A =#= character starts a comment that runs to the end of the line (like in many scripting languages) - The first line that begins with a single =#= (not =##= or =#!=) is a /legend/, naming each column. This is required, and the field names that appear here are referenced by all the tools. - Empty fields reported as =-= For a few years now I've been using these tools myself, and supporting others as they were passing vnlog data around. In the process I've encountered some slightly-weird data, and patched the tools to accept it. So today the included vnlog tools are /very/ permissive, and accept any vnlog data that can possibly be accepted. Other vnlog tools may not be quite as permissive, and may not be able to interpret "weird" data. Points of note, describing the included vnlog tools: - Leading and trailing whitespace is ignored. Everywhere. So this data file will be read properly, with the =x= column containing 1 and 3: #+begin_example # x y 1 2 3 4 #+end_example - Empty (or whitespace-only) lines anywhere are ignored, and treated as a comment - An initial =#= comment without field names is treated as a comment, and we continue looking for the legend in the following lines. So this data file will be read properly: #+begin_example ## comment # # x y 1 2 3 4 #+end_example - Trailing comments are supported, like in most scripting languages. So this data file will be read properly: #+begin_example # x y 1 2 # comment 3 4 #+end_example - Field names are /very/ permissive: anything that isn't whitespace is supported. So this data file will be read properly: #+begin_example # x y # 1+ - 1 2 3 4 5 11 12 13 14 15 #+end_example We can pull out the =#= and =1+= and =-= columns: #+begin_src sh vnl-filter -p '#,1+,-' #+end_src And we can even operate on them, if we use whitespace to indicate field boundaries: #+begin_src sh vnl-filter -p 'x=1+ + 5' #+end_src Note that this implies that trailing comments in a legend line are /not/ supported: the extra =#= characters will be used for field names. Field names containing =,= or === are currently not accepted by =vnl-filter=, but /are/ accepted by the other tools (=vnl-sort= and such). I'll make =vnl-filter= able to work with those field names too, eventually, but as a user, the simplest thing to do is to not pass around data with such field names. - Duplicated labels are supported whenever possible. So #+begin_example # x y z z 1 2 3 4 11 12 13 14 #+end_example will work just fine, unless we're operating on =z=. With this data, both of these commands work: #+begin_src sh vnl-filter -p x vnl-filter -p z #+end_src Picking =z= selects both of the =z= columns. But neither of these commands can work with the non-unique =z= column: #+begin_src sh vnl-filter -p s=z+1 vnl-sort -k z #+end_src * Workflows and recipes ** Storing disjoint data A common use case is a complex application that produces several semi-related subsets of data at once. Example: a moving vehicle is reporting both its own position and the observed positions of other vehicles; at any given time any number of other vehicles may be observed. Two equivalent workflows are possible: - a single unified vnlog stream for /all/ the data - several discrete vnlog streams for each data subset Both are valid approaches *** One unified vnlog stream Here the application produces a /single/ vnlog that contains /all/ the columns, from /all/ the data subsets. In any given row, many of the columns will be empty (i.e. contain only =-= ). For instance, a row describing a vehicle own position will not have data about any observations, and vice versa. It is inefficient to store all the extra =-= but it makes many things much nicer, so it's often worth it. =vnl-filter= can be used to pull out the different subsets. Sample =joint.vnl=: #+BEGIN_EXAMPLE # time x_self x_observation 1 10 - 2 20 - 2 - 100 3 30 - 3 - 200 3 - 300 #+END_EXAMPLE Here we have 3 instances in time. We have no observations at =time= 1, one observation at =time= 2, and two observations at =time= 3. We can use =vnl-filter= to pull out the data we want: #+BEGIN_EXAMPLE $ < joint.vnl vnl-filter -p time,self # time x_self 1 10 2 20 2 - 3 30 3 - 3 - #+END_EXAMPLE If we only care about our own positions, the =+= modifier in picked columns in =vnl-filter= is very useful here: #+BEGIN_EXAMPLE $ < joint.vnl vnl-filter -p time,+self # time x_self 1 10 2 20 3 30 $ < joint.vnl vnl-filter -p time,+observation # time x_observation 2 100 3 200 3 300 #+END_EXAMPLE Note that the default is =--skipempty=, so if we're /only/ looking at =x_self= for instance, then we don't even need to =+= modifier: #+begin_example $ < joint.vnl vnl-filter -p self # x_self 10 20 30 #+end_example Also, note that the =vnlog= C interface works very nicely to produce these datafiles: - You can define lots and lots of columns, but only fill some of them before calling =vnlog_emit_record()=. The rest will be set to =-=. - You can create multiple contexts for each type of data, and you can populate them with data independently. And when calling =vnlog_emit_record_ctx()=, you'll get a record with data for just that context. *** Several discrete vnlog streams Conversely, the application can produce /separate/ vnlog streams for /each/ subset of data. Depending on what is desired, exactly, =vnl-join= can be used to re-join them: #+BEGIN_EXAMPLE $ cat self.vnl # time x_self 1 10 2 20 3 30 $ cat observations.vnl # time x_observation 2 100 3 200 3 300 $ vnl-join -j time -a- self.vnl observations.vnl # time x_self x_observation 1 10 - 2 20 100 3 30 200 3 30 300 #+END_EXAMPLE ** Data statistics A common need is to compute basic statistics from your data. Many of the alternative toolkits listed above provide built-in facilities to do this, but vnlog does not: it's meant to be unixy, where each tool has very limited scope. Thus you can either do this with =awk= like you would normally, or you can use other standalone tools to perform the needed computations. For instance, I can generate some data: #+BEGIN_EXAMPLE $ seq 2 100 | awk 'BEGIN {print "# x"} {print log($1)}' > /tmp/log.vnl #+END_EXAMPLE Then I can compute the mean with =awk=: #+BEGIN_EXAMPLE $ < /tmp/log.vnl vnl-filter --eval '{sum += x} END {print sum/NR}' 3.67414 #+END_EXAMPLE Or I can compute the mean (and other stuff) with a separate standalone tool: #+BEGIN_EXAMPLE $ < /tmp/log.vnl ministat x +----------------------------------------------------------------------------+ | xx | | x xxxxxxx | | xx xxxxxxxxxxxx| | x x xxxxxxxxxxxxxxxxxxxxxxx| |x x x x x x x x x xx xx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx| | |_______________A____M___________| | +----------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 99 0.693147 4.60517 3.93183 3.6741353 0.85656382 #+END_EXAMPLE =ministat= is not a part of the vnlog toolkit, but the vnlog format is generic so it works just fine. ** Powershell-style filtering of common shell commands Everything about vnlog is generic and simple, so it's easy to use it to process data that wasn't originally meant to be used this way. For instance filtering the output of =ls -l= to report only file names and sizes, skipping directories, and sorting by file sizes: #+BEGIN_EXAMPLE $ ls -l total 320 -rw-r--r-- 1 dima dima 5044 Aug 25 15:04 Changes -rw-r--r-- 1 dima dima 12749 Aug 25 15:04 Makefile -rw-r--r-- 1 dima dima 69789 Aug 25 15:04 README.org -rw-r--r-- 1 dima dima 33781 Aug 25 15:04 README.template.org -rw-r--r-- 1 dima dima 5359 Aug 25 15:04 b64_cencode.c drwxr-xr-x 4 dima dima 4096 Aug 25 15:04 completions drwxr-xr-x 3 dima dima 4096 Aug 25 15:04 lib drwxr-xr-x 3 dima dima 4096 Aug 25 15:04 packaging drwxr-xr-x 2 dima dima 4096 Aug 25 15:04 test -rwxr-xr-x 1 dima dima 5008 Aug 25 15:04 vnl-align -rwxr-xr-x 1 dima dima 56637 Aug 25 15:04 vnl-filter -rwxr-xr-x 1 dima dima 5678 Aug 25 15:04 vnl-gen-header -rwxr-xr-x 1 dima dima 29815 Aug 25 15:04 vnl-join -rwxr-xr-x 1 dima dima 3631 Aug 25 15:04 vnl-make-matrix -rwxr-xr-x 1 dima dima 8372 Aug 25 15:04 vnl-sort -rwxr-xr-x 1 dima dima 5822 Aug 25 15:04 vnl-tail -rwxr-xr-x 1 dima dima 4439 Aug 25 15:04 vnl-ts -rw-r--r-- 1 dima dima 559 Aug 25 15:04 vnlog-base64.h -rw-r--r-- 1 dima dima 8169 Aug 25 15:04 vnlog.c -rw-r--r-- 1 dima dima 12677 Aug 25 15:04 vnlog.h $ (echo '# permissions num_links user group size month day time name'; ls -l | tail -n +2) | vnl-filter 'permissions !~ "^d"' -p name,size | vnl-sort -gk size | vnl-align # name size vnlog-base64.h 559 vnl-make-matrix 3631 vnl-ts 4439 vnl-align 5008 Changes 5044 b64_cencode.c 5359 vnl-gen-header 5678 vnl-tail 5822 vnlog.c 8169 vnl-sort 8372 vnlog.h 12677 Makefile 12749 vnl-join 29815 README.template.org 33781 vnl-filter 56637 README.org 69789 #+END_EXAMPLE With a bit of shell manipulation, these tools can be applied to a whole lot of different data streams that know nothing of vnlog. * C interface ** Writing vnlog files *** Basic usage For most uses, vnlog files are simple enough to be generated with plain prints. But then each print statement has to know which numeric column we're populating, which becomes effortful with many columns. In my usage it's common to have a large parallelized C program that's writing logs with hundreds of columns where any one record would contain only a subset of the columns. In such a case, it's helpful to have a library that can output the log files. This is available. Basic usage looks like this: In a shell: #+BEGIN_SRC sh :results none :exports code vnl-gen-header 'int w' 'uint8_t x' 'char* y' 'double z' 'void* binary' > vnlog_fields_generated.h #+END_SRC #+RESULTS: In a C program test.c: #+BEGIN_SRC C #include "vnlog_fields_generated.h" int main() { vnlog_emit_legend(); vnlog_set_field_value__w(-10); vnlog_set_field_value__x(40); vnlog_set_field_value__y("asdf"); vnlog_emit_record(); vnlog_set_field_value__z(0.3); vnlog_set_field_value__x(50); vnlog_set_field_value__w(-20); vnlog_set_field_value__binary("\x01\x02\x03", 3); vnlog_emit_record(); vnlog_set_field_value__w(-30); vnlog_set_field_value__x(10); vnlog_set_field_value__y("whoa"); vnlog_set_field_value__z(0.5); vnlog_emit_record(); return 0; } #+END_SRC Then we build and run, and we get #+BEGIN_EXAMPLE $ cc -o test test.c -lvnlog $ ./test # w x y z binary -10 40 asdf - - -20 50 - 0.2999999999999999889 AQID -30 10 whoa 0.5 - #+END_EXAMPLE The binary field in base64-encoded. This is a rarely-used feature, but sometimes you really need to log binary data for later processing, and this makes it possible. So you 1. Generate the header to define your columns 2. Call =vnlog_emit_legend()= 3. Call =vnlog_set_field_value__...()= for each field you want to set in that row. 4. Call =vnlog_emit_record()= to write the row and to reset all fields for the next row. Any fields unset with a =vnlog_set_field_value__...()= call are written as null: =-= This is enough for 99% of the use cases. Things get a bit more complex if we have have threading or if we have multiple vnlog ouput streams in the same program. For both of these we use vnlog /contexts/. *** Contexts To support independent writing into the same vnlog (possibly by multiple threads; this is reentrant), each log-writer should create a context, and use it when talking to vnlog. The context functions will make sure that the fields in each context are independent and that the output records won't clobber each other: #+BEGIN_SRC C void child_writer( // the parent context also writes to this vnlog. Pass NULL to // use the global one struct vnlog_context_t* ctx_parent ) { struct vnlog_context_t ctx; vnlog_init_child_ctx(&ctx, ctx_parent); while(records) { vnlog_set_field_value_ctx__xxx(&ctx, ...); vnlog_set_field_value_ctx__yyy(&ctx, ...); vnlog_set_field_value_ctx__zzz(&ctx, ...); vnlog_emit_record_ctx(&ctx); } vnlog_free_ctx(&ctx); // required only if we have any binary fields } #+END_SRC If we want to have multiple independent vnlog writers to /different/ streams (with different columns and legends), we do this instead: =file1.c=: #+BEGIN_SRC C #include "vnlog_fields_generated1.h" void f(void) { // Write some data out to the default context and default output (STDOUT) vnlog_emit_legend(); ... vnlog_set_field_value__xxx(...); vnlog_set_field_value__yyy(...); ... vnlog_emit_record(); } #+END_SRC =file2.c=: #+BEGIN_SRC C #include "vnlog_fields_generated2.h" void g(void) { // Make a new session context, send output to a different file, write // out legend, and send out the data struct vnlog_context_t ctx; vnlog_init_session_ctx(&ctx); FILE* fp = fopen(...); vnlog_set_output_FILE(&ctx, fp); vnlog_emit_legend_ctx(&ctx); ... vnlog_set_field_value__a(...); vnlog_set_field_value__b(...); ... vnlog_free_ctx(&ctx); // required only if we have any binary fields vnlog_emit_record(); } #+END_SRC Note that it's the user's responsibility to make sure the new sessions go to a different =FILE= by invoking =vnlog_set_output_FILE()=. Furthermore, note that the included =vnlog_fields_....h= file defines the fields we're writing to; and if we have multiple different vnlog field definitions in the same program (as in this example), then the different writers /must/ live in different source files. The compiler will barf if you try to =#include= two different =vnlog_fields_....h= files in the same source. *** Remaining APIs - =vnlog_printf(...)= and =vnlog_printf_ctx(ctx, ...)= write to a pipe like =printf()= does. This exists primarily for comments. - =vnlog_clear_fields_ctx(ctx, do_free_binary)= clears out the data in a context and makes it ready to be used for the next record. It is rare for the user to have to call this manually. The most common case is handled automatically (clearing out a context after emitting a record). One area where this is useful is when making a copy of a context: #+BEGIN_SRC C struct vnlog_context_t ctx1; // .... do stuff with ctx1 ... add data to it ... struct vnlog_context_t ctx2 = ctx1; // ctx1 and ctx2 now both have the same data, and the same pointers to // binary data. I need to get rid of the pointer references in ctx1 vnlog_clear_fields_ctx(&ctx1, false); #+END_SRC - =vnlog_free_ctx(ctx)= frees memory for an vnlog context. Do this before throwing the context away. Currently this is only needed for context that have binary fields, but this should be called for all contexts anyway, in case this changes in a later revision ** Reading vnlog files The basic usage goes like this: #+begin_src c #include #include #include bool parse_vnlog(const char* filename) { FILE* fp = fopen(filename); if(fp == NULL) return false; vnlog_parser_t ctx; if(VNL_OK != vnlog_parser_init(&ctx, fp)) return false; // String in the "time" column for the most-recently-parsed row const char*const* time_record = vnlog_parser_record_from_key(&ctx, "time"); if(time_record == NULL) { vnlog_parser_free(&ctx); return false; } int i_record = 0; vnlog_parser_result_t result; while(VNL_OK == (result = vnlog_parser_read_record(&ctx, fp))) { for(int i=0; i array(['image1.png', 'image2.png'], dtype=' array([[1, 2, 5], [3, 4, 1]]) print(arr['temperature']) ---> array([34., 35.]) #+end_example Notes: - The given structured dtype defines both how to organize the data, and which data to extract. So it can be used to read in only a subset of the available columns. Here I could have omitted the 'temperature' column, for instance - Sub-arrays are allowed. In the example I could say either #+begin_src python dtype = np.dtype([ ('image', 'U16'), ('x y z', int, (3,)), ('temperature', float), ]) #+end_src or #+begin_src python dtype = np.dtype([ ('image', 'U16'), ('x', int), ('y', int), ('z', int), ('temperature', float), ]) #+end_src The latter would read =x=, =y=, =z= into separate, individual arrays. Sometime we want this, sometimes not. - Nested structured dtypes are not allowed. Fields inside other fields are not supported, since it's not clear how to map that to a flat vnlog legend - If a structured dtype is given, =slurp()= returns the array only, since the field names are already available in the dtype * numpy interface If we need to read data into numpy specifically, nicer tools are available than the generic =vnlog= Python module. The built-in =numpy.loadtxt= =numpy.savetxt= functions work well (with the caveat that =numpy.loadtxt()= should be followed by =numpysane.atleast_dims(..., -2)= to make sure that a data array of shape =(Nrows,Ncols)= is returned even if =Nrows==1=. For example to write to standard output a vnlog with fields =a=, =b= and =c=: #+BEGIN_SRC python numpy.savetxt(sys.stdout, array, fmt="%g", header="a b c") #+END_SRC Note that numpy automatically adds the =#= to the header. To read a vnlog from a file on disk, do something like #+BEGIN_SRC python array = numpysane.atleast_dims(numpy.loadtxt('data.vnl'), -2) #+END_SRC These functions know that =#= lines are comments, but don't interpret anything as field headers. That's easy to do, so I'm not providing any helper libraries. I might do that at some point, but in the meantime, patches are welcome. * Compatibility I use GNU/Linux-based systems exclusively, but everything has been tested functional on FreeBSD and OSX in addition to Debian, Ubuntu and CentOS. I can imagine there's something I missed when testing on non-Linux systems, so please let me know if you find any issues. * Caveats and bugs These tools are meant to be simple, so some things are hard requirements. A big one is that columns are whitespace-separated. There is /no/ mechanism for escaping or quoting whitespace into a single field. I think supporting something like that is more trouble than it's worth. * Manpages ** vnl-filter #+BEGIN_EXAMPLE xxx-manpage-vnl-filter-xxx #+END_EXAMPLE ** vnl-align #+BEGIN_EXAMPLE xxx-manpage-vnl-align-xxx #+END_EXAMPLE ** vnl-sort #+BEGIN_EXAMPLE xxx-manpage-vnl-sort-xxx #+END_EXAMPLE ** vnl-join #+BEGIN_EXAMPLE xxx-manpage-vnl-join-xxx #+END_EXAMPLE ** vnl-tail #+BEGIN_EXAMPLE xxx-manpage-vnl-tail-xxx #+END_EXAMPLE ** vnl-ts #+BEGIN_EXAMPLE xxx-manpage-vnl-ts-xxx #+END_EXAMPLE ** vnl-uniq #+BEGIN_EXAMPLE xxx-manpage-vnl-uniq-xxx #+END_EXAMPLE ** vnl-gen-header #+BEGIN_EXAMPLE xxx-manpage-vnl-gen-header-xxx #+END_EXAMPLE ** vnl-make-matrix #+BEGIN_EXAMPLE xxx-manpage-vnl-make-matrix-xxx #+END_EXAMPLE * Repository https://github.com/dkogan/vnlog/ * Authors Dima Kogan (=dima@secretsauce.net=) wrote this toolkit for his work at the Jet Propulsion Laboratory, and is delighted to have been able to release it publically Chris Venter (=chris.venter@gmail.com=) wrote the base64 encoder * License and copyright This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. Copyright 2016-2017 California Institute of Technology Copyright 2017-2018 Dima Kogan (=dima@secretsauce.net=) =b64_cencode.c= comes from =cencode.c= in the =libb64= project. It is written by Chris Venter (=chris.venter@gmail.com=) who placed it in the public domain. The full text of the license is in that file. vnlog-1.40/b64_cencode.c000066400000000000000000000123571475400722300150160ustar00rootroot00000000000000/* This is cencode.c from libb64. It is written by Chris Venter chris.venter@gmail.com http://rocketpod.blogspot.com who placed it in the public domain. The full text of the license and the source is below. I'm copying this source instead of linking to the library because 1. the library hard-codes the line length (CHARS_PER_LINE), and I need the whole string to appear in a single line 2. I want to make sure that I don't overrun my buffer, so base64_encode_block() takes in the buffer length */ /* Copyright-Only Dedication (based on United States law) or Public Domain Certification The person or persons who have associated work with this document (the "Dedicator" or "Certifier") hereby either (a) certifies that, to the best of his knowledge, the work of authorship identified is in the public domain of the country from which the work is published, or (b) hereby dedicates whatever copyright the dedicators holds in the work of authorship identified below (the "Work") to the public domain. A certifier, moreover, dedicates any copyright interest he may have in the associated work, and for these purposes, is described as a "dedicator" below. A certifier has taken reasonable steps to verify the copyright status of this work. Certifier recognizes that his good faith efforts may not shield him from liability if in fact the work certified is not in the public domain. Dedicator makes this dedication for the benefit of the public at large and to the detriment of the Dedicator's heirs and successors. Dedicator intends this dedication to be an overt act of relinquishment in perpetuity of all present and future rights under copyright law, whether vested or contingent, in the Work. Dedicator understands that such relinquishment of all rights includes the relinquishment of all rights to enforce (by lawsuit or otherwise) those copyrights in the Work. Dedicator recognizes that, once placed in the public domain, the Work may be freely reproduced, distributed, transmitted, used, modified, built upon, or otherwise exploited by anyone for any purpose, commercial or non-commercial, and in any way, including by methods that have not yet been invented or conceived. */ /* cencoder.c - c source to a base64 encoding algorithm implementation This is part of the libb64 project, and has been placed in the public domain. For details, see http://sourceforge.net/projects/libb64 */ #include "vnlog-base64.h" typedef enum { step_A, step_B, step_C } base64_encodestep; typedef struct { base64_encodestep step; char result; } base64_encodestate; static void base64_init_encodestate(base64_encodestate* state_in) { state_in->step = step_A; state_in->result = 0; } static char base64_encode_value(char value_in) { static const char* encoding = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; if (value_in > 63) return '='; return encoding[(int)value_in]; } static int base64_encode_block(const char* plaintext_in, int length_in, char* code_out, base64_encodestate* state_in) { const char* plainchar = plaintext_in; const char* const plaintextend = plaintext_in + length_in; char* codechar = code_out; char result; char fragment; result = state_in->result; switch (state_in->step) { while (1) { case step_A: if (plainchar == plaintextend) { state_in->result = result; state_in->step = step_A; return codechar - code_out; } fragment = *plainchar++; result = (fragment & 0x0fc) >> 2; *codechar++ = base64_encode_value(result); result = (fragment & 0x003) << 4; case step_B: if (plainchar == plaintextend) { state_in->result = result; state_in->step = step_B; return codechar - code_out; } fragment = *plainchar++; result |= (fragment & 0x0f0) >> 4; *codechar++ = base64_encode_value(result); result = (fragment & 0x00f) << 2; case step_C: if (plainchar == plaintextend) { state_in->result = result; state_in->step = step_C; return codechar - code_out; } fragment = *plainchar++; result |= (fragment & 0x0c0) >> 6; *codechar++ = base64_encode_value(result); result = (fragment & 0x03f) >> 0; *codechar++ = base64_encode_value(result); } } /* control should not reach here */ return codechar - code_out; } static int base64_encode_blockend(char* code_out, base64_encodestate* state_in) { char* codechar = code_out; switch (state_in->step) { case step_B: *codechar++ = base64_encode_value(state_in->result); *codechar++ = '='; *codechar++ = '='; break; case step_C: *codechar++ = base64_encode_value(state_in->result); *codechar++ = '='; break; case step_A: break; } return codechar - code_out; } int vnlog_base64_encode( char* dst, int dstlen, const char* src, int srclen ) { if(srclen <= 0) { if(dstlen > 0) *dst = '\0'; return 0; } base64_encodestate s; base64_init_encodestate(&s); const int dstlen_needed = vnlog_base64_dstlen_to_encode(srclen); if(dstlen < dstlen_needed) return -1; int len = base64_encode_block(src, srclen, dst, &s); len += base64_encode_blockend(&dst[len], &s); dst[len] = '\0'; // The number of bytes in the output (not including the trailing '\0') is // returned return len; } vnlog-1.40/choose_mrbuild.mk000066400000000000000000000014651475400722300161240ustar00rootroot00000000000000# Use the local mrbuild or the system mrbuild or tell the user how to download # it ifneq (,$(wildcard mrbuild/)) MRBUILD_MK=mrbuild MRBUILD_BIN=mrbuild/bin else ifneq (,$(wildcard /usr/include/mrbuild/Makefile.common.header)) MRBUILD_MK=/usr/include/mrbuild MRBUILD_BIN=/usr/bin else V := 1.13 SHA512 := 7a1422026cdbe12cea6882c3b76087dcc4c1d258369ec2abb941a779a539253a37850563f801049d570e8ad342722030fd918114388b5155a89644491a221f16 URL := https://github.com/dkogan/mrbuild/archive/refs/tags/v$V.tar.gz TARGZ := mrbuild-$V.tar.gz cmd := wget -O $(TARGZ) ${URL} && sha512sum --quiet --strict -c <(echo $(SHA512) $(TARGZ)) && tar xvfz $(TARGZ) && ln -fs mrbuild-$V mrbuild $(error mrbuild not found. Either 'apt install mrbuild', or if not possible, get it locally like this: '${cmd}') endif vnlog-1.40/completions/000077500000000000000000000000001475400722300151235ustar00rootroot00000000000000vnlog-1.40/completions/bash/000077500000000000000000000000001475400722300160405ustar00rootroot00000000000000vnlog-1.40/completions/bash/vnl-filter000066400000000000000000000006271475400722300200520ustar00rootroot00000000000000complete -W ' \ --has \ --pick --print -p \ --list-columns -l \ --eval \ --begin --end \ --function \ --sub \ --function-abs \ --sub-abs \ --noskipempty \ --skipcomments \ --dumpexprs \ --perl \ --unbuffered \ --stream \ --help' vnl-filter vnlog-1.40/completions/bash/vnl-join000066400000000000000000000010671475400722300175230ustar00rootroot00000000000000complete -W ' \ -a \ -j \ -o \ -v \ -i --ignore-case \ --help \ --check-order --nocheck-order \ --vnl-prefix1 \ --vnl-prefix2 \ --vnl-suffix1 \ --vnl-suffix2 \ --vnl-prefix \ --vnl-suffix \ --vnl-autoprefix \ --vnl-autosuffix \ --vnl-sort' vnl-join vnlog-1.40/completions/bash/vnl-sort000066400000000000000000000017721475400722300175560ustar00rootroot00000000000000complete -W ' \ -d \ --dictionary-order \ -g \ --general-numeric-sort \ -M \ --month-sort \ -h \ --human-numeric-sort \ -n \ --numeric-sort \ --sort \ -V \ --version-sort \ --help \ -c \ -m \ --merge \ -T \ --temporary-directory \ -u --unique \ -f --ignore-case \ -r --reverse \ -k --key \ --check \ --random-source \ --batch-size \ --compress-program \ --debug \ -S \ --buffer-size \ --help \ --parallel \ --radixsort \ --mergesort \ --qsort \ --heapsort \ --mmap' vnl-sort vnlog-1.40/completions/bash/vnl-tail000066400000000000000000000005161475400722300175130ustar00rootroot00000000000000complete -W ' \ -n \ --lines \ --follow \ -f \ -F \ --max-unchanged-stats \ -s \ --sleep-interval \ --pid \ --retry \ --help \ --version' vnl-tail vnlog-1.40/completions/bash/vnl-ts000066400000000000000000000002361475400722300172070ustar00rootroot00000000000000complete -W ' \ -m \ -i \ -s \ --vnl-field' vnl-ts vnlog-1.40/completions/bash/vnl-uniq000066400000000000000000000007461475400722300175430ustar00rootroot00000000000000complete -W ' \ -c \ --count \ -d \ --repeated \ -D \ --all-repeated \ -f \ --skip-fields \ --group \ -i \ --ignore-case \ -s \ --skip-chars \ -u \ --unique \ -z \ --zero-terminated \ -w \ --check-chars \ --help \ --vnl-count' vnl-uniq vnlog-1.40/completions/zsh/000077500000000000000000000000001475400722300157275ustar00rootroot00000000000000vnlog-1.40/completions/zsh/_vnl-filter000066400000000000000000000032061475400722300200740ustar00rootroot00000000000000#compdef vnl-filter _arguments -S \ '*--has[list of fields that must be non-null to be selected for output]:has-list:' \ '(--eval)*'{--pick,--print,-p}'[list of fields and field expressions to be output]:pick-list:' \ {--list-columns,-l}'[list available comments]' \ '(--pick --print -p)--eval[instead of filtering, evaluate this perl/awk script]:eval-script:' \ '(--pick --print -p)--begin[evaluate expression in a BEGIN block]:begin-script:' \ '(--pick --print -p)--end[evaluate expression in an END block]:end-script:' \ '*'{--function,--sub}'[define a function to be available in the expressions]:function-expression:' \ {--function-abs,--sub-abs}'[define abs() function for awk expressions]' \ '(--eval)--noskipempty[DO output records where every field is null]' \ '(--eval)--skipcomments[Do NOT output non-legend comments]' \ '--dumpexprs[Report the expressions we would use for processing, and exit]' \ '--perl[Use perl for all the expressions instead of awk]' \ '--stream[Flush the output pipe with every record]' \ '--help' \ '*: :_guard "^-*" "Expression that must evaluate to true for a record to be selected for output"' vnlog-1.40/completions/zsh/_vnl-join000066400000000000000000000054441475400722300175540ustar00rootroot00000000000000#compdef vnl-join # This is mostly from the zsh _join file. It is copyright 1992-2014 The Zsh # Development Group. Distributed under the zsh license: # # Permission is hereby granted, without written agreement and without # licence or royalty fees, to use, copy, modify, and distribute this # software and to distribute modified versions of this software for any # purpose, provided that the above copyright notice and the following # two paragraphs appear in all copies of this software. # . # In no event shall the copy right owners liable to any party for # direct, indirect, special, incidental, or consequential damages # arising out of the use of this software and its documentation, even # if and the copyright owners have been advised of the possibility of # such damage. # . # The copyright owners specifically disclaim any warranties, including, # but not limited to, the implied warranties of merchantability and # fitness for a particular purpose. The software provided hereunder is # on an "as is" basis, and the copyright owners have no obligation to # provide maintenance, support, updates, enhancements, or # modifications. _arguments -S \ '*-a+[print unpairable lines from specified file]:file number:(1 2 -)' \ "(-1 -2)-j+[join on specified field for both files]:key" \ '-o+[use specified output format]:format string' \ '*-v+[like -a, but suppress joined output lines]:file number:(1 2 -)' \ '(-i --ignore-case)'{-i,--ignore-case}'[ignore differences in case when comparing fields]' \ '(-)--help[display help information]' \ '(--check-order --nocheck-order)'{--check-order,--nocheck-order} \ '--vnl-prefix1[prefix to add to output field labels from the first datafile]:prefix1:' \ '--vnl-prefix2[prefix to add to output field labels from the second datafile]:prefix1:' \ '--vnl-suffix1[suffix to add to output field labels from the first datafile]:prefix1:' \ '--vnl-suffix2[suffix to add to output field labels from the second datafile]:prefix1:' \ '--vnl-prefix[prefix to add to output field labels; comma-separated list for all datafiles]:prefix:' \ '--vnl-suffix[suffix to add to output field labels; comma-separated list for all datafiles]:suffix:' \ '--vnl-autoprefix[automatically determine the prefix to add to output field labels for all datafiles]' \ '--vnl-autosuffix[automatically determine the suffix to add to output field labels for all datafiles]' \ '--vnl-sort[Presort the input and maybe post-sort the output]:ordering; should match -|[dfgiMhnRrV]+:' \ '1:file:_files' '2:file:_files' vnlog-1.40/completions/zsh/_vnl-sort000066400000000000000000000076611475400722300176070ustar00rootroot00000000000000#compdef vnl-sort # This is mostly from the zsh _sort file. It is copyright 1992-2014 The Zsh # Development Group. Distributed under the zsh license: # # Permission is hereby granted, without written agreement and without # licence or royalty fees, to use, copy, modify, and distribute this # software and to distribute modified versions of this software for any # purpose, provided that the above copyright notice and the following # two paragraphs appear in all copies of this software. # . # In no event shall the copy right owners liable to any party for # direct, indirect, special, incidental, or consequential damages # arising out of the use of this software and its documentation, even # if and the copyright owners have been advised of the possibility of # such damage. # . # The copyright owners specifically disclaim any warranties, including, # but not limited to, the implied warranties of merchantability and # fitness for a particular purpose. The software provided hereunder is # on an "as is" basis, and the copyright owners have no obligation to # provide maintenance, support, updates, enhancements, or # modifications. local args variant local ordering='(-d --dictionary-order -g --general-numeric-sort -M --month-sort -h --human-numeric-sort -n --numeric-sort --sort -V --version-sort --help)' args=( "(-c --check -C)-c[check whether input is sorted; don't sort]" '(-m --merge)'{-m,--merge}"[merge already sorted files; don't sort]" \*{-T+,--temporary-directory=}'[specify directory for temporary files]:directory:_directories' '(-u --unique)'{-u,--unique}'[with -c, check for strict ordering; without -c, output only the first of an equal run]' "$ordering"{-d,--dictionary-order}'[consider only blanks and alphanumeric characters]' '(-f --ignore-case)'{-f,--ignore-case}'[fold lower case to upper case characters]' "$ordering"{-n,--numeric-sort}'[compare according to string numerical value]' '(-r --reverse)'{-r,--reverse}'[reverse the result of comparisons]' '(-k --key)'{-k+,--key=}'[the field to sort on]:key field' ) _pick_variant -c sort -r variant gnu=GNU $OSTYPE --version case $variant in dragonfly*|netbsd*|openbsd*|freebsd*|gnu) args+=( '(-s --stable)'{-s,--stable}'[preserve original order of lines with the same key]' ) ;| openbsd*|freebsd*|gnu|solaris2.<11->) args+=( "(-c --check -C)-C[check whether input is sorted silently; don't sort]" ) ;| freebsd*|gnu) args+=( "(-c --check -C)--check=-[check whether input is sorted; don't sort]::bad line handling:(diagnose-first silent quiet)" "$ordering"{-g,--general-numeric-sort}'[compare according to general numeric value]' "$ordering"{-M,--month-sort}"[compare (unknown) < 'JAN' < ... < 'DEC']" "$ordering"{-h,--human-numeric-sort}'[compare human readable numbers (e.g., 2K 1G)]' "$ordering"{-R,--random-sort}'[sort by random hash of keys]' "$ordering"{-V,--version-sort}'[sort version numbers]' "$ordering--sort=[sort according to ordering]:ordering:(general-numeric human-numeric month numeric random version)" '--random-source=[get random bytes from file]:file:_files' '--batch-size=[maximum inputs to merge]:number' '--compress-program=[specify program to compress temporary files with]:program:(gzip bzip2 lzop xz)' '--debug[annotate the of the line used to sort]' '(-S --buffer-size)'{-S+,--buffer-size=}'[specify size for main memory buffer]:size' '(- *)--help[display help and exit]' ) ;| netbsd*|dragonfly*) args+=( "${ordering}-l[sort by string length of field]" "(-s)-S[don't use stable sort]" ) ;| openbsd*) args+=( '-H[use a merge sort instead of a radix sort]' ) ;| gnu) args+=( '--parallel=[set number of sorts run concurrently]:number' ) ;; freebsd*) args+=( --radixsort --mergesort --qsort --heapsort --mmap ) ;; *) args=( "${(@)args:#(|\(*\))(|\*)--*}" ) ;; esac _arguments -s -S $args '*:file:_files' vnlog-1.40/completions/zsh/_vnl-tail000066400000000000000000000061231475400722300175410ustar00rootroot00000000000000#compdef vnl-tail # This is mostly from the zsh _tail file. It is copyright 1992-2014 The Zsh # Development Group. Distributed under the zsh license: # # Permission is hereby granted, without written agreement and without # licence or royalty fees, to use, copy, modify, and distribute this # software and to distribute modified versions of this software for any # purpose, provided that the above copyright notice and the following # two paragraphs appear in all copies of this software. # . # In no event shall the copy right owners liable to any party for # direct, indirect, special, incidental, or consequential damages # arising out of the use of this software and its documentation, even # if and the copyright owners have been advised of the possibility of # such damage. # . # The copyright owners specifically disclaim any warranties, including, # but not limited to, the implied warranties of merchantability and # fitness for a particular purpose. The software provided hereunder is # on an "as is" basis, and the copyright owners have no obligation to # provide maintenance, support, updates, enhancements, or # modifications. #compdef tail local curcontext=$curcontext state state_descr line opts args ret=1 typeset -A opt_args if _pick_variant -c tail gnu=GNU unix --version; then args=( '(-n --lines)'{-n+,--lines=}'[print the last specified lines; with +, start at the specified line]:number of lines:->number' '(-F -f)--follow=-[output appended data as the file grows]::how:(name descriptor)' '(-F --follow)-f[same as --follow=descriptor]' '(-f --follow --retry)-F[same as --follow=name --retry]' '--max-unchanged-stats=[with --follow=name, check file rename after the specified number of iterations]:number of iterations' '(-s --sleep-interval)'{-s+,--sleep-interval=}'[with -f, sleep the specfied seconds between iterations]:seconds' '--pid=[with -f, terminate after the specified process dies]:pid:_pids' '--retry[keep trying to open a file even when it becomes inaccessible]' '(- *)--help[display help and exit]' '(- *)--version[output version information and exit]' ) else opts=(-A '-*') args=( '(-b -c)-n+[start at the specified line]:lines relative to the end (with +, beginning) of file' '(-F -r)-f[wait for new data to be appended to the file]' ) case $OSTYPE in (freebsd*|darwin*|dragonfly*|netbsd*|openbsd*|solaris*) args+=( '(-f -F)-r[display the file in reverse order]' ) ;| (freebsd*|darwin*|dragonfly*|netbsd*) args+=( '(-f -r)-F[implies -f, but also detect file rename]' ) ;| esac fi _arguments -C -s -S $opts : $args '*:file:_files' && return case $state in (number) local mlt sign digit sign='signs:sign:((+\:"start at the specified line"' sign+=' -\:"output the last specified lines (default)"))' digit='digits:digit:(0 1 2 3 4 5 6 7 8 9)' if compset -P '(-|+|)[0-9]##'; then _alternative $mlt $digit && ret=0 elif [[ -z $PREFIX ]]; then _alternative $sign $digit && ret=0 elif compset -P '(+|-)'; then _alternative $digit && ret=0 fi ;; esac return ret vnlog-1.40/completions/zsh/_vnl-ts000066400000000000000000000016111475400722300172330ustar00rootroot00000000000000#compdef vnl-ts # This uses the generic zsh _date_formats to complete the date formats. It does # NOT support the "%.s" form that vnl-ts and ts support. function _guarded_date_formats () { # This is a combination of _guard and _date_formats. If I use _date_formats by # itself below, it'll complete date formats from "-", which I don't want it to local garbage zparseopts -K -D -a garbage M: J: V: 1 2 n F: X: [[ "$PREFIX$SUFFIX" != $~1 ]] && return 1 shift _date_formats "$*" [[ -n "$PREFIX$SUFFIX" ]] } _arguments -S \ '(-s)-i[Report time since last record]' \ '(-i)-s[Report time from the receipt of the legend]' \ '-m[Use the monotonic system clock]' \ '--vnl-field[Name for the new timestamp field]:field-name:' \ ':strftime-like format for the timestamp:_guarded_date_formats "^-*"' vnlog-1.40/completions/zsh/_vnl-uniq000066400000000000000000000045671475400722300175760ustar00rootroot00000000000000#compdef vnl-uniq # This is mostly from the zsh _uniq file. It is copyright 1992-2014 The Zsh # Development Group. Distributed under the zsh license: # # Permission is hereby granted, without written agreement and without # licence or royalty fees, to use, copy, modify, and distribute this # software and to distribute modified versions of this software for any # purpose, provided that the above copyright notice and the following # two paragraphs appear in all copies of this software. # . # In no event shall the copy right owners liable to any party for # direct, indirect, special, incidental, or consequential damages # arising out of the use of this software and its documentation, even # if and the copyright owners have been advised of the possibility of # such damage. # . # The copyright owners specifically disclaim any warranties, including, # but not limited to, the implied warranties of merchantability and # fitness for a particular purpose. The software provided hereunder is # on an "as is" basis, and the copyright owners have no obligation to # provide maintenance, support, updates, enhancements, or # modifications. local args args=( '(-c --count)'{-c,--count}'[prefix lines by the number of occurrences]' '(-d --repeated)'{-d,--repeated}'[only print duplicate lines]' '(--all-repeated)-D-[print all duplicate lines]' '(-D)--all-repeated=-[print all duplicate lines]::delimit method [none]:(none prepend separate)' '(-f --skip-fields)'{-f,--skip-fields=}'[avoid comparing initial fields]:number of fields' '--group=-[show all items]::group separation [separate]:(separate prepend append both)' '(-i --ignore-case)'{-i,--ignore-case}'[ignore differences in case when comparing]' '(-s --skip-chars)'{-s,--skip-chars=}'[avoid comparing initial characters]:number of characters' '(-u --unique)'{-u,--unique}'[only print unique lines]' '(-w --check-chars)'{-w,--check-chars=}'[specify maximum number of characters to compare]:characters' '(- *)--help[display help information]' ) if ! _pick_variant -c uniq gnu=Free\ Soft unix --version; then local optchars="cdufs" if [[ "$OSTYPE" == (darwin|dragonfly|freebsd|openbsd)* ]]; then optchars="${optchars}i" fi args=( ${(M)args:#(|\*)(|\(*\))-[$optchars]*} ) fi args+=('--vnl-count[prefix lines by the number of occurrences in a new NAMED field]:"count" field name:') _arguments "$args[@]" \ '1::input file:_files' vnlog-1.40/dji-tsla.tar.gz000066400000000000000000000225541475400722300154350ustar00rootroot00000000000000‹í|Ë®5½qÝ?öS|€ÇÿÉâuă xæ¹ ‰ Ù "9yýÔZ«ØMnC `@1|{Ruú4«y©;‹üí?ýþó?ÿå¿ü5ɽVÀcòÓýS׋˜­¾n øÐjëch¸zÿ ²ê_$âÃÉóE¬ö•/ Éé÷\×'g ˜|qÎòY °|ø–^ÄJ­í¦0ÑÐÖ§TƒÃïiæÏJø×ò–Y­QÀ<ŸÉÚA!¥çvtŸI÷ ´ô"Öó˜7oØj+Ÿ‰yè-&ŸÔÊ>´î‹’òôö= ðCÁðWs|ìzRéEÌê¸W3~Ñ>¯ùx> 3ŸøWIÝ ¤–U¿FæZÄQ¾ØútN“«ã‹˜>£¾ˆóä…ô«áµäï“Íú§µóIÎí3ú‹Ô:—ݼóþÔûPÁÌ©úÄ¿ƒ½ç¨Ÿ^_¤zÚER0ó>Xr)Ÿb å\åE|ÊæJ¢ÏRièCîŸÜù~ûdïÎ*éFZj–o h80üÊ ¬’Íî(ækVÎÌFªOö=“… G!6óy`ç}NÕ‡6íSË‹ÔÕɬu2qµSäŸ:ú”†8T{‘j½ÞkQÈQ«Hš3ž)>.²«åñ©ùEÌå¸^ÐÕV]Ã@Ì›Ëd臖>Á®U4ñÌ{&©=ö¼µŽ®öÝ’ëSÛ‹X™©Ü(ÝÉeˆ“Wgª¿eõ‰ÇdÍòÕN  u3Rzÿy2ÖäZlÄR›÷ZPÛ›uìÍ÷‹K+××J’²‘9ìkœ._ÄnTcKVµëæÏZ/â’Ù¾F!%ï˵p¶‘­õ˜™´ÕW ­”9o Ð.ŠŸa@\¥sJ›w&Sçøð©qÁ*÷ìçƒÐŸ8X_ýÆ ´ú™üôhzÈy؈՜o œg×0$5RpµË9Eo½hJ7âÿùW¿ô Ð¥šz‡¿™®(àËm¤»sò5ŠÄ%Xb›¾ý¨ãIëZßI§õw É¡®ÒJÝîO’–µ¦v#-Û½š™d] 8.ÁóÄ àgõ‘½9)`°Ùù•]Ä( Î5óä˜Ë¸põ¡Ò\´½¸OKÇ2¹Õ+t5ÝÓîã@Ü·­7£÷âãØ9 ×rr5Ý­u"i~¢„¿šF¼Fl¹²¢Á].t«Èt¯õ¦]=]Ór8É‘ïï³±¡«úœØÞè3{ç©ÄïO§@·BÖÆ‰”|÷¡š¨j<–”}™Û‚À/¢{³‘™¿d3zhAÁÚM·*ä1ï^åt×17…FúK<é!-ø61jRp'½ÚÈ\å–Í´{¨áxC»é<¼HÜC°©§®NS>­{WR É2­äa7ù¯r Ãj½)@6“Û,*7xƒÆ†P5 mzðC 5}Q(b³BÃÙ{ˆ9“G L\ÿSånÄ=±~S lºzßU™(Œ—w^ËHÎuÜFÄe#‡8´û‰«\IJ %(T:“zÕíQ¸‘Ï“±$æ’Ò¥a¦|Ú‘«Â+×áºcIr+·ÉV¾FAþ)Mâ3­‡èBÙÔFÒ­''uµGW=ÄaFh€±ÞÄÜ®©ùÒÕ“ºÚ¿(‡œƒ•ê\Jwz¸¹D°ó n.îQð}Å?Ãy’£p¦æln¤(6âºäæ¾ïàg Nàß~?iþi¸y±Zê½ôi 8pˆó]“\ø¨5œ¬6o~`XQ"²£ãʤA'Fš+ôÞFò-Ýsg0ýIK³TSnFoÁ½Ý˜ääµÖM1µQ¡ÊFJÙ‘W6…?qá¾ùÑDsmϨªûkÔð8®æþùv#.›÷j2Üî9Ì<ùêMÞcCˆ7Úä1¾(hÉ|õ)Pór=)rÄ=è{&•Ápm@)ð‰@qJ•_*ÊälÄæ×L2wd] #Šƒ¹¬e=)óØJ·\ð‹ÞÇ蜇ŸK’ö¶ÒˆÇ´ãÖÕ ‘!Q”ÙDʘ3Û© "–Ûã¤!mon²‹1IŸ¼Or Vb¥›9ÊB²I”5ç@Û±¼B¶@Vÿê5­‹Á§‘‚›Kj-1µ…õÄ@â¦ÀÿºÏÓÄ6ƒÚÞŸØVaô7âѢݥÆ"éäáE2;{“fu%,Ù dØ™Á|‡…b/×wo˜ld\ÙPP£‹‘Ыʙ˜òn%¥i7âk¹)°aøQ Æ]­°³44¨Ó|QÓ=“\kd©îà4æ,k8n.•e ÄUu¿)‡ß”×u¶!cd¨"®ÜÊ:‘2ï™Ô•anZ®ÂÃìT°pÖ‰¤3» XôTÑ´4¤]ÉR³óÖÄrùê²O«„•< 8ÔéBV„Jmÿ àà‹› 7’Kt\O|uæ:sé¼)dfÈÕB@³Q¨á¯ˆC6bÕÞ¿ØõéåBmózâïý@ Ä“BbòÍÿ[˜Ló`¹1 ç^1¸ºvˆ~‘H½øÀÛÄ!ætÊT¦·V(Ày y}­&¼&_y~×Ù…Z®ºUøÔJây˜±ñ%›ÐÕþ_ņÉ·É)<ŠFRre7âÎú¹:ñ¿]°jXt:®Þ‡¦ÈN™ÿ¹£æ®<­Yê ãÈ™,î±ÀŸ¬Å}õz ®åÒ¸)€,8–QKò‚Ü´ 1gÉò07b©]§SÓzC£Áõù^+ÇÜfa¹ˆ¸0ÖM¡ÇÚ‘uÍ@}º*Tññ@̃») ‡ÐÌÃN$9)®Ni±¾¸iù¦@,ÁÃNøód®©ãÊÛƒ³Ù.’7R‘U¯´û°««}Q 8øJe ueŠ öÈ_ÕîÎ Gˆ!àº(p‡w¾]²‹c 5{ˆ3õ-vjZêÊ<‹ŒIÛÃÛ”n·ƒõ@̵Ã= )ùÛX>Íbƒé^1mwZá“âa·}Q ECna§|i²M…y4ù‡ ™ôEA>í HÑò"õ]Â=¦] Äåâò@ºòV{¸:=âÍ‚|&]Y×9ÃÄפÝè¹åíô7Rd†]õis÷XžR ®`êE>mIaÑàÒ%ˈHI{ËW¾4êØ}€ËGO;#15Ãm ¡ ÄÜðÞü@MËŒ­¿OHý7O/1¨¹›B#ÿ¤Œ—/"²¬uä`ŒÀd¾ˆ»tö5ÐÈõ¸Ý3¡rŸh\nÅ,B@Hç‹X©—Ӥ푞m»Ï”Vð$?m=¼»@ì+Zl±§6c›o꽎Œ%ÕBÓÀƒx\üEAû†ƒ΋t䂨(r–Ö Ä\Õ´›‡ïŒjÂé1jqˆ;–3¿ˆÛ¢þÕ:uoßt¥­ü‹å#v5å]d^{(  ú~WìÑOœ‘ä q»¸ÊE3?°!HuŽJ'ß×·ÅLÒo Óè‘ú<¸Ãys{QW>þiÉ…s »g’ž¤À俨ZugfHÿO9™é‹ȺnÃ…ô^ ¶ój~·¼ëkâ‡ɵ‘˜ˆO¥öW¬E Ó£˜›‚oÆVZ—ÒöQª}Ÿ™m5Íey)oãj$øÁ2ãt˜j¹XXª_\͕Ž˜Œþ ïžWŠéÙ¸€]i£®®nˆ¹(ÌÕŒ½!ÉfXÞ@¬Þ‘Z£®®Høs5ödÝ—‹Mj×ÍZè@Ü)ì·~€ž´éÖBðÜèÀ{äÎQ,J~»«@Á¸E¥r•›¹¡…²åÖœO1‹¯QЃ]Úc­Ùl°A–¡ Éš/b³Ù½šÔÕðË:wS¦X‹Iàö"p`Îy¨¡iݾñÜSѾö™ä2!çâð’¬>m ûB'Ÿ µ¤`sÈZ >šK²ª|Ú\T7Raýëõ„F¿¾ˆ-Òo ã·Pi^¹¾%Fí•i/âAÌ¹Ç äØ¡­´1ì‡Ê=.ÚÈØˆÛ@;¼h’òÁ.¦Jв‹®¢«>× ­Þ0ØûBFñ43¤@O ÿû‹Øª_«)]=ºæ!ÃØaønš¤´{ÕBoÄ#¶+7hôHnØM÷P¶’ŸR,y§ÓÚn LƒÏ­Ó–R:†íÄM6«ê‚6‚k½)д ¥F]u쇔Òr6ÃÕÜH+÷îƒQW#\ˆOñEç(Æ“páâ¶¹}õAæ2rûqHÈ$.«/Y+ÄÓ_ó°ó UqT9³àPB/ŠI¾)ÐÕñE¤¹Ü3§Õ[“?ÙgÙwEQ³]ûþ¶uu OÌ—¬çó B°2^ĵÜoštõ\;Ûˆ[©&wäCd„¸²¿$ˤ'V0§ê޵wØ9.__ú0¸•{-äÓÚ¾… ‡ˆFQ¿¿ˆG¬ö5ŠAwñÁ žì}Fì‰î*Án Œô›².nÒº–`Õ‰î$ãµw÷潚Ê? ɬҾ'y^´©”Ý’*q-Äzi7†o.,Êf#¡„ÿû¤E%ªw®|”ɧuÿ,ÝÎUAA˜‰¹ÚÎ`+jF¡ØMaìYâûØÔ¦íFèºc’B\—¤¯>0“Œ*b®>>DV„ˆ)ˆ«;l$âVo|Q Yò`Tp«©Ù«y¶@P‘{óõ¤!¬Èa7éËwMÙ°¹¦'vó÷Ôàe1w¯XaBÕæ{Åæ;c½@<¨½¤»PO¢äIlІ§qH÷LRË­$£ïˆ*¬ ÊŠf}b½¿ÈNô˜XeW\˜ ó†Kk­1Ö^Ä×­ß\M푦vùiÚøiDîôIV.‘~2áuÝ”*ßåÄØø¦ORT]±ùNž däë (¨rI̓pû¾÷ ¹ä«¶rTý±¸[ó,”JHu²pjS`#Î ×øÚäÌ`—–ÿjQÌÅcñgªBçɬ™7MHA?¿ˆËqý¢À /Ú¬ïØµ§þGaoA¼§q Sî™dåvšT¹ŠÊCÖøUm¦æùþÓ~ó×>ŒøËçËðP÷ûü¯¯ýÏó¿ÿ/~ÿ¾ó¿ÈÑt ËŸQþŠÜ)REp ïzr½­*[±c°œÄ "íE•A€š9Û§s¾ØPAŤ¿Àã…•¦Ò^†­ {aÈ•PÛNa07 ÷"@_ÜÑzZ¥Å×Ñ™}¢é4N›g«©i@«ÁÙˆihê2€‡fˆfÞVèšéÜ`ÖñbVDSÑ÷ †»îW»æzðªj–` ÿè£ã¼ÔÛª‘:zXé#c6¸ê¡À˜LǼ­ _G†¢2´ 䎪¶$ÜhÏ«Uæšî@c{Üm‰ì»h’ŸV<{kÚ‹$†0‡lÚrðaU;[á=Úf¼°ðÂh¼Òîº÷q´*z54àC¼Þ97:= ‹£Žf¨Ù¥?ÈSŠ õp;0—øKÀu3Œo³†òæ¢ 3 8«—xÞ†,@I³¦«“U/¢ðK^¼›fø sªRS ¥6Ï¡ï¨Ä›Ñ<}V˜”_*ðŒÎÖÛŒþ‘|Åñï¢#éIgÙpšñüÝRm0uU¶8é•è ,ׄÐ1TiÍPJ¨Û<ïC€ƒ·ålÅó”t?fœ‘Ïÿsnà¾áý-ž.âëCÌ]gG¢¾Q`5œ ;ZñlBS=VÓBl%Ð¿Ž‡Ò©m÷H®ÍŒÌ€{¨ùúÏLèhL‰3ÛÊéÄ·ÀÌ}œÜ¡Ã¬:°Ù”–:¸ÐUùOËÌí\0Uû+¤ézßt'¢Ÿww„Ïf º¸5Ñ#c°äí%埲!=}6kºèC5š›/²˜Š Uë¹fpó–²*‹JK‡$êmƒ5s=ù#¡º€¥ÕøÚ+sl霒ÀX«k—ù-Tr¡–g1ßKÑ+ä<ª]]DAëòà ×WT¬¤ xÒá™Ç8rŠmeŸ²1·„ð¨_Õy^×è[uö ïù±=›²j8m”À˜ÕÊÙ cORS1ÒþiAWùê!&n2ë—$˾€¨Ö^Ê; ÔÉC8o«L%5D¯ëÎÒ>¢²–íl•ø‘Ö¹HN,UÔdžœÀ0Vy<­ ª¦Œ¸†Wâ‚‘¥ªYél²ÎV“ãÇþ:6  7š´/×!@î­Ž³‹ÐU(-¯ütñTð&¬ô9/î€zö÷ ^>+ÊT}6Rïi7ííj%“ô¦Í°(ï@aŸ­ åfe©Ès¨úÇ» E'àì”Û5‹(–\Üb›E¢©xX7X_g1uÀ‚lï#‚n ½?c2 2ŠŒÏYdþ$‘úŒ E¤ºLÇÿXî~6ÓºLi–-M*Ÿ)S&0*Ý–·Uc?âè½*¿ãÖáÚûjUÉKìO“\Omª `XïÀx&ìDS=­²ÐÞ7hÆšÒ·Ub+ŠuH&kàXÞ6%¿¾ŽÎ^Biª©÷ÈϼÊ%À¨¼åm…¢·©[18y»Ú&gù,ÃõÔÙC~Dµ¥H+NÖf]$àáðXv6ó‹ô&r¦/VÁ)§=•~¶2¾'¶ÈEôOÃa,é-×q}«ð?H/(? Ú]³^7h«¾žŸMb±Ù‡« •'‡™åfdwóJ9›‰©ß§­è¯˜n[x üi–Õ9(ÝB#Mug\«–6ÈsµyNHVd@ÿ¼(ÆÑ)1¤hÚ8!{á—¬„;æ¼&©¨†EeE¾Ìùê"‚fzQ¡U" À4 ô5ûÉUˆäÜæ(YV©uqЃ%ÐÇÎiÌo4&ÿC´Æ ä¬#;O3EyëC©:Ð<¼Ò6È#';Á\ga0¶40•Çâüb5T 峓Í!ÜAy¾Ö“ë*ѽ ÒmžÍ:»…÷›ªò|1°h•d€Ùz=§¾NeÒƒçŒðp5cƱAöá÷Sï`ªKèÚä+W/^ÅÙ¹])3Mëÿ÷$£=ëÎ!å°¸:‹¸¤°—0ÆДuƒ–F_g+º%ǺI[£E ³ê »ó—VÐ.îÑy¬|XÚ3XRâmðDöÛJªt¾~[âµ2;®¨Ýí­³…‹; ÐH,¯*¹œzH0Òhçl0œ«º_G¥Ë°»]CFY`µuØž!äÍ.uK}Qž ?ÈANk•s‹tZI(ì!G®ŸŽ6—£õ;k¨~*u#䇼lƒ¶ŽBq>‚ÚPú)ó"­ËZ/õúÖxŒ4jÝÊjG ¯Õ¯VýIÓZ}+Kç ŒÉ ø·•ñuüq}ÙŠÕ¹\™®sê³âõshtÁéðwu€UúWÕµF¡ò¯¤ÀqÀ†õɾ´W¼C¶¬«£%ÌèØ`¡¢ýh•d!—N–€Œ/þÊŒÆãµo+1=#ž5—’橜ÁD }¶jäš¡|þÅj/L%×ÀMÙè'×CQYÆ®'¯’áìrIèGäâÞd?›©zï\¯:‘±”™Ñd ø*·cÉx&ÏšB@ÔuªÅ*{¸Ú¸ §ž­”¯ƒã…R§Bó‚u¬¼´'À´õf¹töŠ›fˆ'Rx€’˜/˜f‡qáy;Óýc00MEÌ9/Qq“ÖÑJ9&ÖØL1ÄP¾Ê(Ÿ}Ùa£y®Žw|Ušº¢%HjÕmƒœ 7ößf¬h’µíª—ï*9ä*±x¯ÔÛ*Ñ!@ÄÓyÚ ß$B6˜6¯éÈ2Ñ` ñXö"ÛÜË臇ϓr¦Ú:”’Ò¤kÒu5`—Ëq„¶<G¾Páôáj¨ÆŽç±AN³Îssø3Æ7ÀÊ•u›à¸ß®ƒg>„ûpRÞšùꥥG ¸£y±0ÓÖÊr7láN¼8cŒ J±Ó_ç‰7( ¬U²r)â¢$*;¨g„ÅcnÐ4IêÑ{VB^g!Ì’Ú9IJ0‹+©p¬(-QïpÎ#OÚJ>+AYC)§ \¨û¸šµ$˜r‘­,¿€—ð\}T”mJèì„Äc¹ÔïoeH.¥¨¦)`§S Õüfðâ€ÚÔ¡Ý¥Ò\ÿiŒBÇpÐ8[%R/òUøå¼ÓV£làaêÕŠISêu&3˜YEâ@wxæ³USÍ¡G¹CDÆmw\ÛÕªò?̸ê"ª¡Æ¡[wj^GRG‡Ì†’„ƒÑZ1S ïO -^:ù¶*œCaîÄöɇæF`®zxÏ:L†+>Ä(™š:¬0«þ"Xù´.:@6•–œOv2i*yÁÀñÍ³Õ oÀ1º—l ÒàýQµ–t}Kk2•¤™êoS"„wë4?}[E2B«ÌDSÚi§è½ƒ…ó¥g«ª h‘Ûv±:*0œä9‡YyÄR§’uP IJµu+)혔™\üdŽ|BÙ|x¶JJ V}2E.Nžö|.:å‹ÑÖXR]¯DšÙͱA;¯bC]kì€s¸Xäwg<`ö1Ï9„ÞˆJ†¥[ù†Nö#½6ðxl]=4Jʈüä«HôkÛ 7h'Ó3g7räEuLÔ¶  ®e)3ÍÕ’éf`]3$P{9Ò„:™µTV¾Tòy¿¥¢" OìhUÄô¦4:Ó~bæ¥ë#p÷êù­Èoqñˆm%úÈô¸ß!­¤Åâ=Î`DÞïkÔ[]ßÒÆsn*Tº3Ë> õyØ"®Z:‹22‹2®&N€ªÁz¶J›‰JœœÆ½IJš–º‡:c­²>²E*kë.ÒìsœÝ>ÇŨ5ïŠRØ”¨hÀ¥“ãl¥T3wÿdc§NEÅãªÔÒ ©m%µ7UX»tu¦ŠÛÙJ.]a(wkšN‘+Q"à.Ø89 N*RRÑJeŸGžÞv~½ˆ×Üåv„màÆ²\3ŸÈìKÉœö· «FÚžÛ:?Z ¥i‰u£Ä|&Ç;ÎõnÊ2)¾ÔnFì0زµ’NVdÆyiƺrä‘$lÊîä•ËaÒuŠÉÝc$ªçŒzcIÒÑÙ‡ÖOaQ犋þ¢ ]wmÆ$Aöø´žœ¯ë騻EÇU;¡qí AîtŒv3Rj:á=TÂê–ne¥ï&[ïÖwœLÂv#ï’Õ­‹ÚÃnª€Ý‰z6\ÒØº‚êŠþXÈîæ”«“¼ŠK  ûR–ND$ZÙ¸¦)ý¬ð!€ní ´ºAïíØOâY£½—yó¯}QF"§ p½Ð<[)ý̤ŠòêHtÈÓæåîRÍk:´]8Ç“‘IS•]a¦ã`Ì¥›6i\”Ëù-j€º©gñFÕ_ilà¡óa×y~†›x•Œ•Ç“ÿÔu>¬–ÎUŽ40Ù±J}Æ] :Œ(€3GålU”ì0mmÎ7å¬Í$^Žr*\Êš/ù5¡{Á>ª3=XTHh:":á–šÊóMürÂ…$g3¥‰ÈLÚóÙîýIP €;ΑñêE4årcÓÑLâIÐÜ;×™:`ð…*&1ÛYÅ]©‰S•ýHpU>5‘+ª³Ê ÛFª×,f¾ž$a]5‘])Y,@Íëðy’eç*[×.£îºÄõÓyœ|Èg+ÍSVñªŽØJ´šöãp7Êù­¢¼/L—.ŸcåãÔVšmPGç·Šê4#Më5”§žmƒÚì0žLyº¦Ýë’í¹ ÀêÙÙª*;: b®[ hm€KÛÙÊž½Ö®”Ý^z ÐjoW »V•öt´1ç§VÎV±B‘IÇ5uµ(¿š´9LPçœç¸X/۔ŕ¥Û•ª¼B:â¾y¶’R«šÃ¤¤,'Eu #iê,Ý6´EªŽêîk¢U!è¾ÌW•®'¿*çŒûx¤MgNÞ`Ut˜®JïÆX9C¡kô^Û¹^‘ã-↔žÌltTÀ»·ÖÑ*I/sxª%çÍl[ D‹®lúÙ*2½ñ­¢Íµ`ǼÁšíâÞ¤ ;ê´¡-ImÀÙœà.¡r¶Ò{±KÇ ¹i¬>ê±üX«ösæéÈä½%ʺEîj+ùœCޤìênä žè €{ÇÐÃÿè¢ýŸ¿Ÿ¿Ÿ¿Ÿ¿Ÿ¿Ÿ¿Ÿ¿Ÿ¿Ÿ¿Ÿ¿çïù‰®xvnlog-1.40/guide-1.svg000066400000000000000000000277621475400722300145610ustar00rootroot00000000000000 Gnuplot Produced by GNUPLOT 5.4 patchlevel 1 23500 24000 24500 25000 25500 26000 26500 27000 0 50 100 150 200 250 gnuplot_plot_1 vnlog-1.40/guide-2.svg000066400000000000000000000277621475400722300145620ustar00rootroot00000000000000 Gnuplot Produced by GNUPLOT 5.4 patchlevel 1 23500 24000 24500 25000 25500 26000 26500 27000 0 50 100 150 200 250 gnuplot_plot_1 vnlog-1.40/guide-3.svg000066400000000000000000000400021475400722300145410ustar00rootroot00000000000000 Gnuplot Produced by GNUPLOT 5.4 patchlevel 1 23500 24000 24500 25000 25500 26000 26500 27000 01/01 02/01 03/01 04/01 05/01 06/01 07/01 08/01 09/01 10/01 11/01 12/01 Price ($) Date gnuplot_plot_1 vnlog-1.40/guide-4.svg000066400000000000000000000260021475400722300145460ustar00rootroot00000000000000 Gnuplot Produced by GNUPLOT 5.4 patchlevel 1 24000 24500 25000 25500 26000 26500 27000 09/27 10/04 10/11 10/18 10/25 11/01 Price ($) Date gnuplot_plot_1 vnlog-1.40/guide-5.svg000066400000000000000000001007611475400722300145540ustar00rootroot00000000000000 Gnuplot Produced by GNUPLOT 5.4 patchlevel 1 240 260 280 300 320 340 360 380 23500 24000 24500 25000 25500 26000 26500 27000 TSLA price ($) DJI price ($) gnuplot_plot_1 vnlog-1.40/lib/000077500000000000000000000000001475400722300133355ustar00rootroot00000000000000vnlog-1.40/lib/Vnlog/000077500000000000000000000000001475400722300144225ustar00rootroot00000000000000vnlog-1.40/lib/Vnlog/Parser.pm000066400000000000000000000111051475400722300162120ustar00rootroot00000000000000package Vnlog::Parser; use strict; use warnings; use feature ':5.10'; our $VERSION = 1.00; use base 'Exporter'; our @EXPORT_OK = qw(); sub new { my $classname = shift; my $this = { 'keys' => undef, 'values' => undef, 'error' => '', 'values_hash' => undef}; bless($this, $classname); return $this; } sub parse { my ($this, $line) = @_; # I reset the data first $this->{values} = undef; $this->{values_hash} = undef; if( $line =~ /^\s*(?:#[#!]|#\s*$|$)/p) { # empty line or hard comment. # no data, no error return 1; } if( $line =~ /^\s*#\s*(.*?)\s*$/ ) { if( $this->{keys} ) { # already have legend, so this is just a comment # no data, no error return 1; } # got legend. # no data, no error $this->{keys} = [ split(' ', $1) ]; return 1; } if( !$this->{'keys'} ) { # Not comment, not empty line, but no legend yet. Barf $this->{error} = "Got dataline before legend"; return undef; } $line =~ /^\s*(.*?)\s*$/; # get the non-newline part. Like chomp, but # non-destructive $this->{values} = [ map {$_ eq '-' ? undef : $_} split(' ', $1) ]; if( scalar @{$this->{'keys'}} != scalar @{$this->{'values'}} ) { # mismatched key/value counts $this->{error} = sprintf('Legend line "%s" has %d elements, but data line "%s" has %d elements. Counts must match!', "# " . join(' ', @{$this->{'keys'}}), scalar @{$this->{'keys'}}, $line, scalar @{$this->{'values'}}); return undef; } return 1; } sub error { my ($this) = @_; return $this->{error}; } sub getKeys { my ($this) = @_; return $this->{keys}; } sub getValues { my ($this) = @_; return $this->{values} } sub getValuesHash { my ($this) = @_; # internally: # $this->{values_hash} == undef: not yet computed # $this->{values_hash} == {}: computed, but no-data # returning: undef if computed, but no-data if( defined $this->{values_hash} ) { return undef if 0 == scalar(%{$this->{values_hash}}); return $this->{values_hash}; } $this->{values_hash} = {}; if($this->{keys} && $this->{values}) { for my $i (0..$#{$this->{keys}}) { $this->{values_hash}{$this->{keys}[$i]} = $this->{values}[$i]; } } return $this->{values_hash}; } 1; =head1 NAME Vnlog::Parser - Simple library to parse vnlog data =head1 SYNOPSIS use Vnlog::Parser; my $parser = Vnlog::Parser->new(); while () { if( !$parser->parse($_) ) { die "Error parsing vnlog line '$_': " . $parser->error(); } my $d = $parser->getValuesHash(); next unless %$d; say "$d->{time}: $d->{height}"; } =head1 DESCRIPTION This is a simple perl script to parse vnlog input and make the incoming key/values available. The example above is representative of normal use. API functions are =over =item * new() Creates new Vnlog::Parser object. Takes no arguments. =item * parse(line) Method to call for each input line. On error, a false value is returned. =item * error() If an error occurred, returns a string that describes the error. =item * getKeys() Returns a list-ref containing the current column labels or undef if this hasn't been parsed yet. =item * getValues() Returns a list-ref containing the values for the current line or undef if there aren't any. This isn't an error necessarily because this line could have been a comment. Empty fields are '-' in the vnlog and undef in the values returned here. =item * getValuesHash() Returns a hash-ref containing the key-value mapping for the current line or undef if there's no data in this line. This isn't an error necessarily because this line could have been a comment. Empty fields are '-' in the vnlog and undef in the values returned here. =item * =back =head1 REPOSITORY L =head1 AUTHOR Dima Kogan, C<< >> =head1 LICENSE AND COPYRIGHT Copyright 2016 California Institute of Technology. This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. =cut vnlog-1.40/lib/Vnlog/Util.pm000066400000000000000000000403201475400722300156740ustar00rootroot00000000000000package Vnlog::Util; use strict; use warnings; use feature ':5.10'; use Carp 'confess'; our $VERSION = 1.00; use base 'Exporter'; our @EXPORT_OK = qw(get_unbuffered_line parse_options read_and_preparse_input ensure_all_legends_equivalent reconstruct_substituted_command close_nondev_inputs get_key_index longest_leading_trailing_substring fork_and_filter); # The bulk of these is for the coreutils wrappers such as sort, join, paste and # so on use FindBin '$Bin'; use lib "$Bin/lib"; use Vnlog::Parser; use Fcntl qw(F_GETFD F_SETFD FD_CLOEXEC); use Getopt::Long 'GetOptionsFromArray'; # Reads a line from STDIN one byte at a time. This means that as far as the OS # is concerned we never read() past our line. sub get_unbuffered_line { my $fd = shift; my $line = ''; while(1) { my $c = ''; return undef unless 1 == sysread($fd, $c, 1); $line .= $c; return $line if $c eq "\n"; } } sub open_file_as_pipe { my ($filename, $input_filter, $unbuffered) = @_; if( defined $input_filter && $unbuffered) { die "Currently I refuse a custom input filter while running without a buffer; because the way I implement unbuffered-ness assumes the default filter"; } if ($filename eq '-') { # This is required because Debian currently ships an ancient version of # mawk that has a bug: if an input file is given on the commandline, # -Winteractive is silently ignored. So I explicitly omit the input to # make my mawk work properly if(!$unbuffered) { $filename = '/dev/stdin'; } } else { if ( ! -r $filename ) { confess "'$filename' is not readable"; } } # This invocation of 'mawk' or cat below is important. I want to read the # legend in this perl program from a FILE, and then exec the underlying # application, with the inner application using the post-legend # file-descriptor. Conceptually this works, BUT the inner application # expects to get a filename that it calls open() on, NOT an already-open # file-descriptor. I can get an open-able filename from /dev/fd/N, but on # Linux this is a plain symlink to the actual file, so the file would be # re-opened, and the legend visible again. By using a filtering process # (grep here), /dev/fd/N is a pipe, not a file. And opening this pipe DOES # start reading the file from the post-legend location my $pipe_cmd = $input_filter; if(!defined $pipe_cmd) { # mawk script to strip away comments and trailing whitespace (GNU coreutils # join treats trailing whitespace as empty-field data: # https://debbugs.gnu.org/32308). This is the pre-filter to the data my $mawk_strip_comments = <<'EOF'; { if (havelegend) { sub("[\t ]*#.*",""); # have legend. Strip all comments if (match($0,"[^\t ]")) # If any non-whitespace remains, print { sub("[\t ]+$",""); print; } } else { sub("[\t ]*#[!#].*",""); # strip all ##/#! comments if (match($0,"^[\t ]*#[\t ]*$")) # data-less # is a comment too { next } if (!match($0,"[^\t ]")) # skip if only whitespace remains { next } if (!match($0, "^[\t ]*#")) # Only single # comments are possible # If we hit something else, barf { print "ERROR: Data before legend"; exit } havelegend = 1; # got a legend. spit it out print; } } EOF my @mawk_cmd = ('mawk'); push @mawk_cmd, '-Winteractive' if $unbuffered; push @mawk_cmd, $mawk_strip_comments; $pipe_cmd = \@mawk_cmd; } return fork_and_filter(@$pipe_cmd) if ($filename eq '-' && $unbuffered); return fork_and_filter(@$pipe_cmd, $filename); } sub fork_and_filter { my @cmd = @_; my $fh; my $pipe_pid = open($fh, "-|") // confess "Can't fork: $!"; if (!$pipe_pid) { # child exec @cmd or confess "can't exec program: $!"; } # parent # I'm going to be explicitly passing these to an exec, so FD_CLOEXEC # must be off my $flags = fcntl $fh, F_GETFD, 0; fcntl $fh, F_SETFD, ($flags & ~FD_CLOEXEC); return $fh; } sub pull_key { my ($input) = @_; my $filename = $input->{filename}; my $fh = $input->{fh}; my $keys; my $parser = Vnlog::Parser->new(); while (defined ($_ = get_unbuffered_line($fh))) { if ( !$parser->parse($_) ) { confess "Reading '$filename': Error parsing vnlog line '$_': " . $parser->error(); } $keys = $parser->getKeys(); if (defined $keys) { return $keys; } } confess "Error reading '$filename': no legend found!"; } sub parse_options { my ($_ARGV, $_specs, $num_nondash_options, $usage) = @_; my @specs = @$_specs; my @ARGV_copy = @$_ARGV; # In my usage, options that take optional arguments (specified as # "option:type") must be given as --option=arg and NOT '--option arg'. # Getopt::Long doesn't allow this, so I have to do it myself. # # I find all occurrences in ARGV, I pull them out before parsing, and I put # them back afterwards. Since '--option arg' is invalid, I only need to pull # out single tokens: multiple-token options aren't valid my @optional_arg_opts = grep /:/, @specs; my %optional_arg_opts_tokens_removed; for my $optional_arg_opt (@optional_arg_opts) { my ($opt_spec) = split(/:/, $optional_arg_opt); # options are specified as a|b|cc|dd. The one-letter options appear as # -a, and the longer ones as --cc. I process each indepenently my @opts = split(/\|/, $opt_spec); for my $opt (@opts) { if(length($opt) == 1) { $opt = "-$opt"; } else { $opt = "--$opt"; } my $re = '^' . $opt . '(=|$)'; $re = qr/$re/; my @tokens = grep /$re/, @ARGV_copy; if (@tokens) { $optional_arg_opts_tokens_removed{$optional_arg_opt} //= []; push @{$optional_arg_opts_tokens_removed{$optional_arg_opt}}, @tokens; @ARGV_copy = grep {$_ !~ /$re/} @ARGV_copy; } } } my %options; my $result; my $oldconfig = Getopt::Long::Configure('gnu_getopt'); eval { $result = GetOptionsFromArray( \@ARGV_copy, \%options, @specs ); }; my $err = $@ || !$result; if(!$err && !$options{help}) { # Parsing succeeded! I parse all the options I pulled out earlier, and # THEN I'm done for my $optional_arg_opt ( keys %optional_arg_opts_tokens_removed ) { my $opt = $optional_arg_opts_tokens_removed{$optional_arg_opt}; eval { $result = GetOptionsFromArray( $opt, \%options, ($optional_arg_opt) ); }; $err = $@ || !$result; last if $err; push @ARGV_copy, @$opt; } } if( $err || $options{help}) { if( $err ) { say "Error parsing options!\n"; } my ($what) = $0 =~ /-(.+?)$/; say <[$i] ne $l2->[$i]; } return 1; } sub ensure_all_legends_equivalent { my ($inputs) = @_; for my $i (1..$#$inputs) { if (!legends_match($inputs->[0 ]{keys}, $inputs->[$i]{keys})) { confess("All input legends must match! Instead files '$inputs->[0 ]{filename}' and '$inputs->[$i]{filename}' have keys " . "'@{$inputs->[0 ]{keys}}' and '@{$inputs->[$i]{keys}}' respectively"); } } return 1; } sub read_and_preparse_input { my ($filenames, $input_filter, $unbuffered) = @_; my @inputs = map { {filename => $_} } @$filenames; for my $input (@inputs) { $input->{fh} = open_file_as_pipe($input->{filename}, $input_filter, $unbuffered); $input->{keys} = pull_key($input); } return \@inputs; } sub close_nondev_inputs { my ($inputs) = @_; for my $input (@$inputs) { if( $input->{filename} !~ m{^-$ # stdin | # or ^/(?:dev|proc)/ # device }x ) { close $input->{fh}; } } } sub get_key_index { my ($input, $key) = @_; my $index; my $keys = $input->{keys}; for my $i (0..$#$keys) { next unless $keys->[$i] eq $key; if (defined $index) { my $key_list = '(' . join(' ', @$keys) . ')'; confess "File '$input->{filename}' contains requested key '$key' more than once. Available keys: $key_list"; } $index = $i + 1; # keys are indexed from 1 } if (!defined $index) { my $key_list = '(' . join(' ', @$keys) . ')'; confess "File '$input->{filename}' does not contain key '$key'. Available keys: $key_list"; } return $index; }; sub reconstruct_substituted_command { # reconstruct the command, invoking the internal GNU tool, but replacing the # filenames with the opened-and-read-past-the-legend pipe. The field # specifiers have already been replaced with their column indices my ($inputs, $options, $nondash_options, $specs, $keep_normal_files) = @_; my @argv; # First I pull in the arguments for my $option(keys %$options) { # vnlog-specific options are not passed on to the inner command next if $option =~ /^vnl/; my $re_specs_noarg = qr/^ $option (?: \| [^=:] + )* $/x; my $re_specs_yesarg = qr/^ $option (?: \| [^=:] + )* = /x; my $re_specs_maybearg = qr/^ $option (?: \| [^=:] + )* : /x; my @specs_noarg = grep { /$re_specs_noarg/ } @$specs; my @specs_yesarg = grep { /$re_specs_yesarg/ } @$specs; my @specs_maybearg = grep { /$re_specs_maybearg/ } @$specs; if( scalar(@specs_noarg) + scalar(@specs_yesarg) + scalar(@specs_maybearg) != 1) { confess "Couldn't uniquely figure out where '$option' came from. This is a bug. Specs: '@$specs'"; } my $dashoption = length($option) == 1 ? "-$option" : "--$option"; my $push_value = sub { # This is overly complex, but mostly exists for "vnl-tail # --follow=name". This does NOT work as 'vnl-tail --follow name' if($_[0] eq '') { # -x or --xyz push @argv, $dashoption; } elsif($dashoption =~ '^--') { # --xyz=123 push @argv, "$dashoption=$_[0]"; } else { # -x 123 push @argv, $dashoption; push @argv, $_[0]; } }; if( @specs_noarg ) { push @argv, "$dashoption"; } else { # required or optional arg. push_value() will omit the arg if the # value is '' my $value = $options->{$option}; if( ref $options->{$option} ) { for my $value(@{$options->{$option}}) { &$push_value($value); } } else { &$push_value($value); } } } push @argv, @$nondash_options; # And then I pull in the files push @argv, map { ($keep_normal_files && $_->{filename} !~ m{^-$ # stdin | # or ^/(?:dev|proc)/ # device }x) ? $_->{filename} : ("/dev/fd/" . fileno $_->{fh}) } @$inputs; return \@argv; } sub longest_leading_trailing_substring { # I start out with the full first input string. At best this whole string is # the answer. I look through each string in the input, and wittle down the # leading/trailing matches my $match_leading = shift; my $match_trailing_reversed = scalar reverse $match_leading; my @all = @_; for my $s (@all) { # xor difference string. '\0' bytes means "exact match" my $diff; $diff = $match_leading ^ $s; $diff =~ /^\0*/; my $NleadingMatches = $+[0]; $diff = $match_trailing_reversed ^ (scalar reverse $s); $diff =~ /^\0*/; my $NtrailingMatches = $+[0]; # I cut down the matching string to keep ONLY the matched bytes substr($match_leading, $NleadingMatches ) = ''; substr($match_trailing_reversed, $NtrailingMatches) = ''; } # A common special case is that the input files are of the form aaaNNNbbb # where NNN are numbers. If the numbers are 0-padded, the set of NNN could # be "01", "02", "03". In this case the "0" is a common prefix, so it would # not be included in the file labels, which is NOT what you want here: the # labels should be "01", "02", ... not "1", "2". Here I handle this case by # removing all trailing digits from the common prefix $match_leading =~ s/[0-9]$//; return ($match_leading, scalar reverse $match_trailing_reversed); } 1; =head1 NAME Vnlog::Util - Various utility functions useful in vnlog parsing =head1 SYNOPSIS use Vnlog::Util 'get_unbuffered_line'; while(defined ($_ = get_unbuffered_line(*STDIN))) { print "got line '$_'."; } =head1 DESCRIPTION This module provides some useful utilities =over =item get_unbuffered_line Reads a line of input from the given pipe, and returns it. Common usage is like while(defined ($_ = get_unbuffered_line(*STDIN))) { ... } which is identical to the basic form while() { ... } except C reads I the bytes in the line from the OS. The rest is guaranteed to be available for future reading. This is useful for tools that bootstrap vnlog processing by reading up-to the legend, and then C some other tool to process the rest. =back =head1 REPOSITORY L =head1 AUTHOR Dima Kogan, C<< >> =head1 LICENSE AND COPYRIGHT Copyright 2017 Dima Kogan This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. =cut vnlog-1.40/lib/vnlog.py000066400000000000000000000324611475400722300150420ustar00rootroot00000000000000#!/usr/bin/env python3 r'''A simple parser for vnlog data Synopsis: import vnlog for d in vnlog.vnlog(f): print(d['time'],d['height']) Vnlog is simple, and you don't NEED a parser to read it, but this library makes it a bit nicer. This module provides three different ways to parse vnlog 1. slurp the whole thing into a numpy array: the slurp() function. Basic usage: import vnlog arr,list_keys,dict_key_index = \ vnlog.slurp(filename_or_fileobject) This parses out the legend, and then calls numpy.loadtxt(). Null data values ('-') are not supported at this time. A structured dtype can be passed-in to read non-numerical data. See the docstring for vnlog.slurp() for details 2. Iterate through the records: vnlog class, used as an iterator. Basic usage: import vnlog for d in vnlog.vnlog(filename_or_fileobject): print(d['time'],d['height']) Null data values are represented as None 3. Parse incoming lines individually: vnlog class, using the parse() method. Basic usage: import vnlog parser = vnlog.vnlog() for l in file: parser.parse(l) d = parser.values_dict() if not d: continue print(d['time'],d['height']) Most of the time you'd use options 1 or 2 above. Option 3 is the most general, but also the most verbose ''' from __future__ import print_function import re class vnlog: r'''Class to facilitate vnlog parsing This class provides two different ways to parse vnlog 1. Iterate through the records: vnlog class, used as an iterator. Basic usage: import vnlog for d in vnlog.vnlog(filename_or_fileobject): print(d['time'],d['height']) Null data values are represented as None 2. Parse incoming lines individually: vnlog class, using the parse() method. Basic usage: import vnlog parser = vnlog.vnlog() for l in file: parser.parse(l) d = parser.values_dict() if not d: continue print(d['time'],d['height']) ''' def __init__(self, f = None): r'''Initialize the vnlog parser If using this class as an iterator, you MUST pass a filename or file object into this constructor ''' self._keys = None self._values = None self._values_dict = None if f is None or type(f) is not str: self.f = f self.f_need_close = False else: self.f = open(f, 'r') self.f_need_close = True def __del__(self): try: if self.f_need_close: self.f.close() except: pass def parse(self, l): r'''Parse a new line of data. The user only needs to call this if they're not using this class as an iterator. When this function returns, the keys(), values() and values_dict() functions return the data from this line. Before the legend was parsed, all would return None. After the legend was parsed, keys() returns non-None. When a comment is encountered, values(), values_dict() return None ''' # I reset the data first self._values = None self._values_dict = None if not hasattr(self, 're_hard_comment'): self.re_hard_comment = re.compile(r'^\s*(?:#[#!]|#\s*$|$)') self.re_soft_comment = re.compile(r'^\s*#\s*(.*?)\s*$') if self.re_hard_comment.match(l): # empty line or hard comment. # no data, no error return True m = self.re_soft_comment.match(l) if m: if self._keys is not None: # already have legend, so this is just a comment # no data, no error return True # got legend. # no data, no error self._keys = m.group(1).split() return True if self._keys is None: # Not comment, not empty line, but no legend yet. Barf raise Exception("Got dataline before legend") # string trailing comments i = l.find('#') if i >= 0: l = l[:i] # strip leading, trailing whitespace l = l.strip() if len(l) == 0: return True self._values = [ None if x == '-' else x for x in l.split()] if len(self._values) != len(self._keys): raise Exception('Legend line "{}" has {} elements, but data line "{}" has {} elements. Counts must match!'. \ format( "# " + ' '.join(self._keys), len(self._keys), l, len(self._values))) return True def keys(self): r'''Returns the keys of the so-far-parsed data Returns None if we haven't seen the legend line yet''' return self._keys def values(self): r'''Returns the values list of the last-parsed line Returns None if the last line was a comment. Null fields ('-') values are represented as None ''' return self._values def values_dict(self): r'''Returns the values dict of the last-parsed line This dict maps field names to values. Returns None if the last line was a comment. Null fields ('-') values are represented as None. ''' # internally: # self._values_dict == None: not yet computed # self._values_dict == {}: computed, but no-data # returning: None if computed, but no-data if self._values_dict is not None: if len(self._values_dict) == 0: return None return self._values_dict self._values_dict = {} if self._keys and self._values: for i in range(len(self._keys)): self._values_dict[self._keys[i]] = self._values[i] return self._values_dict def __iter__(self): if self.f is None: raise Exception("Cannot iterate since this vnlog instance was not given a log to iterate on") return self def __next__(self): for l in self.f: self.parse(l) if self._values is None: continue return self.values_dict() raise StopIteration # to support python2 and python3 next = __next__ def _slurp(f, *, dtype = None): r'''Reads a whole vnlog into memory This is an internal function. The argument is a file object, not a filename. See the docs for slurp() for details ''' import numpy as np # Expands the fields in a dtype into a flat list of names. For vnlog # purposes this doesn't support multiple levels of fields and it doesn't # support unnamed fields. It DOES support (require!) compound elements with # whitespace-separated field names, such as 'x y z' for a shape-(3,) field. # # This function is an analogue of field_type_grow_recursive() in # https://github.com/numpy/numpy/blob/9815c16f449e12915ef35a8255329ba26dacd5c0/numpy/core/src/multiarray/textreading/field_types.c#L95 def field_names_in_dtype(dtype, split_name = None, name = None): if dtype.subdtype is not None: if split_name is None: raise Exception("only structured dtypes with named fields are supported") size = np.prod(dtype.shape) if size != len(split_name): raise Exception(f'Field "{name}" has {len(split_name)} elements, but the dtype has it associated with a field of shape {dtype.shape} with {size} elements. The sizes MUST match') yield from split_name return if dtype.fields is not None: if split_name is not None: raise("structured dtype with nested fields unsupported") for name1 in dtype.names: tup = dtype.fields[name1] field_descr = tup[0] yield from field_names_in_dtype(field_descr, name = name1, split_name = name1.split(),) return if split_name is None: raise Exception("structured dtype with unnamed fields unsupported") if len(split_name) != 1: raise Exception(f"Field '{name}' is a scalar so it may not contain whitespace in its name") yield split_name[0] parser = vnlog() keys = None for line in f: parser.parse(line) keys = parser.keys() if keys is not None: break else: raise Exception("vnlog parser did not find a legend line") dict_key_index = {} for i in range(len(keys)): dict_key_index[keys[i]] = i if dtype is None or \ not isinstance(dtype, np.dtype) or \ ( dtype.fields is None and \ dtype.subdtype is None ): return \ ( np.loadtxt(f, ndmin=2, dtype=dtype), keys, dict_key_index ) # We have a dtype. We parse out the field names from it, map those to # columns in the input (from the vnl legend that we just parsed), and # load everything with np.loadtxt() names_dtype = list(field_names_in_dtype(dtype)) # We have input fields in the vnl represented in: # - keys # - dict_key_index # # We have output fields represented in: # - names_dtype # # Each element of 'names_dtype' corresponds to each output field, in # order. 'names_dtype' are names of these input fields, which must match # the input names given in 'keys'. Ncols_output = len(names_dtype) usecols = [None] * Ncols_output for i_out in range(Ncols_output): name_dtype = names_dtype[i_out] try: i_in = dict_key_index[name_dtype] except: raise Exception(f"The given dtype contains field {name_dtype=} but this doesn't appear in the vnlog columns {keys=}") usecols[i_out] = i_in return \ np.loadtxt(f, ndmin = 1, dtype = dtype, usecols = usecols) def slurp(f, *, dtype = None): r'''Reads a whole vnlog into memory SYNOPSIS import vnlog ### Read numerical data into arr arr,list_keys,dict_key_index = \ vnlog.slurp(filename_or_fileobject) ### Read disparate, partly-numerical data using a structured dtype # Let's say "data.vnl" contains: # image x y z temperature image1.png 1 2 5 34 image2.png 3 4 1 35 dtype = np.dtype([ ('image', 'U16'), ('x y z', int, (3,)), ('temperature', float), ]) arr = vnlog.slurp("data.vnl", dtype=dtype) print(arr['image']) ---> array(['image1.png', 'image2.png'], dtype=' array([[1, 2, 5], [3, 4, 1]]) print(arr['temperature']) ---> array([34., 35.]) This function is primarily a wrapper around numpy.loadtxt(), which does most of the work. Null data values ('-') are not supported at this time. A dtype can be given in a keyword argument. If this is a base type (something like 'float' or 'np.int8'), the returned array will be composed entirely of values of that type. If this is a structured dtype (like the one in the SYNOPSIS above), a structured array will be returned. Some notes about this behavior: - The given structured dtype defines both how to organize the data, and which data to extract. So it can be used to read in only a subset of the available columns. In the sample above I could have omitted the 'temperature' column, for instance - Sub-arrays are allowed. In the example I could say either dtype = np.dtype([ ('image', 'U16'), ('x y z', int, (3,)), ('temperature', float), ]) or dtype = np.dtype([ ('image', 'U16'), ('x', int), ('y', int), ('z', int), ('temperature', float), ]) The latter would read x,y,z into separate, individual arrays. Sometime we want this, sometimes not. - Nested structured dtypes are not allowed. Fields inside other fields are not supported, since it's not clear how to map that to a flat vnlog legend - If a structured dtype is given we return the array only, since the field names are already available in the dtype ARGUMENTS - f: a filename or a readable Python "file" object. We read this until the end - dtype: an optional dtype for the ouput array. May be a structured dtype RETURN VALUE - If no dtype is given or a simple dtype is given: Returns a tuple (arr, list_keys, dict_key_index) - If a structured dtype is given: Returns arr ''' if type(f) is str: with open(f, 'r') as fh: return _slurp(fh, dtype=dtype) else: return _slurp(f, dtype=dtype) # Basic usage. More examples in test_python_parser.py if __name__ == '__main__': try: from StringIO import StringIO except ImportError: from io import StringIO f = StringIO('''#! zxcv # time height ## qewr 1 2 3 4 # - 10 - 5 6 - - - 7 8 ''') for d in vnlog(f): print(d['time'],d['height']) vnlog-1.40/packaging/000077500000000000000000000000001475400722300145135ustar00rootroot00000000000000vnlog-1.40/packaging/README000066400000000000000000000004421475400722300153730ustar00rootroot00000000000000These are sample packaging files that can be used to build your own packages. At the time of this writing vnlog is already included in the latest releases of Debian and Ubuntu, so those packaging files aren't included here. Want this to be shipped in some other distro? Send them patches! vnlog-1.40/packaging/vnlog.spec000066400000000000000000000051511475400722300165160ustar00rootroot00000000000000Name: vnlog Version: 1.6 Release: 1%{?dist} Summary: Tools to manipulate whitespace-separated ASCII logs License: LGPL-2.1+ URL: https://github.com/dkogan/vnlog/ Source0: https://github.com/dkogan/vnlog/archive/%{version}.tar.gz#/%{name}-%{version}.tar.gz BuildRequires: python2-devel BuildRequires: perl-IPC-Run BuildRequires: perl-Text-Diff BuildRequires: perl-String-ShellQuote BuildRequires: perl-List-MoreUtils BuildRequires: mawk BuildRequires: make BuildRequires: chrpath BuildRequires: /usr/bin/pod2man BuildRequires: perl-autodie BuildRequires: perl-Data-Dumper BuildRequires: numpy %description We want to manipulate data logged in a very simple whitespace-separated ASCII format. The format in compatible with the usual UNIX tools, and thus can be processed with a multitude of existing methods. Some convenience tools and library interfaces are provided to create new data in this format and manipulate existing data %package devel Requires: %{name}%{_isa} = %{version}-%{release} Summary: Development files for vnlog Requires: perl-String-ShellQuote %description devel The library needed for the vnlog C interface and the vnl-gen-header tool needed to define the fields %package tools Requires: %{name}%{_isa} = %{version}-%{release} Summary: Tools for manipulating vnlogs Requires: mawk Requires: perl-Text-Table Requires: perl-List-MoreUtils Requires: perl-autodie %description tools Various helper tools to make working with vnlogs easier %prep %setup -q %build make %{?_smp_mflags} all doc %check make check %install %make_install mkdir -p %{buildroot}%{_datadir}/zsh/site-functions cp completions/zsh/* %{buildroot}%{_datadir}/zsh/site-functions mkdir -p %{buildroot}%{_datadir}/bash-completion/completions cp completions/bash/* %{buildroot}%{_datadir}/bash-completion/completions %clean make clean %files %doc %{_libdir}/*.so.* %doc %{_mandir}/man3/* %{_datadir}/perl5 %{python2_sitelib}/* %files devel %{_libdir}/*.so %{_includedir}/* %{_bindir}/vnl-gen-header %doc %{_mandir}/man1/vnl-gen-header.1.gz %files tools %{_bindir}/vnl-filter %{_bindir}/vnl-tail %{_bindir}/vnl-sort %{_bindir}/vnl-uniq %{_bindir}/vnl-join %{_bindir}/vnl-make-matrix %{_bindir}/vnl-align %{_bindir}/vnl-ts %doc %{_mandir}/man1/vnl-filter.1.gz %doc %{_mandir}/man1/vnl-tail.1.gz %doc %{_mandir}/man1/vnl-sort.1.gz %doc %{_mandir}/man1/vnl-uniq.1.gz %doc %{_mandir}/man1/vnl-join.1.gz %doc %{_mandir}/man1/vnl-make-matrix.1.gz %doc %{_mandir}/man1/vnl-align.1.gz %doc %{_mandir}/man1/vnl-ts.1.gz %{_datadir}/zsh/* %{_datadir}/bash-completion/* vnlog-1.40/test/000077500000000000000000000000001475400722300135465ustar00rootroot00000000000000vnlog-1.40/test/TestHelpers.pm000066400000000000000000000066171475400722300163600ustar00rootroot00000000000000package TestHelpers; use strict; use warnings; use feature ':5.10'; use Carp qw(cluck confess); use Fcntl qw(F_GETFD F_SETFD FD_CLOEXEC); use IPC::Run 'run'; use Text::Diff 'diff'; use FindBin '$Bin'; our $VERSION = 1.00; use base 'Exporter'; our @EXPORT_OK = qw(check test_init); my $tool; my $Nfailed_ref; my $testdata_dir; sub test_init { $tool = shift; $Nfailed_ref = shift; my %data = @_; $testdata_dir = "$Bin/testdata_$tool"; mkdir $testdata_dir if ! -d $testdata_dir; for my $key (keys %data) { my $filename = "${testdata_dir}/" . $key =~ s/^\$//r; open FD, '>', $filename or die "Couldn't open '$filename' for writing"; print FD $data{$key}; close FD; } } sub check { # arguments: # # - expected output. 'ERROR' means the invocation should fail # - arguments to the tool. # - if an arg is '$xxx', replace that arg with a pipe containing the data # in $xxx # - if an arg is '-$xxx', replace that arg with '-', pipe $xxx into STDIN # - if an arg is '--$xxx', remove the arg entirely, pipe $xxx into STDIN my ($expected, @args) = @_; my @pipes; my $in; for my $iarg(0..$#args) { if($args[$iarg] =~ /^\$/) { my $datafile = "${testdata_dir}/" . substr($args[$iarg], 1); $args[$iarg] = $datafile; } elsif($args[$iarg] =~ /^-\$/) { # I'm passing it data via stdin if(defined $in) { die "A test passed in more than one chunk of data on stdin"; } my $datafile = "${testdata_dir}/" . substr($args[$iarg], 2); $in = $datafile; $args[$iarg] = '-'; } elsif($args[$iarg] =~ /^--\$/) { # I'm passing it data via stdin if(defined $in) { die "A test passed in more than one chunk of data on stdin"; } my $datafile = "${testdata_dir}/" . substr($args[$iarg], 3); $in = $datafile; $args[$iarg] = undef; # mark the arg for removal } } # remove marked args @args = grep {defined $_} @args; my $out = ''; my $err = ''; $in //= \''; my @cmd = ("perl", "$Bin/../$tool", @args); my $result = run( \@cmd, '<', $in, '>', \$out, '2>', \$err ); if($expected ne 'ERROR') { if( !$result ) { cluck "Test failed. Expected success, but got failure.\n" . "Ran '@cmd'.\n" . "STDERR: '$err'"; $$Nfailed_ref++; } else { # I ignore differences in leading whitespace $expected =~ s/^\s*//gm; $out =~ s/^\s*//gm; my $diff = diff(\$expected, \$out); if ( length $diff ) { cluck "Test failed: diff mismatch.\n" . "Ran '@cmd'.\n" . "Diff: '$diff'"; $$Nfailed_ref++; } } } else { if( $result ) { cluck "Test failed. Expected failure, but got success.\n". "Ran '@cmd'.\n" . "STDERR: '$err'\n" . "STDOUT: '$err'"; $$Nfailed_ref++; } } for my $pipe(@pipes) { close $pipe; } } 1; vnlog-1.40/test/test-parser.c000066400000000000000000000023141475400722300161630ustar00rootroot00000000000000#include #include #include #include "../vnlog-parser.h" #define MSG(fmt, ...) \ fprintf(stderr, "%s:%d " fmt "\n", __FILE__, __LINE__, ##__VA_ARGS__) int main(int argc, char* argv[]) { if(argc != 3) { fprintf(stderr, "Usage: %s input.vnl query-key\n", argv[0]); return 1; } const char* filename = argv[1]; FILE* fp = (0 == strcmp(filename,"-")) ? stdin : fopen(filename, "r"); if(fp == NULL) { MSG("Couldn't open '%s'", filename); return 1; } const char* querykey = argv[2]; vnlog_parser_t ctx; if(VNL_OK != vnlog_parser_init(&ctx, fp)) return 1; const char*const* queryvalue = vnlog_parser_record_from_key(&ctx, querykey); vnlog_parser_result_t result; while(VNL_OK == (result = vnlog_parser_read_record(&ctx, fp))) { printf("======\n"); for(int i=0; i #include "vnlog_fields_generated1.h" void test2(void); int main() { vnlog_emit_legend(); vnlog_set_field_value__w(-10); vnlog_set_field_value__x(40); vnlog_set_field_value__y("asdf"); vnlog_emit_record(); vnlog_set_field_value__w(55); const char* str = "123\x01\x02\x03"; vnlog_set_field_value__d(str, strlen(str)); // we just set some fields in this record, and in the middle of filling this // record we write other records, and other vnlog sessions { struct vnlog_context_t ctx; vnlog_init_child_ctx(&ctx, NULL); // child of the global context for(int i=0; i<3; i++) { vnlog_set_field_value_ctx__w(&ctx, i + 5); vnlog_set_field_value_ctx__x(&ctx, i + 6); vnlog_emit_record_ctx(&ctx); } vnlog_free_ctx(&ctx); test2(); } // Now we resume the previous record. We still remember that w == 55 vnlog_set_field_value__x(77); vnlog_set_field_value__z(0.3); vnlog_emit_record(); return 0; } vnlog-1.40/test/test1.want000066400000000000000000000001411475400722300154750ustar00rootroot00000000000000# w x y z d -10 40 asdf - - 5 6 - - - 6 7 - - - 7 8 - - - 55 77 - 0.2999999999999999889 MTIzAQID vnlog-1.40/test/test2.c000066400000000000000000000007171475400722300147600ustar00rootroot00000000000000#include "vnlog_fields_generated2.h" void test2(void) { struct vnlog_context_t ctx; vnlog_init_session_ctx(&ctx); FILE* fp = fopen("test2.got", "w"); if(fp == NULL) return; vnlog_set_output_FILE(&ctx, fp); vnlog_emit_legend_ctx(&ctx); vnlog_set_field_value_ctx__a(&ctx, -3); vnlog_emit_record_ctx(&ctx); vnlog_set_field_value_ctx__b(&ctx, -4); vnlog_emit_record_ctx(&ctx); fclose(fp); vnlog_free_ctx(&ctx); } vnlog-1.40/test/test2.want000066400000000000000000000000261475400722300155000ustar00rootroot00000000000000# a b c -3 - - - -4 - vnlog-1.40/test/test_c_api.sh000077500000000000000000000061341475400722300162230ustar00rootroot00000000000000#!/bin/bash set -e cd `dirname $0` #### writer ./test1 > test1.got diff -q test1.want test1.got diff -q test2.want test2.got #### reader read -r -d '' ref_df <<'EOF' || true ====== time = 0 id = abc x = 1 y = 5 z = 3 query: df = NOT FOUND ====== time = 1 id = def x = 11 y = 25 z = 53 query: df = NOT FOUND EOF read -r -d '' ref_y <<'EOF' || true ====== time = 0 id = abc x = 1 y = 5 z = 3 query: y = 5 ====== time = 1 id = def x = 11 y = 25 z = 53 query: y = 25 EOF read -r -d '' ref_y_gap_first_row <<'EOF' || true ====== time = 0 id = abc x = 1 y = - z = - query: y = - ====== time = 1 id = def x = 11 y = 25 z = 53 query: y = 25 EOF read -r -d '' ref_y_gap_second_row <<'EOF' || true ====== time = 0 id = abc x = 1 y = 5 z = 3 query: y = 5 ====== time = 1 id = def x = 11 y = - z = - query: y = - EOF echo ' # time id x y z 0 abc 1 5 3 1 def 11 25 53 ' | ./test-parser - y 2>/dev/null > test-parser.got || { echo "LINE $LINENO: FAILED!"; exit 1; } diff -q test-parser.got <(echo "$ref_y") >&/dev/null || { echo "LINE $LINENO: mismatched output!"; exit 1; } echo ' # time id x y z # asdf err 0 abc 1 5 3 # 115 113 ## zxvvvv 1 def 11 25 53 ' | ./test-parser - y 2>/dev/null > test-parser.got || { echo "LINE $LINENO: FAILED!"; exit 1; } diff -q test-parser.got <(echo "$ref_y") >&/dev/null || { echo "LINE $LINENO: mismatched output!"; exit 1; } echo ' # time id x y z # asdf err 0 abc 1 - - # 115 113 1 def 11 25 53 # qwer ' | ./test-parser - y 2>/dev/null > test-parser.got || { echo "LINE $LINENO: FAILED!"; exit 1; } diff -q test-parser.got <(echo "$ref_y_gap_first_row") >&/dev/null || { echo "LINE $LINENO: mismatched output!"; exit 1; } echo ' # time id x y z # asdf err 0 abc 1 5 3 # 115 113 ## zxvvvv 1 def 11 - - ' | ./test-parser - y 2>/dev/null > test-parser.got || { echo "LINE $LINENO: FAILED!"; exit 1; } diff -q test-parser.got <(echo "$ref_y_gap_second_row") >&/dev/null || { echo "LINE $LINENO: mismatched output!"; exit 1; } echo ' ## adsf #! zxcv # # # # time id x y z 0 abc 1 5 3 1 def 11 25 53 ' | ./test-parser - df 2>/dev/null > test-parser.got || { echo "LINE $LINENO: FAILED!"; exit 1; } diff -q test-parser.got <(echo "$ref_df") >&/dev/null || { echo "LINE $LINENO: mismatched output!"; exit 1; } echo ' # # time id x y z # asdf err 0 abc 1 5 3 1 def 11 25 53 ' | ./test-parser - df 2>/dev/null > test-parser.got || { echo "LINE $LINENO: FAILED!"; exit 1; } diff -q test-parser.got <(echo "$ref_df") >&/dev/null || { echo "LINE $LINENO: mismatched output!"; exit 1; } ## And the expected failures echo ' # time id x y 0 abc 1 5 3 1 def 11 25 53 ' | ./test-parser - y >&/dev/null && { echo "LINE $LINENO: SHOULD HAVE FAILED!"; exit 1; } || true echo ' # time id x y z z 0 abc 1 5 3 1 def 11 25 53 ' | ./test-parser - y >&/dev/null && { echo "LINE $LINENO: SHOULD HAVE FAILED!"; exit 1; } || true echo ' ## time id x y z 0 abc 1 5 3 1 def 11 25 53 ' | ./test-parser - y >&/dev/null && { echo "LINE $LINENO: SHOULD HAVE FAILED!"; exit 1; } || true echo ' time id x y z 0 abc 1 5 3 1 def 11 25 53 ' | ./test-parser - y >&/dev/null && { echo "LINE $LINENO: SHOULD HAVE FAILED!"; exit 1; } || true vnlog-1.40/test/test_perl_parser.pl000077500000000000000000000020011475400722300174540ustar00rootroot00000000000000#!/usr/bin/env perl use strict; use warnings; use feature ':5.10'; use FindBin '$RealBin'; use lib "$RealBin/../lib"; use Vnlog::Parser; my $parser = Vnlog::Parser->new(); my $resultstring = ''; while () { if( !$parser->parse($_) ) { die "Error parsing vnlog line '$_': " . $parser->error(); } my $d = $parser->getValuesHash(); next unless %$d; use Data::Dumper; $resultstring .= Dumper [$d->{time},$d->{height}]; } my $ref = <<'EOF'; $VAR1 = [ '1', '2' ]; $VAR1 = [ '3', '4' ]; $VAR1 = [ undef, '5' ]; $VAR1 = [ '6', undef ]; $VAR1 = [ undef, undef ]; $VAR1 = [ '7', '8' ]; EOF if( $resultstring eq $ref) { say "Test passed"; exit 0; } say "Expected '$ref' but got '$resultstring'"; say "Test failed!"; exit 1; __DATA__ #! zxcv # #time height ## qewr # asdf 1 2 3 4 # - 10 - 5 6 - - - 7 8 vnlog-1.40/test/test_python_parser.py000077500000000000000000000144721475400722300200670ustar00rootroot00000000000000#!/usr/bin/env python3 r'''Tests the python parser This is intended to work with both python2 and python3. ''' from __future__ import print_function import os import sys import numpy as np sys.path[:0] = (os.path.abspath(os.path.dirname(sys.argv[0])) + "/../lib",) import vnlog try: from StringIO import StringIO except ImportError: from io import StringIO inputstring = '''#! zxcv # # # ## fdd #time height ## qewr 1 2 ## ff 3 4 # - 10 - 5 # abc 6 - - - 7 8 ''' ref = r'''1 2 3 4 None 5 6 None None None 7 8 ''' # Parsing manually f = StringIO(inputstring) parser = vnlog.vnlog() resultstring = '' for l in f: parser.parse(l) d = parser.values_dict() if not d: continue resultstring += '{} {}\n'.format(d['time'],d['height']) if resultstring != ref: print("Expected '{}' but got '{}'".format(ref, resultstring)) print("Test failed!") sys.exit(1) # Iterating f = StringIO(inputstring) resultstring = '' for d in vnlog.vnlog(f): resultstring += '{} {}\n'.format(d['time'],d['height']) if resultstring != ref: print("Expected '{}' but got '{}'".format(ref, resultstring)) print("Test failed!") sys.exit(1) # Slurping inputstring_noundef = r'''#! zxcv # time height ## qewr 1 2 3 4 #fff # - 10 7 8 ''' ref_noundef = np.array(((1,2),(3,4),(7,8))) f = StringIO(inputstring_noundef) arr,list_keys,dict_key_index = vnlog.slurp(f) if np.linalg.norm((ref_noundef - arr).ravel()) > 1e-8: raise Exception("Array mismatch: expected '{}' but got '{}". \ format(ref_noundef, arr)) if len(list_keys) != 2 or list_keys[0] != 'time' or list_keys[1] != 'height': raise Exception("Key mismatch: expected '{}' but got '{}". \ format(('time','height'), list_keys)) if len(dict_key_index) != 2 or dict_key_index['time'] != 0 or dict_key_index['height'] != 1: raise Exception("Key-dict mismatch: expected '{}' but got '{}". \ format({'time': 0, 'height': 1}, dict_key_index)) # Slurping with simple dtypes f = StringIO(inputstring_noundef) arr = vnlog.slurp(f, dtype=int)[0] if arr.dtype != int: raise Exception("Unexpected dtype") if np.linalg.norm((ref_noundef - arr).ravel()) > 1e-8: raise Exception("Array mismatch") f = StringIO(inputstring_noundef) arr = vnlog.slurp(f, dtype=float)[0] if arr.dtype != float: raise Exception("Unexpected dtype") if np.linalg.norm((ref_noundef - arr).ravel()) > 1e-8: raise Exception("Array mismatch") f = StringIO(inputstring_noundef) arr = vnlog.slurp(f, dtype=np.dtype(float))[0] if arr.dtype != float: raise Exception("Unexpected dtype") if np.linalg.norm((ref_noundef - arr).ravel()) > 1e-8: raise Exception("Array mismatch") # Slurping a single row should still produce a 2d result inputstring = ''' ## asdf # x name y name2 z 1 2 3 ''' f = StringIO(inputstring) arr = vnlog.slurp(f)[0] if arr.shape != (1,3): raise Exception("Unexpected shape") # Slurping with structured dtypes inputstring = ''' ## asdf # x name y name2 z 1 a 2 zz2 3 4 fbb 5 qq2 6 ''' ref = np.array(((1,2,3), (4,5,6),),) dtype = np.dtype([ ('name', 'U16'), ('x y z', int, (3,)), ('name2', 'U16'), ]) f = StringIO(inputstring) arr = vnlog.slurp(f, dtype=dtype) if arr.shape != (2,): raise Exception("Unexpected structured array outer shape") if arr['name' ].shape != (2,): raise Exception("Unexpected structured array inner shape") if arr['name2'].shape != (2,): raise Exception("Unexpected structured array inner shape") if arr['x y z'].shape != (2,3): raise Exception("Unexpected structured array inner shape") if arr['x y z'].dtype != int: raise Exception("Unexpected structured array inner dtype") if arr['name' ][0] != 'a': raise Exception("mismatch") if arr['name2'][1] != 'qq2': raise Exception("mismatch") if np.linalg.norm((ref - arr['x y z']).ravel()) > 1e-8: raise Exception("Array mismatch") # selecting a subset of the data ref = np.array(((1,3), (4,6),),) dtype = np.dtype([ ('name2', 'U16'), ('x z', int, (2,)) ]) f = StringIO(inputstring) arr = vnlog.slurp(f, dtype=dtype) if arr['x z'].shape != (2,2): raise Exception("Unexpected structured array inner shape") if arr['x z'].dtype != int: raise Exception("Unexpected structured array inner dtype") if arr['name2'][1] != 'qq2': raise Exception("mismatch") if np.linalg.norm((ref - arr['x z']).ravel()) > 1e-8: raise Exception("Array mismatch") dtype = np.dtype([ ('name', 'U16'), ('x yz', int, (3,)), ('name2', 'U16'), ]) f = StringIO(inputstring) try: arr = vnlog.slurp(f, dtype=dtype) except: pass else: raise Exception("Bad dtype wasn't flagged") dtype = np.dtype([ ('name', 'U16'), ('x yz', int, (2,)), ('name2', 'U16'), ]) f = StringIO(inputstring) try: arr = vnlog.slurp(f, dtype=dtype) except: pass else: raise Exception("Bad dtype wasn't flagged") dtype = np.dtype([ ('name', 'U16'), ('x y z w', int, (4,)), ('name2', 'U16'), ]) f = StringIO(inputstring) try: arr = vnlog.slurp(f, dtype=dtype) except: pass else: raise Exception("Bad dtype wasn't flagged") dtype = np.dtype([ ('name', 'U16'), ('x y z', int, (2,)), ('name2', 'U16'), ]) f = StringIO(inputstring) try: arr = vnlog.slurp(f, dtype=dtype) except: pass else: raise Exception("Bad dtype wasn't flagged") dtype = np.dtype([ ('name', 'U16'), ('x y z', int, (3,)), ('name 2', 'U16'), ]) f = StringIO(inputstring) try: arr = vnlog.slurp(f, dtype=dtype) except: pass else: raise Exception("Bad dtype wasn't flagged") # Slurping a single row with a structured dtype inputstring = ''' ## asdf # x name y name2 z 4 fbb 5 qq2 6 ''' dtype = np.dtype([ ('name', 'U16'), ('x y z', int, (3,)), ('name2', 'U16'), ]) f = StringIO(inputstring) arr = vnlog.slurp(f, dtype=dtype) if arr.shape != (1,): raise Exception("Unexpected structured array outer shape") if arr['name' ].shape != (1,): raise Exception("Unexpected structured array inner shape") if arr['name2'].shape != (1,): raise Exception("Unexpected structured array inner shape") if arr['x y z'].shape != (1,3): raise Exception("Unexpected structured array inner shape") print("Test passed") sys.exit(0); vnlog-1.40/test/test_vnl-filter.pl000077500000000000000000000505051475400722300172340ustar00rootroot00000000000000#!/usr/bin/env perl use strict; use warnings; use feature ':5.10'; use IPC::Run 'run'; use Text::Diff 'diff'; use Carp qw(cluck confess); use FindBin '$RealBin'; use Term::ANSIColor; my $Nfailed = 0; my $data_default = <<'EOF'; #!/bin/xxx # a b c 1 2 3 4 - 6 7 9 - 10 11 12 EOF check( <<'EOF', qw(-p s=b) ); #!/bin/xxx # s 2 9 11 EOF check( <<'EOF', qw(-l) ); a b c EOF check( <<'EOF', qw(-p s=b --noskipempty) ); #!/bin/xxx # s 2 - 9 11 EOF my $data_hasempty_hascomments = <<'EOF'; #!adsf # a b c 1 2 3 ## zcxv 4 - 6 7 9 - - - - EOF check( <<'EOF', qw(--noskipempty), {data => $data_hasempty_hascomments} ); #!adsf # a b c 1 2 3 ## zcxv 4 - 6 7 9 - - - - EOF check( <<'EOF', qw(--skipempty), {data => $data_hasempty_hascomments} ); #!adsf # a b c 1 2 3 ## zcxv 4 - 6 7 9 - EOF check( <<'EOF', qw(--noskipempty --skipcomments), {data => $data_hasempty_hascomments} ); # a b c 1 2 3 4 - 6 7 9 - - - - EOF check( <<'EOF', qw(--skipempty --skipcomments), {data => $data_hasempty_hascomments} ); # a b c 1 2 3 4 - 6 7 9 - EOF check( <<'EOF', '-p', 's=b,a' ); #!/bin/xxx # s a 2 1 - 4 9 7 11 10 EOF check( <<'EOF', '-p', 's=b,a', '--noskipempty'); #!/bin/xxx # s a 2 1 - 4 9 7 11 10 EOF check( <<'EOF', qw(-p s=a) ); #!/bin/xxx # s 1 4 7 10 EOF check( <<'EOF', qw(-p s=a+1) ); #!/bin/xxx # s 2 5 8 11 EOF # asking for bogus fields should never produce an empty-string result. Such a # thing would misalign the output fields. In awk I expect -. Perl just outputs # the bogus thing as a string; good-enough. --noskipempty had a different code # path, so I check it separately check( <<'EOF', '-p', 'a=b,b=xxx', {language => 'AWK'} ); #!/bin/xxx # a b 2 - 9 - 11 - EOF check( <<'EOF', '-p', 'a=b,b=xxx', '--noskipempty', {language => 'AWK'} ); #!/bin/xxx # a b 2 - - - 9 - 11 - EOF check( <<'EOF', '-p', 'a=b,b=xxx', {language => 'perl'} ); #!/bin/xxx # a b 2 xxx - xxx 9 xxx 11 xxx EOF check( <<'EOF', '-p', 'a=b,b=xxx', '--noskipempty', {language => 'perl'} ); #!/bin/xxx # a b 2 xxx - xxx 9 xxx 11 xxx EOF # And I really REALLY should never output empty strings. check( <<'EOF', '-p', 'a=b,b=""' ); #!/bin/xxx # a b 2 - 9 - 11 - EOF check( <<'EOF', '-p', 'a=b,b=""', '--noskipempty' ); #!/bin/xxx # a b 2 - - - 9 - 11 - EOF check( <<'EOF', qw(-p s=a+1) ); #!/bin/xxx # s 2 5 8 11 EOF check( <<'EOF', qw(-p .) ); #!/bin/xxx # a b c 1 2 3 4 - 6 7 9 - 10 11 12 EOF check( <<'EOF', '-p', 'a,b' ); #!/bin/xxx # a b 1 2 4 - 7 9 10 11 EOF check( <<'EOF', qw(-p a -p b) ); #!/bin/xxx # a b 1 2 4 - 7 9 10 11 EOF check( <<'EOF', qw(--print a --pick b) ); #!/bin/xxx # a b 1 2 4 - 7 9 10 11 EOF check( <<'EOF', qw( -p [ab]) ); #!/bin/xxx # a b 1 2 4 - 7 9 10 11 EOF check( <<'EOF', qw(--has a -p .) ); #!/bin/xxx # a b c 1 2 3 4 - 6 7 9 - 10 11 12 EOF check( <<'EOF', qw(--has b) ); #!/bin/xxx # a b c 1 2 3 7 9 - 10 11 12 EOF check( <<'EOF', qw(--has c -p .) ); #!/bin/xxx # a b c 1 2 3 4 - 6 10 11 12 EOF check( <<'EOF', '--has', 'b,c'); #!/bin/xxx # a b c 1 2 3 10 11 12 EOF check( <<'EOF', '--has', 'b,c'); #!/bin/xxx # a b c 1 2 3 10 11 12 EOF check( <<'EOF', qw(--has b --has c -p .) ); #!/bin/xxx # a b c 1 2 3 10 11 12 EOF check( <<'EOF', qw(--has b --has c) ); #!/bin/xxx # a b c 1 2 3 10 11 12 EOF check( <<'EOF', qw(--has b --has c -p a) ); #!/bin/xxx # a 1 10 EOF check( <<'EOF', qw(--has b -p), 'a,b'); #!/bin/xxx # a b 1 2 7 9 10 11 EOF check( <<'EOF', '-p', 'a,+b' ); #!/bin/xxx # a b 1 2 7 9 10 11 EOF check( <<'EOF', '-p', '.' ); #!/bin/xxx # a b c 1 2 3 4 - 6 7 9 - 10 11 12 EOF check( <<'EOF', '-p', 'a,[bx]' ); #!/bin/xxx # a b 1 2 4 - 7 9 10 11 EOF check( <<'EOF', '-p', 'a,+[bx]' ); #!/bin/xxx # a b 1 2 7 9 10 11 EOF check( <<'EOF', '-p', 'a', '--has', '[bx]' ); #!/bin/xxx # a 1 7 10 EOF check( <<'EOF', '-p', 'a,[bc]' ); #!/bin/xxx # a b c 1 2 3 4 - 6 7 9 - 10 11 12 EOF check( <<'EOF', '--sub-abs', '-p', 'x=abs(a-5)' ); #!/bin/xxx # x 4 1 2 5 EOF check( 'ERROR', '-p', 'a,+[bc]' ); check( 'ERROR', '-p', '+.' ); check( <<'EOF', qw(--BEGIN x=5 --END), 'print 100', qw(-p s=a+x), {language => 'AWK'} ); #!/bin/xxx # s 6 9 12 15 100 EOF check( <<'EOF', qw(--BEGIN $x=5 --END), 'say 100', qw(-p s=a+$x), {language => 'perl'} ); #!/bin/xxx # s 6 9 12 15 100 EOF check( <<'EOF', qw(-p d=rel(a) -p s=sum(a) -p pa=prev(a) -p b -p c -p pdb=latestdefined(b) --noskipempty)); #!/bin/xxx # d s pa b c pdb 0 1 - 2 3 2 3 5 1 - 6 2 6 12 4 9 - 9 9 22 7 11 12 11 EOF check( <<'EOF', qw(rel(a)>6 -p . -p d=rel(a) -p s=sum(a))); #!/bin/xxx # a b c d s 10 11 12 9 22 EOF check( <<'EOF', qw(-p d=rel(a) -p b -p c)); #!/bin/xxx # d b c 0 2 3 3 - 6 6 9 - 9 11 12 EOF check( <<'EOF', qw(-p r=rel(a) -p b -p d=diff(a) -p s=sum(a) -p c -p a)); #!/bin/xxx # r b d s c a 0 2 - 1 3 1 3 - 3 5 6 4 6 9 3 12 - 7 9 11 3 22 12 10 EOF check( <<'EOF', ['-p', 'r=rel(a),b,c'], [qw(-p r)]); #!/bin/xxx # r 0 3 6 9 EOF check( <<'EOF', ['-p', 'r=rel(a),b,c'], [qw(-p r=rel(r))]); #!/bin/xxx # r 0 3 6 9 EOF check( <<'EOF', ['-p', 'r=rel(a),b,c'], [qw(-p d=diff(r))]); #!/bin/xxx # d 3 3 3 EOF my $data_cubics = <<'EOF'; #!/bin/xxx # x 1 8 27 64 125 EOF check( <<'EOF', '-p', 'd1=diff(x),d2=diff(diff(x)),sd2=sum(diff(diff(x)))', {data => $data_cubics}); #!/bin/xxx # d1 d2 sd2 - - 0 7 7 7 19 12 19 37 18 37 61 24 61 EOF check( <<'EOF', '-p', 'sd=sum(diff(a))', '-p', 'ds=diff(sum(a))'); #!/bin/xxx # sd ds 0 - 3 4 6 7 9 10 EOF check( <<'EOF', qw(--has b -p [ab]) ); #!/bin/xxx # a b 1 2 7 9 10 11 EOF check( <<'EOF', ['--has', 'b', '-p', 'da=diff(a),db=diff(b)'], ['db>3'], {language => 'AWK'} ); #!/bin/xxx # da db 6 7 EOF check( <<'EOF', ['-p', 'a,r=rel(a)'], ['a<4'], {language => 'AWK'} ); #!/bin/xxx # a r 1 0 EOF check( <<'EOF', ['-p', 'a,r=rel(a)'], ['r<4'], {language => 'AWK'} ); #!/bin/xxx # a r 1 0 4 3 EOF check( <<'EOF', ['-p', 'r=rel(a),a'], ['a<4'], {language => 'AWK'} ); #!/bin/xxx # r a 0 1 EOF check( <<'EOF', ['-p', 'r=rel(a),a'], ['r<4'], {language => 'AWK'} ); #!/bin/xxx # r a 0 1 3 4 EOF check( <<'EOF', ['-p', 'r=rel(a),a'], ['--eval', '{print r}'], {language => 'AWK'} ); 0 3 6 9 EOF check( <<'EOF', ['-p', 'r=rel(a),a'], ['--eval', 'say r'], {language => 'perl'} ); 0 3 6 9 EOF # rel/diff and eval. Should work check( <<'EOF', qw(-p d=rel(a))); #!/bin/xxx # d 0 3 6 9 EOF check( <<"EOF", '--eval', "{print rel(a)}", {language => "AWK"}); 0 3 6 9 EOF check( <<"EOF", '--eval', "say rel(a)", {language => "perl"}); 0 3 6 9 EOF check( <<"EOF", '--eval', "{if(1) { print rel(a) }}", {language => "AWK"}); 0 3 6 9 EOF check( <<"EOF", '--eval', "{if(1) { \n print rel(a) }}", {language => "AWK"}); 0 3 6 9 EOF check( <<"EOF", '--eval', "say rel(a)", {language => "perl"}); 0 3 6 9 EOF check( <<"EOF", '--eval', "\n say rel(a)", {language => "perl"}); 0 3 6 9 EOF check( <<'EOF', 'a>5' ); #!/bin/xxx # a b c 7 9 - 10 11 12 EOF check( <<'EOF', 'a<9' ); #!/bin/xxx # a b c 1 2 3 4 - 6 7 9 - EOF check( <<'EOF', 'a>5 && a<9' ); #!/bin/xxx # a b c 7 9 - EOF check( <<'EOF', 'a>5', 'a<9' ); #!/bin/xxx # a b c 7 9 - EOF check( <<'EOF', qw(a>5 -p c) ); #!/bin/xxx # c 12 EOF check( <<'EOF', qw(a>5 --no-skipempty -p c) ); #!/bin/xxx # c - 12 EOF check( <<'EOF', 'a>5', '--eval', '{print a+b}', {language => 'AWK'} ); 16 21 EOF check( <<'EOF', 'a>5', '--function', 'func() { return a + b }', '-p', 'sum=func()', {language => 'AWK'} ); #!/bin/xxx # sum 16 21 EOF check( <<'EOF', 'a>5', '--function', 'func(x,y) { return x + y }', '-p', 'sum=func(a,b)', {language => 'AWK'} ); #!/bin/xxx # sum 16 21 EOF check( <<'EOF', 'a>5', '--function', 'func { return a + b }', '-p', 'sum=func()', {language => 'perl'} ); #!/bin/xxx # sum 16 21 EOF check( <<'EOF', 'a>5', '--function', 'func { my ($x,$y) = @_; return $x + $y }', '-p', 'sum=func(a,b)', {language => 'perl'} ); #!/bin/xxx # sum 16 21 EOF check( <<'EOF', 'a>5', '--eval', '{say a+b}', {language => 'perl'} ); 16 21 EOF check( <<'EOF', 'a>5', '--eval', 'my $v = a + b + 2; say $v', {language => 'perl'} ); 18 23 EOF my $data_specialchars = <<'EOF'; #!/bin/xxx # PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND aaa=bbb ccc=ddd ccc=ddd 25946 dima 20 0 82132 23828 644 S 5.9 1.2 0:01.42 mailalert.pl 1 a b 27036 dima 20 0 1099844 37772 13600 S 5.9 1.9 1:29.57 mpv 2 a b 28648 dima 20 0 45292 3464 2812 R 5.9 0.2 0:00.02 top 3 a b 1 root 20 0 219992 4708 3088 S 0.0 0.2 1:04.41 systemd 4 a b EOF # special characters and trailing comments and leading and trailing whitespace # and empty lines and and empty comments and duplicated fields my $data_funky = <<'EOF'; ## test # # x y # z z - 1+ ## whoa bar 5 1 2 22 10 18 # comment ## comment bbb 4 7 8 88 11 2 EOF check(<<'EOF', '-p', 'M,aaa=bbb,aaa=USER,ccc', {data => $data_specialchars}); #!/bin/xxx # %MEM TIME+ COMMAND aaa=bbb aaa ccc=ddd ccc=ddd 1.2 0:01.42 mailalert.pl 1 dima a b 1.9 1:29.57 mpv 2 dima a b 0.2 0:00.02 top 3 dima a b 0.2 1:04.41 systemd 4 root a b EOF check('ERROR', '-p', 'x=ccc=ddd', {data => $data_specialchars}); check(<<'EOF', '-p', q{s=1 + %CPU,s2=%CPU + 2,s3=TIME+ + 1,s4=1 + TIME+}, {data => $data_specialchars}); #!/bin/xxx # s s2 s3 s4 6.9 7.9 1 1 6.9 7.9 2 2 6.9 7.9 1 1 1 2 2 2 EOF check(<<'EOF', '-p', '-,#,1+', {data => $data_funky}); ## test # # - # 1+ ## whoa 10 1 18 ## comment 11 7 2 EOF check(<<'EOF', '-p', 'x', {data => $data_funky}); ## test # # x ## whoa bar ## comment bbb EOF check(<<'EOF', '-p', 'z', {data => $data_funky}); ## test # # z z ## whoa 2 22 ## comment 8 88 EOF check(<<'EOF', '-p', 'x=1+ + 5', {data => $data_funky}); ## test # # x ## whoa 23 ## comment 7 EOF check('ERROR', '-p', 's=z+1', {data => $data_funky}); # A log with duplicated columns should generally behave normally, if we aren't # explicitly touching the duplicate columns my $data_int_dup = <<'EOF'; # c a c 2 1 a 4 - b 6 5 c EOF check(<<'EOF', qw(1), {data => $data_int_dup}); # c a c 2 1 a 4 - b 6 5 c EOF check(<<'EOF', qw(a==1), {data => $data_int_dup}); # c a c 2 1 a EOF check('ERROR', qw(c==1), {data => $data_int_dup}); check(<<'EOF', qw(-p a), {data => $data_int_dup}); # a 1 5 EOF check(<<'EOF', qw(-p a --noskipempty), {data => $data_int_dup}); # a 1 - 5 EOF check(<<'EOF', qw(-p .), {data => $data_int_dup}); # c a c 2 1 a 4 - b 6 5 c EOF check(<<'EOF', qw(-p c), {data => $data_int_dup}); # c c 2 a 4 b 6 c EOF check( <<'EOF', qw(-p b) ); #!/bin/xxx # b 2 9 11 EOF check( <<'EOF', qw(-p b*) ); #!/bin/xxx # b 2 9 11 EOF check('ERROR', qw(-p X*)); # exclusions check(<<'EOF', '--noskipempty', '-p', '.,!c', {data => $data_int_dup}); # a 1 - 5 EOF check(<<'EOF', '--noskipempty', '-p', '.,!a', {data => $data_int_dup}); # c c 2 a 4 b 6 c EOF check(<<'EOF', '--noskipempty', '-p', '.,!c*', {data => $data_int_dup}); # a 1 - 5 EOF check(<<'EOF', '--noskipempty', qw(-p !c), {data => $data_int_dup}); # a 1 - 5 EOF check(<<'EOF', '--noskipempty', qw(-p !a), {data => $data_int_dup}); # c c 2 a 4 b 6 c EOF check('ERROR', '--noskipempty', '-p', '.,!.*', {data => $data_int_dup}); # no cols left check('ERROR', '--noskipempty', '-p', '.,!xxxx', {data => $data_int_dup}); # col not found check(<<'EOF', '--noskipempty', '-p', 'a,b=a,z=a,![az]', {data => $data_int_dup}); # b 1 - 5 EOF ############### Context stuff: -A, -B, -C my $data_seq15 = <<'EOF'; #!/bin/xxx # x 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 EOF check( <<'EOF', ['-p', '.,x2=2*x', '-A', '1', '(x-3)%10 == 0'], {data => $data_seq15} ); #!/bin/xxx # x x2 3 6 4 8 ## 13 26 14 28 EOF check( <<'EOF', ['-p', '.,x2=2*x', '-A1', '(x-3)%10 == 0'], {data => $data_seq15} ); #!/bin/xxx # x x2 3 6 4 8 ## 13 26 14 28 EOF check( <<'EOF', ['-p', '.,x2=2*x', '-A8', '(x-3)%10 == 0'], {data => $data_seq15} ); #!/bin/xxx # x x2 3 6 4 8 5 10 6 12 7 14 8 16 9 18 10 20 11 22 ## 13 26 14 28 15 30 EOF check( <<'EOF', ['-p', '.,x2=2*x', '-A9', '(x-3)%10 == 0'], {data => $data_seq15} ); #!/bin/xxx # x x2 3 6 4 8 5 10 6 12 7 14 8 16 9 18 10 20 11 22 12 24 13 26 14 28 15 30 EOF check( <<'EOF', ['-p', '.,x2=2*x', '-B1', '(x-3)%10 == 0'], {data => $data_seq15} ); #!/bin/xxx # x x2 2 4 3 6 ## 12 24 13 26 EOF check( <<'EOF', ['-p', '.,x2=2*x', '-B2', '(x-3)%10 == 0'], {data => $data_seq15} ); #!/bin/xxx # x x2 1 2 2 4 3 6 ## 11 22 12 24 13 26 EOF check( <<'EOF', ['-p', '.,x2=2*x', '-B3', '(x-3)%10 == 0'], {data => $data_seq15} ); #!/bin/xxx # x x2 1 2 2 4 3 6 ## 10 20 11 22 12 24 13 26 EOF check( <<'EOF', ['-p', '.,x2=2*x', '-B8', '(x-3)%10 == 0'], {data => $data_seq15} ); #!/bin/xxx # x x2 1 2 2 4 3 6 ## 5 10 6 12 7 14 8 16 9 18 10 20 11 22 12 24 13 26 EOF check( <<'EOF', ['-p', '.,x2=2*x', '-B9', '(x-3)%10 == 0'], {data => $data_seq15} ); #!/bin/xxx # x x2 1 2 2 4 3 6 4 8 5 10 6 12 7 14 8 16 9 18 10 20 11 22 12 24 13 26 EOF check( <<'EOF', ['-p', '.,x2=2*x', '-C1', '(x-3)%10 == 0'], {data => $data_seq15} ); #!/bin/xxx # x x2 2 4 3 6 4 8 ## 12 24 13 26 14 28 EOF check( <<'EOF', ['-p', '.,x2=2*x', '-C4', '(x-3)%10 == 0'], {data => $data_seq15} ); #!/bin/xxx # x x2 1 2 2 4 3 6 4 8 5 10 6 12 7 14 ## 9 18 10 20 11 22 12 24 13 26 14 28 15 30 EOF check( <<'EOF', ['-p', '.,x2=2*x', '-C5', '(x-3)%10 == 0'], {data => $data_seq15} ); #!/bin/xxx # x x2 1 2 2 4 3 6 4 8 5 10 6 12 7 14 8 16 9 18 10 20 11 22 12 24 13 26 14 28 15 30 EOF ########################################## # Testing context stuff (-A/-B/-C) together with diff() expressions. my $data_reldiff_context = <<'EOF'; #!/bin/xxx # a b c 1 2 3 4 - 6 4 - 6 4 - 6 4 - 6 7 9 - 10 11 12 EOF # Baselines: check( <<'EOF', ['-p', 'p=prev(b)', '--noskipempty'], {data => $data_reldiff_context}); #!/bin/xxx # p - 2 - - - - 9 EOF check( <<'EOF', ['-p', 'p=prev(b)'], {data => $data_reldiff_context}); #!/bin/xxx # p 2 9 EOF check( <<'EOF', ['-p', 'p=prev(b)', '--noskipempty', '--has','b'], {data => $data_reldiff_context}); #!/bin/xxx # p - 2 9 EOF check( <<'EOF', ['-p', 'p=prev(b)', '--noskipempty', 'a!=4'], {data => $data_reldiff_context}); #!/bin/xxx # p - - 9 EOF # The context business only kicks in with 'matches' expressions. I.e. any # records thrown out by --has or --skipempty are not output as context check( <<'EOF', ['-A1', '-p', 'p=prev(b)', '--noskipempty'], {data => $data_reldiff_context}); #!/bin/xxx # p - 2 - - - - 9 EOF check( <<'EOF', ['-A1', '-p', 'p=prev(b)'], {data => $data_reldiff_context}); #!/bin/xxx # p 2 9 EOF check( <<'EOF', ['-A1', '-p', 'p=prev(b)', '--noskipempty', '--has','b'], {data => $data_reldiff_context}); #!/bin/xxx # p - 2 9 EOF check( <<'EOF', ['-A1', '-p', 'p=prev(b)', '--noskipempty', 'a!=4'], {data => $data_reldiff_context}); #!/bin/xxx # p - 2 ## - 9 EOF check( <<'EOF', ['-B1', '-p', 'p=prev(b)', '--noskipempty'], {data => $data_reldiff_context}); #!/bin/xxx # p - 2 - - - - 9 EOF check( <<'EOF', ['-B1', '-p', 'p=prev(b)'], {data => $data_reldiff_context}); #!/bin/xxx # p 2 9 EOF check( <<'EOF', ['-B1', '-p', 'p=prev(b)', '--noskipempty', '--has','b'], {data => $data_reldiff_context}); #!/bin/xxx # p - 2 9 EOF check( <<'EOF', ['-B1', '-p', 'p=prev(b)', '--noskipempty', 'a!=4'], {data => $data_reldiff_context}); #!/bin/xxx # p - ## - - 9 EOF check( <<'EOF', ['-C1', '-p', 'p=prev(b)', '--noskipempty'], {data => $data_reldiff_context}); #!/bin/xxx # p - 2 - - - - 9 EOF check( <<'EOF', ['-C1', '-p', 'p=prev(b)'], {data => $data_reldiff_context}); #!/bin/xxx # p 2 9 EOF check( <<'EOF', ['-C1', '-p', 'p=prev(b)', '--noskipempty', '--has','b'], {data => $data_reldiff_context}); #!/bin/xxx # p - 2 9 EOF check( <<'EOF', ['-C1', '-p', 'p=prev(b)', '--noskipempty', 'a!=4'], {data => $data_reldiff_context}); #!/bin/xxx # p - 2 ## - - 9 EOF ########################################## # latestdefined() my $data_latestdefined = <<'EOF'; #!/bin/xxx # a b c 1 2 - 4 - 6 5 - 9 8 - 7 4 - 6 7 9 - 10 11 12 EOF # Baselines: check( <<'EOF', ['-p', 'p=latestdefined(b)', '--noskipempty'], {data => $data_latestdefined}); #!/bin/xxx # p 2 2 2 2 2 9 11 EOF check( <<'EOF', ['-p', 'p=latestdefined(c)', '--noskipempty'], {data => $data_latestdefined}); #!/bin/xxx # p - 6 9 7 6 6 12 EOF check( <<'EOF', ['-p', 'p=latestdefined(b)'], {data => $data_latestdefined}); #!/bin/xxx # p 2 2 2 2 2 9 11 EOF check( <<'EOF', ['-p', 'p=latestdefined(c)'], {data => $data_latestdefined}); #!/bin/xxx # p 6 9 7 6 6 12 EOF # check funny whitespace behavior my $data_funny_whitespace = <<'EOF'; # # # ## xxx # a b c ## yyy 1 2 3 3 4 5 EOF check( <<'EOF', qw(-p .), {data => $data_funny_whitespace}); # # # ## xxx # a b c ## yyy 1 2 3 3 4 5 EOF check( <<'EOF', qw(-p a), {data => $data_funny_whitespace}); # # # ## xxx # a ## yyy 1 3 EOF check( <<'EOF', qw(-p . --skipcomments), {data => $data_funny_whitespace}); # a b c 1 2 3 3 4 5 EOF check( <<'EOF', qw(--noskipempty), {data => $data_funny_whitespace}); # # # ## xxx # a b c ## yyy 1 2 3 3 4 5 EOF check( <<'EOF', qw(--noskipempty --skipcomments), {data => $data_funny_whitespace}); # a b c 1 2 3 3 4 5 EOF my $data_latlon = <<'EOF'; #!/bin/xxx # lat lon lat2 lon2 37.0597792247 -76.1703387355 37.0602752259 -76.1705049567 37.0598883299 -76.1703577868 37.0604772596 -76.1705748082 37.0599879749 -76.1703966222 37.0605833650 -76.1706010153 37.0600739448 -76.1704347187 37.0606881510 -76.1706390439 37.0601797672 -76.1704662408 37.0607908914 -76.1706712460 EOF # # awk and perl write out the data with different precisions, so I test them separately for now # check( <<'EOF', '-p', 'rel_n(lat),rel_e(lon),rel_n(lat2),rel_e(lon2)', {language => 'AWK', data => $data_latlon} ); # # rel_n(lat) rel_e(lon) rel_n(lat2) rel_e(lon2) # 0 0 55.1528 -14.7495 # 12.1319 -1.6905 77.6179 -20.9478 # 23.212 -5.13654 89.4163 -23.2732 # 32.7714 -8.51701 101.068 -26.6477 # 44.5383 -11.3141 112.492 -29.5051 # EOF # check( <<'EOF', '-p', 'rel_n(lat),rel_e(lon),rel_n(lat2),rel_e(lon2)', {language => 'perl', data => $data_latlon} ); # # rel_n(lat) rel_e(lon) rel_n(lat2) rel_e(lon2) # 0 0 55.1528170494324 -14.7495300237067 # 12.1319447101403 -1.69050470904804 77.6179395005555 -20.9477574245461 # 23.2119631755057 -5.13653865865383 89.4163216701965 -23.2732273896387 # 32.7713799003105 -8.51700679638622 101.06799325373 -26.6476704659731 # 44.5382939050826 -11.3140998289326 112.492204495462 -29.505102855998 # EOF # check with column names 0,1,2,3... vnl-filter had a bug where these confused # things my $data_simple_colnames = <<'EOF'; # aaa 0 1 2 3 bbb 1 2 3 4 5 6 11 12 13 14 15 16 EOF check( <<'EOF', "-p", "aaa,bbb", {data => $data_simple_colnames}); # aaa bbb 1 6 11 16 EOF if($Nfailed == 0 ) { say colored(["green"], "All tests passed!"); exit 0; } else { say colored(["red"], "$Nfailed tests failed!"); exit 1; } sub check { # I check stuff twice. Once with perl processing, and again with awk # processing my ($expected, @args) = @_; my @langs; my $data; if(ref($args[-1]) && ref($args[-1]) eq 'HASH' ) { my $opts = pop @args; if($opts->{language}) { push @langs, ($opts->{language} =~ /perl/i ? 1 : 0); } if($opts->{data}) { $data = $opts->{data}; } } if( !@langs ) { @langs = (0,1); } $data //= $data_default; LANGUAGE: for my $doperl (@langs) { # if the arguments are a list of strings, these are simply the args to a # filter run. If the're a list of list-refs, then we run the filter # multiple times, with the arguments in each list-ref if ( !ref $args[0] ) { my @args2 = @args; @args = (\@args2); } # @args is now a list-ref. Each element is a filter operation my $in = $data; my $out; my $err; for my $arg (@args) { my @args_here = @$arg; push @args_here, '--perl' if $doperl; $out = ''; my $result = run( ["perl", "$RealBin/../vnl-filter", @args_here], \$in, \$out, \$err ); $in = $out; if($expected ne 'ERROR' && !$result) { cluck "Test failed. Expected success, but got failure"; $Nfailed++; next LANGUAGE; } if($expected eq 'ERROR' && $result) { cluck "Test failed. Expected failure, but got success"; $Nfailed++; next LANGUAGE; } if($expected eq 'ERROR' && !$result) { # successful failure next LANGUAGE; } } my $diff = diff(\$expected, \$out); if ( length $diff ) { cluck "Test failed when doperl=$doperl; diff: '$diff'"; $Nfailed++; } } } vnlog-1.40/test/test_vnl-join.pl000077500000000000000000000240701475400722300167040ustar00rootroot00000000000000#!/usr/bin/env perl use strict; use warnings; use feature ':5.10'; use FindBin '$RealBin'; use lib $RealBin; use IPC::Run 'run'; use TestHelpers qw(test_init check); use Term::ANSIColor; my $Nfailed = 0; # I try to detect the join flavor. Not doing FEATURE detection here, because I # want to test the feature my $in = ''; my $out = ''; my $err = ''; my $have_fancy_join; if(run(['join', '--version'], \$in, \$out, \$err)) { # success if($out =~ /GNU/) { $have_fancy_join = 1; say "Detected GNU join. Running a full test of vnl-join"; } else { die "I don't know which 'join' this is. 'join --version' succeeed, but didn't say it was 'GNU' join"; } } else { $have_fancy_join = 0; say "Detected non-GNU join ('join --version' failed): Running a limited test of vnl-join"; } my $data1 = <<'EOF'; #!/bin/xxx ## asdf # a b e ## asdf 1a 22b 9 # asdf 5a 32b 10 ## zxcv 6a 42b 11 EOF # Look at all the funny whitespace! Leading whitespace shouldn't matter. # Whitespace-only lines are comments. Single-# line with ONLY whitespace is a # hard-comment, NOT a legend my $data22 = <<'EOF'; #!/bin/xxx ## zxcv ## zxcv # adsf ## vvv # #b c d e ## uuu ## d # xx 22b 1c 5d 8 # asdf 32b 5c 6d 9 ## zxcv 52b 6c 7d 10 EOF my $data3 = <<'EOF'; # b f 22b 18 32b 29 52b 30 62b 11 EOF my $data_int = <<'EOF'; # a b 1 a 2 b 3 c EOF my $data_int_dup = <<'EOF'; # a c c 1 10 A 2 11 B 3 12 C EOF my $data_empty1 = <<'EOF'; # a c d EOF my $data_empty2 = <<'EOF'; # cc dd a EOF test_init('vnl-join', \$Nfailed, '$data1' => $data1, '$data22' => $data22, '$data3' => $data3, '$data_int' => $data_int, '$data_int_dup'=> $data_int_dup, '$data_empty1' => $data_empty1, '$data_empty2' => $data_empty2); check( 'ERROR', (), '$data1', '$data22' ); check( 'ERROR', qw(-e x), '$data1', '$data22' ); check( 'ERROR', qw(-t x), '$data1', '$data22' ); check( 'ERROR', qw(-1 x), '$data1', '$data22' ); check( 'ERROR', qw(-2 x), '$data1', '$data22' ); check( 'ERROR', qw(-1 x -2 x), '$data1', '$data22' ); check( 'ERROR', qw(-1 b -2 d), '$data1', '$data22' ); check( 'ERROR', qw(-j x), '$data1', '$data22' ); check( 'ERROR', qw(-1 x -j x), '$data1', '$data22' ); check( 'ERROR', qw(-1 b -2 b -j b), '$data1', '$data22' ); check( <<'EOF', qw(-1 b -2 b), '$data1', '$data22' ); # b a e c d e 22b 1a 9 1c 5d 8 32b 5a 10 5c 6d 9 EOF check( <<'EOF', qw(-1 b -2 b --vnl-prefix1 aaa --vnl-suffix2 bbb), '$data1', '$data22' ); # b aaaa aaae cbbb dbbb ebbb 22b 1a 9 1c 5d 8 32b 5a 10 5c 6d 9 EOF check( <<'EOF', qw(-1 b -2 b -o), '0,1.a,2.e', qw( --vnl-prefix1 aaa --vnl-suffix2 bbb), '$data1', '$data22' ); # b aaaa ebbb 22b 1a 8 32b 5a 9 EOF check( 'ERROR', qw(-1 b -2 b --vnl-autoprefix --vnl-prefix1 aaa --vnl-suffix2 bbb), '$data1', '$data22' ); check( <<'EOF', qw(-1 b -2 b --vnl-autoprefix), '$data1', '$data22' ); # b 1_a 1_e 22_c 22_d 22_e 22b 1a 9 1c 5d 8 32b 5a 10 5c 6d 9 EOF check( <<'EOF', qw(-1 b -2 b --vnl-autosuffix), '$data1', '$data22' ); # b a_1 e_1 c_22 d_22 e_22 22b 1a 9 1c 5d 8 32b 5a 10 5c 6d 9 EOF check( 'ERROR', qw(-1 b -2 b --vnl-autosuffix), '$data1', '-$data22' ); check( <<'EOF', qw(-1b -2b), '$data1', '$data22' ); # b a e c d e 22b 1a 9 1c 5d 8 32b 5a 10 5c 6d 9 EOF check( <<'EOF', qw(-j b), '$data1', '$data22' ); # b a e c d e 22b 1a 9 1c 5d 8 32b 5a 10 5c 6d 9 EOF check( <<'EOF', qw(-j b), '$data1', '-$data22' ); # b a e c d e 22b 1a 9 1c 5d 8 32b 5a 10 5c 6d 9 EOF check( <<'EOF', qw(-j b), '-$data1', '$data22' ); # b a e c d e 22b 1a 9 1c 5d 8 32b 5a 10 5c 6d 9 EOF check( <<'EOF', qw(-j b), '-$data1', '$data22' ); # b a e c d e 22b 1a 9 1c 5d 8 32b 5a 10 5c 6d 9 EOF check( <<'EOF', qw(-jb -a1), '-$data1', '$data22' ); # b a e c d e 22b 1a 9 1c 5d 8 32b 5a 10 5c 6d 9 42b 6a 11 - - - EOF check( <<'EOF', qw(-jb -a2), '-$data1', '$data22' ); # b a e c d e 22b 1a 9 1c 5d 8 32b 5a 10 5c 6d 9 52b - - 6c 7d 10 EOF check( <<'EOF', qw(-jb -a2 -a1), '$data1', '$data22' ); # b a e c d e 22b 1a 9 1c 5d 8 32b 5a 10 5c 6d 9 42b 6a 11 - - - 52b - - 6c 7d 10 EOF check( <<'EOF', qw(-jb -a-), '$data1', '$data22' ); # b a e c d e 22b 1a 9 1c 5d 8 32b 5a 10 5c 6d 9 42b 6a 11 - - - 52b - - 6c 7d 10 EOF check( <<'EOF', qw(-jb -a1 -a2), '$data1', '$data22' ); # b a e c d e 22b 1a 9 1c 5d 8 32b 5a 10 5c 6d 9 42b 6a 11 - - - 52b - - 6c 7d 10 EOF check( <<'EOF', qw(-jb -v1), '$data1', '$data22' ); # b a e c d e 42b 6a 11 - - - EOF check( <<'EOF', qw(-jb -v2), '-$data1', '$data22' ); # b a e c d e 52b - - 6c 7d 10 EOF check( <<'EOF', qw(-jb -v1 -v2), '$data1', '$data22' ); # b a e c d e 42b 6a 11 - - - 52b - - 6c 7d 10 EOF check( <<'EOF', qw(-jb -v-), '$data1', '$data22' ); # b a e c d e 42b 6a 11 - - - 52b - - 6c 7d 10 EOF check( <<'EOF', qw(-jb -o), '1.a,0,2.d,1.e,0,2.e', '$data1', '$data22' ); # a b d e b e 1a 22b 5d 9 22b 8 5a 32b 6d 10 32b 9 EOF check( 'ERROR', qw(-jb -o), '9.a,0,2.d,1.e,0,2.e', '$data1', '$data22' ); check( 'ERROR', qw(-jb -o), '1.ax,0,2.d,1.e,0,2.e', '$data1', '$data22' ); check( 'ERROR', qw(-jb -o), '1.a,9,2.d,1.e,0,2.e', '$data1', '$data22' ); # these keys are sorted numerically, not lexicographically if($have_fancy_join) { check( 'ERROR', qw(-j e), '$data1', '$data22' ); } check( <<'EOF', qw(-j e --vnl-sort - --vnl-suffix1 1), '$data1', '$data22' ); # e a1 b1 b c d 10 5a 32b 52b 6c 7d 9 1a 22b 32b 5c 6d EOF check( <<'EOF', qw(-j e --vnl-sort n --vnl-suffix1 1), '$data1', '$data22' ); # e a1 b1 b c d 9 1a 22b 32b 5c 6d 10 5a 32b 52b 6c 7d EOF # Now make sure irrelevant dups don't break me check( <<'EOF', qw(-j a), '$data_int', '$data_int_dup' ); # a b c c 1 a 10 A 2 b 11 B 3 c 12 C EOF # But that relevant dups do check( 'ERROR', qw(-j c), '$data_int', '$data_int_dup' ); # 3-way joins check( <<'EOF', qw(-jb), '$data1', '$data22', '$data3'); # b a e c d e f 22b 1a 9 1c 5d 8 18 32b 5a 10 5c 6d 9 29 EOF check( 'ERROR', qw(-1b), '$data1', '$data22', '$data3'); # I check -a- with ALL ordering of passed-in data check( <<'EOF', qw(-jb -a-), '$data1', '$data22', '$data3'); # b a e c d e f 22b 1a 9 1c 5d 8 18 32b 5a 10 5c 6d 9 29 42b 6a 11 - - - - 52b - - 6c 7d 10 30 62b - - - - - 11 EOF check( <<'EOF', qw(-jb -a-), '$data1', '$data3', '$data22'); # b a e f c d e 22b 1a 9 18 1c 5d 8 32b 5a 10 29 5c 6d 9 42b 6a 11 - - - - 52b - - 30 6c 7d 10 62b - - 11 - - - EOF check( <<'EOF', qw(-jb -a-), '$data22', '$data1', '$data3'); # b c d e a e f 22b 1c 5d 8 1a 9 18 32b 5c 6d 9 5a 10 29 42b - - - 6a 11 - 52b 6c 7d 10 - - 30 62b - - - - - 11 EOF check( <<'EOF', qw(-jb -a-), '$data22', '$data3', '$data1'); # b c d e f a e 22b 1c 5d 8 18 1a 9 32b 5c 6d 9 29 5a 10 42b - - - - 6a 11 52b 6c 7d 10 30 - - 62b - - - 11 - - EOF check( <<'EOF', qw(-jb -a-), '$data3', '$data1', '$data22'); # b f a e c d e 22b 18 1a 9 1c 5d 8 32b 29 5a 10 5c 6d 9 42b - 6a 11 - - - 52b 30 - - 6c 7d 10 62b 11 - - - - - EOF check( <<'EOF', qw(-jb -a-), '$data3', '$data22', '$data1'); # b f c d e a e 22b 18 1c 5d 8 1a 9 32b 29 5c 6d 9 5a 10 42b - - - - 6a 11 52b 30 6c 7d 10 - - 62b 11 - - - - - EOF # 3-way -o. Generally unsupported check( 'ERROR', '-jb', '-o', '1.a,0,3.f,2.c,3.b,1.b,1.e,2.e', '$data1', '$data22', '$data3'); if($have_fancy_join) { check( <<'EOF', qw(-jb -o auto), '$data1', '$data22', '$data3'); # b a e c d e f 22b 1a 9 1c 5d 8 18 32b 5a 10 5c 6d 9 29 EOF } # 3-way prefix/suffix check( <<'EOF', qw(-jb --vnl-prefix1 a_ --vnl-suffix2 _c), '$data1', '$data22', '$data3'); # b a_a a_e c_c d_c e_c f 22b 1a 9 1c 5d 8 18 32b 5a 10 5c 6d 9 29 EOF check( <<'EOF', qw(-jb --vnl-autoprefix), '$data1', '$data22', '$data3'); # b 1_a 1_e 22_c 22_d 22_e 3_f 22b 1a 9 1c 5d 8 18 32b 5a 10 5c 6d 9 29 EOF check( <<'EOF', qw(-jb --vnl-autosuffix), '$data1', '$data22', '$data3'); # b a_1 e_1 c_22 d_22 e_22 f_3 22b 1a 9 1c 5d 8 18 32b 5a 10 5c 6d 9 29 EOF check( <<'EOF', qw(-jb --vnl-prefix a_ --vnl-suffix), ',,_c', '$data1', '$data22', '$data3'); # b a_a a_e c d e f_c 22b 1a 9 1c 5d 8 18 32b 5a 10 5c 6d 9 29 EOF check( <<'EOF', qw(-jb --vnl-prefix), 'a_,,c_', '$data1', '$data22', '$data3'); # b a_a a_e c d e c_f 22b 1a 9 1c 5d 8 18 32b 5a 10 5c 6d 9 29 EOF check( 'ERROR', qw(-jb --vnl-prefix), 'a_,,c_', qw(--vnl-prefix1 f), '$data1', '$data22', '$data3'); check( 'ERROR', qw(-jb --vnl-prefix), 'a_,,c_', qw(--vnl-autoprefix f), '$data1', '$data22', '$data3'); # 3-way pre-sorting/post-sorting # Again, I check ALL the orderings of passed-in data check( <<'EOF', qw(-jb --vnl-sort=r -a-), '$data1', '$data22', '$data3'); # b a e c d e f 62b - - - - - 11 52b - - 6c 7d 10 30 42b 6a 11 - - - - 32b 5a 10 5c 6d 9 29 22b 1a 9 1c 5d 8 18 EOF check( <<'EOF', qw(-jb --vnl-sort=r -a-), '$data1', '$data3', '$data22'); # b a e f c d e 62b - - 11 - - - 52b - - 30 6c 7d 10 42b 6a 11 - - - - 32b 5a 10 29 5c 6d 9 22b 1a 9 18 1c 5d 8 EOF check( <<'EOF', qw(-jb --vnl-sort=r -a-), '$data22', '$data1', '$data3'); # b c d e a e f 62b - - - - - 11 52b 6c 7d 10 - - 30 42b - - - 6a 11 - 32b 5c 6d 9 5a 10 29 22b 1c 5d 8 1a 9 18 EOF check( <<'EOF', qw(-jb --vnl-sort=r -a-), '$data22', '$data3', '$data1'); # b c d e f a e 62b - - - 11 - - 52b 6c 7d 10 30 - - 42b - - - - 6a 11 32b 5c 6d 9 29 5a 10 22b 1c 5d 8 18 1a 9 EOF check( <<'EOF', qw(-jb --vnl-sort=r -a-), '$data3', '$data1', '$data22'); # b f a e c d e 62b 11 - - - - - 52b 30 - - 6c 7d 10 42b - 6a 11 - - - 32b 29 5a 10 5c 6d 9 22b 18 1a 9 1c 5d 8 EOF check( <<'EOF', qw(-jb --vnl-sort=r -a-), '$data3', '$data22', '$data1'); # b f c d e a e 62b 11 - - - - - 52b 30 6c 7d 10 - - 42b - - - - 6a 11 32b 29 5c 6d 9 5a 10 22b 18 1c 5d 8 1a 9 EOF check( <<'EOF', qw(-ja -a-), '$data1', '$data_empty1'); # a b e c d 1a 22b 9 - - 5a 32b 10 - - 6a 42b 11 - - EOF check( <<'EOF', qw(-ja -a-), '$data1', '$data_empty2'); # a b e cc dd 1a 22b 9 - - 5a 32b 10 - - 6a 42b 11 - - EOF check( <<'EOF', qw(-ja -a-), '$data_empty1', '$data1'); # a c d b e 1a - - 22b 9 5a - - 32b 10 6a - - 42b 11 EOF check( <<'EOF', qw(-ja -a-), '$data_empty1', '$data1', '$data_empty2'); # a c d b e cc dd 1a - - 22b 9 - - 5a - - 32b 10 - - 6a - - 42b 11 - - EOF if($Nfailed == 0 ) { say colored(["green"], "All tests passed!"); exit 0; } else { say colored(["red"], "$Nfailed tests failed!"); exit 1; } vnlog-1.40/test/test_vnl-sort.pl000077500000000000000000000100061475400722300167260ustar00rootroot00000000000000#!/usr/bin/env perl use strict; use warnings; use feature ':5.10'; use FindBin '$RealBin'; use lib $RealBin; use TestHelpers qw(test_init check); use Term::ANSIColor; my $Nfailed = 0; my $data1 = <<'EOF'; #!/bin/xxx ## xxx # a b 1 1.69 ## asdf # 1234 20 0.09# xxx 3 0.49 # yyy 4 2.89 ## zzz 5 7.29## zzz EOF my $data2 = <<'EOF'; #!/bin/xxx ## zzz ## # #a b ## yyy # adsf # 345 9 -2 8 -4 7 -6 6 -8 5 -10 EOF my $data_not_ab = <<'EOF'; #!/bin/xxx # a b c 1 2 3 4 5 6 EOF my $data3 = <<'EOF'; #!/bin/xxx # a b c d 4 150 156 3 211 24 3 231 4 150 156 23 211 24 2 231 32 150 156 3 111 24 3 231 EOF my $data_int_dup = <<'EOF'; # a c c 1 10 A 2 11 B 3 12 C EOF # special characters and trailing comments and leading and trailing whitespace # and empty lines and and empty comments and duplicated fields my $data_funky = <<'EOF'; ## test # # x y # z z - 1+ ## whoa bar 5 1 2 22 10 18 # comment ## comment bbb 4 7 8 88 11 2 EOF test_init('vnl-sort', \$Nfailed, '$data1' => $data1, '$data2' => $data2, '$data3' => $data3, '$data_int_dup'=> $data_int_dup, '$data_not_ab' => $data_not_ab, '$data_funky' => $data_funky); check( <<'EOF', qw(-k a), '$data1', '$data2' ); # a b 1 1.69 20 0.09 3 0.49 4 2.89 5 -10 5 7.29 6 -8 7 -6 8 -4 9 -2 EOF check( <<'EOF', qw(-k a), '$data2', '$data1' ); # a b 1 1.69 20 0.09 3 0.49 4 2.89 5 -10 5 7.29 6 -8 7 -6 8 -4 9 -2 EOF check( <<'EOF', qw(-k a), '-$data1', '$data2' ); # a b 1 1.69 20 0.09 3 0.49 4 2.89 5 -10 5 7.29 6 -8 7 -6 8 -4 9 -2 EOF check( <<'EOF', qw(-k a), '$data1', '-$data2' ); # a b 1 1.69 20 0.09 3 0.49 4 2.89 5 -10 5 7.29 6 -8 7 -6 8 -4 9 -2 EOF check( <<'EOF', qw(-k a), '-$data2' ); # a b 5 -10 6 -8 7 -6 8 -4 9 -2 EOF check( <<'EOF', qw(-k a), '--$data2' ); # a b 5 -10 6 -8 7 -6 8 -4 9 -2 EOF check( <<'EOF', qw(-k b), '--$data2' ); # a b 5 -10 9 -2 8 -4 7 -6 6 -8 EOF check( <<'EOF', qw(-n -k b), '--$data2' ); # a b 5 -10 6 -8 7 -6 8 -4 9 -2 EOF check( <<'EOF', qw(-n -k a), '$data1', '$data2' ); # a b 1 1.69 3 0.49 4 2.89 5 -10 5 7.29 6 -8 7 -6 8 -4 9 -2 20 0.09 EOF check( <<'EOF', qw(-n -k b), '$data1', '$data2' ); # a b 5 -10 6 -8 7 -6 8 -4 9 -2 20 0.09 3 0.49 1 1.69 4 2.89 5 7.29 EOF check( <<'EOF', qw(-n --key b), '$data1', '$data2' ); # a b 5 -10 6 -8 7 -6 8 -4 9 -2 20 0.09 3 0.49 1 1.69 4 2.89 5 7.29 EOF check( <<'EOF', qw(-n --key=b), '$data1', '$data2' ); # a b 5 -10 6 -8 7 -6 8 -4 9 -2 20 0.09 3 0.49 1 1.69 4 2.89 5 7.29 EOF # don't have this field check( 'ERROR', qw(-k x), '$data1', '-$data2' ); # inconsistent fields check( 'ERROR', qw(-k a), '$data1', '-$data2', '$data_not_ab' ); # unsupported options check( 'ERROR', qw(-t f), '$data1' ); check( 'ERROR', qw(-z), '$data1' ); check( 'ERROR', qw(-o xxx), '$data1' ); ################ fancy key-ing # Sort numerically on each field. Front one most significant check( <<'EOF', '-k', 'a.n', '-k', 'b.n', '-k', 'c.n', '-k', 'd.n', '$data3' ); # a b c d 4 150 156 3 4 150 156 23 32 150 156 3 111 24 3 231 211 24 2 231 211 24 3 231 EOF # Sort numerically on each field. Last one most significant check( <<'EOF', '-k', 'd.n', '-k', 'c.n', '-k', 'b.n', '-k', 'a.n', '$data3' ); # a b c d 4 150 156 3 32 150 156 3 4 150 156 23 211 24 2 231 111 24 3 231 211 24 3 231 EOF # Sort numerically on each field, except the last. First one most significant check( <<'EOF', '-k', 'a.n', '-k', 'b.n', '-k', 'c.n', '-k', 'd', '$data3' ); # a b c d 4 150 156 23 4 150 156 3 32 150 156 3 111 24 3 231 211 24 2 231 211 24 3 231 EOF # Now make sure irrelevant dups don't break me check( <<'EOF', qw(-k a), '$data_int_dup' ); # a c c 1 10 A 2 11 B 3 12 C EOF # But that relevant dups do check( 'ERROR', qw(-k c), '$data_int_dup' ); # funky data works check( <<'EOF', '-nk', '1+', '$data_funky' ); # x y # z z - 1+ bbb 4 7 8 88 11 2 bar 5 1 2 22 10 18 EOF check( 'ERROR', qw(-nk z), '$data_funky' ); if($Nfailed == 0 ) { say colored(["green"], "All tests passed!"); exit 0; } else { say colored(["red"], "$Nfailed tests failed!"); exit 1; } vnlog-1.40/test/test_vnl-uniq.pl000077500000000000000000000113171475400722300167210ustar00rootroot00000000000000#!/usr/bin/env perl use strict; use warnings; use feature ':5.10'; use FindBin '$RealBin'; use lib $RealBin; use IPC::Run 'run'; use TestHelpers qw(test_init check); use Term::ANSIColor; my $Nfailed = 0; # I try to detect the uniq flavor. Not doing FEATURE detection here, because I # want to test the feature my $in = ''; my $out = ''; my $err = ''; my $have_fancy_uniq; if(run(['uniq', '--version'], \$in, \$out, \$err)) { # success if($out =~ /GNU/) { $have_fancy_uniq = 1; say "Detected GNU uniq. Running a full test of vnl-uniq"; } else { die "I don't know which 'uniq' this is. 'uniq --version' succeeed, but didn't say it was 'GNU' uniq"; } } else { $have_fancy_uniq = 0; say "Detected non-GNU uniq ('uniq --version' failed): Running a limited test of vnl-uniq"; } my $data1 = <<'EOF'; #!/bin/xxx # x y 1 1 2 2 # asdf 2 2 3 3 10 1 11 1 12 1 20 2 21 2 # asdf 22 2 30 3 31 3 32 3 33 3 40 4 EOF test_init('vnl-uniq', \$Nfailed, '$data1' => $data1); # Basic run. I ignore the duplicate line "2 2" check( <<'EOF', qw(), '$data1' ); # x y 1 1 2 2 3 3 10 1 11 1 12 1 20 2 21 2 22 2 30 3 31 3 32 3 33 3 40 4 EOF # Same thing, but make sure I can read STDIN with '-' as an arg and that I can # read STDIN with no arg at all check( <<'EOF', qw(), '-$data1' ); # x y 1 1 2 2 3 3 10 1 11 1 12 1 20 2 21 2 22 2 30 3 31 3 32 3 33 3 40 4 EOF check( <<'EOF', qw(), '--$data1' ); # x y 1 1 2 2 3 3 10 1 11 1 12 1 20 2 21 2 22 2 30 3 31 3 32 3 33 3 40 4 EOF check( <<'EOF', qw(-u), '--$data1' ); # x y 1 1 3 3 10 1 11 1 12 1 20 2 21 2 22 2 30 3 31 3 32 3 33 3 40 4 EOF # I don't support this option check( 'ERROR', qw(-z), '$data1' ); # only print the one duplicate line check( <<'EOF', qw(-d), '$data1' ); # x y 2 2 EOF if($have_fancy_uniq) { # print the duplicate lines, but don't suppress their duplicate-ness check( <<'EOF', qw(-D), '$data1' ); # x y 2 2 2 2 EOF } # print duplicate lines, but don't look at the first column for the duplicate # detection. Here I print just the first one of each group check( <<'EOF', qw(-d -f1), '$data1' ); # x y 2 2 10 1 20 2 30 3 EOF check( <<'EOF', qw(-d -f 1), '$data1' ); # x y 2 2 10 1 20 2 30 3 EOF check( <<'EOF', qw(-d -f-1), '$data1' ); # x y 2 2 10 1 20 2 30 3 EOF check( <<'EOF', qw(-d -f -1), '$data1' ); # x y 2 2 10 1 20 2 30 3 EOF if($have_fancy_uniq) { # print duplicate lines, but don't look at the first column for the duplicate # detection. Here I print ALL the duplicates; since I skipped the first column, # they're not really duplicates check( <<'EOF', qw(-D -f1), '$data1' ); # x y 2 2 2 2 10 1 11 1 12 1 20 2 21 2 22 2 30 3 31 3 32 3 33 3 EOF check( <<'EOF', qw(--all-repeated -f1), '$data1' ); # x y 2 2 2 2 10 1 11 1 12 1 20 2 21 2 22 2 30 3 31 3 32 3 33 3 EOF # same thing, but using different flavors of -D, and making sure my option # parser works right check( <<'EOF', qw(--all-repeated=none -f1), '$data1' ); # x y 2 2 2 2 10 1 11 1 12 1 20 2 21 2 22 2 30 3 31 3 32 3 33 3 EOF check( <<'EOF', qw(--all-repeated=prepend -f1), '$data1' ); # x y 2 2 2 2 10 1 11 1 12 1 20 2 21 2 22 2 30 3 31 3 32 3 33 3 EOF check( <<'EOF', qw(--all-repeated=separate -f1), '$data1' ); # x y 2 2 2 2 10 1 11 1 12 1 20 2 21 2 22 2 30 3 31 3 32 3 33 3 EOF check( 'ERROR', qw(--all-repeated separate -f1), '$data1' ); # And now --group check( <<'EOF', qw(--group), '$data1' ); # x y 1 1 2 2 2 2 3 3 10 1 11 1 12 1 20 2 21 2 22 2 30 3 31 3 32 3 33 3 40 4 EOF check( <<'EOF', qw(--group -f1), '$data1' ); # x y 1 1 2 2 2 2 3 3 10 1 11 1 12 1 20 2 21 2 22 2 30 3 31 3 32 3 33 3 40 4 EOF check( <<'EOF', qw(--group=both -f1), '$data1' ); # x y 1 1 2 2 2 2 3 3 10 1 11 1 12 1 20 2 21 2 22 2 30 3 31 3 32 3 33 3 40 4 EOF check( 'ERROR', qw(--group both), '$data1' ); check( 'ERROR', qw(--group -c), '$data1' ); check( <<'EOF', qw(-D -s1), '$data1' ); # x y 2 2 2 2 EOF check( <<'EOF', qw(-D -s2), '$data1' ); # x y 2 2 2 2 10 1 11 1 12 1 20 2 21 2 22 2 30 3 31 3 32 3 33 3 EOF } check( <<'EOF', qw(-c), '$data1' ); # count x y 1 1 1 2 2 2 1 3 3 1 10 1 1 11 1 1 12 1 1 20 2 1 21 2 1 22 2 1 30 3 1 31 3 1 32 3 1 33 3 1 40 4 EOF check( <<'EOF', qw(-c -f1), '$data1' ); # count x y 1 1 1 2 2 2 1 3 3 3 10 1 3 20 2 4 30 3 1 40 4 EOF check( <<'EOF', qw(--vnl-count xxx -f1), '$data1' ); # xxx x y 1 1 1 2 2 2 1 3 3 3 10 1 3 20 2 4 30 3 1 40 4 EOF if($Nfailed == 0 ) { say colored(["green"], "All tests passed!"); exit 0; } else { say colored(["red"], "$Nfailed tests failed!"); exit 1; } vnlog-1.40/test/vnlog1.defs000066400000000000000000000000511475400722300156130ustar00rootroot00000000000000int w uint8_t x char* y double z void* d vnlog-1.40/test/vnlog2.defs000066400000000000000000000000221475400722300156120ustar00rootroot00000000000000int a int b int c vnlog-1.40/vnl-align000077500000000000000000000116371475400722300144140ustar00rootroot00000000000000#!/usr/bin/env perl use strict; use warnings; use Text::Table; my $usage = "Usage: $0 [logfile]\n"; if( exists $ARGV[0] && ($ARGV[0] eq '-h' || $ARGV[0] eq '--help')) { print $usage; exit 0; } my $table = undef; # This exists to support interstitial comments that are output without # alignment. Each chunk is: # # - integer index of the line start # - trailing comment # # Lines preceding the legend are stored in the chunk that has index < 0 my @chunks = ( [-1, ''] ); my $Nlines_here = 0; my @legend; while(<>) { if( !defined $table ) { if( !/^#[^#!]/ ) { # don't have a legend yet, and this is a ##/#! comment, not a legend $chunks[-1][1] .= $_; } else { # got legend push @chunks, [0,'']; $Nlines_here = 1; chomp; s/^# *//; @legend = split; $legend[0] = "# $legend[0]"; $table = Text::Table->new(@legend); } next; } if( /^#/ || /^\s*$/ ) { # comment. Add to the comment we're accumulating $chunks[-1][1] .= $_; next; } # data line chomp; my @fields = split; $table->add(@fields); if( length($chunks[-1][1]) == 0 ) { # Data line and we don't have a trailing comment yet. Accumulate $Nlines_here++; } else { # Our chunk already has a trailing comment, But the new line is a data # line. I start a new chunk push @chunks, [$chunks[-1][0] + $Nlines_here, '']; $Nlines_here = 1; } } for my $ichunk (0..$#chunks) { my $chunk = $chunks[$ichunk]; if( $chunk->[0] >= 0) { # This isn't a comment-only chunk. Those are the pre-legend ##/#! lines if($chunk->[0] == 0) { # Treat the legend specially: I want to center-justify the labels. # Can't figure out how to use the library to do that, so I'm doing # that manually for my $icol(0..$#legend) { my $textwidth = length($legend[$icol]); my ($fieldstart,$fieldwidth) = $table->colrange($icol); # I want to center the thing. First column is different if($icol == 0 ) { # line is '# xxx' my ($text) = $legend[$icol] =~ /^# (.*)/; $textwidth -= 2; # margin+textwidth+margin = fieldwidth my $margin0 = int(($fieldwidth - $textwidth) / 2); # rounds down my $margin1 = $fieldwidth - $textwidth - $margin0; # rounds up if($margin1 == 1) { $margin1++; $margin0--; } print( '#' . (' ' x ($margin1-1)) . $text . (' ' x $margin0)); } else { # margin+textwidth+margin = fieldwidth my $text = $legend[$icol]; my $margin0 = int(($fieldwidth - $textwidth) / 2); # rounds down my $margin1 = $fieldwidth - $textwidth - $margin0; # rounds up print( (' ' x $margin1) . $text . (' ' x $margin0)); } print( ($icol == $#legend) ? "\n" : ' '); } # done with the legend. Process this chunk from the next line $chunk->[0]++; } print $table->table($chunk->[0], $ichunk != $#chunks ? ($chunks[$ichunk+1][0] - $chunk->[0]) : $Nlines_here); } print $chunk->[1]; } __END__ =head1 NAME vnl-align - aligns vnlog columns for easy interpretation by humans =head1 SYNOPSIS $ cat tst.vnl # w x y z -10 40 asdf - -20 50 - 0.300000 -30 10 whoa 0.500000 $ vnl-align tst.vnl # w x y z -10 40 asdf - -20 50 - 0.300000 -30 10 whoa 0.500000 =head1 DESCRIPTION The basic usage is vnl-align logfile The arguments are assumed to be the vnlog files. If no arguments are given, the input comes from STDIN. This is very similar to C, but handles C<#> lines properly: 1. The first C<#> line is the legend. For the purposes of alignment, the leading C<#> character and the first column label are treated as one column 2. All other C<#> lines are output verbatim. =head1 REPOSITORY https://github.com/dkogan/vnlog/ =head1 AUTHOR Dima Kogan C<< >> =head1 LICENSE AND COPYRIGHT Copyright 2016 California Institute of Technology. Copyright 2018 Dima Kogan C<< >> This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. =cut vnlog-1.40/vnl-filter000077500000000000000000001643141475400722300146100ustar00rootroot00000000000000#!/usr/bin/env perl use strict; use warnings; use Getopt::Long qw(:config no_getopt_compat bundling); use List::Util 'max'; use List::MoreUtils qw(any all); use FindBin '$RealBin'; use lib "$RealBin/lib"; use Vnlog::Util 'get_unbuffered_line'; use Text::Balanced 'extract_bracketed'; use feature qw(say state); my $usage = < 100' 'temp > 20 && temp < 30' By default, this tool generates an awk script that's then interpreted by mawk. Although it is slower, perl can be used instead by passing --perl. This makes no difference in output in most cases, but the various expressions would be evaluated by perl, which is often useful, especially for anything non-trivial. --unbuffered flushes each line after each print. Useful for streaming data. --stream is a synonym for "--unbuffered" -A N/ -B N / -C N prints N lines of context after/before/around all records matching the given expressions. Works just like in the 'grep' tool For more information, please read the manpage. EOF if(! @ARGV) { die $usage; } # by default we do skip empty records my %options = (skipempty => 1); GetOptions(\%options, "list-columns|l", "has=s@", "pick|print|p=s@", "eval=s", "after-context|A=i", "before-context|B=i", "context|C=i", "function|sub=s@", "function-abs|sub-abs", "begin=s", "end=s", "skipempty!", "skipcomments!", "dumpexprs!", "perl", "unbuffered", "stream", "help") or die($usage); if( defined $options{help} ) { print $usage; exit 0; } $options{has} //= []; $options{pick} //= []; $options{unbuffered} = $options{unbuffered} || $options{stream}; # anything remaining on the commandline are 'matches' expressions $options{matches} = \@ARGV; if( defined $options{eval} ) { $options{skipcomments} = 1; } if( defined $options{eval} && @{$options{pick}} ) { say STDERR "--eval is given, so no column selectors should be given also"; die $usage; } if( defined $options{context} && (defined $options{'before-context'} || defined $options{'after-context'}) ) { say STDERR "-C is exclusive with -A and -B"; die $usage; } my $any_context_stuff = $options{'after-context'} || $options{'before-context'} || $options{'context'}; if( $any_context_stuff && $options{eval} ) { say STDERR "--eval is exclusive with -A/-B/-C"; die $usage; } my $NcontextBefore = ($options{'before-context'} || $options{'context'}) // 0; my $NcontextAfter = ($options{'after-context'} || $options{'context'}) // 0; # parse the , in $options{has} and $options{pick}. In --pick use the fancy # ()-respecting version of split @{$options{has}} = map split(/,/, $_), @{$options{has}}; @{$options{pick}} = map split_on_comma_respect_parens($_), @{$options{pick}}; # any requested columns preceded with '+' go into --has. And I strip out the '+' for my $ipick(0..$#{$options{pick}}) { # handle extra column syntax here if( ${$options{pick}}[$ipick] =~ /^\+(.+)/ ) { ${$options{pick}}[$ipick] = $1; push @{$options{has}}, ${$options{pick}}[$ipick]; } } my @picked_exprs_named = @{$options{pick}}; my @must_have_col_names = @{$options{has}}; my @must_have_col_indices_input; # if no columns requested, just print everything if( !@picked_exprs_named && !$options{'list-columns'} && !@must_have_col_names && !@{$options{matches}} && !defined $options{eval} ) { if($options{dumpexprs}) { say "--dumpexprs: No-op special case; printing everything, modulo --skipcomments, --noskipempty"; exit 0; } my $gotlegend; while() { if( $options{skipempty} ) { next if /^ \s* - (?: \s+ - )* \s* $/x; } if( $options{skipcomments}) { # always skip hard comments next if /^\s*(?:#[#!]|#\s*$|$)/p; # skip a single comment only if we need a legend still if( /^\s*#/) { next if $gotlegend; $gotlegend = 1; } } print; flush STDOUT if $options{unbuffered}; } exit 0; } my @colnames_output; # input column-name to index map. This always maps to a listref of indices, even # if I only have a single index my %colindices_input; my $colidx_needed_max = -1; # awk or perl strings representing stuff to output. These are either simple # column references (such as $1), or more complex expressions my @langspecific_output_fields; # How many rel(),diff(),... calls we have. I generate code based on this my @all_specialops = qw(rel diff sum prev latestdefined); my %specialops; for my $what (@all_specialops) { $specialops{$what} = {N => 0, outer => []}; } # Loop searching for the legend. # # Here instead of using while() we read one byte at a time. This means # that as far as the OS is concerned we never read() past our line. And when we # exec() to awk, all the data is available. This is inefficient, but we only use # this function to read up to the legend, which is fine. # # Note that perl tries to make while() work by doing an lseek() before we # exec(), but if we're reading a pipe, this can't work while(defined ($_ = get_unbuffered_line(*STDIN))) { # I pass through (don't treat as a legend) ## comments and #! shebang and # empty lines and # comments without anything else. if(/^\s*(?:#[#!]|#\s*$|$)/p) { if(!$options{skipcomments} && !$options{dumpexprs} && !$options{'list-columns'}) { print; flush STDOUT if $options{unbuffered}; } next; } if( /^\s*#\s*/p ) { chomp; # we got a legend line my @cols_all_legend_input = split ' ', ${^POSTMATCH}; # split the field names (sans the #) if($options{'list-columns'}) { for my $c (@cols_all_legend_input) { say $c; } exit 0; } foreach my $idx (0..$#cols_all_legend_input) { $colindices_input{$cols_all_legend_input[$idx]} //= []; push @{$colindices_input{$cols_all_legend_input[$idx]}}, $idx; } # each element is a tuple representing a picked field: # (output_field, colidx_needed_max_here, colname_output) my @picked_fields; # If we weren't asked for particular columns, take them all. This isn't # a no-op because we can have --has if( @picked_exprs_named ) { foreach my $i_picked_exprs_named (0..$#picked_exprs_named) { my $accept = sub { my ($expr, $name, $dupindex) = @_; push @picked_fields, [ expr_subst_col_names($options{perl} ? 'perl' : 'awk', $expr, $dupindex), $name // $expr ]; }; my $acceptExactMatch = sub { my ($picked_expr, $name) = @_; if (defined $colindices_input{$picked_expr}) { for my $dupindex (0..$#{$colindices_input{$picked_expr}}) { $accept->( $picked_expr, $name, $dupindex); } return 1; } return undef; }; my $acceptRegexMatch = sub { my ($picked_expr) = @_; my $picked_expr_re; eval { $picked_expr_re = qr/$picked_expr/p; }; if ( !$@ ) { # compiled regex successfully my $matched_any; my %next_dupindex; # I look through cols_all_legend_input instead of # keys(%colindices_input) to preserve the original order for my $matched_legend_input (@cols_all_legend_input) { $next_dupindex{$matched_legend_input} //= 0; if ( $matched_legend_input =~ /$picked_expr_re/p && length(${^MATCH}) > 0 ) { $accept->($matched_legend_input, undef, $next_dupindex{$matched_legend_input}); $matched_any = 1; $next_dupindex{$matched_legend_input}++; } } return $matched_any; } return undef; }; my $excludeExactMatch = sub { my ($expr) = @_; my @picked_fields_filtered = grep {$_->[2] ne $expr} @picked_fields; if( scalar(@picked_fields_filtered) == scalar(@picked_fields) ) { return undef; } @picked_fields = @picked_fields_filtered; return 1; }; my $excludeRegexMatch = sub { my ($expr) = @_; my $expr_re; eval { $expr_re = qr/$expr/p; }; return undef if $@; my @picked_fields_filtered = grep {! ($_->[2] =~ /$expr_re/p && length(${^MATCH})) } @picked_fields; if( scalar(@picked_fields_filtered) == scalar(@picked_fields) ) { return undef; } @picked_fields = @picked_fields_filtered; return 1; }; my $picked_expr_named = $picked_exprs_named[$i_picked_exprs_named]; next if $acceptExactMatch->($picked_expr_named, undef); my ($name, $picked_expr) = $picked_expr_named =~ /(.*?)=(.*)/; $picked_expr //= $picked_expr_named; # No exact column match. If this is a named expression, I pass it on # to awk/perl if ( defined $name ) { $accept->($picked_expr, $name); next; } # No exact matches were found, and not a named expression. This # is either a regex or an exclusion expression if( $picked_expr =~ /^!(.*)/ ) { # exclusion expression. I apply the same logic as before: # try exact column matches first, and then a regex. # # I accumulate the picked list in order the arguments were # given: each exclusion expression removes columns from the # so-far-picked list. If the picked list BEGINS with an # exclusion expression, we assume that ALL columns have been # previously picked # # Here we match on the names of the OUTPUT columns $picked_expr = $1; if($i_picked_exprs_named == 0) { $acceptRegexMatch->('.'); } next if $excludeExactMatch->($picked_expr); next if $excludeRegexMatch->($picked_expr); my @output_names_have = map {$_->[2]} @picked_fields; die "Couldn't find requested column '$picked_expr' to exclude. Currently have output columns\n" . join('', map { " $_\n" } @output_names_have); } next if $acceptRegexMatch->($picked_expr); die "Couldn't find requested column '$picked_expr'. Legend has the following columns:\n" . join('', map { " $_\n" } @cols_all_legend_input); } if(!@picked_fields) { die "After processing --pick options, no fields remain!"; } for my $picked_field (@picked_fields) { my ($output_field, $colidx_needed_max_here, $colname_output) = @$picked_field; push @colnames_output, $colname_output; if ( $colidx_needed_max_here > $colidx_needed_max ) { $colidx_needed_max = $colidx_needed_max_here; } push @langspecific_output_fields, $output_field; } } else { # no columns requested. I take ALL the columns. I make sure to not # explicitly look at any of the column names, so if we have # duplicate columns, things will remain functional @colnames_output = @cols_all_legend_input; if( !$options{perl} ) { @langspecific_output_fields = map { '$'. $_ } 1..(1+$#cols_all_legend_input); } else { @langspecific_output_fields = map { "\$fields[$_]" } 0..$#cols_all_legend_input; } } # print out the new legend unless($options{dumpexprs} || $options{eval}) { print "# @colnames_output\n"; flush STDOUT if $options{unbuffered}; } if( @must_have_col_names ) { foreach my $col_name (@must_have_col_names) { # First I try exact matches for column names. Only unique # matches accepted if( defined $colindices_input{$col_name} ) { if(1 == @{$colindices_input{$col_name}}) { push @must_have_col_indices_input, $colindices_input{$col_name}[0]; next; } die "--has found multiple columns named '$col_name'; --has expects unique columns"; } # No exact matches. Try regex matches. Again, only unique # matches accepted my $col_name_re; eval { $col_name_re = qr/$col_name/p; }; if( $@ ) { die "--has found no columns matching '$col_name'"; } # compiled regex successfully my $matching_col_index; for my $matched_legend_input (keys(%colindices_input)) { if ( $matched_legend_input =~ /$col_name_re/p && length(${^MATCH}) > 0 ) { # Found match. Is it unique? if(defined $matching_col_index || 1 != @{$colindices_input{$matched_legend_input}}) { die "--has found multiple columns matching '$col_name'; --has expects unique columns"; } # Found unique (for now) column $matching_col_index = $colindices_input{$matched_legend_input}[0]; } } if(!defined $matching_col_index) { die "--has found no columns matching '$col_name'"; } push @must_have_col_indices_input, $matching_col_index; } } last; } die "Got data line before a legend"; } if(!%colindices_input) { die "No legend received. Is the input file empty?"; } # At this point I'm done dealing with the legend, and it's time to read in and # process the data. I can keep going in perl, or I can generate an awk program, # and let awk do this work. The reason: awk (mawk especialy) runs much faster. # Both paths should produce the exact same output, and the test suite makes sure # this is the case if( !$options{perl} ) { my $awkprogram = makeAwkProgram(); if( $options{dumpexprs} ) { say $awkprogram; exit; } if($options{unbuffered}) { exec 'mawk', '-Winteractive', $awkprogram; } else { exec 'mawk', $awkprogram; } exit; # dummy. We never get here } sub makeAwkProgram { # The awk program I generate here is analogous to the logic in the data # while() loop above my $function_arguments = $options{function} // []; if( $options{'function-abs'} ) { push @$function_arguments, 'abs(___x___) { if(___x___ >= 0) { return ___x___;} return -___x___;}'; } my $functions = join('', map { my ($sub) = expr_subst_col_names('awk', $_); "function $sub " } @$function_arguments); my $awkprogram_preamble = ''; my $BEGIN = $options{begin} // ''; my $END = $options{end} // ''; if($any_context_stuff) { # context-handling stuff. This is a mirror of the perl implementation # below. See the comments there for a description $BEGIN .= ' ; ' . '__i1_contextbuffer = 0; ' . '__N_contextbuffer = 0; ' . '__N_printafter = 0; ' . '__just_skipped_something = 0; ' . '__printed_something_ever = 0; '; $awkprogram_preamble .= 'function __contextbuffer_output_and_clear() { ' . "__i0_contextbuffer = __i1_contextbuffer - __N_contextbuffer; " . "if(__i0_contextbuffer < 0){__i0_contextbuffer += $NcontextBefore;} " . "while (__N_contextbuffer) { " . " print __contextbuffer[__i0_contextbuffer++]; " . " if(__i0_contextbuffer == $NcontextBefore){__i0_contextbuffer = 0} " . " __N_contextbuffer--; " . "} " . "} "; # pushes to the buffer. Returns TRUE if I did NOT just overwrite an element of # the buffer $awkprogram_preamble .= 'function __contextbuffer_push(__line) { ' . '__contextbuffer[__i1_contextbuffer++] = __line; ' . "if(__i1_contextbuffer == $NcontextBefore) {__i1_contextbuffer = 0} " . "if(__N_contextbuffer != $NcontextBefore) {__N_contextbuffer++} " . "} "; } $awkprogram_preamble = "BEGIN { $BEGIN } END { $END } $awkprogram_preamble"; # Deal with comments. If printing, these do not count towards the context # stuff (-A/-B/-C) $awkprogram_preamble .= '/^ *(#|$)/ { ' . ($options{skipcomments} ? '' : 'print; ') . 'next } '; # skip incomplete records. Can happen if a log line at the end of a file was # cut off in the middle. These are invalid lines, so I don't even bother to # handle -A/-B/-C if( $colidx_needed_max >= 0) { $awkprogram_preamble .= (1+$colidx_needed_max) . " > NF { next } "; } # skip records that have empty input columns that must be non-empty if (@must_have_col_indices_input) { $awkprogram_preamble .= join(' || ', map { '$'.($_+1). " == \"-\"" } @must_have_col_indices_input); $awkprogram_preamble .= " { next } "; } my $not_matches_condition = join(' || ', map { my ($expr) = expr_subst_col_names('awk', $_); '!' . "($expr)" } @{$options{matches}}); my $awkprogram_matches = ''; my $awkprogram_print; if($options{eval}) { if( length($not_matches_condition) ) { $awkprogram_matches .= $not_matches_condition . '{next}'; } my ($expr) = expr_subst_col_names('awk', $options{eval}); $awkprogram_print .= "$expr "; } else { if( length($not_matches_condition) ) { $awkprogram_matches .= $not_matches_condition . '{ '; if ($any_context_stuff) { # get the line $awkprogram_matches .= "__line = " . join('" "', @langspecific_output_fields) . "; "; # save and skip the record $awkprogram_matches .= 'if(__N_printafter) { ' . ' print __line; ' . ' __N_printafter--; ' . '} ' . "else { if(__N_contextbuffer == $NcontextBefore){__just_skipped_something = 1;} "; if ($NcontextBefore) { $awkprogram_matches .= '__contextbuffer_push(__line); '; } $awkprogram_matches .= '} '; } $awkprogram_matches .= ' next } '; } # past if(!matches) {} $awkprogram_print .= '{'; my $record_accept_pre_print = ''; my $record_accept_post_print = ''; if($any_context_stuff) { $record_accept_pre_print = 'if( __just_skipped_something && __printed_something_ever) { print "##" } ' . '__just_skipped_something = 0; ' . '__printed_something_ever = 1; '; if($NcontextBefore) { $record_accept_pre_print .= '__contextbuffer_output_and_clear(); '; } ####### now print the thing $record_accept_post_print .= "__N_printafter = $NcontextAfter; "; } # I evaluate the fields just one time to not affect the state inside # rel() and diff(): I read them into local variables, and operate on # those # # I make sure the reported field doesn't have length-0. This would # confuse the vnlog fields $awkprogram_print .= join(' ', map { "__f$_ = $langspecific_output_fields[$_]; if(length(__f$_)==0) { __f$_ = \"-\"; } " } 0..$#langspecific_output_fields); if ($any_context_stuff) { $awkprogram_print .= '__line = ' . join('" "', map {"__f$_"} 0..$#langspecific_output_fields) . '; '; } if ($options{skipempty}) { $awkprogram_print .= "if(" . join( ' && ', map { "__f$_ == \"-\""} 0..$#langspecific_output_fields ) . ") { next } "; } # And THEN I print everything $awkprogram_print .= $record_accept_pre_print . 'print ' . join(',', map {"__f$_"} 0..$#langspecific_output_fields) . '; ' . $record_accept_post_print . ' '; $awkprogram_print .= '}'; } my $outer_expr = get_reldiff_outer_expr(); my $awkprogram_reldiff = ''; for my $i (0..$specialops{rel}{N}-1) { $awkprogram_reldiff .= "function rel$i(x) { if(!__inited_rel$i) { __state_rel$i = x; __inited_rel$i = 1; } return x - __state_rel$i; } "; } for my $i (0..$specialops{diff}{N}-1) { $awkprogram_reldiff .= "function diff$i(x) { retval = __inited_diff$i ? (x - __state_diff$i) : \"-\"; __state_diff$i = x; __inited_diff$i = 1; return retval; } "; } for my $i (0..$specialops{sum}{N}-1) { $awkprogram_reldiff .= "function sum$i(x) { __state_sum$i += x; return __state_sum$i; } "; } for my $i (0..$specialops{prev}{N}-1) { $awkprogram_reldiff .= "function prev$i(x) { __prev = length(__state_prev$i) ? __state_prev$i : \"-\"; __state_prev$i = x; return __prev; } "; } for my $i (0..$specialops{latestdefined}{N}-1) { $awkprogram_reldiff .= "function latestdefined$i(x) { if( x != \"-\" ) { __state_latestdefined$i = x; } return length(__state_latestdefined$i) ? __state_latestdefined$i : \"-\"; }"; } my $awkprogram = $functions . $awkprogram_reldiff . $awkprogram_preamble; if(length($outer_expr)) { $awkprogram .= "{ $outer_expr } "; } $awkprogram .= $awkprogram_matches . $awkprogram_print; return $awkprogram; } # line split(',', $s), but respects (). I.e. splitting "a,b,f(c,d)" produces 3 # tokens, not 4 sub split_on_comma_respect_parens { my ($s) = @_; my @f; FIELDS: # loop accumulating fields while (1) { my $field_accum = ''; FIELD_ACCUM: # accumulate THIS field. Keep grabbing tokens until I see an # , or the end while(1) { if (length($s) == 0) { if (length($field_accum)) { push @f, $field_accum; } last FIELDS; } if ($s !~ /^ # start of string ([^(,]*?) # some minimal number of non-comma, non-paren ([(,]) # comma or paren /px) { # didn't match. The whole thing is the last field push @f, $field_accum . $s; last FIELDS; } my ($pre,$sep) = ($1,$2); if ($sep eq ',') { # we have a field push @f, $field_accum . $pre; $field_accum = ''; $s = ${^POSTMATCH}; next FIELD_ACCUM; } # we have a paren. accumulate my ($paren_expr, $rest) = extract_bracketed($sep . ${^POSTMATCH}, '('); if ( !defined $paren_expr ) { # non-matched paren. Accum normally $rest =~ /^\((.*)$/ or die "Weird... '$rest' should have started with a '('. Giving up"; $field_accum .= '('; $s = $1; next FIELD_ACCUM; } $field_accum .= $pre . $paren_expr; $s = $rest; } } return @f; } sub find_outer_specialop { # returns the FIRST outer specialop that appears in the given string, or # undef if none are found my ($str) = @_; my $re_any = join('|', @all_specialops); my $re = qr/^.*?\b($re_any)\s*\(/s; $str =~ $re or return undef; return $1; } sub subst_reldiff { # This is somewhat convoluted. I want the meaning of rel() and diff() and # ... to be preserved regardless of any early-exit expressions. I.e. this # sequence is broken: # # - if(!matches) { continue } # - print rel() # update internal state # # because the internal state will not be updated if(!matches). I thus do # this instead: # # - _rel = rel() # - if(!matches) { continue } # - print _rel # # This works. But to make it work, I need to pre-compute all the outermost # rel() and diff() expressions. Outermost because rel(rel(x)) should still # work properly. I thus do this: # # rel( rel(xxx) ) ------> # function rel1() {} function rel2() {} # __rel1 = rel1( rel2(xxx) ); ... __rel1 # # I.e. each rel() gets a function defined with its own state. Only the # outermost one is cached. This is done so that I evaluate all the rel() # unconditionally, and then do conditional stuff (due to matches or # skipempty) my ($what, $expr, $isouter) = @_; my $sigil = $options{perl} ? '$' : ''; my $N = \$specialops{$what}{N}; my $outer_list = $specialops{$what}{outer}; my $whatre = qr/\b$what\s*\(/p; while( $expr =~ /$whatre/p ) { if( !$isouter ) { # not an outer one. Simply replace the call with a specific, # numbered one $expr =~ s/$whatre/$what$$N(/; } else { # IS an outer one. Replace the call with a variable that gets # precomputed. Save the string for precomputation my $prematch = ${^PREMATCH}; my ($paren_expr, $rest) = extract_bracketed("(${^POSTMATCH}", '[({'); if (!defined $paren_expr) { die "Giving up: Couldn't parse '$expr'"; } $expr = $prematch . $sigil . "__$what$$N" . $rest; push @$outer_list, [$$N, $paren_expr]; } $$N++; } return $expr; } sub get_reldiff_outer_expr { # should be called AFTER all the outer rel/diff/... expressions were # encountered. I.e. after the last expr_subst_col_names() my $sigil = $options{perl} ? '$' : ''; my $expr = ''; for my $what (@all_specialops) { for my $e (@{$specialops{$what}{outer}}) { my ($i, $paren_expr) = @$e; # I keep substituting until I got everything. This is required # because I can have deeply recursive calls my $substituted = $paren_expr; while(1) { my $start = $substituted; for my $what_inner (@all_specialops) { $substituted = subst_reldiff($what_inner, $substituted, 0); } if($substituted eq $start) { last; } } $expr .= $sigil . "__$what$i = $what$i" . $substituted . '; '; } } return $expr; } sub expr_subst_col_names { # I take in a string with awk/perl code, and replace field references to # column references that the awk/perl program will understand. To minimize # the risk of ambiguous matches, I try to match longer strings first my ($language, $out, $dupindex) = @_; # This looks odd. Mostly, $bound0 = $bound1 = '\b'. This would work to # Find "normal" alphanumeric keys in the string. But my keys may have # special characters in them. For instance, if I grab keys from the # 'top' command, it'll produce a legend including keys '%CPU', 'TIME+', # and the point before the '%' or after the '+' would not match \b. I # thus expand the regex at the boundary. I match EITHER the normal \b # meaning for a word-nonword transition OR a whitespace-nonword # transition. This means that whitespace becomes important: '1+%CPU' # will not be parsed as expected (but that's the RIGHT behavior here), # but '1+ %CPU' will be parsed correctly my $bound0 = '(?:(?) loop is all for context # handling (-A,-B,-C) # circular buffer containing previous entries. Used for -B my @contextbuffer; @contextbuffer = (undef) x $NcontextBefore if $NcontextBefore; my $i1_contextbuffer = 0; # the end; new entries written here my $N_contextbuffer = 0; # how many record to print unconditionally. Used for -A my $N_printafter = 0; # used for the group separator '##' my $just_skipped_something = 0; my $printed_something_ever = 0; sub contextbuffer_output_and_clear { return unless $NcontextBefore; my $i0_contextbuffer = $i1_contextbuffer - $N_contextbuffer; $i0_contextbuffer += $NcontextBefore if $i0_contextbuffer < 0; while ($N_contextbuffer) { say $contextbuffer[$i0_contextbuffer++]; $i0_contextbuffer = 0 if $i0_contextbuffer == $NcontextBefore; $N_contextbuffer--; } } # pushes to the buffer. Returns TRUE if I did NOT just overwrite an element of # the buffer sub contextbuffer_push { return unless $NcontextBefore; my ($line) = @_; $contextbuffer[$i1_contextbuffer++] = $line; $i1_contextbuffer = 0 if $i1_contextbuffer == $NcontextBefore; $N_contextbuffer++ unless $N_contextbuffer == $NcontextBefore; } RECORD: while() { # Data loop. Each statement here is analogous to the awk program generated # by makeAwkProgram(); # Deal with comments. If printing, these do not count towards the context # stuff (-A/-B/-C) if(/^\s*(?:#|$)/p) { unless($options{skipcomments}) { print; flush STDOUT if $options{unbuffered}; } next; } chomp; @fields = map {q{-} eq $_ ? undef : $_ } split; # skip incomplete records. Can happen if a log line at the end of a file was # cut off in the middle. These are invalid lines, so I don't even bother to # handle -A/-B/-C next unless $colidx_needed_max <= $#fields; # skip records that have empty input columns that must be non-empty next if any {!defined $fields[$_]} @must_have_col_indices_input; compute_reldiff(); # skip all records that don't match given expressions if($options{eval}) { next unless matches(); evalexpr(); next; } # skip all records that don't match given expressions if(!matches()) { next unless $any_context_stuff; my $fout = compute_output_fields(); my $line = join(' ', map {$_ // '-'} @$fout); if ($N_printafter) { say $line; flush STDOUT if $options{unbuffered}; $N_printafter--; } else { $just_skipped_something = 1 if $N_contextbuffer == $NcontextBefore; contextbuffer_push($line); } next; } my $fout = compute_output_fields(); # skip empty records if we must next if $options{skipempty} && all {!defined $_} @$fout; my $line = join(' ', map {$_ // '-'} @$fout); if ($any_context_stuff) { say '##' if $just_skipped_something && $printed_something_ever; $just_skipped_something = 0; $printed_something_ever = 1; contextbuffer_output_and_clear(); $N_printafter = $NcontextAfter; } say $line; flush STDOUT if $options{unbuffered}; } if(defined $options{end}) { my $evalstr = "$options{end} ;"; no strict; no warnings; eval $evalstr; if ( $@ ) { die "Error evaluating expression '$evalstr':\n$@"; } use strict; use warnings; } __END__ =head1 NAME vnl-filter - filters vnlogs to select particular rows, fields =head1 SYNOPSIS $ cat run.vnl # time x y z temperature 3 1 2.3 4.8 30 4 1.1 2.2 4.7 31 6 1 2.0 4.0 35 7 1 1.6 3.1 42 $ = 35' | vnl-align # time x y z temperature 6 1 2.0 4.0 35 7 1 1.6 3.1 42 $ the same as the format of the input. The exception to this is when using C<--eval>, in which the output is dependent on whatever expression we're evaluating. This tool is convenient to process both stored data or live data; in the latter case, it's very useful to pipe the streaming output to C to get a realtime visualization of the incoming data. This tool reads enough of the input file to get a legend, at which point it constructs an awk program to do the main work, and execs to awk (it's possible to use perl as well, but this isn't as fast). =head2 Input/output data format The input/output data is vnlog: a plain-text table of values. Any lines beginning with C<#> are treated as comments, and are passed through. The first line that begins with C<#> but not C<##> or C<#!> is a I line. After the C<#>, follow whitespace-separated field names. Each subsequent line is whitespace-separated values matching this legend. For instance, this is a valid vnlog file: #!/usr/bin/something ## more comments # x y z -0.016107 0.004362 0.005369 -0.017449 0.006711 0.006711 -0.018456 0.014093 0.006711 -0.017449 0.018791 0.006376 C uses this format for both the input and the output. The comments are preserved, but the legend is updated to reflect the fields in the output file. A string C<-> is used to indicate an undefined value, so this is also a valid vnlog file: # x y z 1 2 3 4 - 6 - - 7 =head2 Filtering To select specific I, pass their names to the C<-p> option (short for C<--print> or C<--pick>, which are synonyms). In its simplest form, to grab only columns C and C, do vnl-filter -p x,y See the detailed description of C<-p> below for more detail. To select specific I, we use I expressions. Anything on the C commandline and not attached to any C<--xxx> option is such an expression. For instance vnl-filter 'size > 10' would select only those rows whose C column contains a value E 10. See the detailed description of matches expressions below for more detail. =head2 Context lines C supports the context output options (C<-A>, C<-B> and C<-C>) exactly like the C tool. I.e to print out all rows whose C column contains a value E 10 I include the 3 rows immediately before I after such matching rows, do this: vnl-filter -C3 'size > 10' C<-B> reports the rows I matching ones and C<-A> the rows I matching ones. C<-C> reports both. Note that this applies I to I expressions: records skipped because they fail C<--has> or C<--skipempty> are I included in contextual output. =head2 Backend choice By default, the parsing of arguments and the legend happens in perl, which then constructs a simple awk script, and invokes C to actually read the data and to process it. This is done because awk is lighter weight and runs faster, which is important because our data sets could be quite large. We default to C specifically, since this is a simpler implementation than C, and runs much faster. If for whatever reason we want to do everything with perl, this can be requested with the C<--perl> option. =head2 Special functions For convenience we support several special functions in any expression passed on to awk or perl (named expressions, matches expressions, C<--eval> strings). These generally maintain some internal state, and vnl-filter makes sure that this state is consistent. Note that these are evaluated I C<--skipcomments> and C<--has>. So any record skipped because of a C<--has> expression, for instance, will I be considered in C, C and so on. =over =item * C returns value of C relative to the first value of C. For instance we might want to see the time or position relative to the start, not relative to some absolute beginning. Example: $ cat tst.vnl # time x 100 200 101 212 102 209 $ returns the difference between the current value of C and the previous value of C. The first row will always be C<->. Example: $ returns the cumulative sum of C. As C can be thought of as a derivative, C can be thought of as an integral. So C would return the same value as C (except for the first row; C always returns C<-> for the first row). Example: $ returns the previous value of C. One could construct C and C using this, if they weren't already available. =item * C returns the most recent value of C that isn't C<->. If C isn't C<->, this simply returns C. =back =head1 ARGUMENTS =head2 Matches expressions Anything on the commandline not attached to any C<--xxx> option is a I expression. These are used to select particular records (rows) in a data file. For each row, we evaluate all the expressions. If I the expressions evaluate to true, that row is output. This expression is passed directly to the awk (or perl) backend. Example: to select all rows that have valid data in column C I column C I column C you can vnl-filter 'a != "-" || b != "-" || c != "-"' or vnl-filter --perl 'defined a || defined b || defined c' As with the named expressions given to C<-p> (described above), these are passed directly to awk, so anything that can be done with awk is supported here. =head2 -p|--print|--pick expr These option provide the mechanism to select specific columns for output. For instance to pull out columns called C, C, and any column whose name contains the string C, do vnl-filter -p lat,lon,'feature_.*' or, equivalently vnl-filter --print lat --print lon --print 'feature_.*' We look for exact column name matches first, and if none are found, we try a regex. If there was no column called exactly C, then the above would be equivalent to vnl-filter -p lat,lon,feature_ This mechanism is much more powerful than just selecting columns. First off, we can rename chosen fields: vnl-filter -p w=feature_width would pick the C field, but the resulting column in the output would be named C. When renaming a column in this way regexen are I supported, and exact field names must be given. But the string to the right of the C<=> is passed on directly to awk (after replacing field names with column indices), so any awk expression can be used here. For instance to compute the length of a vector in separate columns C, C, and C you can do: vnl-filter -p 'l=sqrt(x*x + y*y + z*z)' A single column called C would be produced. We can also I columns by preceding their name with C. This works like you expect. Rules: =over =item * The pick/exclude directives are processed in order given to produce the output picked-column list =item * If the first C<-p> item is an exclusion, we implicitly pick I the columns prior to processing the C<-p>. =item * The exclusion expressions match the I column names, not the I names. =item * We match the exact column names first. If that fails, we match as a regex =back Example. To grab all the columns I the temperature(s) do this: vnl-filter -p !temperature To grab all the columns that describe I about a robot (columns whose names have the string C in them), but I its temperature (i.e. I "robot_temperature"), do this: vnl-filter -p robot_,!temperature =head2 --has a,b,c,... Used to select records (rows) that have a non-empty value in a particular field (column). A I value in a column is designated with a single C<->. If we want to select only records that have a value in the C column, we pass C<--has x>. To select records that have data for I of a given set of columns, the C<--has> option can be repeated, or these multiple columns can be given in a whitespace-less comma-separated list. For instance if we want only records that have data in I columns C I C we can pass in C<--has x,y> or C<--has x --has y>. If we want to combine multiple columns in an I (select rows that have data in I of a given set of columns), use a matches expression, as documented below. If we want to select a column I pick only rows that have a value in this column, a shorthand syntax exists: vnl-filter --has col -p col is equivalent to vnl-filter -p +col Note that just like the column specifications in C<-p> the columns given to C<--has> must match exactly I as a regex. In either case, a unique matching column must be found. =head2 -l|list-columns Instead of doing any processing, parse the input to get the available columns, print those out, and exit =head2 -A N|--after-context N Output N lines following each I expression, even those lines that do not themselves match. This works just like the C options of the same name. See L =head2 -B N|--before-context N Output N lines preceding each I expression, even those lines that do not themselves match. This works just like the C options of the same name. See L =head2 -C N|--context N Output N lines preceding and following each I expression, even those lines that do not themselves match. This works just like the C options of the same name. See L =head2 --eval expr Instead of printing out all matching records and picked columns, just run the given chunk of awk (or perl). In this mode of operation, C acts just like a glorified awk, that allows fields to be accessed by name instead of by number, as it would be in raw awk. Since the expression may print I or nothing at all, the output in this mode is not necessarily itself a valid vnlog stream. And no column-selecting arguments should be given, since they make no sense in this mode. In awk the expr is a full set of pattern/action statements. So to print the sum of columns C and C in each row, and at the end, print the sum of all values in the C column vnl-filter --eval '{print a+b; suma += a} END {print suma}' In perl the arbitrary expression fits in like this: sub evalexpr { eval expression; # evaluate the arbitrary expression } while(<>) # read each line { chomp; next unless matches; # skip non-matching lines evalexpr(); } =head2 --function|--sub Evaluates the given expression as a function that can be used in other expressions. This is most useful when you want to print something that can't trivially be written as a simple expression. For instance: $ cat tst.vnl # s 1-2 3-4 5-6 $ < tst.vnl vnl-filter --function 'before(x) { sub("-.*","",x); return x }' \ --function 'after(x) { sub(".*-","",x); return x }' \ -p 'b=before(s),a=after(s)' # b a 1 2 3 4 5 6 See the L section below if you're doing something sufficiently-complicated where you need this. =head2 --function-abs|--sub-abs Convenience option to add an absolute-value C function. This is only useful for awk programs (the default, no C<--perl> given) since perl already provides C by default. =head2 --begin|--BEGIN Evaluates the given expression in the BEGIN {} block of the generated awk (or perl) program. =head2 --end|--END Evaluates the given expression in the END {} block of the generated awk (or perl) program. =head2 --[no]skipempty Do [not] skip records where all fields are blank. By default we I skip all empty records; to include them, pass C<--noskipempty> =head2 --skipcomments Don't output non-legend comments =head2 --perl By default all procesing is performed by C, but if for whatever reason we want perl instead, pass C<--perl>. Both modes work, but C is noticeably faster. C<--perl> could be useful because it is more powerful, which could be important since a number of things pass commandline strings directly to the underlying language (named expressions, matches expressions, C<--eval> strings). Note that while variables in perl use sigils, column references should I use sigils. To print the sum of all values in column C you'd do this in awk vnl-filter --eval '{suma += a} END {print suma}' and this in perl vnl-filter --perl --eval '{$suma += a} END {say $suma}' The perl strings are evaluated without C or C so I didn't have to declare C<$suma> in the example. With C<--perl>, empty strings (C<-> in the vnlog file) are converted to C. =head2 --dumpexprs Used for debugging. This spits out all the final awk (or perl) program we run for the given commandline options and given input. This is the final program, with the column references resolved to numeric indices, so one can figure out what went wrong. =head2 --unbuffered Flushes each line after each print. This makes sure each line is output as soon as it is available, which is crucial for realtime output and streaming plots. =head2 --stream Synonym for C<--unbuffered> =head1 CAVEATS This tool is very lax in its input validation (on purpose). As a result, columns with names like C<%CPU> and C do work (i.e. you can more or less feed in output from C). The downside is that shooting yourself in the foot is possible. This tradeoff is currently tuned to be very permissive, which works well for my use cases. I'd be interested in hearing other people's experiences. Potential pitfalls/unexpected behaviors: =over =item * All column names are replaced in all eval strings without regard to context. The earlier example that reports the sum of values in a column: C will work fine if we I have a column named C and do I have a column named C. But will not do the right thing if any of those are violated. For instance, if a column C doesn't exist, then C would see C instead of something like C. C would be an uninitialized variable, which evaluates to 0, so the full C command would not fail, but would print 0 instead. It's the user's responsibility to make sure we're talking about the right columns. The focus here was one-liners so hopefully nobody has so many columns, they can't keep track of all of them in their head. I don't see any way to resolve this without seriously impacting the scope of the tool, so I'm leaving this alone. =item * It is natural to use vnlog as a database. You can run queries with something like vnl-filter 'key == 5' This works. But unlike a real database this is clearly a linear lookup. With large data files, this would be significantly slower than the logarithmic searches provided by a real database. The meaning of "large" and "significant" varies, and you should test it. In my experience vnlog "databases" scale surprisingly well. But at some point, importing your data to something like sqlite is well worth it. =item * When substituting column names I match I a word-nonword transition (C<\b>) I a whitespace-nonword transition. The word boundaries is what would be used 99% of the time. But the keys may have special characters in them, which don't work with C<\b>. This means that whitespace becomes important: C<1+%CPU> will not be parsed as expected, which is correct since C<+%CPU> is also a valid field name. But C<1+ %CPU> will be parsed correctly, so if you have weird field names, put the whitespace into your expressions. It'll make them more readable anyway. =item * Strings passed to C<-p> are split on C<,> I if the C<,> is inside balanced C<()>. This makes it possible to say things like C. This is probably the right behavior, although some questionable looking field names become potentially impossible: C and C I otherwise be legal field names, but you're probably asking for trouble if you do that. =item * Currently there're two modes: a pick/print mode and an C<--eval> mode. Then there's also C<--function>, which adds bits of C<--eval> to the pick/print mode, but it feels maybe insufficient. I don't yet have strong feelings about what this should become. Comments welcome =back =head1 REPOSITORY https://github.com/dkogan/vnlog/ =head1 AUTHOR Dima Kogan C<< >> =head1 LICENSE AND COPYRIGHT Copyright 2016-2017 California Institute of Technology Copyright 2017-2019 Dima Kogan C<< >> This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. =cut vnlog-1.40/vnl-gen-header000077500000000000000000000130031475400722300153060ustar00rootroot00000000000000#!/usr/bin/env perl use strict; use warnings; use feature ':5.10'; # input can come on the commandline, or pipe in on STDIN my @defs; # field definitions. Each element is "type name" if(@ARGV) { @defs = @ARGV; } else { # each line is a definition. # cut off trailing whitespace # ignore comments and any blank lines @defs = grep {!/^\s*#/ && /\S/ } map {chomp; $_;} <>; } if(!@defs) { say STDERR "Field definitions must come on the commandline or on STDIN"; exit 1; } my $legend = "#"; my $set_field_value_defs = ''; for my $field(@defs) { my ($set_field_value, $name) = gen_field($field); $set_field_value_defs .= $set_field_value; $legend .= " $name"; } my $Nfields = @defs; say < #define VNLOG_N_FIELDS $Nfields #include EOF print $set_field_value_defs; say < '*'. so 'const char *' -> 'const char*' my @ret; if( $type eq 'void*' ) { # binary type my $set_field_value = < "", 'int8_t' => "", 'int16_t' => "", 'int32_t' => "", 'int64_t' => "", 'unsigned int' => "unsignedint", 'unsigned' => "", 'uint8_t' => "", 'uint16_t' => "", 'uint32_t' => "", 'uint64_t' => "", 'char' => "", 'float' => "", 'double' => "", 'const char*' => "ccharp", 'char*' => "charp", 'void*' => "" ); my $typename = $typenames{$type}; if( !defined $typename ) { die "Unknown type '$type'. I only know about " . join(' ', keys %typenames); } $typename = $type if $typename eq ""; my $arg = "($type)(x)"; my $set_field_value = < vnlog_fields_generated.h =head1 DESCRIPTION We provide a simple C library to produce vnlog output. The fields this library outputs must be known at compile time, and are specified in a header created by this tool. Please see the vnlog documentation for instructions on how to use the library =head1 ARGUMENTS This tool needs to be given a list of field definitions. First we look at the commandline, and if the definitions are not available there, we look on STDIN. Each definition is a string C (one def per argument on the commandline or per line on STDIN). If reading from STDIN, we ignore blank lines, and treat any line starting with C<#> as a comment. Each def represents a single output field. Each such field spec in a C-style variable declaration with a type followed by a name. Note that these field specs contain whitespace, so each one must be quoted before being passed to the shell. The types can be basic scalars, possibly with set widths (C, C, C, C, C, ...), a NULL-terminated string (C) or a generic chunk of binary data (C). The names must consist entirely of letters, numbers or C<_>, like variables in C. =head1 REPOSITORY https://github.com/dkogan/vnlog/ =head1 AUTHOR Dima Kogan C<< >> =head1 LICENSE AND COPYRIGHT Copyright 2016 California Institute of Technology. This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. =cut vnlog-1.40/vnl-join000077500000000000000000000757331475400722300142700ustar00rootroot00000000000000#!/usr/bin/env perl use strict; use warnings; use feature 'say'; use autodie; use FindBin '$RealBin'; use lib "$RealBin/lib"; # Non-ancient perls have this in List::Util, but I want to support ancient ones too use List::MoreUtils 'all'; use POSIX; use Config; use Vnlog::Util qw(parse_options read_and_preparse_input reconstruct_substituted_command get_key_index longest_leading_trailing_substring fork_and_filter); # This comes from the getopt_long() invocation in join.c in GNU coreutils my @specs = ( # options with no args "ignore-case|i", "check-order", "nocheck-order", "zero-terminated|z", "header", # options that take an arg "a=s@", "e=s", "1=s", "2=s", "j=s", "o=s", "t=s", "v=s@", "vnl-tool=s", "help|h"); @specs = (@specs, "vnl-prefix1=s", "vnl-suffix1=s", "vnl-prefix2=s", "vnl-suffix2=s", "vnl-prefix=s", "vnl-suffix=s", "vnl-autoprefix", "vnl-autosuffix", "vnl-sort=s"); my %options_unsupported = ( 't' => <<'EOF', vnlog is built on assuming a particular field separator EOF 'e' => <<'EOF', vnlog assumes - as an "undefined" field value. -e thus not allowed EOF 'header' => <<'EOF', vnlog already handles field headers; this is pointless EOF 'zero-terminated' => <<'EOF' vnlog is built on assuming a particular record separator EOF ); my ($filenames,$options) = parse_options(\@ARGV, \@specs, 0, <{'vnl-tool'} //= 'join'; my $Ndatafiles = scalar(@$filenames); my @prefixes = ('') x $Ndatafiles; my @suffixes = ('') x $Ndatafiles; my $Nstdin = scalar grep {$_ eq '-'} @$filenames; if($Nstdin > 1) { die "At most 1 '-' inputs are allowed"; } if( defined $options->{'vnl-autoprefix'} && defined $options->{'vnl-autosuffix'} ) { die "Either --vnl-autoprefix or --vnl-autosuffix should be passed, not both"; } if( defined $options->{'vnl-autoprefix'} || defined $options->{'vnl-autosuffix'} ) { if( grep /vnl-(prefix|suffix)/, keys %$options ) { die "--vnl-autoprefix/suffix is mutually exclusive with the manual --vnl-prefix/suffix options"; } for my $i(0..$#$filenames) { if($filenames->[$i] =~ m{/dev/fd/ # pipe | # or ^-$ # STDIN }x) { die "autoprefix/suffix can't work when data is piped in" } } # OK. autoprefix/autosuffix are valid, so I process them sub take { my ($s,$i) = @_; if($options->{'vnl-autoprefix'}) { $prefixes[$i] = "${s}_"; } else { $suffixes[$i] = "_${s}"; } } my ($prefix,$suffix) = longest_leading_trailing_substring( grep { $_ ne '-' } @$filenames ); for my $i(0..$#$filenames) { take(substr($filenames->[$i], length($prefix), length($filenames->[$i]) - (length($prefix) + length($suffix)) ), $i); } } else { # no --vnl-autoprefix or --vnl-autosuffix if( (defined $options->{'vnl-prefix1'} || defined $options->{'vnl-prefix2'}) && defined $options->{'vnl-prefix'} ) { die "--vnl-prefix1/2 are mutually exclusive with --vnl-prefix"; } if( (defined $options->{'vnl-suffix1'} || defined $options->{'vnl-suffix2'}) && defined $options->{'vnl-suffix'} ) { die "--vnl-suffix1/2 are mutually exclusive with --vnl-suffix"; } if(defined $options->{'vnl-prefix'}) { @prefixes = split(',', $options->{'vnl-prefix'}); if(@prefixes > $Ndatafiles) { die "too many items in --vnl-prefix"; } } else { @prefixes = ($options->{"vnl-prefix1"} // '', $options->{"vnl-prefix2"} // ''); } if (@prefixes < $Ndatafiles) { push @prefixes, ('') x ($Ndatafiles - @prefixes) } if(defined $options->{'vnl-suffix'}) { @suffixes = split(',', $options->{'vnl-suffix'}); if(@suffixes > $Ndatafiles) { die "too many items in --vnl-suffix"; } } else { @suffixes = ($options->{"vnl-suffix1"} // '', $options->{"vnl-suffix2"} // ''); } if (@suffixes < $Ndatafiles) { push @suffixes, ('') x ($Ndatafiles - @suffixes) } } # At this point I reduced all the prefix/suffix stuff to @prefixes and @suffixes for my $key(keys %$options) { if($options_unsupported{$key}) { my $keyname = length($key) == 1 ? "-$key" : "--$key"; die("I don't support $keyname: $options_unsupported{$key}"); } } if( $Ndatafiles < 2 ) { die "At least two inputs should have been given"; } if(defined $options->{j} && (defined $options->{1} || defined $options->{2})) { die "Either (both -1 and -2) or -j MUST be given, but not both. -j is recommended"; } if(( defined $options->{1} && !defined $options->{2}) || (!defined $options->{1} && defined $options->{2})) { die "Either (both -1 and -2) or -j MUST be given, but not both. -j is recommended"; } if( defined $options->{1}) { if( $Ndatafiles != 2 ) { die "If passing -1/-2 we must be joining EXACTLY two data files"; } if($options->{1} ne $options->{2}) { die "-1 and -2 should refer to the same field. Using -j is recommended"; } $options->{j} = $options->{1}; delete $options->{1}; delete $options->{2}; } if( !defined $options->{j} ) { die "Either (both -1 and -2) or -j MUST be given, but not both. -j is recommended"; } for my $av (qw(a v)) { if ( defined $options->{$av} ) { my $N = scalar @{$options->{$av}}; if ($Ndatafiles == 2) { # "normal" mode. Joining exactly two data items # -a- is -a1 -a2 and -v- is -v1 -v2 if( $N == 1 && $options->{$av}[0] eq '-') { $N = 2; $options->{$av}[0] = 1; $options->{$av}[1] = 2; } else { if ( !($N == 1 || $N == 2) ) { die "-$av should have been passed at most 2 times"; } if ( !all {$_ == 1 || $_ == 2} @{$options->{$av}} ) { die "-$av MUST be an integer in [1 .. 2]"; } } } else { # "cascaded" mode: N-way join made of a set of 2-way joins. # # I only support -a applied to ALL the datafiles. Finer-grained # support is probably possible, but the implementation is # non-obvious and I'm claiming it's not worth the effort. # # "-a-" means "full outer join": I print ALL the unmatched rows for # ALL the data files. This conceptually means -a1 -a2 -a3 ... -aN, # but I don't support that: you MUST say "-a-" to take ALL the # datafiles. # # The implementation of -v is un-obvious even for -v-. It's also not # obvious that this is a feature anybody cares about, so I leave it # un-implemented for now. if($av eq 'v') { die "When given more than 2 data files, -v is not implemented.\n" . "It COULD be done, but nobody has done it. Talk to Dima if you need this."; } if( !($N == 1 && $options->{$av}[0] eq '-')) { die "When given more than 2 data files, I only support \"-${av}-\".\n" . "Finer-grained support is possible. Talk to Dima if you need this."; } } } } # I don't support -o either for now. It's also non-trivial and non-obvious if # anybody needs it. post-processing with vnl-filter is generally equivalent, # (but slower) if ($Ndatafiles != 2 && defined $options->{o} and $options->{o} ne 'auto') { die "When given more than 2 data files, I don't (yet) support -o.\n" . "Instead, post-process with vnl-filter"; } if( $Ndatafiles > 2 ) { # I have more than two datafiles, but the coreutils join only supports 2 at # a time. I thus subdivide my problem into a set of pairwise ones. I can do # this with a reduce(), but this would cause each sub-join to be dependent # on a previous one. Instead I rearrange the calls to make the sub-joins # independent, and thus parallelizable my @child_pids; # I need an arbitrary place to store references the file handles (these are # full perl file handles, not just bare file descriptors). If I don't have # these then the language wants to garbage-collect these. THAT calls the # destructor, which closes the file handles. And THAT in turn calls wait() # on the subprocess. And since the subprocesses aren't yet done, the wait() # blocks, and the whole chain then deadlocks. # # So I simply store the file handles and void this whole business my @fh_cache; sub subjoin { # inputs $in0 and $in1 are a hashref describing the input my ($in0, $in1, $final_subjoin) = @_; sub infile { my ($in) = @_; if( exists $in->{i_filename} ) { my $filename = $filenames->[ $in->{i_filename} ]; # I handle the --vnl-sort pre-filter here. If we have to do any # pre-filtering, I'll generate that pipeline here. Note that we WILL # pre-sort any data that comes from files, but any of the # intermediate sub-joins will already be sortsed, and I won't be # re-sorting it here. my $input_filter = get_sort_prefilter($options); if (!defined $input_filter) { return $filename; } # We need to pre-sort the data. I fork off that process, and # convert this input to a fd one my $fh = fork_and_filter(@$input_filter, $filename); push @fh_cache, $fh; # see comment for @fh_cache above $in->{fd} = fileno $fh; delete $in->{i_filename}; } my $fd = $in->{fd}; return "/dev/fd/$fd"; } # I construct the inner commandlines. Some options are copied directly, # while some need to be adopted for the specific inner command I'm # looking at # deep copy my %sub_options = %$options; if($options->{a}) { $sub_options{a} = [1,2]; } my $ARGV_new = reconstruct_substituted_command([], \%sub_options, [], \@specs); # $ARGV_new now has all the arguments except the --vnl-... options The # suffix/prefix options have already been parsed into @prefixes and # @suffixes, and I apply those if( defined $in0->{i_filename} ) { my $i = $in0->{i_filename}; push @$ARGV_new, "--vnl-prefix1", $prefixes[$i] if length($prefixes[$i] ); push @$ARGV_new, "--vnl-suffix1", $suffixes[$i] if length($suffixes[$i] ); } if( defined $in1->{i_filename} ) { my $i = $in1->{i_filename}; push @$ARGV_new, "--vnl-prefix2", $prefixes[$i] if length($prefixes[$i] ); push @$ARGV_new, "--vnl-suffix2", $suffixes[$i] if length($suffixes[$i] ); } my @run_opts = ($Config{perlpath}, $0, @$ARGV_new, infile($in0), infile($in1)); my ($fd_read, $fd_write); if( !$final_subjoin) { ($fd_read, $fd_write) = POSIX::pipe(); } my $childpid_subjoin = fork(); if ( $childpid_subjoin == 0 ) { POSIX::close(0); if( $final_subjoin ) { # If I need to post-sort the output, I do that here if(defined $options->{'vnl-sort'} && $options->{'vnl-sort'} ne '-') { post_sort(undef, $options->{j}, @run_opts); # this does not return } } else { POSIX::close(1); POSIX::dup2($fd_write, 1); POSIX::close($fd_write); POSIX::close($fd_read); } exec @run_opts; } push @child_pids, $childpid_subjoin; # I'm done with the writer (child uses it) so I close it POSIX::close($fd_write) if defined($fd_write); # I'm however NOT done with the readers. All the various sub-joins use # them in some arbitrary order, so I close none of them return { fd => $fd_read }; } my @inputs = map { {i_filename => $_} } 0..$#$filenames; while (1) { my $N = scalar @inputs; # First index of each pairwise subjoin. If we have an odd number of # inputs, we use the last one directly my @i0 = map {2*$_} 0..int($N/2)-1; if ($N == 2) { # run the subjoin, and write out to stdout subjoin(@inputs, 1); last; } my @outputs = map { subjoin(@inputs[$_, $_+1]) } @i0; push @outputs, $inputs[$N-1] if ($N % 2)==1; @inputs = @outputs; } # I spawned off all the child processes. They'll run in parallel, as # dictated by the OS for my $childpid (@child_pids) { waitpid $childpid, 0; my $result = $?; if($result != 0) { die "Subjoin in PID $childpid failed!"; } } exit 0; } # vnlog uses - to represent empty fields $options->{e} = '-'; my $join_key = $options->{j}; sub get_sort_prefilter { my ($options) = @_; return undef if !defined $options->{'vnl-sort'}; if ($options->{'vnl-sort'} !~ /^(?: [sdfgiMhnRrV]+ | -)$/x) { die("--vnl-sort must be followed by '-' or one or more of the ordering options that 'sort' takes: sdfgiMhnRrV"); } # We sort with the default order (lexicographical) since that's what join # wants. We'll re-sort the output by the desired order again my $key = $options->{j}; my $input_filter = [$Config{perlpath}, "$RealBin/vnl-sort", "-s", "-k", "$key"]; if ($options->{'ignore-case'}) { push @$input_filter, '-f'; } return $input_filter; } sub post_sort { my ($legend, $join_key, @cmd) = @_; my ($fdread,$fdwrite) = POSIX::pipe(); my $childpid_sort = fork(); if ( $childpid_sort == 0 ) { # Child. This is the re-sorting end. In this side of the fork vnl-sort reads # data from the join POSIX::close($fdwrite); if( $fdread != 0) { POSIX::close(0); POSIX::dup2($fdread, 0); POSIX::close($fdread); } my $order = $options->{'vnl-sort'}; exec $Config{perlpath}, "$RealBin/vnl-sort", "-k", $join_key, "-$order"; } my $childpid_join = fork(); if ( $childpid_join == 0 ) { # Child. This is the 'join' end. join will write to the pipe, not to stdout. POSIX::close($fdread); if( $fdwrite != 1 ) { POSIX::close(1); POSIX::dup2($fdwrite, 1); POSIX::close($fdwrite); } POSIX::write(1, $legend, length($legend)) if defined $legend; exec @cmd; } POSIX::close($fdread); POSIX::close($fdwrite); # parent of both. All it does is wait for both to finish so that whoever called # vnl-join knows when the whole thing is done. waitpid $childpid_join, 0; waitpid $childpid_sort, 0; exit 0; } my $inputs = read_and_preparse_input($filenames, get_sort_prefilter($options)); my $keys_output = substitute_field_keys($options, $inputs); # If we don't have a -o, make one. '-o auto' does ALMOST what I want, but it # fails if given an empty vnlog $options->{o} //= construct_default_o_option($inputs, $options->{1}, $options->{2}); my $ARGV_new = reconstruct_substituted_command($inputs, $options, [], \@specs); my $legend = '# ' . join(' ', @$keys_output) . "\n"; # Simple case used 99% of the time: we're not post-filtering anything. Just # invoke the join, and we're done if(!defined $options->{'vnl-sort'} || $options->{'vnl-sort'} eq '-') { syswrite(*STDOUT, $legend); exec $options->{'vnl-tool'}, @$ARGV_new; } # Complicated case. We're post-filtering our output. I set up the pipes, fork # and exec post_sort($legend, $join_key, $options->{'vnl-tool'}, @$ARGV_new); sub push_nonjoin_keys { my ($keys_output, $keys, $key_join, $prefix, $suffix) = @_; for my $i (0..$#$keys) { if ( $keys->[$i] ne $key_join) { push @$keys_output, $prefix . $keys->[$i] . $suffix; } } } sub substitute_field_keys { # I handle -j and -o. Prior to this I converted -1 and -2 into -j my ($options, $inputs) = @_; # I convert -j into -1 and -2 because the two files might # have a given named field in a different position my $join_field_name = $options->{j}; delete $options->{j}; $options->{1} = get_key_index($inputs->[0], $join_field_name); $options->{2} = get_key_index($inputs->[1], $join_field_name); my @keys_out; if( defined $options->{o} and $options->{o} ne 'auto') { my @format_in = split(/[ ,]/, $options->{o}); my @format_out; for my $format_element(@format_in) { if( $format_element eq '0') { push @format_out, '0'; push @keys_out, $join_field_name; } else { $format_element =~ /(.*)\.(.*)/ or die "-o given '$format_element', but each field must be either 'FILE.FIELD' or '0'"; my ($file,$field) = ($1,$2); if($file ne '1' && $file ne '2') { die "-o given '$format_element', where a field parsed to 'FILE.FIELD', but FILE must be either '1' or '2'"; } my $index = get_key_index($inputs->[$file-1],$field); push @format_out, "$file.$index"; push @keys_out, $prefixes[$file-1] . $field . $suffixes[$file-1]; } } $options->{o} = join(',', @format_out); } else { # automatic field ordering. I.e # join field # all non-join fields from file1, in order # all non-join fields from file2, in order push @keys_out, $join_field_name; push_nonjoin_keys(\@keys_out, $inputs->[0]{keys}, $join_field_name, $prefixes[0], $suffixes[0]); push_nonjoin_keys(\@keys_out, $inputs->[1]{keys}, $join_field_name, $prefixes[1], $suffixes[1]); } return \@keys_out; } sub construct_default_o_option { # I wasn't asked to output specific fields, so I report the default columns: # # - the join key # - all the remaining fields from the first file # - all the remaining fields from the second file # # This is what '-o auto' does, except that infers the number of columns from # the first record, which could be wrong sometimes (most notably a vnlog # with no data). In our case we have the column counts from the vnlog # header, so we can construct the correct thing my ($inputs, @col12) = @_; return '0' . join('', map { my $input_index = $_; my $Nkeys = int( @{$inputs->[$input_index]{keys}} ); my @output_fields = ((1..$col12[$input_index]-1) , ($col12[$input_index]+1..$Nkeys)); map {' ' . ($input_index+1) . ".$_"} @output_fields; } 0..$#$inputs); } __END__ =head1 NAME vnl-join - joins two log files on a particular field =head1 SYNOPSIS $ cat a.vnl # a b AA 11 bb 12 CC 13 dd 14 dd 123 $ cat b.vnl # a c aa 1 cc 3 bb 4 ee 5 - 23 Try to join unsorted data on field 'a': $ vnl-join -j a a.vnl b.vnl # a b c join: /dev/fd/5:3: is not sorted: CC 13 join: /dev/fd/6:3: is not sorted: bb 4 Sort the data, and join on 'a': $ vnl-join --vnl-sort - -j a a.vnl b.vnl | vnl-align # a b c bb 12 4 Sort the data, and join on 'a', ignoring case: $ vnl-join -i --vnl-sort - -j a a.vnl b.vnl | vnl-align # a b c AA 11 1 bb 12 4 CC 13 3 Sort the data, and join on 'a'. Also print the unmatched lines from both files: $ vnl-join -a1 -a2 --vnl-sort - -j a a.vnl b.vnl | vnl-align # a b c - - 23 AA 11 - CC 13 - aa - 1 bb 12 4 cc - 3 dd 123 - dd 14 - ee - 5 Sort the data, and join on 'a'. Print the unmatched lines from both files, Output ONLY column 'c' from the 2nd input: $ vnl-join -a1 -a2 -o 2.c --vnl-sort - -j a a.vnl b.vnl | vnl-align # c 23 - - 1 4 3 - - 5 =head1 DESCRIPTION Usage: vnl-join [join options] [--vnl-sort -|[sdfgiMhnRrV]+] [ --vnl-[pre|suf]fix[1|2] xxx | --vnl-[pre|suf]fix xxx,yyy,zzz | --vnl-autoprefix | --vnl-autosuffix ] logfile1 logfile2 This tool joins two vnlog files on a given field. C is a wrapper around the GNU coreutils C tool. Since this is a wrapper, most commandline options and behaviors of the C tool are present; consult the L manpage for detail. The differences from GNU coreutils C are =over =item * The input and output to this tool are vnlog files, complete with a legend =item * The columns are referenced by name, not index. So instead of saying join -j1 to join on the first column, you say join -j time to join on column "time". =item * C<-1> and C<-2> are supported, but I refer to the same field. Since vnlog knows the identify of each field, it makes no sense for C<-1> and C<-2> to be different. So pass C<-j> instead, it makes more sense in this context. =item * C<-a-> is available as a shorthand for C<-a1 -a2>: this is a full outer join, printing unmatched records from both of the inputs. Similarly, C<-v-> is available as a shorthand for C<-v1 -v2>: this will output I the unique records in both of the inputs. =item * C-specific options are available to adjust the field-naming in the output: --vnl-prefix1 --vnl-suffix1 --vnl-prefix2 --vnl-suffix2 --vnl-prefix --vnl-suffix --vnl-autoprefix --vnl-autosuffix See "Field names in the output" below for details. =item * A C-specific option C<--vnl-sort> is available to sort the input and/or output. See below for details. =item * By default we call the C tool to do the actual work. If the underlying tool has a different name or lives in an odd path, this can be specified by passing C<--vnl-tool TOOL> =item * If no C<-o> is given, we output the join field, the remaining fields in logfile1, the remaining fields in logfile2, .... This is what C<-o auto> does, except we also handle empty vnlogs correctly. =item * C<-e> is not supported because vnlog uses C<-> to represent undefined fields. =item * C<--header> is not supported because vnlog assumes a specific header structure, and C makes sure that this header is handled properly =item * C<-t> is not supported because vnlog assumes whitespace-separated fields =item * C<--zero-terminated> is not supported because vnlog assumes newline-separated records =item * Rather than only 2-way joins, this tool supports N-way joins for any N > 2. See below for details. =back Past that, everything C does is supported, so see that man page for detailed documentation. Note that all non-legend comments are stripped out, since it's not obvious where they should end up. =head2 Field names in the output By default, the field names in the output match those in the input. This is what you want most of the time. It is possible, however that a column name adjustment is needed. One common use case for this is if the files being joined have identically-named columns, which would produce duplicate columns in the output. Example: we fixed a bug in a program, and want to compare the results before and after the fix. The program produces an x-y trajectory as a function of time, so both the bugged and the bug-fixed programs produce a vnlog with a legend # time x y Joining this on C