efilter-1-1.5/0000750066434000116100000000000012762014475013210 5ustar adamsheng00000000000000efilter-1-1.5/efilter.egg-info/0000750066434000116100000000000012762014475016334 5ustar adamsheng00000000000000efilter-1-1.5/efilter.egg-info/SOURCES.txt0000640066434000116100000000420312762014475020220 0ustar adamsheng00000000000000AUTHORS.txt LICENSE.txt MANIFEST.in README.md setup.cfg setup.py version.txt dpkg/changelog dpkg/compat dpkg/control dpkg/copyright dpkg/python-efilter.docs dpkg/python3-efilter.docs dpkg/rules dpkg/source/format efilter/__init__.py efilter/api.py efilter/ast.py efilter/dispatch.py efilter/errors.py efilter/protocol.py efilter/query.py efilter/scope.py efilter/syntax.py efilter/version.py efilter.egg-info/PKG-INFO efilter.egg-info/SOURCES.txt efilter.egg-info/dependency_links.txt efilter.egg-info/requires.txt efilter.egg-info/top_level.txt efilter/ext/__init__.py efilter/ext/csv_reader.py efilter/ext/lazy_repetition.py efilter/ext/line_reader.py efilter/ext/list_repetition.py efilter/ext/row_tuple.py efilter/parsers/__init__.py efilter/parsers/lisp.py efilter/parsers/literal.py efilter/parsers/common/__init__.py efilter/parsers/common/ast_transforms.py efilter/parsers/common/grammar.py efilter/parsers/common/parser.py efilter/parsers/common/token_stream.py efilter/parsers/common/tokenizer.py efilter/parsers/dottysql/__init__.py efilter/parsers/dottysql/grammar.py efilter/parsers/dottysql/parser.py efilter/parsers/legacy/__init__.py efilter/parsers/legacy/objectfilter.py efilter/protocols/__init__.py efilter/protocols/applicative.py efilter/protocols/associative.py efilter/protocols/boolean.py efilter/protocols/counted.py efilter/protocols/eq.py efilter/protocols/hashable.py efilter/protocols/indexable.py efilter/protocols/iset.py efilter/protocols/number.py efilter/protocols/ordered.py efilter/protocols/reducer.py efilter/protocols/repeated.py efilter/protocols/structured.py efilter/stdlib/__init__.py efilter/stdlib/core.py efilter/stdlib/io.py efilter/stdlib/math.py efilter/transforms/__init__.py efilter/transforms/asdottysql.py efilter/transforms/aslisp.py efilter/transforms/infer_type.py efilter/transforms/normalize.py efilter/transforms/solve.py efilter/transforms/validate.py sample_projects/__init__.py sample_projects/star_catalog/__init__.py sample_projects/star_catalog/star_catalog.py sample_projects/star_catalog/star_catalog_test.py sample_projects/tagging/__init__.py sample_projects/tagging/tag.py sample_projects/tagging/tag_test.pyefilter-1-1.5/efilter.egg-info/top_level.txt0000640066434000116100000000003012762014475021060 0ustar adamsheng00000000000000sample_projects efilter efilter-1-1.5/efilter.egg-info/dependency_links.txt0000640066434000116100000000000112762014475022403 0ustar adamsheng00000000000000 efilter-1-1.5/efilter.egg-info/requires.txt0000640066434000116100000000005612762014475020736 0ustar adamsheng00000000000000python-dateutil > 2 pytz >= 2011k six >= 1.4.0efilter-1-1.5/efilter.egg-info/PKG-INFO0000640066434000116100000000065712762014475017442 0ustar adamsheng00000000000000Metadata-Version: 1.0 Name: efilter Version: 1-1.5 Summary: EFILTER query language Home-page: https://github.com/google/dotty/ Author: Adam Sindelar Author-email: adam.sindelar@gmail.com License: Apache 2.0 Description: EFILTER is a general-purpose destructuring and search language implemented in Python, and suitable for integration with any Python project that requires a search function for some of its data. Platform: UNKNOWN efilter-1-1.5/dpkg/0000750066434000116100000000000012762014475014135 5ustar adamsheng00000000000000efilter-1-1.5/dpkg/changelog0000664066434000116100000000021712713157120016005 0ustar adamsheng00000000000000efilter (1455107621-1) unstable; urgency=low * Auto-generated -- Adam Sindelar Mon, 15 Feb 2016 19:26:24 +0100 efilter-1-1.5/dpkg/control0000664066434000116100000000215312713157120015537 0ustar adamsheng00000000000000Source: efilter Section: python Priority: extra Maintainer: Adam Sindelar Build-Depends: debhelper (>= 7), python-all (>= 2.7~), python-setuptools, python-six (>= 1.4.0), python3-all (>= 3.2~), python3-setuptools, python3-six (>= 1.4.0) Standards-Version: 3.9.5 X-Python-Version: >= 2.7 X-Python3-Version: >= 3.2 Homepage: https://github.com/google/dotty/ Package: python-efilter Architecture: all Depends: python-dateutil, python-six (>= 1.4.0), python-tz, ${python:Depends}, ${misc:Depends} Description: EFILTER query language EFILTER is a general-purpose destructuring and search language implemented in Python, and suitable for integration with any Python project that requires a search function for some of its data. Package: python3-efilter Architecture: all Depends: python-dateutil, python3-six (>= 1.4.0), python3-tz, ${python3:Depends}, ${misc:Depends} Description: EFILTER query language EFILTER is a general-purpose destructuring and search language implemented in Python, and suitable for integration with any Python project that requires a search function for some of its data. efilter-1-1.5/dpkg/copyright0000664066434000116100000000165712713157120016077 0ustar adamsheng00000000000000Format: http://dep.debian.net/deps/dep5 Upstream-Name: efilter Source: https://github.com/google/dotty Files: * Copyright: 2015 Copyright 2015 Google Inc. License: Apache-2.0 Files: debian/* Copyright: 2016 Copyright 2015 Google Inc. License: Apache-2.0 License: Apache-2.0 Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at . http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. . On Debian systems, the complete text of the Apache version 2.0 license can be found in "/usr/share/common-licenses/Apache-2.0". efilter-1-1.5/dpkg/python3-efilter.docs0000664066434000116100000000004212713157120020035 0ustar adamsheng00000000000000AUTHORS.txt LICENSE.txt README.md efilter-1-1.5/dpkg/compat0000664066434000116100000000000212713157120015331 0ustar adamsheng000000000000007 efilter-1-1.5/dpkg/rules0000775066434000116100000000314012713157120015211 0ustar adamsheng00000000000000#!/usr/bin/make -f # debian/rules that uses debhelper >= 7. # Uncomment this to turn on verbose mode. #export DH_VERBOSE=1 # This has to be exported to make some magic below work. export DH_OPTIONS %: dh $@ --buildsystem=python_distutils --with=python2,python3 .PHONY: override_dh_auto_clean override_dh_auto_clean: dh_auto_clean rm -rf build efilter.egg-info/SOURCES.txt efilter.egg-info/PKG-INFO .PHONY: override_dh_auto_build override_dh_auto_build: dh_auto_build set -ex; for python in $(shell py3versions -r); do \ $$python setup.py build; \ done; .PHONY: override_dh_auto_install override_dh_auto_install: dh_auto_install --destdir $(CURDIR)/debian/python-efilter set -ex; for python in $(shell py3versions -r); do \ $$python setup.py install --root=$(CURDIR)/debian/python3-efilter --install-layout=deb; \ done; .PHONY: override_dh_auto_test override_dh_auto_test: .PHONY: override_dh_installmenu override_dh_installmenu: .PHONY: override_dh_installmime override_dh_installmime: .PHONY: override_dh_installmodules override_dh_installmodules: .PHONY: override_dh_installlogcheck override_dh_installlogcheck: .PHONY: override_dh_installlogrotate override_dh_installlogrotate: .PHONY: override_dh_installpam override_dh_installpam: .PHONY: override_dh_installppp override_dh_installppp: .PHONY: override_dh_installudev override_dh_installudev: .PHONY: override_dh_installwm override_dh_installwm: .PHONY: override_dh_installxfonts override_dh_installxfonts: .PHONY: override_dh_gconf override_dh_gconf: .PHONY: override_dh_icons override_dh_icons: .PHONY: override_dh_perl override_dh_perl: efilter-1-1.5/dpkg/python-efilter.docs0000664066434000116100000000004212713157120017752 0ustar adamsheng00000000000000AUTHORS.txt LICENSE.txt README.md efilter-1-1.5/dpkg/source/0000750066434000116100000000000012762014475015435 5ustar adamsheng00000000000000efilter-1-1.5/dpkg/source/format0000664066434000116100000000000412713157120016640 0ustar adamsheng000000000000001.0 efilter-1-1.5/sample_projects/0000750066434000116100000000000012762014475016402 5ustar adamsheng00000000000000efilter-1-1.5/sample_projects/tagging/0000750066434000116100000000000012762014475020022 5ustar adamsheng00000000000000efilter-1-1.5/sample_projects/tagging/__init__.py0000664066434000116100000000003712713157120022131 0ustar adamsheng00000000000000"""EFILTER sample projects.""" efilter-1-1.5/sample_projects/tagging/tag.py0000775066434000116100000001231212713157120021147 0ustar adamsheng00000000000000#!/usr/bin/env python # EFILTER sample project - star catalog filter. # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ A sample project that uses EFILTER to implement a custom indicator format. """ from __future__ import print_function __author__ = "Adam Sindelar " import argparse import re from efilter import api from efilter import ast from efilter import query from efilter import syntax from efilter.transforms import asdottysql def main(): parser = argparse.ArgumentParser(description="Convert a tafile to DottySQL") parser.add_argument("path", type=str) args = parser.parse_args() with open(args.path, "r") as fd: tag_rules = query.Query(fd, syntax="tagfile") # What does the query look like as DottySQL? dottysql = asdottysql.asdottysql(tag_rules) print("# Tagfile %r converted:\n\n%s" % (args.path, dottysql)) # How will the query tag this event? event = { "data_type": "windows:evtx:record", "timestamp_desc": "", "strings": ("foo", "bar"), "source_name": "Microsoft-Windows-Kernel-Power", "event_identifier": 42 } tags = api.apply(tag_rules, vars=event) print("\n# Tagfile %r returned %r." % (args.path, list(tags))) class TagFile(syntax.Syntax): """Parses the plaso tagfile format.""" # A line with no indent is a tag name. TAG_DECL_LINE = re.compile(r"^(\w+)") # A line with leading indent is one of the rules for the preceding tag. TAG_RULE_LINE = re.compile(r"^\s+(.+)") # If any of these words are in the query then it's probably objectfilter. OBJECTFILTER_WORDS = re.compile( r"\s(is|isnot|equals|notequals|inset|notinset|contains|notcontains)\s") _root = None def __init__(self, path=None, original=None, **kwargs): if original is None: if path is not None: original = open(path, "r") else: raise ValueError("Either path to a tag file or a file-like " "object must be provided as path or original.") elif path is not None: raise ValueError("Cannot provide both a path and an original.") elif not callable(getattr(original, "__iter__", None)): raise TypeError("The 'original' argument to TagFile must be " "an iterable of lines (like a file object).") super(TagFile, self).__init__(original=original, **kwargs) def __del__(self): if not self.original.closed: self.original.close() def _parse_query(self, source): """Parse one of the rules as either objectfilter or dottysql. Example: _parse_query("5 + 5") # Returns Sum(Literal(5), Literal(5)) Arguments: source: A rule in either objectfilter or dottysql syntax. Returns: The AST to represent the rule. """ if self.OBJECTFILTER_WORDS.search(source): syntax_ = "objectfilter" else: syntax_ = None # Default it is. return query.Query(source, syntax=syntax_) def _parse_tagfile(self): """Parse the tagfile and yield tuples of tag_name, list of rule ASTs.""" rules = None tag = None for line in self.original: match = self.TAG_DECL_LINE.match(line) if match: if tag and rules: yield tag, rules rules = [] tag = match.group(1) continue match = self.TAG_RULE_LINE.match(line) if match: source = match.group(1) rules.append(self._parse_query(source)) @property def root(self): if not self._root: self._root = self.parse() return self._root def parse(self): tags = [] for tag_name, rules in self._parse_tagfile(): tag = ast.IfElse( # Union will be true if any of the 'rules' match. ast.Union(*[rule.root for rule in rules]), # If so then evaluate to a string with the name of the tag. ast.Literal(tag_name), # Otherwise don't return anything. ast.Literal(None)) tags.append(tag) self.original.close() # Generate a repeated value with all the tags (None will be skipped). return ast.Repeat(*tags) # We can register our parser with the Syntax baseclass. Subsequently, the # shorthand can be given to query.Query(syntax=...) argument without having to # invoke our parser manually. syntax.Syntax.register_parser(TagFile, shorthand="tagfile") if __name__ == "__main__": main() efilter-1-1.5/sample_projects/tagging/tag_test.py0000664066434000116100000000220712713157120022205 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ EFILTER test suite. """ __author__ = "Adam Sindelar " from efilter import ast from efilter_tests import testlib from sample_projects.tagging import tag class TagfileTest(testlib.EfilterTestCase): def testFullRun(self): tagfile = tag.TagFile( path=testlib.get_fixture_path("plaso_tagfile.txt")) # This is just a sanity check. Nuanced tests for the tagfile parser # are left as exercise to the reader. self.assertIsInstance(tagfile.root, ast.Repeat) efilter-1-1.5/sample_projects/star_catalog/0000750066434000116100000000000012762014475021045 5ustar adamsheng00000000000000efilter-1-1.5/sample_projects/star_catalog/__init__.py0000664066434000116100000000003712713157120023154 0ustar adamsheng00000000000000"""EFILTER sample projects.""" efilter-1-1.5/sample_projects/star_catalog/star_catalog_test.py0000664066434000116100000000175712713157120025131 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ EFILTER test suite. """ __author__ = "Adam Sindelar " import os from efilter_tests import testlib class StarCatalogTest(testlib.EfilterTestCase): def testFullRun(self): cmd = os.path.join("sample_projects", "star_catalog", "star_catalog.py") self.assertPythonScript(cmd) efilter-1-1.5/sample_projects/star_catalog/star_catalog.py0000775066434000116100000001014112713157120024060 0ustar adamsheng00000000000000#!/usr/bin/env python # EFILTER sample project - star catalog filter. # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ A sample project that uses EFILTER to analyze a CSV file. """ from __future__ import print_function __author__ = "Adam Sindelar " import os # The API module is the easiest way to use EFILTER - the functions, 'apply', # 'search' and 'infer', take care of parsing and using the query. from efilter import api # This is a CSV file with the HYG star catalog in it. A complete list of fields # can be found at the astronexus page [1]. Of interest to us are: # # - "proper": A common name for the star, such as "Sirius". - "dist": The # distance in parsecs. - "mag": The star's apparent magnitude. # # 1: https://github.com/astronexus/HYG-Database/blob/master/README.md CATALOG_PATH = os.path.join(os.path.dirname(os.path.realpath(__file__)), "..", "..", "sample_data", "hygdata_v3.csv") # Let's declare a user function for the demo! def parsec2ly(parsecs): """Convert parsecs to light years. This is an example of a user-defined function that can be called from inside an EFILTER query. """ return parsecs * 3.262 QUERIES = [ # Basic example query. ("Count the lines in the file.", "count(csv(?))"), # More complex SELECT query: ("Find the first 10 stars with proper names.", "SELECT " " proper AS Name," # Note the 'AS' which works exactly as it does in SQL. " cast(mag, float)," # The CSV file contains strings, but we can cast. " parsec2ly(cast(dist, float)) AS ly" # ...and call functions. " FROM csv(?, decode_header:true)" # Note the keyword argument. " WHERE proper LIMIT 10"), # EFILTER supports the pseudo-SQL syntax as convenience. The processing # is actually accomplished using familiar map/filter/sort functions. ("Get 3 proper names exactly 6 characters in length.", "map(" " take(3, filter(csv(?, decode_header:true), count(proper) == 6))," " proper)") ] def main(): for description, query in QUERIES: print("# %s\n%s" % (description, query)) # We can find out what the EFILTER query will return by using the type # inference system. If it is a repeated value, we can render it in # multiple rows. result_type = api.infer(query, replacements=[CATALOG_PATH], libs=("stdcore", "stdio")) print("# Return type will be %s." % (result_type.__name__,)) # api.apply will give us the actual result of running the query, which # should be of the type we got above. results = api.apply(query, replacements=[CATALOG_PATH], allow_io=True, # We provide the top level variables in a 'vars' # argument. To bind 'parsec2ly' to the function of # the same name, we have to also wrap it in the # EFILTER user_func. This prevents EFILTER from # accidentally calling regular Python functions. vars={"parsec2ly": api.user_func(parsec2ly)}) # Because we don't know the cardinality of the query in 'query' we can # use 'getvalues' to always receive an iterator of results. This is just # a convenience function. for n, result in enumerate(api.getvalues(results)): print("%d - %r" % (n + 1, result)) print("\n\n") if __name__ == "__main__": main() efilter-1-1.5/sample_projects/__init__.py0000664066434000116100000000003712713157120020511 0ustar adamsheng00000000000000"""EFILTER sample projects.""" efilter-1-1.5/setup.py0000755066434000116100000001037212722027602014725 0ustar adamsheng00000000000000#!/usr/bin/env python import sys try: from setuptools import find_packages, setup except ImportError: from distutils.core import find_packages, setup try: from setuptools.commands.bdist_rpm import bdist_rpm except ImportError: from distutils.command.bdist_rpm import bdist_rpm try: from setuptools.command.sdist import sdist except ImportError: from distutils.command.sdist import sdist # Change PYTHONPATH to include efilter so that we can get the version. sys.path.insert(0, ".") try: from efilter.version import get_txt_version, get_version except ImportError: # If we can't import EFILTER then we can't generate version from git, but # can still just read the version file. def get_version(_=None): return get_txt_version() def get_txt_version(): try: with open("version.txt", "r") as fp: return fp.read().strip() except IOError: return None __version__ = get_txt_version() class BdistRPMCommand(bdist_rpm): """Custom handler for the bdist_rpm command.""" def _make_spec_file(self): """Generates the text of an RPM spec file. Returns: A list of strings containing the lines of text. """ # Note that bdist_rpm can be an old style class. if issubclass(BdistRPMCommand, object): spec_file = super(BdistRPMCommand, self)._make_spec_file() else: spec_file = bdist_rpm._make_spec_file(self) if sys.version_info[0] < 3: python_package = "python" else: python_package = "python3" description = [] summary = "" in_description = False python_spec_file = [] for line in spec_file: if line.startswith("Summary: "): summary = line elif line.startswith("BuildRequires: "): line = "BuildRequires: {0:s}-setuptools".format(python_package) elif line.startswith("Requires: "): if python_package == "python3": line = line.replace("python", "python3") elif line.startswith("%description"): in_description = True elif line.startswith("%files"): line = "%files -f INSTALLED_FILES -n {0:s}-%{{name}}".format( python_package) elif line.startswith("%prep"): in_description = False python_spec_file.append( "%package -n {0:s}-%{{name}}".format(python_package)) python_spec_file.append("{0:s}".format(summary)) python_spec_file.append("") python_spec_file.append( "%description -n {0:s}-%{{name}}".format(python_package)) python_spec_file.extend(description) elif in_description: # Ignore leading white lines in the description. if not description and not line: continue description.append(line) python_spec_file.append(line) return python_spec_file class SDistCommand(sdist): """Custom handler for the sdist command.""" def run(self): global __version__ __version__ = get_version(False) with open("version.txt", "w") as fd: fd.write(__version__) # Need to use old style super class invocation here for # backwards compatibility. sdist.run(self) setup(name="efilter", version=__version__, description="EFILTER query language", long_description=( "EFILTER is a general-purpose destructuring and search language " "implemented in Python, and suitable for integration with any " "Python project that requires a search function for some of its " "data."), license="Apache 2.0", author="Adam Sindelar", author_email="adam.sindelar@gmail.com", url="https://github.com/google/dotty/", packages=find_packages(exclude=["efilter_tests*"]), package_dir={"efilter": "efilter"}, cmdclass={ "bdist_rpm": BdistRPMCommand, "sdist": SDistCommand}, install_requires=[ "python-dateutil > 2", "pytz >= 2011k", "six >= 1.4.0"]) efilter-1-1.5/MANIFEST.in0000664066434000116100000000011712713157120014743 0ustar adamsheng00000000000000include AUTHORS.txt LICENSE.txt README.md version.txt recursive-include dpkg * efilter-1-1.5/efilter/0000750066434000116100000000000012762014475014642 5ustar adamsheng00000000000000efilter-1-1.5/efilter/stdlib/0000750066434000116100000000000012762014475016123 5ustar adamsheng00000000000000efilter-1-1.5/efilter/stdlib/core.py0000644066434000116100000002343512722037003017425 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ EFILTER stdlib - core module. This module defines functions that are always included in every query, as well as the base classes TypedFunction and LibraryModule, which are used to represent stdlib functions and modules. """ __author__ = "Adam Sindelar " import itertools import six import threading from efilter import protocol from efilter.protocols import applicative from efilter.protocols import counted from efilter.protocols import reducer from efilter.protocols import repeated from efilter.protocols import structured class TypedFunction(object): """Represents an EFILTER-callable function with reflection support. Each function in the standard library is an instance of a subclass of this class. Subclasses override __call__ and the reflection API. """ name = None def apply(self, args, kwargs): return self(*args, **kwargs) def __call__(self): raise NotImplementedError() @classmethod def reflect_static_args(cls): return itertools.repeat(protocol.AnyType) @classmethod def reflect_static_return(cls): return protocol.AnyType applicative.IApplicative.implicit_dynamic(TypedFunction) class TypedReducer(object): """Represents an EFILTER-callable reducer function. TypedReducer supports the IReducer protocol, but also works as a function (IApplicative), to allow it to reduce values inside rows in a query. """ name = None # IApplicative def apply(self, args, kwargs): return self(*args, **kwargs) def __call__(self, data, chunk_size=None): return reducer.reduce(self, data, chunk_size) @classmethod def reflect_static_args(cls): return (repeated.IRepeated,) @classmethod def reflect_static_return(cls): return protocol.AnyType # IReducer def fold(self, chunk): raise NotImplementedError() def merge(self, left, right): raise NotImplementedError() def finalize(self, intermediate): raise NotImplementedError() applicative.IApplicative.implicit_dynamic(TypedReducer) reducer.IReducer.implicit_dynamic(TypedReducer) class SingletonReducer(object): """Preserves a literal value and ensures it's a singleton.""" name = "singleton" def fold(self, chunk): iterator = iter(chunk) first = next(iterator) for item in iterator: if item != first: raise ValueError("All values in a singleton reducer must be " "equal to each other. Got %r != %r." % ( first, item)) return first def merge(self, left, right): if left != right: raise ValueError("All values in a singleton reducer must be " "equal to each other. Got %r != %r." % ( left, right)) return left def finalize(self, intermediate): return intermediate class LibraryModule(object): """Represents a part of the standard library. Each library module consists of a collection of vars, which are mostly instances of TypedFunction. The stdcore module also contains basic types, such as 'str' or 'int', in addition to functions. """ vars = None name = None # This is a class-level global storing all instances by their name. ALL_MODULES = {} _all_modules_lock = threading.Lock() def __init__(self, vars, name): self.vars = vars self.name = name self._all_modules_lock.acquire() try: if name in self.ALL_MODULES: raise ValueError("Duplicate module name %r." % name) self.ALL_MODULES[name] = self finally: self._all_modules_lock.release() def __del__(self): """If modules are being used properly this will only happen on exit.""" self._all_modules_lock.acquire() try: del self.ALL_MODULES[self.name] finally: self._all_modules_lock.release() def __repr__(self): return "LibraryModule(name=%r, vars=%r)" % (self.name, self.vars) def getmembers_runtime(self): return self.vars.keys() def resolve(self, name): return self.vars[name] def reflect_runtime_member(self, name): return type(self.vars[name]) structured.IStructured.implicit_static(LibraryModule) class First(TypedFunction): """Return the first value from an IRepeated.""" name = "first" def __call__(self, x): for value in repeated.getvalues(x): return value @classmethod def reflect_static_args(cls): return (repeated.IRepeated,) @classmethod def reflect_static_return(cls): return protocol.AnyType class Take(TypedFunction): """Take only the first 'count' elements from 'x' (tuple or IRepeated). This implementation is lazy. Example: take(2, (1, 2, 3, 4)) -> (1, 2) Arguments: count: How many elements to return. x: The tuple or IRepeated to take from. Returns: A lazy IRepeated. """ name = "take" def __call__(self, count, x): def _generator(): if isinstance(x, tuple): values = x else: values = repeated.getvalues(x) for idx, value in enumerate(values): if idx == count: break yield value return repeated.lazy(_generator) @classmethod def reflect_static_args(cls): return (int, repeated.IRepeated) @classmethod def reflect_static_return(cls): return repeated.IRepeated class Drop(TypedFunction): """Drop the first 'count' elements from 'x' (tuple or IRepeated). This implementation is lazy. Example: drop(2, (1, 2, 3, 4)) -> (3, 4) Arguments: count: How many elements to drop. x: The tuple or IRepeated to drop from. Returns: A lazy IRepeated. """ name = "drop" def __call__(self, count, x): def _generator(): if isinstance(x, tuple): values = x else: values = repeated.getvalues(x) for idx, value in enumerate(values): if idx < count: continue yield value return repeated.lazy(_generator) @classmethod def reflect_static_args(cls): return (int, repeated.IRepeated) @classmethod def reflect_static_return(cls): return repeated.IRepeated class Lower(TypedFunction): """Make a string lowercase.""" name = "lower" def __call__(self, x): return x.lower() @classmethod def reflect_static_args(cls): return (six.string_types[0],) @classmethod def reflect_static_return(cls): return six.string_types[0] class Find(TypedFunction): """Returns the position of 'needle' in 'string', or -1 if not found.""" name = "find" def __call__(self, string, needle): return string.find(needle) @classmethod def reflect_static_args(cls): return (six.string_types[0], six.string_types[0]) @classmethod def reflect_static_return(cls): return int class Count(TypedReducer): """Counts the number of elements in a tuple or of values in a repeated.""" name = "count" def fold(self, chunk): return counted.count(chunk) def merge(self, left, right): return left + right def finalize(self, intermediate): return intermediate @classmethod def reflect_static_return(cls): return int class Reverse(TypedFunction): """Reverse a tuple of a repeated and maintains the type.""" name = "reverse" def __call__(self, x): if isinstance(x, tuple): return tuple(reversed(x)) return repeated.meld(*reversed(repeated.getvalues(x))) @classmethod def reflect_static_args(cls): return (repeated.IRepeated,) @classmethod def reflect_static_return(cls): return repeated.IRepeated class Materialize(TypedFunction): """Force a repeated value (e.g. output of map) to materialize in memory.""" name = "materialize" def __call__(self, rv): return repeated.repeated(*list(rv)) @classmethod def reflect_static_args(cls): return (repeated.IRepeated,) @classmethod def reflect_static_return(cls): return repeated.IRepeated MODULE = LibraryModule(name="stdcore", vars={Take.name: Take(), Drop.name: Drop(), Count.name: Count(), Reverse.name: Reverse(), Lower.name: Lower(), Find.name: Find(), SingletonReducer.name: SingletonReducer(), First.name: First(), Materialize.name: Materialize(), # Built-in types below: "int": int, "str": six.text_type, "bytes": six.binary_type, "float": float}) efilter-1-1.5/efilter/stdlib/math.py0000644066434000116100000001137412722075331017433 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ (EXPERIMENTAL) EFILTER stdlib - math module. """ __author__ = "Adam Sindelar " import six from six.moves import xrange from efilter.protocols import counted from efilter.protocols import number from efilter.stdlib import core # Analytical functions: class LevenshteinDistance(core.TypedFunction): """Compute Levenshtein distance between 'x' and 'y'. Levenshtein distance is, informally, the number of insert/delete/substitute operations needed to transform 'x' to 'y'. Computing the distance takes O(N * M) steps using the bottom-up dynamic programming approach below. See: https://en.wikipedia.org/wiki/Levenshtein_distance. """ name = "levenshtein" def __call__(self, x, y): lx = len(x) ly = len(y) # Base cases: if not lx: return ly if not ly: return lx if lx > ly: # This saves space, because the rows are shorter. return self(y, x) # Conceptually, this is a matrix of edit distances between prefixes of # x and y, arranged so that every coordinate pair into the matrix is # the levenshtein distance between the first 'i' characters of 'x' and # first 'j' characters of 'y'. To compute the distance from x to y we # need all intermediate results, but only the last two rows at a time. # The first row of edit distances: an empty string can be transformed # into a string of length N in N steps. current_row = list(xrange(lx)) for i in xrange(1, ly): previous_row = current_row current_row = [0] * lx current_row[0] = i for j in xrange(1, lx): if x[j - 1] == y[i - 1]: substitution_cost = 0 else: substitution_cost = 1 # One of three operations will have to lowest cost. They are, # in order, substitution (or nop), deletion and insertion. current_row[j] = min( previous_row[j - 1] + substitution_cost, previous_row[j] + 1, current_row[j - 1] + 1) return current_row[-1] @classmethod def reflect_static_args(cls): return (six.string_types[0], six.string_types[0]) @classmethod def reflect_static_return(cls): return int # Aggregate functions (reducers): class Mean(core.TypedReducer): """(EXPERIMENTAL) Computes the mean.""" name = "mean" def fold(self, chunk): return (sum(chunk), counted.count(chunk)) def merge(self, left, right): return (left[0] + right[0], left[1] + right[1]) def finalize(self, intermediate): total, count = intermediate return float(total) / count @classmethod def reflect_static_return(cls): return int class Sum(core.TypedReducer): """(EXPERIMENTAL) Computes a sum of numbers.""" name = "sum" def fold(self, chunk): return sum(chunk) def merge(self, left, right): return left + right def finalize(self, intermediate): return intermediate @classmethod def reflect_static_return(cls): return number.INumber class VectorSum(core.TypedReducer): """(EXPERIMENTAL) Computes a sum of vectors of numbers of constant size.""" name = "vector_sum" def fold(self, chunk): iterator = iter(chunk) running_sum = list(next(chunk)) expected_len = len(running_sum) for row in iterator: if len(row) != expected_len: raise ValueError( "vector_sum can only add up vectors of same size.") for idx, col in enumerate(row): running_sum[idx] += col def merge(self, left, right): return self.fold([left, right]) def finalize(self, intermediate): return intermediate @classmethod def reflect_static_return(cls): return list MODULE = core.LibraryModule( name="stdmath", vars={ Mean.name: Mean(), Sum.name: Sum(), LevenshteinDistance.name: LevenshteinDistance() } ) efilter-1-1.5/efilter/stdlib/__init__.py0000664066434000116100000000022412713157120020230 0ustar adamsheng00000000000000"""EFILTER tests.""" from efilter.stdlib import io as std_core from efilter.stdlib import io as std_io from efilter.stdlib import math as std_math efilter-1-1.5/efilter/stdlib/io.py0000664066434000116100000000623312713157120017106 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ EFILTER stdlib - IO module. """ __author__ = "Adam Sindelar " import six from efilter.ext import csv_reader from efilter.protocols import repeated from efilter.stdlib import core class Lines(core.TypedFunction): """Return an IRepeated with lines from text file 'path'. Arguments: path: String with the path to the file to read in. Raises: IOError if the file can't be opened for whatever reason. Returns: An object implementing IRepeated containing the lines of in the file as strings. """ name = "lines" def __call__(self, path): fd = open(path, "r") # We don't close fd here, because repeated.lines is lazy and will read # on demand. The descriptor will be closed in the repeated value's # destructor. return repeated.lines(fd) @classmethod def reflect_static_args(cls): return (six.string_types[0],) @classmethod def reflect_static_return(cls): return repeated.IRepeated class CSV(core.TypedFunction): """Return an IRepeated with file at 'path' decoded as CSV. Arguments: path: Same as 'Lines' decode_header: Use the first line in the file for column names and return a dict per line, instead of tuple per line. (default: False.) delim: Column separator (default: ","). quote: Quote character (defalt: double quote). trim: Eliminate leading whitespace (default: True). Raises: IOError if the file can't be opened for whatever reason. Returns: An IRepeated containing the lines in the CSV file decoded as either a tuple of values per line, or a dict of values per line, if 'decode_header' is True. """ name = "csv" def __call__(self, path, decode_header=False, delim=",", quote="\"", trim=True): fd = open(path, "r") # We don't close fd here, because repeated.lines is lazy and will read # on demand. The descriptor will be closed in the repeated value's # destructor. return csv_reader.LazyCSVReader(fd=fd, output_dicts=decode_header, delim=delim, quote=quote, trim=trim) @classmethod def reflect_static_args(cls): return (six.string_types[0], bool) @classmethod def reflect_static_return(cls): return repeated.IRepeated MODULE = core.LibraryModule(name="stdio", vars={CSV.name: CSV(), Lines.name: Lines()}) efilter-1-1.5/efilter/version.py0000640066434000116100000000636112762014472016705 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ EFILTER versioning scheme. EFILTER version is in the following format: YEAR.MONTH.REVCOUNT, where revcount is the number of commits since initial commit on the master branch. This we believe strikes a good balance between human readable strings, and ability to tie a release to the git revision it was built from. """ __author__ = "Adam Sindelar " import logging import re RELEASE = "Awesome Sauce" MAJOR = 1 MINOR = 5 ANCHOR_TAG = "v%d.%d" % (MAJOR, MINOR) try: import datetime import pytz import subprocess # The below functionality is only available if dateutil is installed. from dateutil import parser def git_commits_since_tag(tag): try: p = subprocess.Popen( ["git", "log", "%s..master" % tag, "--oneline"], stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=False) errors = p.stderr.read() p.stderr.close() commits = p.stdout.readlines() return commits except (OSError, IndexError): if errors: logging.warn("git log failed with %r" % errors) return None def git_dev_version(): commits = git_commits_since_tag(ANCHOR_TAG) if not commits: return "1!%d.%d.dev0" % (MAJOR, MINOR) return "1!%d.%d.dev%d" % (MAJOR, MINOR, len(commits)) except ImportError: logging.warn("pytz or dateutil are not available - getting a version " "number from git won't work.") # If there's no dateutil then doing the git tango is pointless. def git_verbose_version(): pass def get_pkg_version(): """Get version string by parsing PKG-INFO.""" try: with open("PKG-INFO", "r") as fp: rgx = re.compile(r"Version: (\d+)") for line in fp.readlines(): match = rgx.match(line) if match: return match.group(1) except IOError: return None def get_txt_version(): """Get version string from version.txt.""" try: with open("version.txt", "r") as fp: return fp.read().strip() except IOError: return None def get_version(dev_version=False): """Generates a version string. Arguments: dev_version: Generate a verbose development version from git commits. Examples: 1.1 1.1.dev43 # If 'dev_version' was passed. """ if dev_version: version = git_dev_version() if not version: raise RuntimeError("Could not generate dev version from git.") return version return "1!%d.%d" % (MAJOR, MINOR) efilter-1-1.5/efilter/dispatch.py0000664066434000116100000003435212713157120017020 0ustar adamsheng00000000000000# -*- coding: utf-8 -*- # EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ EFILTER type system. This module implements multimethod function dispatch. """ __author__ = "Adam Sindelar " import functools import six import threading def memoize(func): # Declare the class in this lexical scope so 'func' is bound to the # decorated callable. class memdict(dict): """Calls 'func' for missing keys in this dict subclass.""" def __missing__(self, args): result = func(*args) self[args] = result return result cache = memdict() def memoized(*args): return cache[args] return memoized def call_audit(func): """Print a detailed audit of all calls to this function.""" def audited_func(*args, **kwargs): import traceback stack = traceback.extract_stack() r = func(*args, **kwargs) func_name = func.__name__ print("@depth %d, trace %s -> %s(*%r, **%r) => %r" % ( len(stack), " -> ".join("%s:%d:%s" % x[0:3] for x in stack[-5:-2]), func_name, args, kwargs, r)) return r return audited_func def _class_dispatch(args, kwargs): """See 'class_multimethod'.""" _ = kwargs if not args: raise ValueError( "Multimethods must be passed at least one positional arg.") if not isinstance(args[0], type): raise TypeError( "class_multimethod must be called with a type, not instance.") return args[0] def class_multimethod(func): """Declare a multimethod that dispatches on the first arg. If you think of 'multimethod' as working on the instances of classes then this would work on the classes. """ return multimethod(func, dispatch_function=_class_dispatch) class multimethod(object): """Multimethod that dispatches on the type of the first arg. This function decorator can be used on instance methods as well as regular functions. It allows the function to dispatch on the type of its first argument, much like standard python instance methods dispatch on the type of self (conceptually, not in actuality). This enables us to define arbitrary interfaces and have already existing types participate in those interfaces, without having to actually alter the existing type hierarchy or monkey-patch additional functions into their namespaces. This approach is used in EFILTER to enable it to be easily added to existing codebases, which may already overload many operators and have their own conventions about how members of objects are accessed and types interact. Arguments: func: The original function passed to the decorator. Usually 'func' should just raise NotImplementedError, but if not, it can be used as sort of a default behavior. dispatch_function: Optional. Can override the dispatch type derivation function, which takes the type of the first arg by default. Examples: @multimethod def say_moo(bovine): raise NotImplementedError() class Cow(): pass say_moo.implement(for_type=Cow, implementation=lambda x: "Moo!") class Sheep(): pass say_moo.implement(for_type=Sheep, implementation=lambda x: "Baah!") shaun = Sheep() bessy = Cow() say_moo(shaun) # => "Baah!" say_moo(bessy) # => "Moo!" """ # Locks _dispatch_table and implementations. _write_lock = None # Cache of type -> implementation. _dispatch_table = None # Table of which dispatch type is preferred over which other type in # cases that benefit from disambiguation. _prefer_table = None implementations = None func = None is_multimethod = True # Can override behavior of default_dispatch to derive the dispatch type # some other way. For example, using types of more than just the first # argument, or by using the argument itself, in case of functions that # take classes as parameters. dispatch_function = None def __init__(self, func, dispatch_function=None): self._write_lock = threading.Lock() self.func = func self._dispatch_table = {} self._prefer_table = {} self.implementations = [] self.dispatch_function = dispatch_function or self.default_dispatch functools.update_wrapper(self, func) @staticmethod def default_dispatch(args, kwargs): """Returns the type of the first argument as dispatch key.""" _ = kwargs if not args: raise ValueError( "Multimethods must be passed at least one positional arg.") return type(args[0]) @property def func_name(self): return self.func.__name__ def __repr__(self): return "multimethod(%s)" % self.func_name def __call__(self, *args, **kwargs): """Pick the appropriate overload based on args and call it.""" dispatch_type = self.dispatch_function(args, kwargs) implementation = self._find_and_cache_best_function(dispatch_type) if implementation: return implementation(*args, **kwargs) # Fall-through to calling default implementation. By convention, the # default will usually raise a NotImplemented exception, but there # may be times when it will actually do something useful (good example # are convenience type checking functions, such as isrepeated). try: return self.func(*args, **kwargs) except NotImplementedError: # Throw a better exception. if isinstance(None, dispatch_type): raise TypeError( "%r was passed None for first argument, which was " "unexpected." % self.func_name) implemented_types = [t for t, _ in self.implementations] raise NotImplementedError( "Multimethod %r is not implemented for type %r and has no " "default behavior. Overloads are defined for %r." % (self.func_name, dispatch_type, implemented_types)) def implemented_for_type(self, dispatch_type): candidate = self._find_and_cache_best_function(dispatch_type) return candidate is not None def _preferred(self, preferred, over): prefs = self._prefer_table.get(preferred) if prefs and over in prefs: return True return False def prefer_type(self, prefer, over): """Prefer one type over another type, all else being equivalent. With abstract base classes (Python's abc module) it is possible for a type to appear to be a subclass of another type without the supertype appearing in the subtype's MRO. As such, the supertype has no order with respect to other supertypes, and this may lead to amguity if two implementations are provided for unrelated abstract types. In such cases, it is possible to disambiguate by explictly telling the function to prefer one type over the other. Arguments: prefer: Preferred type (class). over: The type we don't like (class). Raises: ValueError: In case of logical conflicts. """ self._write_lock.acquire() try: if self._preferred(preferred=over, over=prefer): raise ValueError( "Type %r is already preferred over %r." % (over, prefer)) prefs = self._prefer_table.setdefault(prefer, set()) prefs.add(over) finally: self._write_lock.release() def _find_and_cache_best_function(self, dispatch_type): """Finds the best implementation of this function given a type. This function caches the result, and uses locking for thread safety. Returns: Implementing function, in below order of preference: 1. Explicitly registered implementations (through multimethod.implement) for types that 'dispatch_type' either is or inherits from directly. 2. Explicitly registered implementations accepting an abstract type (interface) in which dispatch_type participates (through abstract_type.register() or the convenience methods). 3. Default behavior of the multimethod function. This will usually raise a NotImplementedError, by convention. Raises: TypeError: If two implementing functions are registered for different abstract types, and 'dispatch_type' participates in both, and no order of preference was specified using prefer_type. """ result = self._dispatch_table.get(dispatch_type) if result: return result # The outer try ensures the lock is always released. with self._write_lock: try: dispatch_mro = dispatch_type.mro() except TypeError: # Not every type has an MRO. dispatch_mro = () best_match = None result_type = None for candidate_type, candidate_func in self.implementations: if not issubclass(dispatch_type, candidate_type): # Skip implementations that are obviously unrelated. continue try: # The candidate implementation may be for a type that's # actually in the MRO, or it may be for an abstract type. match = dispatch_mro.index(candidate_type) except ValueError: # This means we have an implementation for an abstract # type, which ranks below all concrete types. match = None if best_match is None: if result and match is None: # Already have a result, and no order of preference. # This is probably because the type is a member of two # abstract types and we have separate implementations # for those two abstract types. if self._preferred(candidate_type, over=result_type): result = candidate_func result_type = candidate_type elif self._preferred(result_type, over=candidate_type): # No need to update anything. pass else: raise TypeError( "Two candidate implementations found for " "multimethod function %s (dispatch type %s) " "and neither is preferred." % (self.func_name, dispatch_type)) else: result = candidate_func result_type = candidate_type best_match = match if (match or 0) < (best_match or 0): result = candidate_func result_type = candidate_type best_match = match self._dispatch_table[dispatch_type] = result return result @staticmethod def __get_types(for_type=None, for_types=None): """Parse the arguments and return a tuple of types to implement for. Raises: ValueError or TypeError as appropriate. """ if for_type: if for_types: raise ValueError("Cannot pass both for_type and for_types.") for_types = (for_type,) elif for_types: if not isinstance(for_types, tuple): raise TypeError("for_types must be passed as a tuple of " "types (classes).") else: raise ValueError("Must pass either for_type or for_types.") return for_types def implementation(self, for_type=None, for_types=None): """Return a decorator that will register the implementation. Example: @multimethod def add(x, y): pass @add.implementation(for_type=int) def add(x, y): return x + y @add.implementation(for_type=SomeType) def add(x, y): return int(x) + int(y) """ for_types = self.__get_types(for_type, for_types) def _decorator(implementation): self.implement(implementation, for_types=for_types) return self return _decorator @staticmethod def __get_unbound_function(method): try: return six.get_method_function(method) except AttributeError: return method def implement(self, implementation, for_type=None, for_types=None): """Registers an implementing function for for_type. Arguments: implementation: Callable implementation for this type. for_type: The type this implementation applies to. for_types: Same as for_type, but takes a tuple of types. for_type and for_types cannot both be passed (for obvious reasons.) Raises: ValueError """ unbound_implementation = self.__get_unbound_function(implementation) for_types = self.__get_types(for_type, for_types) for t in for_types: self._write_lock.acquire() try: self.implementations.append((t, unbound_implementation)) finally: self._write_lock.release() efilter-1-1.5/efilter/errors.py0000664066434000116100000000744512713157120016540 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ EFILTER abstract syntax. """ __author__ = "Adam Sindelar " class EfilterError(Exception): query = None _root = None message = None start = None end = None def __init__(self, query=None, message=None, root=None, start=None, end=None): super(EfilterError, self).__init__(message) self.query = query self.message = message self.root = root if start is not None: self.start = start if end is not None: self.end = end @property def root(self): return self._root @root.setter def root(self, value): self._root = value try: self.start = value.start self.end = value.end except AttributeError: self.start = None self.end = None @property def text(self): return self.message @property def adjusted_start(self): """Start of the error in self.source (with the >>> and <<< delims).""" if self.start is not None: return self.start @property def adjusted_end(self): """End of the error in self.source (with the >>> and <<< delims).""" if self.end is not None: return self.end + 9 @property def source(self): if not self.query: return None if self.start is not None and self.end is not None: return "%s >>> %s <<< %s" % ( self.query[0:self.start], self.query[self.start:self.end], self.query[self.end:]) elif self.query: return self.query def __str__(self): return "%s (%s) in query %r" % ( type(self).__name__, self.text, self.source) def __repr__(self): return "%s(message=%r, start=%r, end=%r)" % ( type(self), self.message, self.start, self.end) class EfilterLogicError(EfilterError): pass class EfilterNoneError(EfilterError): pass class EfilterParseError(EfilterError): token = None def __init__(self, *args, **kwargs): self.token = kwargs.pop("token", None) super(EfilterParseError, self).__init__(*args, **kwargs) class EfilterKeyError(EfilterError): key = None @property def text(self): if self.message: return self.message if self.key: return "No such key %r." % self.key return None def __init__(self, *args, **kwargs): self.key = kwargs.pop("key", None) super(EfilterKeyError, self).__init__(*args, **kwargs) class EfilterTypeError(EfilterError): expected = None actual = None @property def text(self): if self.message: return self.message if self.expected and self.actual: return "Expected type %r, got %r instead." % (self.expected, self.actual) return None def __init__(self, *args, **kwargs): self.expected = kwargs.pop("expected", None) self.actual = kwargs.pop("actual", None) super(EfilterTypeError, self).__init__(*args, **kwargs) efilter-1-1.5/efilter/query.py0000664066434000116100000000761012713157120016363 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ EFILTER query wrapper. """ __author__ = "Adam Sindelar " import six from efilter import syntax as s from efilter import ast def guess_source_syntax(source): if isinstance(source, ast.Expression): return "expression" if isinstance(source, six.string_types): return "dottysql" if isinstance(source, tuple): return "lisp" return None class Query(object): source = None root = None syntax = None application_delegate = None params = None def __init__(self, source, root=None, params=None, syntax=None, application_delegate=None): super(Query, self).__init__() if isinstance(source, Query): # Run as a copy constructor with optional overrides. self.root = source.root self.source = source.source self.application_delegate = source.application_delegate self.syntax = source.syntax self.params = source.params elif isinstance(source, ast.Expression): # TODO: This will go away when other stops relying on it. self.root = source else: self.source = source # Override anything set by above code with explicit args. if syntax is not None: self.syntax = syntax if application_delegate is not None: self.application_delegate = application_delegate if params is not None: self.params = params if root is not None: if root != self.root: self.source = None # No longer valid. self.root = root # Generate missing information. if not self.source and not self.root: raise ValueError("Must pass at least 'source' or 'root'.") if self.source and not self.root: # Run parser to generate AST. if not self.syntax: self.syntax = guess_source_syntax(self.source) parser_cls = s.Syntax.get_syntax(self.syntax) if not parser_cls: raise ValueError( "Cannot find parser for syntax %r. Source was %r." % (self.syntax, self.source)) parser = parser_cls(original=self.source, params=self.params) self.root = parser.root elif self.root and not self.source: # Run formatter to generate the source. if not self.syntax: # Good, fully expressive default. self.syntax = "dottysql" formatter = s.Syntax.get_formatter(self.syntax) if not formatter: # If we don't have a formatter for the explicit syntax, just # generate at least /something/. formatter = s.Syntax.get_formatter("dottysql") self.source = formatter(self.root) def __str__(self): return unicode(self) def __unicode__(self): return unicode(self.source) def __repr__(self): return "Query(%s)" % repr(self.source) def __hash__(self): return hash(self.root) def __eq__(self, other): if not isinstance(other, Query): return False return self.root == other.root def __ne__(self, other): return not self.__eq__(other) efilter-1-1.5/efilter/parsers/0000750066434000116100000000000012762014475016321 5ustar adamsheng00000000000000efilter-1-1.5/efilter/parsers/literal.py0000664066434000116100000000255312713157120020332 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ EFILTER special syntaxes. Syntaxs in this module don't really implement a language - they're special cases for just passing through literals and stuff. """ __author__ = "Adam Sindelar " from efilter import ast from efilter import syntax class LiteralSyntax(syntax.Syntax): """This is basically and identity function for literals.""" @property def root(self): return ast.Literal(self.original) syntax.Syntax.register_parser(LiteralSyntax, shorthand="literal") class PassthroughSyntax(syntax.Syntax): """This is basically and identity function for expressions.""" @property def root(self): return self.original syntax.Syntax.register_parser(LiteralSyntax, shorthand="expression") efilter-1-1.5/efilter/parsers/__init__.py0000664066434000116100000000026412713157120020432 0ustar adamsheng00000000000000"""EFILTER Forensic Query Language""" from efilter.parsers import dottysql from efilter.parsers import legacy from efilter.parsers import lisp from efilter.parsers import literal efilter-1-1.5/efilter/parsers/legacy/0000750066434000116100000000000012762014475017565 5ustar adamsheng00000000000000efilter-1-1.5/efilter/parsers/legacy/objectfilter.py0000664066434000116100000002043712713157120022617 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ This module implements a syntax similar to objectfilter [1], with the following differences: - The context operator (@) isn't implemented. It would probably not be too difficult to add, but isn't actually being used by any of the objectfilter projects, as far as I know. - The handling of list literals is different: - Nested lists ARE supported (the grammar is fully recursive). - Elements in lists MUST be separated by commas, and commas MUST separete elements in lists (so, "[,,]" isn't allowed). There are probably other subtle differences owing to the very different design of the canonical objectfilter parser. The below seems to work well enough in all the cases I've tested, though. 1: https://github.com/google/objectfilter/ """ from efilter import ast from efilter import syntax from efilter.parsers.common import ast_transforms from efilter.parsers.common import grammar from efilter.parsers.common import parser from efilter.parsers.common import tokenizer class ObjectFilterSyntax(syntax.Syntax): OPERATORS = [ # Aliases for equivalence: grammar.Operator(name="equals", precedence=3, assoc="left", handler=ast.Equivalence, docstring=None, prefix=None, infix=grammar.Token("symbol", "equals"), suffix=None), grammar.Operator(name="is", precedence=3, assoc="left", handler=ast.Equivalence, docstring=None, prefix=None, infix=grammar.Token("symbol", "is"), suffix=None), grammar.Operator(name="==", precedence=3, assoc="left", handler=ast.Equivalence, docstring=None, prefix=None, infix=grammar.Token("symbol", "=="), suffix=None), grammar.Operator(name="notequals", precedence=3, assoc="left", handler=ast_transforms.ComplementEquivalence, docstring=None, prefix=None, infix=grammar.Token("symbol", "notequals"), suffix=None), grammar.Operator(name="isnot", precedence=3, assoc="left", handler=ast_transforms.ComplementEquivalence, docstring=None, prefix=None, infix=grammar.Token("symbol", "isnot"), suffix=None), grammar.Operator(name="!=", precedence=3, assoc="left", handler=ast_transforms.ComplementEquivalence, docstring=None, prefix=None, infix=grammar.Token("symbol", "!="), suffix=None), # Logical: grammar.Operator(name="or", precedence=0, assoc="left", handler=ast.Union, docstring="Logical OR.", prefix=None, suffix=None, infix=grammar.Token("symbol", "or")), grammar.Operator(name="and", precedence=1, assoc="left", handler=ast.Intersection, docstring="Logical AND.", prefix=None, suffix=None, infix=grammar.Token("symbol", "and")), grammar.Operator(name="||", precedence=0, assoc="left", handler=ast.Union, docstring="Logical OR.", prefix=None, suffix=None, infix=grammar.Token("symbol", "||")), grammar.Operator(name="&&", precedence=1, assoc="left", handler=ast.Intersection, docstring="Logical AND.", prefix=None, suffix=None, infix=grammar.Token("symbol", "&&")), # Comparisons: grammar.Operator(name=">=", precedence=3, assoc="left", handler=ast.PartialOrderedSet, docstring="Equal-or-greater-than.", prefix=None, suffix=None, infix=grammar.Token("symbol", ">=")), grammar.Operator(name="<=", precedence=3, assoc="left", handler=ast_transforms.ReversePartialOrderedSet, docstring="Equal-or-less-than.", prefix=None, suffix=None, infix=grammar.Token("symbol", "<=")), grammar.Operator(name=">", precedence=3, assoc="left", handler=ast.StrictOrderedSet, docstring="Greater-than.", prefix=None, suffix=None, infix=grammar.Token("symbol", ">")), grammar.Operator(name="<", precedence=3, assoc="left", handler=ast_transforms.ReverseStrictOrderedSet, docstring="Less-than.", prefix=None, suffix=None, infix=grammar.Token("symbol", "<")), # Set ops: grammar.Operator(name="notinset", precedence=3, assoc="left", handler=ast_transforms.ComplementMembership, docstring="Left-hand operand is not in list.", prefix=None, suffix=None, infix=(grammar.Token("symbol", "notinset"))), grammar.Operator(name="inset", precedence=3, assoc="left", handler=ast.Membership, docstring="Left-hand operand is in list.", prefix=None, suffix=None, infix=grammar.Token("symbol", "inset")), grammar.Operator(name="notcontains", precedence=3, assoc="left", handler=ast_transforms.ReverseComplementMembership, docstring="Right-hand operand is not in list.", prefix=None, suffix=None, infix=(grammar.Token("symbol", "notcontains"))), grammar.Operator(name="contains", precedence=3, assoc="left", handler=ast_transforms.ReverseMembership, docstring="Right-hand operand is in list.", prefix=None, suffix=None, infix=grammar.Token("symbol", "contains")), # Miscellaneous: grammar.Operator(name="unary -", precedence=5, assoc="right", handler=ast_transforms.NegateValue, docstring=None, infix=None, suffix=None, prefix=grammar.Token("symbol", "-")), grammar.Operator(name="list builder", precedence=14, assoc="left", handler=ast.Tuple, docstring=None, prefix=grammar.Token("lbracket", "["), infix=grammar.Token("comma", ","), suffix=grammar.Token("rbracket", "]")), grammar.Operator(name="regexp", precedence=3, assoc="left", handler=ast.RegexFilter, docstring="Match LHS against regex on RHS.", prefix=None, suffix=None, infix=grammar.Token("symbol", "regexp")), grammar.Operator(name=".", precedence=12, assoc="left", handler=ast_transforms.NormalizeResolve, docstring="OBJ.MEMBER -> return MEMBER of OBJ.", prefix=None, suffix=None, infix=grammar.Token("symbol", ".")), ] def __init__(self, original, params=None): super(ObjectFilterSyntax, self).__init__(original) if params is not None: raise ValueError("ObjectFilterSyntax doesn't support parameters.") t = tokenizer.LazyTokenizer(original) self.parser = parser.ExpressionParser(operators=self.OPERATORS, tokenizer=t) @property def root(self): return self.parser.parse() syntax.Syntax.register_parser(ObjectFilterSyntax, shorthand="objectfilter") efilter-1-1.5/efilter/parsers/legacy/__init__.py0000664066434000116100000000020612713157120021672 0ustar adamsheng00000000000000"""EFILTER Forensic Query Language This module implements some legacy syntaxes. """ from efilter.parsers.legacy import objectfilter efilter-1-1.5/efilter/parsers/common/0000750066434000116100000000000012762014475017611 5ustar adamsheng00000000000000efilter-1-1.5/efilter/parsers/common/parser.py0000664066434000116100000001461412713157120021463 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ This module implements a customizable precedence-climbing parser. """ __author__ = "Adam Sindelar " from efilter import ast from efilter import errors from efilter.parsers.common import grammar from efilter.parsers.common import token_stream class ExpressionParser(object): """Precedence-climbing parser with support for *fix operators. Precedence-climbing parsers refer to an operator precedence table which can be modified at runtime. This implementation supports prefix, infix, suffix and mixfix operators and can be used to support grammars that aren't known ahead of time. This parser also supports circumfix operators with repeated infix separators, which allows for list builders and the like. For example: # This builds a list: Operator(prefix=Token("lbracket", "["), infix=Token("comma", ","), suffix=Token("rbracket", "]"), handler=ast.Tuple) # The above doesn't conflict with, for example, array subscription # because mixfix and circumfix operators are non-ambiguous: Operator(prefix=None, infix=Token("lbracket", "["), suffix=Token("rbracket", "]"), handler=ast.Select) Precedence-climbing is particularly suitable for atom/operator expressions, but doesn't extend well to more complex grammars, such as SQL, function application, C-like languages, etc. For those more complex use-cases, this class can still be invoked for the subsections that are pure expression syntax. * Sometimes called postcirfumfix: infix + suffix part, like x[y]. """ operators = None @property def original(self): return self.tokens.tokenizer.source def __init__(self, operators, tokenizer): self.operators = grammar.OperatorTable(*operators) self.tokens = token_stream.TokenStream(tokenizer) def parse(self): result = self.expression() if self.tokens.peek(0): token = self.tokens.peek(0) raise errors.EfilterParseError( message="Unexpected %s '%s' here." % (token.name, token.value), query=self.original, token=token) if result is None: raise errors.EfilterParseError( message="Query %r is empty." % self.original) return result def expression(self, previous_precedence=0): lhs = self.atom() return self.operator(lhs, previous_precedence) def atom(self): # Unary operator. if self.tokens.accept(grammar.prefix, self.operators): operator = self.tokens.matched.operator start = self.tokens.matched.start children = [self.expression(operator.precedence)] # Allow infix to be repeated in circumfix operators. if operator.infix: while self.tokens.accept(grammar.match_tokens(operator.infix)): children.append(self.expression()) # If we have a suffix expect it now. if operator.suffix: self.tokens.expect(grammar.match_tokens(operator.suffix)) return operator.handler(*children, start=start, end=self.tokens.matched.end, source=self.original) if self.tokens.accept(grammar.literal): return ast.Literal(self.tokens.matched.value, source=self.original, start=self.tokens.matched.start, end=self.tokens.matched.end) if self.tokens.accept(grammar.symbol): return ast.Var(self.tokens.matched.value, source=self.original, start=self.tokens.matched.start, end=self.tokens.matched.end) if self.tokens.accept(grammar.lparen): expr = self.expression() self.tokens.expect(grammar.rparen) return expr if self.tokens.peek(0): raise errors.EfilterParseError( message="Was not expecting %r here." % self.tokens.peek(0).name, token=self.tokens.peek(0)) else: raise errors.EfilterParseError("Unexpected end of input.") def _infix_of_min_precedence(self, tokens, precedence): match = grammar.infix(tokens, self.operators) if not match: return if match.operator.precedence < precedence: return return match def operator(self, lhs, min_precedence): while self.tokens.accept(self._infix_of_min_precedence, min_precedence): operator = self.tokens.matched.operator if operator.prefix: raise ValueError("infix+prefix operators aren't supported.") if operator.suffix: rhs = self.expression() self.tokens.expect(grammar.match_tokens(operator.suffix)) rhs.end = self.tokens.matched.end else: rhs = self.atom() next_min_precedence = operator.precedence if operator.assoc == "left": next_min_precedence += 1 while self.tokens.match(grammar.infix, self.operators): if (self.tokens.matched.operator.precedence < next_min_precedence): break rhs = self.operator(rhs, self.tokens.matched.operator.precedence) if not rhs: raise errors.EfilterParseError( message="Expecting the operator RHS here.", token=self.tokens.peek(0)) lhs = operator.handler(lhs, rhs, start=lhs.start, end=rhs.end, source=self.original) return lhs efilter-1-1.5/efilter/parsers/common/grammar.py0000640066434000116100000002742712736453202021622 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ This module provides grammar primitives common across most parsers. ### What is a grammar? In the EFILTER world, we use the word 'grammar' to mean a collection of stateless functions that take an iterable of tokens and return a TokenMatch if the tokens match the grammatical construct the function represents. These functions are called 'grammar functions' (gasp!). For example, a grammar function that matches a parenthesis would be: def lparen(tokens): first_token = next(iter(tokens)) if first_token.name == "lparen": return TokenMatch(operator=None, value=None, tokens=[first_token]) To make writing grammar functions easier, this module provides a number of primitives that largely insulate the programmer from having to write all of the above. The real world lparen function actually looks like this: def lparen(tokens): return token_name(tokens, "lparen") ### Operators A common theme across most grammars are operators, and this module provides a convenient container to group grammar functions related to operators. The 'Operator' container groups basic information about an operator, starting with its name and docstring, its suffix, infix and prefix parts, and also an AST construct that the operator maps onto. For example, the addition operator would look like this: plus = Operator( handler=ast.Sum, assoc="left", infix="+" ...) """ __author__ = "Adam Sindelar " import collections import itertools import six class Token(collections.namedtuple("Token", "name value start end")): """Represents one token, which is what grammars operate on.""" def __new__(cls, name, value, start=None, end=None): return super(Token, cls).__new__(cls, name, value, start, end) def __repr__(self): return "Token(name='%s', value='%s', start=%d, end=%d)" % ( self.name, self.value, self.start or 0, self.end or 0) def __eq__(self, other): """Tokens compare on name and value, not on position.""" return (self.name, self.value) == (other.name, other.value) def __hash__(self): """Tokens hash on name and value, not on position.""" return hash((self.name, self.value)) class Operator(collections.namedtuple( "Operator", "name precedence assoc handler docstring prefix infix suffix")): """Declares an operator in a grammar with functions to match it. Operators can have prefix, infix and suffix parts, each of which is represented by the token (like Token("keyword", "+", None)). Each operator must have at least one of the *fixes. This class has no restriction on which *fixes can be used together, but the individual parsers may not support every combination. For example, DottySQL doesn't parse circumfix (prefix + suffix) operators. Previously, DottySQL used grammar functions for suffixes, which works well when there is only a small number of them, but is very slow if there are many operators. In practice, the grammar functions matching *fixes almost always just call _keyword, which means they can be replaced with a lookup in the operator table. Arguments: name: The literal name of the operator, such as "+" or "not". precedence: Integer precedence with operators of the same arity. handler: Callable that emits AST for this operator. docstring: Documentation for the operator. assoc: Associativity - can be left or right for infix operators. suffix: (OPTIONAL) The token (not grammar function) of the suffix. prefix: (OPTIONAL) The token (not grammar function) of the prefix. infix: (OPTIONAL) The token (not grammar function) of the infix. """ class TokenLookupTable(object): """Ordered associative container where tokens are keys. Public properties: case_sensitive (default False): If set to False, all lookups will be converted to lower case. NOTE: Does not affect insertion: case-insensitive grammar should insert operators in lower case. """ _max_len = 1 # Longest match so far. _table = None # Ordered dict keyed on tokens. # This affects only lookups, not insertion. case_sensitive = False def __init__(self, *entries): self._table = collections.OrderedDict() for tokens, entry in entries: self.set(tokens, entry) def set(self, tokens, entry): if isinstance(tokens, Token): tokens = (tokens,) elif isinstance(tokens, tuple): self._max_len = max(self._max_len, len(tokens)) else: raise TypeError( "TokenLookupTable only supports instances of Token or " "tuples thereof for keys. Got %r." % tokens) if tokens in self._table: raise ValueError("Duplicate token key %r for %r." % ( tokens, entry)) self._table[tokens] = entry def _normalize_token(self, token): if (isinstance(token.value, six.string_types) and not self.case_sensitive): return token._replace(value=token.value.lower()) return token def match(self, tokens): # Try to match longest known match first. for match_len in range(self._max_len, 0, -1): needle = tuple((self._normalize_token(t) for t in itertools.islice(tokens, match_len))) result = self._table.get(needle) if result: return result, needle return None, None class OperatorTable(object): """A complete set of operators in a grammar, keyed on their *fix tokens.""" prefix = None infix = None suffix = None by_name = None by_handler = None def __init__(self, *operators): self.prefix = TokenLookupTable() self.infix = TokenLookupTable() self.suffix = TokenLookupTable() self.by_name = dict() self.by_handler = dict() for operator in operators: if operator.name in self.by_name: raise ValueError("Duplicit operator name %r." % operator.name) self.by_name[operator.name] = operator if operator.handler not in self.by_handler: # Multiple operators can have the same handler, in which case # they are probably aliases that mean the same thing. In that # case the first operator "wins" and will likely be what # the formatter for this syntax ends up using as default when # it formats this AST. self.by_handler[operator.handler] = operator # An operator can have multiple components, but it is only indexed # by the first one to prevent ambiguity. if operator.prefix: self.prefix.set(operator.prefix, operator) elif operator.infix: self.infix.set(operator.infix, operator) elif operator.suffix: self.suffix.set(operator.suffix, operator) # Grammar primitives and helpers. (No grammar functions until the end of file.) class TokenMatch(collections.namedtuple( "TokenMatch", "operator value tokens")): """Represents a one or more matching tokens and, optionally, their contents. Arguments: operator: The Operator instance that matched, if any. value: The literal value that matched, if any. tokens: The actual tokens the match consumed. """ @property def start(self): return self.tokens[0].start @property def end(self): return self.tokens[-1].end @property def first(self): return self.tokens[0] def keyword(tokens, expected): """Case-insensitive keyword match.""" try: token = next(iter(tokens)) except StopIteration: return if token and token.name == "symbol" and token.value.lower() == expected: return TokenMatch(None, token.value, (token,)) def multi_keyword(tokens, keyword_parts): """Match a case-insensitive keyword consisting of multiple tokens.""" tokens = iter(tokens) matched_tokens = [] limit = len(keyword_parts) for idx in six.moves.range(limit): try: token = next(tokens) except StopIteration: return if (not token or token.name != "symbol" or token.value.lower() != keyword_parts[idx]): return matched_tokens.append(token) return TokenMatch(None, token.value, matched_tokens) def keywords(tokens, expected): """Match against any of a set/dict of keywords. Not that this doesn't support multi-part keywords. Any multi-part keywords must be special-cased in their grammar function. """ try: token = next(iter(tokens)) except StopIteration: return if token and token.name == "symbol" and token.value.lower() in expected: return TokenMatch(None, token.value, (token,)) def prefix(tokens, operator_table): """Match a prefix of an operator.""" operator, matched_tokens = operator_table.prefix.match(tokens) if operator: return TokenMatch(operator, None, matched_tokens) def infix(tokens, operator_table): """Match an infix of an operator.""" operator, matched_tokens = operator_table.infix.match(tokens) if operator: return TokenMatch(operator, None, matched_tokens) def suffix(tokens, operator_table): """Match a suffix of an operator.""" operator, matched_tokens = operator_table.suffix.match(tokens) if operator: return TokenMatch(operator, None, matched_tokens) def token_name(tokens, expected): """Match a token name (type).""" try: token = next(iter(tokens)) except StopIteration: return if token and token.name == expected: return TokenMatch(None, token.value, (token,)) def match_tokens(expected_tokens): """Generate a grammar function that will match 'expected_tokens' only.""" if isinstance(expected_tokens, Token): # Match a single token. def _grammar_func(tokens): try: next_token = next(iter(tokens)) except StopIteration: return if next_token == expected_tokens: return TokenMatch(None, next_token.value, (next_token,)) elif isinstance(expected_tokens, tuple): # Match multiple tokens. match_len = len(expected_tokens) def _grammar_func(tokens): upcoming = tuple(itertools.islice(tokens, match_len)) if upcoming == expected_tokens: return TokenMatch(None, None, upcoming) else: raise TypeError( "'expected_tokens' must be an instance of Token or a tuple " "thereof. Got %r." % expected_tokens) return _grammar_func # Some common grammar functions: def literal(tokens): return token_name(tokens, "literal") def symbol(tokens): return token_name(tokens, "symbol") def lparen(tokens): return token_name(tokens, "lparen") def rparen(tokens): return token_name(tokens, "rparen") def lbracket(tokens): return token_name(tokens, "lbracket") def rbracket(tokens): return token_name(tokens, "rbracket") def comma(tokens): return token_name(tokens, "comma") efilter-1-1.5/efilter/parsers/common/__init__.py0000664066434000116100000000014112713157120021714 0ustar adamsheng00000000000000"""EFILTER Forensic Query Language This module provides building blocks for other grammars. """ efilter-1-1.5/efilter/parsers/common/ast_transforms.py0000640066434000116100000000476412744402773023246 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ This module contains functions that map consructs in common grammars to the AST. Most constructs in expression grammars map directly to something in the EFILTER AST, but some constructs don't. For example, EFILTER supports a Complement ('NOT') and an Equivalence ('=='), but not a non-Equivalence ('!='), therefore, this module contains a function that simulates a non-Equivalence AST node by transforming it to a Completement of Equivalence. """ __author__ = "Adam Sindelar " from efilter import ast def NormalizeResolve(x, y, **kwargs): if isinstance(y, ast.Var): literal_y = ast.Literal(y.value, start=y.start, end=y.end, source=y.source) else: raise TypeError("Type of RHS must be Var. Got %r." % y) return ast.Resolve(x, literal_y, **kwargs) def ComplementEquivalence(*args, **kwargs): """Change x != y to not(x == y).""" return ast.Complement( ast.Equivalence(*args, **kwargs), **kwargs) def ComplementMembership(*args, **kwargs): """Change (x not in y) to not(x in y).""" return ast.Complement( ast.Membership(*args, **kwargs), **kwargs) def ReverseMembership(x, y, **kwargs): """Change (x contains y) to y in x.""" return ast.Membership(y, x, **kwargs) def ReverseComplementMembership(x, y, **kwargs): """Change (x doesn't contain y) to not(y in x).""" return ast.Complement( ast.Membership(y, x, **kwargs), **kwargs) def ReverseStrictOrderedSet(*args, **kwargs): """Change x < y to y > x.""" return ast.StrictOrderedSet(*reversed(args), **kwargs) def ReversePartialOrderedSet(*args, **kwargs): """Change x <= y to y >= x.""" return ast.PartialOrderedSet(*reversed(args), **kwargs) def NegateValue(*args, **kwargs): """Change -x to (-1 * x).""" return ast.Product( ast.Literal(-1), *args, **kwargs) efilter-1-1.5/efilter/parsers/common/tokenizer.py0000640066434000116100000003271512736453202022202 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ This module implements a reusable expression tokenizer. """ __author__ = "Adam Sindelar " import collections import re import six from efilter import errors from efilter.parsers.common import grammar class Pattern(collections.namedtuple("Pattern", "name states regex action next_state")): """Defines a token pattern for the tokenizer. Arguments: name: The name of the pattern will be used to name the token. states: The pattern will only be applied if we're in one these states. regex: A regular expression to try and match from the current point. A named matched group 'token' will be saved in Token.value. action: The handler to call. next_state: The next state we transition to if this pattern matched. """ def __new__(cls, name, states, regex, action, next_state): return super(Pattern, cls).__new__( cls, name, states, re.compile(regex, re.DOTALL | re.M | re.S | re.U), action, next_state) class LazyTokenizer(object): """Configurable tokenizer usable with most expression grammars. This class is directly usable, and will, by default, produce tokens for string and number literals, parens, brackets, commas and words (symbols). Notes on performance: The runtime complexity grows with the number of patterns (m) and the number of tokens (n) in source. It is O(n*m) in the worst case. The tokenizer is lazy, and uses a deque to cache parsed tokens which haven't been skipped yet. When using 'peek' without 'skip' all tokens have to be cached, and this leads to O(n) memory complexity! Extending the tokenizer: This class is capable of tokenizing most any sane expression language, but can be further extended to (1) yield more specific token names for certain grammar (e.g. distinguishing between symbols and operators), as well as (2) supporting further tokens, such as curly braces. In the majority of cases, adding more patterns will be sufficient. For example, to support curly braces, one would add the following to DEFAULT_PATTERNS: Pattern("lbrace", # Give the token a new name. ("INITIAL",), # Match this only if you're not in a string. r"(?P\\{)", # The regex should match an lbrace, and # capture it in the group named 'token'. "emit", # This will yield a Token(name='lbrace', value='{'). None, # Matching an lbrace doesn't change the state. ), Pattern("rbrace", ("INITIAL",), r"(?P\\})", "emit", None) For more complex use cases, it may be necessary to implement additional actions, which are just instance methods. Take a look at how string literals are implemented (string_start, string_end) for an example. Built-in actions: emit: Will emit a token with the supplied name and value set to whatever the named match group 'token' contains. emit_param: The tokenizer will emit a parse-time parameter for interpolation by a parser. The parameter token can be indexed, keyed on a string, or both. Indexing happens automatically, starting from 0. emit_int: The tokenizer will emit an integer obtained by interpreting the matched substring as an integer in base 10. emit_hex: Same as 'emit_int' but base 16. emit_oct: Same as 'emit_int' but base 8. emit_float: Same as 'emit_int' but emits a base 10 float. string_end: Emits a token with the last matched string. Public interface: next_token: Returns the next token and advances the tokenizer. skip: Skips N tokens ahead, without returning them. peek: Looks ahead over N tokens WITHOUT advancing the tokenizer. This fills up the token lookahead queue with N tokens - avoid supplying large values of N. __iter__: Returns an iterator that doesn't advance the tokenizer ( same as calling peek() with increasing values of N). This can fill up the token queue quickly and should not be the primary interface. Arguments: source: Source string that will be lexed. patterns (optional): Overrides self.DEFAULT_PATTERNS """ # Used if no patterns are supplied to the constructor. Subclasses can # override. DEFAULT_PATTERNS = ( # Parens/brackets and separators. Pattern("lparen", ("INITIAL,"), r"(?P\()", "emit", None), Pattern("rparen", ("INITIAL,"), r"(?P\))", "emit", None), Pattern("lbracket", ("INITIAL,"), r"(?P\[)", "emit", None), Pattern("rbracket", ("INITIAL,"), r"(?P\])", "emit", None), Pattern("comma", ("INITIAL,"), r"(?P,)", "emit", None), # Built-time parameters. Pattern("param", ("INITIAL",), r"\{(?P[a-z_0-9]*)\}", "emit_param", None), Pattern("param", ("INITIAL,"), r"(?P\?)", "emit_param", None), # Numeric literals. Pattern("literal", ("INITIAL,"), r"(?P\d+\.\d+)", "emit_float", None), Pattern("literal", ("INITIAL,"), r"(?P0\d+)", "emit_oct", None), Pattern("literal", ("INITIAL,"), r"(?P0x[0-9a-zA-Z]+)", "emit_hex", None), Pattern("literal", ("INITIAL,"), r"(?P\d+)", "emit_int", None), # String literals. Pattern(None, ("INITIAL",), r"\"", "string_start", "STRING"), Pattern(None, ("INITIAL",), r"'", "string_start", "SQ_STRING"), Pattern("literal", ("STRING",), "\"", "string_end", None), Pattern(None, ("STRING",), r"\\(.)", "string_escape", None), Pattern(None, ("STRING",), r"([^\\\"]+)", "string_append", None), Pattern("literal", ("SQ_STRING",), "'", "string_end", None), Pattern(None, ("SQ_STRING",), r"\\(.)", "string_escape", None), Pattern(None, ("SQ_STRING",), r"([^\\']+)", "string_append", None), # Prefer to match symbols only as far as they go, should they be # followed by a literal with no whitespace in between. Pattern("symbol", ("INITIAL",), r"([a-zA-Z_][\w_]*)", "emit", None), # Special characters are also valid as a symbol, but we won't match them # eagerly so as to not swallow valid literals that follow. Pattern("symbol", ("INITIAL",), r"([-+*\/=~\.><\[\]!:]+)", "emit", None), # Whitespace is ignored. Pattern(None, ("INITIAL",), r"(\s+)", None, None), ) # Ordered instances of Pattern. patterns = None # A deque with tokens that have been parsed, but haven't been skipped yet. lookahead = None # The input string being tokenized. source = None # List of states, as determined by rules in 'patterns'. state_stack = None # The length of 'source'. limit = None # The latest matched literal string. string = None def __init__(self, source, patterns=None): self.source = source self.state_stack = ["INITIAL"] self.current_token = None self._position = 0 self.limit = len(source) self.lookahead = collections.deque() self._param_idx = 0 self.patterns = patterns or self.DEFAULT_PATTERNS # Make sure current_token starts containing something. self.next_token() # API for the parser: @property def position(self): """Returns the logical position (unaffected by lookahead).""" if self.lookahead: return self.lookahead[0].start return self._position def __iter__(self): """Look ahead from current position.""" if self.current_token is not None: yield self.current_token for token in self.lookahead: yield token while True: token = self._parse_next_token() if not token: return self.lookahead.append(token) yield token def peek(self, steps=1): """Look ahead, doesn't affect current_token and next_token.""" try: tokens = iter(self) for _ in six.moves.range(steps): next(tokens) return next(tokens) except StopIteration: return None def skip(self, steps=1): """Skip ahead by 'steps' tokens.""" for _ in six.moves.range(steps): self.next_token() def next_token(self): """Returns the next logical token, advancing the tokenizer.""" if self.lookahead: self.current_token = self.lookahead.popleft() return self.current_token self.current_token = self._parse_next_token() return self.current_token # Implementation: def _pop_state(self, **_): try: self.state_stack.pop() except IndexError: self._error("Pop state called on an empty stack.", self.position) def _parse_next_token(self): """Will parse patterns until it gets to the next token or EOF.""" while self._position < self.limit: token = self._next_pattern() if token: return token return None def _next_pattern(self): """Parses the next pattern by matching each in turn.""" current_state = self.state_stack[-1] position = self._position for pattern in self.patterns: if current_state not in pattern.states: continue m = pattern.regex.match(self.source, position) if not m: continue position = m.end() token = None if pattern.next_state: self.state_stack.append(pattern.next_state) if pattern.action: callback = getattr(self, pattern.action, None) if callback is None: raise RuntimeError( "No method defined for pattern action %s!" % pattern.action) if "token" in m.groups(): value = m.group("token") else: value = m.group(0) token = callback(string=value, match=m, pattern=pattern) self._position = position return token self._error("Don't know how to match next. Did you forget quotes?", start=self._position, end=self._position + 1) def _error(self, message, start, end=None): """Raise a nice error, with the token highlighted.""" raise errors.EfilterParseError( source=self.source, start=start, end=end, message=message) # Actions: def emit(self, string, match, pattern, **_): """Emits a token using the current pattern match and pattern label.""" return grammar.Token(name=pattern.name, value=string, start=match.start(), end=match.end()) def emit_param(self, match, pattern, **_): param_name = match.group(1) if not param_name or param_name == "?": param_name = self._param_idx self._param_idx += 1 elif param_name and re.match(r"^\d+$", param_name): param_name = int(param_name) return grammar.Token(name=pattern.name, value=param_name, start=match.start(), end=match.end()) def emit_int(self, string, match, pattern, **_): return grammar.Token(name=pattern.name, value=int(string), start=match.start(), end=match.end()) def emit_oct(self, string, match, pattern, **_): return grammar.Token(name=pattern.name, value=int(string, 8), start=match.start(), end=match.end()) def emit_hex(self, string, match, pattern, **_): return grammar.Token(name=pattern.name, value=int(string, 16), start=match.start(), end=match.end()) def emit_float(self, string, match, pattern, **_): return grammar.Token(name=pattern.name, value=float(string), start=match.start(), end=match.end()) # String parsing: def string_start(self, match, **_): self.string = "" self.string_position = match.start() def string_escape(self, string, match, **_): if match.group(1) in "'\"rnbt": self.string += string.decode("string_escape") else: self.string += string def string_append(self, string="", **_): self.string += string def string_end(self, pattern, match, **_): self._pop_state() return grammar.Token(name=pattern.name, value=self.string, start=self.string_position, end=match.end()) efilter-1-1.5/efilter/parsers/common/token_stream.py0000664066434000116100000001040712713157120022656 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ This module implements a parser that manages tokenizer output based on rules. """ __author__ = "Adam Sindelar " from efilter import errors from efilter.parsers.common import grammar class TokenStream(object): """Manages and enforces grammar over tokenizer output. Most recursive descent parsers need a mechanism to accept, reject, expect or peek at the next token based on matching loging supplied by grammar functions. This class manages the tokenizer for the parser, and enforces the expectations set by grammar. Arguments: tokenizer: Must support the tokenizer interface (skip and peek). """ tokenizer = None matched = None def __init__(self, tokenizer=None): self.tokenizer = tokenizer def match(self, f, *args): """Match grammar function 'f' against next token and set 'self.matched'. Arguments: f: A grammar function - see efilter.parsers.common.grammar. Must return TokenMatch or None. args: Passed to 'f', if any. Returns: Instance of efilter.parsers.common.grammar.TokenMatch or None. Comment: If a match is returned, it will also be stored in self.matched. """ try: match = f(self.tokenizer, *args) except StopIteration: # The grammar function might have tried to access more tokens than # are available. That's not really an error, it just means it didn't # match. return if match is None: return if not isinstance(match, grammar.TokenMatch): raise TypeError("Invalid grammar function %r returned %r." % (f, match)) self.matched = match return match def accept(self, f, *args): """Like 'match', but consume the token (tokenizer advances.)""" match = self.match(f, *args) if match is None: return self.tokenizer.skip(len(match.tokens)) return match def reject(self, f, *args): """Like 'match', but throw a parse error if 'f' matches. This is useful when a parser wants to be strict about specific things being prohibited. For example, DottySQL bans the use of SQL keywords as variable names. """ match = self.match(f, *args) if match: token = self.peek(0) raise errors.EfilterParseError( query=self.tokenizer.source, token=token, message="Was not expecting a %s here." % token.name) def expect(self, f, *args): """Like 'accept' but throws a parse error if 'f' doesn't match.""" match = self.accept(f, *args) if match: return match try: func_name = f.func_name except AttributeError: func_name = "" start, end = self.current_position() raise errors.EfilterParseError( query=self.tokenizer.source, start=start, end=end, message="Was expecting %s here." % (func_name)) def current_position(self): """Return a tuple of (start, end).""" token = self.tokenizer.peek(0) if token: return token.start, token.end return self.tokenizer.position, self.tokenizer.position + 1 def peek(self, n): """Same as self.tokenizer.peek.""" return self.tokenizer.peek(n) def skip(self, n): """Same as self.tokenizer.skip.""" return self.tokenizer.skip(n) def __iter__(self): """Self as iter(self.tokenizer).""" return iter(self.tokenizer) efilter-1-1.5/efilter/parsers/lisp.py0000664066434000116100000000472512713157120017650 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ Lisp-like EFILTER syntax. This is mostly used in tests, in situations where dotty doesn't make it obvious what the AST is going to look like, and manually setting up expression classes is too verbose. """ __author__ = "Adam Sindelar " from efilter import ast from efilter import syntax EXPRESSIONS = { "var": ast.Var, "!": ast.Complement, "select": ast.Select, "cast": ast.Cast, "isa": ast.IsInstance, "map": ast.Map, "filter": ast.Filter, "reducer": ast.Reducer, "group": ast.Group, "sort": ast.Sort, "any": ast.Any, "each": ast.Each, "in": ast.Membership, "apply": ast.Apply, "repeat": ast.Repeat, "tuple": ast.Tuple, "bind": ast.Bind, "if": ast.IfElse, ":": ast.Pair, ".": ast.Resolve, "|": ast.Union, "&": ast.Intersection, ">": ast.StrictOrderedSet, ">=": ast.PartialOrderedSet, "==": ast.Equivalence, "=~": ast.RegexFilter, "+": ast.Sum, "-": ast.Difference, "*": ast.Product, "/": ast.Quotient, "literal": ast.Literal, } class Parser(syntax.Syntax): """Parses the lisp expression language into the query AST.""" @property def root(self): return self._parse_atom(self.original) def _parse_atom(self, atom): if isinstance(atom, tuple): return self._parse_s_expression(atom) return ast.Literal(atom) def _parse_s_expression(self, atom): car = atom[0] cdr = atom[1:] # Vars are a little special. Don't make the value a Literal. if car == "var": return ast.Var(cdr[0]) # Params are interpolated right away. if car == "param": return ast.Literal(self.params[cdr[0]]) return EXPRESSIONS[car](*[self._parse_atom(a) for a in cdr]) syntax.Syntax.register_parser(Parser, shorthand="lisp") efilter-1-1.5/efilter/parsers/dottysql/0000750066434000116100000000000012762014475020204 5ustar adamsheng00000000000000efilter-1-1.5/efilter/parsers/dottysql/parser.py0000644066434000116100000006567212747672347022112 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ This module implements the DottySQL language. Sketch of the DottySQL grammar follows, in pseudo-EBNF. This is not meant to be correct, by the way - or exhaustive - but to give the reader a sense of what the parser is doing. # Simplified - the actual binary_expressions are parsed using # precedence-climbing. expression = atom | binary_expression . binary_expression = atom { [ infix_operator atom ] } | atom { [ mixfix_operator expression suffix ] } atom = [ prefix ] ( select_expression | any_expression | func_application | let_expr | var | literal | list | "(" expression ["," expression ] ")" ). list = "[" literal [ { "," literal } ] "]" . let_expr = "let" var "=" expression [ "," var "=" expression ] expression . select_expression = "select" ("*" | "any" | what_expression ) from_expression . what_expression = expression ["as" var ] { "," expression ["as" var ] } . from_expression = expression [ ( where_expression | order_expression ) ] . where_expression = expression [ order_expression ] . order_expression = expression [ ( "asc" | "desc" ) ] . any_expression = [ "select" ] "any" [ "from" ] expression . func_application = var "(" [ expression [ { "," expression } ] ] ")" . # infix, prefix, literal and var should be obvious. """ __author__ = "Adam Sindelar " import itertools import six from efilter import ast from efilter import errors from efilter import syntax from efilter.parsers.dottysql import grammar from efilter.parsers.common import grammar as common_grammar from efilter.parsers.common import tokenizer from efilter.parsers.common import token_stream class Parser(syntax.Syntax): """Parses DottySQL and produces an efilter AST. This is a basic recursive descend parser that handles infix expressions by precedence climbing. """ last_match = common_grammar.TokenMatch(None, None, None) last_param = 0 tokens = None def __init__(self, original, params=None): super(Parser, self).__init__(original) self.tokens = token_stream.TokenStream( tokenizer.LazyTokenizer(self.original)) if isinstance(params, list): self.params = {} for idx, val in enumerate(params): self.params[idx] = val elif isinstance(params, dict): self.params = params elif params is None: self.params = {} else: raise TypeError("Params must be a list or a dict, not %r." % type(params)) def parse(self): # If we get any exceptions, make sure they have access to the query # source code. try: result = self.expression() except errors.EfilterError as e: e.query = self.original raise if self.tokens.peek(0): token = self.tokens.peek(0) return self.error( "Unexpected %s '%s'. Were you looking for an operator?" % (token.name, token.value), token) return result @property def root(self): return self.parse() def error(self, message=None, start_token=None, end_token=None): start = self.tokens.tokenizer.position end = start + 20 if start_token: start = start_token.start end = start_token.end if end_token: end = end_token.end raise errors.EfilterParseError( query=self.original, start=start, end=end, message=message, token=start_token) # Recursive grammar. def expression(self, previous_precedence=0): """An expression is an atom or an infix expression. Grammar (sort of, actually a precedence-climbing parser): expression = atom [ binary_operator expression ] . Args: previous_precedence: What operator precedence should we start with? """ lhs = self.atom() return self.operator(lhs, previous_precedence) def atom(self): """Parse an atom, which is most things. Grammar: atom = [ prefix ] ( select_expression | any_expression | func_application | let_expr | var | literal | list | "(" expression ")" ) . """ # Parameter replacement with literals. if self.tokens.accept(grammar.param): return self.param() # Let expressions (let(x = 5, y = 10) x + y) if self.tokens.accept(grammar.let): return self.let() # At the top level, we try to see if we are recursing into an SQL query. if self.tokens.accept(grammar.select): return self.select() # A SELECT query can also start with 'ANY'. if self.tokens.accept(grammar.select_any): return self.select_any() # Explicitly reject any keywords from SQL other than SELECT and ANY. # If we don't do this they will match as valid symbols (variables) # and that might be confusing to the user. self.tokens.reject(grammar.sql_keyword) # Match if-else before other things that consume symbols. if self.tokens.accept(grammar.if_if): return self.if_if() # Operators must be matched first because the same symbols could also # be vars or applications. if self.tokens.accept(grammar.prefix): operator = self.tokens.matched.operator start = self.tokens.matched.start expr = self.expression(operator.precedence) return operator.handler(expr, start=start, end=expr.end, source=self.original) if self.tokens.accept(grammar.literal): return ast.Literal(self.tokens.matched.value, source=self.original, start=self.tokens.matched.start, end=self.tokens.matched.end) # Match builtin pseudo-functions before functions and vars to prevent # overrides. if self.tokens.accept(grammar.builtin): return self.builtin(self.tokens.matched.value) # Match applications before vars, because obviously. if self.tokens.accept(grammar.application): return self.application( ast.Var(self.tokens.matched.value, source=self.original, start=self.tokens.matched.start, end=self.tokens.matched.first.end)) if self.tokens.accept(common_grammar.symbol): return ast.Var(self.tokens.matched.value, source=self.original, start=self.tokens.matched.start, end=self.tokens.matched.end) if self.tokens.accept(common_grammar.lparen): # Parens will contain one or more expressions. If there are several # expressions, separated by commas, then they are a repeated value. # # Unlike lists, repeated values must all be of the same type, # otherwise evaluation of the query will fail at runtime (or # type-check time, for simple cases.) start = self.tokens.matched.start expressions = [self.expression()] while self.tokens.accept(common_grammar.comma): expressions.append(self.expression()) self.tokens.expect(common_grammar.rparen) if len(expressions) == 1: return expressions[0] else: return ast.Repeat(*expressions, source=self.original, start=start, end=self.tokens.matched.end) if self.tokens.accept(common_grammar.lbracket): return self.list() # We've run out of things we know the next atom could be. If there is # still input left then it's illegal syntax. If there is nothing then # the input cuts off when we still need an atom. Either is an error. if self.tokens.peek(0): return self.error( "Was not expecting %r here." % self.tokens.peek(0).name, start_token=self.tokens.peek(0)) else: return self.error("Unexpected end of input.") def let(self): saved_start = self.tokens.matched.start expect_rparens = 0 while self.tokens.accept(common_grammar.lparen): expect_rparens += 1 bindings = [] while True: symbol = self.tokens.expect(common_grammar.symbol) binding = ast.Literal(symbol.value, start=symbol.start, end=symbol.end, source=self.original) self.tokens.expect(grammar.let_assign) value = self.expression() bindings.append(ast.Pair(binding, value, start=binding.start, end=value.end, source=self.original)) if not self.tokens.accept(common_grammar.comma): break bind = ast.Bind(*bindings, start=bindings[0].start, end=bindings[-1].end, source=self.original) while expect_rparens: self.tokens.expect(common_grammar.rparen) expect_rparens -= 1 nested_expression = self.expression() return ast.Let(bind, nested_expression, start=saved_start, end=nested_expression.end, source=self.original) def param(self): if self.tokens.matched.value is None: param = self.last_param self.last_param += 1 elif isinstance(self.tokens.matched.value, int): param = self.last_param = self.tokens.matched.value elif isinstance(self.tokens.matched.value, six.string_types): param = self.tokens.matched.value else: return self.error( "Invalid param %r." % self.tokens.matched.value, start_token=self.tokens.matched.first) if param not in self.params: return self.error( "Param %r unavailable. (Available: %r)" % (param, self.params), start_token=self.tokens.matched.first) return ast.Literal(self.params[param], start=self.tokens.matched.start, end=self.tokens.matched.end, source=self.original) def accept_operator(self, precedence): """Accept the next binary operator only if it's of higher precedence.""" match = grammar.infix(self.tokens) if not match: return if match.operator.precedence < precedence: return # The next thing is an operator that we want. Now match it for real. return self.tokens.accept(grammar.infix) def operator(self, lhs, min_precedence): """Climb operator precedence as long as there are operators. This function implements a basic precedence climbing parser to deal with binary operators in a sane fashion. The outer loop will keep spinning as long as the next token is an operator with a precedence of at least 'min_precedence', parsing operands as atoms (which, in turn, recurse into 'expression' which recurses back into 'operator'). This supports both left- and right-associativity. The only part of the code that's not a regular precedence-climber deals with mixfix operators. A mixfix operator in DottySQL consists of an infix part and a suffix (they are still binary, they just have a terminator). """ # Spin as long as the next token is an operator of higher # precedence. (This may not do anything, which is fine.) while self.accept_operator(precedence=min_precedence): operator = self.tokens.matched.operator # If we're parsing a mixfix operator we can keep going until # the suffix. if operator.suffix: rhs = self.expression() self.tokens.expect(common_grammar.match_tokens(operator.suffix)) rhs.end = self.tokens.matched.end elif operator.name == ".": # The dot operator changes the meaning of RHS. rhs = self.dot_rhs() else: # The right hand side is an atom, which might turn out to be # an expression. Isn't recursion exciting? rhs = self.atom() # Keep going as long as the next token is an infix operator of # higher precedence. next_min_precedence = operator.precedence if operator.assoc == "left": next_min_precedence += 1 while self.tokens.match(grammar.infix): if (self.tokens.matched.operator.precedence < next_min_precedence): break rhs = self.operator(rhs, self.tokens.matched.operator.precedence) lhs = operator.handler(lhs, rhs, start=lhs.start, end=rhs.end, source=self.original) return lhs def dot_rhs(self): """Match the right-hand side of a dot (.) operator. The RHS must be a symbol token, but it is interpreted as a literal string (because that's what goes in the AST of Resolve.) """ self.tokens.expect(common_grammar.symbol) return ast.Literal(self.tokens.matched.value, start=self.tokens.matched.start, end=self.tokens.matched.end, source=self.original) # SQL subgrammar: def select(self): """First part of an SQL query.""" # Try to match the asterisk, any or list of vars. if self.tokens.accept(grammar.select_any): return self.select_any() if self.tokens.accept(grammar.select_all): # The FROM after SELECT * is required. self.tokens.expect(grammar.select_from) return self.select_from() return self.select_what() def select_any(self): saved_match = self.tokens.matched # Any can be either a start of a pseudosql query or the any builtin. if self.tokens.match(common_grammar.lparen): self.tokens.matched = saved_match # The paren means we're calling 'any(...)' - the builtin. return self.builtin(self.tokens.matched.value) # An optional FROM can go after ANY. # "SELECT ANY FROM", "ANY FROM", "SELECT ANY" and just "ANY" all mean # the exact same thing. The full form of SELECT ANY FROM is preferred # but the shorthand is very useful for writing boolean indicators and # so it's worth allowing it. start = self.tokens.matched.start self.tokens.accept(grammar.select_from) source_expression = self.expression() if self.tokens.accept(grammar.select_where): map_expression = self.expression() else: map_expression = None # ORDER after ANY doesn't make any sense. self.tokens.reject(grammar.select_order) if map_expression: return ast.Any(source_expression, map_expression, start=start, end=map_expression.end, source=self.original) return ast.Any(source_expression, start=start, end=self.tokens.matched.end, source=self.original) def _guess_name_of(self, expr): """Tries to guess what variable name 'expr' ends in. This is a heuristic that roughly emulates what most SQL databases name columns, based on selected variable names or applied functions. """ if isinstance(expr, ast.Var): return expr.value if isinstance(expr, ast.Resolve): # We know the RHS of resolve is a Literal because that's what # Parser.dot_rhs does. return expr.rhs.value if isinstance(expr, ast.Select) and isinstance(expr.rhs, ast.Literal): name = self._guess_name_of(expr.lhs) if name is not None: return "%s_%s" % (name, expr.rhs.value) if isinstance(expr, ast.Apply) and isinstance(expr.func, ast.Var): return expr.func.value def select_what(self): # Each value we select is in form EXPRESSION [AS SYMBOL]. Values are # separated by commas. start = self.tokens.matched.start used_names = set() # Keeps track of named values to prevent duplicates. vars = [] for idx in itertools.count(): value_expression = self.expression() if self.tokens.accept(grammar.select_as): # If there's an AS then we have an explicit name for this value. self.tokens.expect(common_grammar.symbol) if self.tokens.matched.value in used_names: return self.error( "Duplicate 'AS' name %r." % self.tokens.matched.value) key_expression = ast.Literal(self.tokens.matched.value, start=self.tokens.matched.start, end=self.tokens.matched.end, source=self.original) used_names.add(self.tokens.matched.value) else: # Try to guess the appropriate name of the column based on what # the expression is. name = self._guess_name_of(value_expression) if not name or name in used_names: # Give up and just use the current idx for key. name = "column_%d" % (idx,) else: used_names.add(name) key_expression = ast.Literal(name) end = key_expression.end or value_expression.end vars.append(ast.Pair(key_expression, value_expression, start=value_expression.start, end=end, source=self.original)) if self.tokens.accept(grammar.select_from): # Make ast.Bind here. source_expression = self.select_from() return ast.Map( source_expression, ast.Bind(*vars, start=start, end=vars[-1].end, source=self.original), start=start, end=self.tokens.matched.end, source=self.original) self.tokens.expect(common_grammar.comma) def select_from(self): source_expression = self.expression() if self.tokens.accept(grammar.select_where): return self.select_where(source_expression) if self.tokens.accept(grammar.select_order): return self.select_order(source_expression) if self.tokens.accept(grammar.select_limit): return self.select_limit(source_expression) return source_expression def select_where(self, source_expression): start = self.tokens.matched.start filter_expression = ast.Filter(source_expression, self.expression(), start=start, end=self.tokens.matched.end, source=self.original) if self.tokens.accept(grammar.select_order): return self.select_order(filter_expression) if self.tokens.accept(grammar.select_limit): return self.select_limit(filter_expression) return filter_expression def select_order(self, source_expression): start = self.tokens.matched.start sort_expression = ast.Sort(source_expression, self.expression(), start=start, end=self.tokens.matched.end, source=self.original) if self.tokens.accept(grammar.select_asc): sort_expression.end = self.tokens.matched.end return sort_expression if self.tokens.accept(grammar.select_desc): # Descending sort uses the stdlib function 'reverse' on the sorted # results. Standard library's core functions should ALWAYS be # available. sort_expression = ast.Apply( ast.Var("reverse", start=sort_expression.start, end=self.tokens.matched.end, source=self.original), sort_expression, start=sort_expression.start, end=self.tokens.matched.end, source=self.original) if self.tokens.accept(grammar.select_limit): return self.select_limit(sort_expression) if self.tokens.accept(grammar.select_limit): return self.select_limit(sort_expression) return sort_expression def select_limit(self, source_expression): """Match LIMIT take [OFFSET drop].""" start = self.tokens.matched.start # The expression right after LIMIT is the count to take. limit_count_expression = self.expression() # Optional OFFSET follows. if self.tokens.accept(grammar.select_offset): offset_start = self.tokens.matched.start offset_end = self.tokens.matched.end # Next thing is the count to drop. offset_count_expression = self.expression() # We have a new source expression, which is drop(count, original). offset_source_expression = ast.Apply( ast.Var("drop", start=offset_start, end=offset_end, source=self.original), offset_count_expression, source_expression, start=offset_start, end=offset_count_expression.end, source=self.original) # Drop before taking, because obviously. source_expression = offset_source_expression limit_expression = ast.Apply( ast.Var("take", start=start, end=limit_count_expression.end, source=self.original), limit_count_expression, source_expression, start=start, end=self.tokens.matched.end, source=self.original) return limit_expression # Builtin pseudo-function application subgrammar. def builtin(self, keyword): """Parse the pseudo-function application subgrammar.""" # The match includes the lparen token, so the keyword is just the first # token in the match, not the whole thing. keyword_start = self.tokens.matched.first.start keyword_end = self.tokens.matched.first.end self.tokens.expect(common_grammar.lparen) if self.tokens.matched.start != keyword_end: return self.error( "No whitespace allowed between function and lparen.", start_token=self.tokens.matched.first) expr_type = grammar.BUILTINS[keyword.lower()] arguments = [self.expression()] while self.tokens.accept(common_grammar.comma): arguments.append(self.expression()) self.tokens.expect(common_grammar.rparen) if expr_type.arity and expr_type.arity != len(arguments): return self.error( "%s expects %d arguments, but was passed %d." % ( keyword, expr_type.arity, len(arguments)), start_token=self.tokens.matched.first) return expr_type(*arguments, start=keyword_start, end=self.tokens.matched.end, source=self.original) # If-else if-else grammar. def if_if(self): start = self.tokens.matched.start # Even-numbered children are conditions; odd-numbered are results. # Last child is the else expression. children = [self.expression()] self.tokens.expect(grammar.if_then) children.append(self.expression()) while self.tokens.accept(grammar.if_else_if): children.append(self.expression()) self.tokens.expect(grammar.if_then) children.append(self.expression()) if self.tokens.accept(grammar.if_else): children.append(self.expression()) else: children.append(ast.Literal(None)) return ast.IfElse(*children, start=start, end=self.tokens.matched.end, source=self.original) # Function application subgrammar. def application(self, func): """Parse the function application subgrammar. Function application can, conceptually, be thought of as a mixfix operator, similar to the way array subscripting works. However, it is not clear at this point whether we want to allow it to work as such, because doing so would permit queries to, at runtime, select methods out of an arbitrary object and then call them. While there is a function whitelist and preventing this sort of thing in the syntax isn't a security feature, it still seems like the syntax should make it clear what the intended use of application is. If we later decide to extend DottySQL to allow function application over an arbitrary LHS expression then that syntax would be a strict superset of the current syntax and backwards compatible. """ start = self.tokens.matched.start if self.tokens.accept(common_grammar.rparen): # That was easy. return ast.Apply(func, start=start, end=self.tokens.matched.end, source=self.original) arguments = [self.expression()] while self.tokens.accept(common_grammar.comma): arguments.append(self.expression()) self.tokens.expect(common_grammar.rparen) return ast.Apply(func, *arguments, start=start, end=self.tokens.matched.end, source=self.original) # Tuple grammar. def list(self): """Parse a list (tuple) which can contain any combination of types.""" start = self.tokens.matched.start if self.tokens.accept(common_grammar.rbracket): return ast.Tuple(start=start, end=self.tokens.matched.end, source=self.original) elements = [self.expression()] while self.tokens.accept(common_grammar.comma): elements.append(self.expression()) self.tokens.expect(common_grammar.rbracket) return ast.Tuple(*elements, start=start, end=self.tokens.matched.end, source=self.original) syntax.Syntax.register_parser(Parser, shorthand="dottysql") efilter-1-1.5/efilter/parsers/dottysql/grammar.py0000640066434000116100000002447112746111034022204 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ This module implements the DottySQL grammar (on tokens, not on a query string). """ __author__ = "Adam Sindelar " from efilter import ast from efilter import errors from efilter.parsers.common import ast_transforms as transforms from efilter.parsers.common import grammar as common # DottySQL's operator table. The parser only supports pure prefix and infix # operators, as well as infix operators that have a suffix (like x[y]). # # Circumfix and pure suffix operators can be declared, but won't do anything. OPERATORS = common.OperatorTable( # Infix operators: common.Operator(name="or", precedence=0, assoc="left", handler=ast.Union, docstring="Logical OR.", prefix=None, suffix=None, infix=common.Token("symbol", "or")), common.Operator(name="and", precedence=1, assoc="left", handler=ast.Intersection, docstring="Logical AND.", prefix=None, suffix=None, infix=common.Token("symbol", "and")), common.Operator(name="==", precedence=3, assoc="left", handler=ast.Equivalence, docstring="Equivalence.", prefix=None, suffix=None, infix=common.Token("symbol", "==")), common.Operator(name="=", precedence=3, assoc="left", handler=ast.Equivalence, docstring="Equivalence (same as '==').", prefix=None, suffix=None, infix=common.Token("symbol", "=")), common.Operator(name="=~", precedence=3, assoc="left", handler=ast.RegexFilter, docstring="Left-hand operand where regex.", prefix=None, suffix=None, infix=common.Token("symbol", "=~")), common.Operator(name="!=", precedence=3, assoc="left", handler=transforms.ComplementEquivalence, docstring="Inequivalence (same as not (x == y)).", prefix=None, suffix=None, infix=common.Token("symbol", "!=")), common.Operator(name="not in", precedence=3, assoc="left", handler=transforms.ComplementMembership, docstring="Left-hand operand is not in list.", prefix=None, suffix=None, infix=(common.Token("symbol", "not"), common.Token("symbol", "in"))), common.Operator(name="in", precedence=3, assoc="left", handler=ast.Membership, docstring="Left-hand operand is in list.", prefix=None, suffix=None, infix=common.Token("symbol", "in")), common.Operator(name="isa", precedence=3, assoc="left", handler=ast.IsInstance, docstring="LHS must be instance of RHS.", prefix=None, suffix=None, infix=common.Token("symbol", "isa")), common.Operator(name=">=", precedence=3, assoc="left", handler=ast.PartialOrderedSet, docstring="Equal-or-greater-than.", prefix=None, suffix=None, infix=common.Token("symbol", ">=")), common.Operator(name="<=", precedence=3, assoc="left", handler=transforms.ReversePartialOrderedSet, docstring="Equal-or-less-than.", prefix=None, suffix=None, infix=common.Token("symbol", "<=")), common.Operator(name=">", precedence=3, assoc="left", handler=ast.StrictOrderedSet, docstring="Greater-than.", prefix=None, suffix=None, infix=common.Token("symbol", ">")), common.Operator(name="<", precedence=3, assoc="left", handler=transforms.ReverseStrictOrderedSet, docstring="Less-than.", prefix=None, suffix=None, infix=common.Token("symbol", "<")), common.Operator(name="+", precedence=4, assoc="left", handler=ast.Sum, docstring="Arithmetic addition.", prefix=None, suffix=None, infix=common.Token("symbol", "+")), common.Operator(name="-", precedence=4, assoc="left", handler=ast.Difference, docstring="Arithmetic subtraction.", prefix=None, suffix=None, infix=common.Token("symbol", "-")), common.Operator(name="*", precedence=6, assoc="left", handler=ast.Product, docstring="Arithmetic multiplication.", prefix=None, suffix=None, infix=common.Token("symbol", "*")), common.Operator(name="/", precedence=6, assoc="left", handler=ast.Quotient, docstring="Arithmetic division.", prefix=None, suffix=None, infix=common.Token("symbol", "/")), common.Operator(name=":", precedence=10, assoc="left", handler=ast.Pair, docstring="Key/value pair.", prefix=None, suffix=None, infix=common.Token("symbol", ":")), common.Operator(name=".", precedence=12, assoc="left", handler=ast.Resolve, docstring="OBJ.MEMBER -> return MEMBER of OBJ.", prefix=None, suffix=None, infix=common.Token("symbol", ".")), # Mixfix: common.Operator(name="[]", precedence=12, assoc="left", handler=ast.Select, docstring="Array subscript.", prefix=None, infix=common.Token("lbracket", "["), suffix=common.Token("rbracket", "]")), common.Operator(name="()", precedence=11, assoc="left", handler=ast.Apply, docstring="Function application.", prefix=None, infix=common.Token("lparen", "("), suffix=common.Token("rparen", ")")), # Prefix: common.Operator(name="not", precedence=6, assoc="right", handler=ast.Complement, docstring="Logical NOT.", suffix=None, infix=None, prefix=common.Token("symbol", "not")), common.Operator(name="unary -", precedence=5, assoc="right", handler=transforms.NegateValue, docstring="Unary -.", infix=None, suffix=None, prefix=common.Token("symbol", "-")), ) # These keywords are not allowed outside of the SELECT expression. They are not # the full list of SQL keywords (for example LIMIT and OFFSET are not included), # just ones that will be rejected by the parser unless they follow in proper # order after SELECT. SQL_RESERVED_KEYWORDS = frozenset([ "SELECT", "FROM", "ANY", "WHERE", "DESC", "ASC", "ORDER BY", ]) # Builtin pseudo-functions which cannot be overriden. BUILTINS = { "map": ast.Map, "sort": ast.Sort, "filter": ast.Filter, "bind": ast.Bind, "any": ast.Any, "each": ast.Each, "cast": ast.Cast } # Additional grammar used by the parser. def bool_literal(tokens): match = common.keyword(tokens, "true") if match: return match._replace(value=True) match = common.keyword(tokens, "false") if match: return match._replace(value=False) def literal(tokens): return bool_literal(tokens) or common.literal(tokens) def prefix(tokens): return common.prefix(tokens, OPERATORS) def infix(tokens): return common.infix(tokens, OPERATORS) def param(tokens): return common.token_name(tokens, "param") def builtin(tokens): """Matches a call to a builtin pseudo-function (like map or sort).""" return common.keywords(tokens, BUILTINS) def let(tokens): """Matches a let expression.""" return common.keyword(tokens, "let") def let_assign(tokens): """Matches a '=' in the let expression.""" return common.keyword(tokens, "=") def application(tokens): """Matches function call (application).""" tokens = iter(tokens) func = next(tokens) paren = next(tokens) if func and func.name == "symbol" and paren.name == "lparen": # We would be able to unambiguously parse function application with # whitespace between the function name and the lparen, but let's not # do that because it's unexpected in most languages. if func.end != paren.start: raise errors.EfilterParseError( start=func.start, end=paren.end, message="No whitespace allowed between function and paren.") return common.TokenMatch(None, func.value, (func, paren)) def if_if(tokens): """Matches an if-else block.""" return common.keyword(tokens, "if") def if_then(tokens): return common.keyword(tokens, "then") def if_else_if(tokens): return common.multi_keyword(tokens, ("else", "if")) def if_else(tokens): return common.keyword(tokens, "else") # SQL subgrammar: def select(tokens): return common.keyword(tokens, "select") def select_any(tokens): return common.keyword(tokens, "any") def select_all(tokens): return common.keyword(tokens, "*") def select_as(tokens): return common.keyword(tokens, "as") def select_from(tokens): return common.keyword(tokens, "from") def select_where(tokens): return common.keyword(tokens, "where") def select_limit(tokens): return common.keyword(tokens, "limit") def select_offset(tokens): return common.keyword(tokens, "offset") def select_order(tokens): return common.multi_keyword(tokens, ("order", "by")) def select_asc(tokens): return common.keyword(tokens, "asc") def select_desc(tokens): return common.keyword(tokens, "desc") def sql_keyword(tokens): return (common.keywords(tokens, SQL_RESERVED_KEYWORDS) or common.multi_keyword(tokens, ("order", "by"))) efilter-1-1.5/efilter/parsers/dottysql/__init__.py0000664066434000116100000000030212713157120022306 0ustar adamsheng00000000000000"""EFILTER Forensic Query Language This module implements the DottySQL language, which is an extension of the original Dotty syntax of EFILTER. """ from efilter.parsers.dottysql import parser efilter-1-1.5/efilter/api.py0000664066434000116100000002164612713157120015774 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ EFILTER convenience API. """ __author__ = "Adam Sindelar " from efilter import query as q from efilter import scope from efilter.protocols import repeated from efilter.transforms import solve from efilter.transforms import infer_type from efilter.stdlib import core as std_core def apply(query, replacements=None, vars=None, allow_io=False, libs=("stdcore", "stdmath")): """Run 'query' on 'vars' and return the result(s). Arguments: query: A query object or string with the query. replacements: Built-time parameters to the query, either as dict or as an array (for positional interpolation). vars: The variables to be supplied to the query solver. allow_io: (Default: False) Include 'stdio' and allow IO functions. libs: Iterable of library modules to include, given as strings. Default: ('stdcore', 'stdmath') For full list of bundled libraries, see efilter.stdlib. Note: 'stdcore' must always be included. WARNING: Including 'stdio' must be done in conjunction with 'allow_io'. This is to make enabling IO explicit. 'allow_io' implies that 'stdio' should be included and so adding it to libs is actually not required. Notes on IO: If allow_io is set to True then 'stdio' will be included and the EFILTER query will be allowed to read files from disk. Use this with caution. If the query returns a lazily-evaluated result that depends on reading from a file (for example, filtering a CSV file) then the file descriptor will remain open until the returned result is deallocated. The caller is responsible for releasing the result when it's no longer needed. Returns: The result of evaluating the query. The type of the output will depend on the query, and can be predicted using 'infer' (provided reflection callbacks are implemented). In the common case of a SELECT query the return value will be an iterable of filtered data (actually an object implementing IRepeated, as well as __iter__.) A word on cardinality of the return value: Types in EFILTER always refer to a scalar. If apply returns more than one value, the type returned by 'infer' will refer to the type of the value inside the returned container. If you're unsure whether your query returns one or more values (rows), use the 'getvalues' function. Raises: efilter.errors.EfilterError if there are issues with the query. Examples: apply("5 + 5") # -> 10 apply("SELECT * FROM people WHERE age > 10", vars={"people":({"age": 10, "name": "Bob"}, {"age": 20, "name": "Alice"}, {"age": 30, "name": "Eve"})) # This will replace the question mark (?) with the string "Bob" in a # safe manner, preventing SQL injection. apply("SELECT * FROM people WHERE name = ?", replacements=["Bob"], ...) """ if vars is None: vars = {} if allow_io: libs = list(libs) libs.append("stdio") query = q.Query(query, params=replacements) stdcore_included = False for lib in libs: if lib == "stdcore": stdcore_included = True # 'solve' always includes this automatically - we don't have a say # in the matter. continue if lib == "stdio" and not allow_io: raise ValueError("Attempting to include 'stdio' but IO not " "enabled. Pass allow_io=True.") module = std_core.LibraryModule.ALL_MODULES.get(lib) if not lib: raise ValueError("There is no standard library module %r." % lib) vars = scope.ScopeStack(module, vars) if not stdcore_included: raise ValueError("EFILTER cannot work without standard lib 'stdcore'.") results = solve.solve(query, vars).value return results def getvalues(result): """Return an iterator of results of 'apply'. The 'apply' function can return one or more values, depending on the query. If you are unsure whether your query evaluates to a scalar or a collection of scalars, 'getvalues' will always return an iterator with one or more elements. Arguments: result: Anything. If it's an instance of IRepeated, all values will be returned. Returns: An iterator of at least one element. """ return repeated.getvalues(result) def user_func(func, arg_types=None, return_type=None): """Create an EFILTER-callable version of function 'func'. As a security precaution, EFILTER will not execute Python callables unless they implement the IApplicative protocol. There is a perfectly good implementation of this protocol in the standard library and user functions can inherit from it. This will declare a subclass of the standard library TypedFunction and return an instance of it that EFILTER will happily call. Arguments: func: A Python callable that will serve as the implementation. arg_types (optional): A tuple of argument types. If the function takes keyword arguments, they must still have a defined order. return_type (optional): The type the function returns. Returns: An instance of a custom subclass of efilter.stdlib.core.TypedFunction. Examples: def my_callback(tag): print("I got %r" % tag) api.apply("if True then my_callback('Hello World!')", vars={ "my_callback": api.user_func(my_callback) }) # This should print "I got 'Hello World!'". """ class UserFunction(std_core.TypedFunction): name = func.__name__ def __call__(self, *args, **kwargs): return func(*args, **kwargs) @classmethod def reflect_static_args(cls): return arg_types @classmethod def reflect_static_return(cls): return return_type return UserFunction() def infer(query, replacements=None, root_type=None, libs=("stdcore", "stdmath")): """Determine the type of the query's output without actually running it. Arguments: query: A query object or string with the query. replacements: Built-time parameters to the query, either as dict or as an array (for positional interpolation). root_type: The types of variables to be supplied to the query inference. libs: What standard libraries should be taken into account for the inference. Returns: The type of the query's output, if it can be determined. If undecidable, returns efilter.protocol.AnyType. NOTE: The inference returns the type of a row in the results, not of the actual Python object returned by 'apply'. For example, if a query returns multiple rows, each one of which is an integer, the type of the output is considered to be int, not a collection of rows. Examples: infer("5 + 5") # -> INumber infer("SELECT * FROM people WHERE age > 10") # -> AnyType # If root_type implements the IStructured reflection API: infer("SELECT * FROM people WHERE age > 10", root_type=...) # -> dict """ # Always make the scope stack start with stdcore. if root_type: type_scope = scope.ScopeStack(std_core.MODULE, root_type) else: type_scope = scope.ScopeStack(std_core.MODULE) stdcore_included = False for lib in libs: if lib == "stdcore": stdcore_included = True continue module = std_core.LibraryModule.ALL_MODULES.get(lib) if not module: raise TypeError("No standard library module %r." % lib) type_scope = scope.ScopeStack(module, type_scope) if not stdcore_included: raise TypeError("'stdcore' must always be included.") query = q.Query(query, params=replacements) return infer_type.infer_type(query, type_scope) def search(query, data, replacements=None): """Yield objects from 'data' that match the 'query'.""" query = q.Query(query, params=replacements) for entry in data: if solve.solve(query, entry).value: yield entry efilter-1-1.5/efilter/protocols/0000750066434000116100000000000012762014475016666 5ustar adamsheng00000000000000efilter-1-1.5/efilter/protocols/repeated.py0000664066434000116100000001265712720562422021045 0ustar adamsheng00000000000000# -*- coding: utf-8 -*- # EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """EFILTER abstract type system. The repeated protocol concerns itself with variables that have more than one value, such as repeated fields on protocol buffers. """ from efilter import dispatch from efilter import protocol from efilter.protocols import counted from efilter.protocols import eq from efilter.protocols import ordered # Declarations: # pylint: disable=unused-argument @dispatch.multimethod def repeated(first_value, *values): """Build a repeated variable from values, all of which are the same type. Repeated values usually [1] preserve order and always allow a single value to appear more than once. Order of repeated values is NOT significant even when it is preserved. Any repeated values passed to this function will be flattened (repeated values do not nest). If you pass a repeated value in the arguments its value type (as determined by IRepeated.value_type) must be the same as the type of the other arguments. 1: Order is always preserved for repetead values created with 'repeated' or 'meld' but not for repeated values created with other functions. """ raise NotImplementedError() def meld(*values): """Return the repeated value, or the first value if there's only one. This is a convenience function, equivalent to calling getvalue(repeated(x)) to get x. This function skips over instances of None in values (None is not allowed in repeated variables). Examples: meld("foo", "bar") # => ListRepetition("foo", "bar") meld("foo", "foo") # => ListRepetition("foo", "foo") meld("foo", None) # => "foo" meld(None) # => None """ values = [x for x in values if x is not None] if not values: return None result = repeated(*values) if isrepeating(result): return result return getvalue(result) @dispatch.multimethod def lazy(generator_func): """Return a lazy repeated value of 'generator_func', which must be stable. For large datasets, it's useful to use lazy repeated values, because they avoid storing all the values of the repetition in memory. EFILTER ships a default implementation of this multimethod, found in efilter.ext.lazy_repetition. Arguments: generator_func: A function that returns a generator of the values that constitute this repeated value. IMPORTANT: This function MUST be stable, meaning the values in the generator MUST be the same each time the function is called. """ raise NotImplementedError() @dispatch.multimethod def lines(fd): """Return a lazy repeated value of lines in 'fd' which is a File object. EFILTER ships a default implementation of this multimethod, found in efilter.ext.line_reader. Argument: fd: A File object that represents a text file. """ raise NotImplementedError() @dispatch.multimethod def getvalues(x): """Return a collection of the values of x.""" raise NotImplementedError() def getvalue(x): """Return the single value of x or raise TypError if more than one value.""" if isrepeating(x): raise TypeError( "Ambiguous call to getvalue for %r which has more than one value." % x) for value in getvalues(x): return value @dispatch.multimethod def value_type(x): """Return the type (class) of the values of x.""" raise NotImplementedError() @dispatch.multimethod def value_eq(x, y): """Sorted comparison between the values in x and y.""" raise NotImplementedError() @dispatch.multimethod def value_apply(x, f): """Apply f to each value of x and return a new repeated var of results.""" raise NotImplementedError() @dispatch.multimethod def isrepeating(x): """Optional: Is x a repeated var AND does it have more than one value?""" return isinstance(x, IRepeated) and counted.count(x) > 1 class IRepeated(protocol.Protocol): _required_functions = (getvalues, value_type, value_eq, value_apply) _optional_functions = (isrepeating,) def _scalar_value_eq(x, y): if isrepeating(y): return False return eq.eq(x, getvalue(y)) # If you're repeated, you automatically implement ICounted. counted.ICounted.implement( for_type=IRepeated, implementations={ counted.count: lambda r: len(getvalues(r)) } ) # Repeated values should sort as a tuple of themselves. ordered.IOrdered.implement( for_type=IRepeated, implementations={ ordered.assortkey: getvalues } ) # Implementation for scalars: # pylint: disable=unnecessary-lambda IRepeated.implement( for_type=protocol.AnyType, implementations={ getvalues: lambda x: (x,) if x is not None else (), value_type: lambda x: type(x), value_eq: _scalar_value_eq, value_apply: lambda x, f: f(x) } ) efilter-1-1.5/efilter/protocols/structured.py0000664066434000116100000000503112713157120021441 0ustar adamsheng00000000000000# -*- coding: utf-8 -*- # EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """EFILTER abstract type system.""" from efilter import dispatch from efilter import protocol # Declarations: # pylint: disable=unused-argument @dispatch.multimethod def resolve(structured, key): raise NotImplementedError() def getmembers(structured): if isinstance(structured, type): return getmembers_static(structured) return getmembers_runtime(structured) @dispatch.class_multimethod def getmembers_static(structured_cls): raise NotImplementedError() @dispatch.multimethod def getmembers_runtime(structured): return getmembers_static(type(structured)) def reflect(structured, name): if isinstance(structured, type): return reflect_static_member(structured, name) return reflect_runtime_member(structured, name) @dispatch.class_multimethod def reflect_static_member(structured_cls, name): """Provide the type of 'name' which is a member of 'structured_cls'. Arguments: associative_cls: The type of the structured object (like a dict). name: The name to be reflected. Must be a member of 'structured_cls'. Returns: The type of 'name' or None. Invalid names should return None, whereas valid names with unknown type should return AnyType. """ raise NotImplementedError() @dispatch.multimethod def reflect_runtime_member(structured, name): return reflect_static_member(type(structured), name) class IStructured(protocol.Protocol): _required_functions = (resolve,) _optional_functions = (reflect_runtime_member, reflect_static_member, getmembers_static, getmembers_runtime) # Lets us pretend that dicts are objects, which makes it easy for users to # declare variables. IStructured.implement(for_type=dict, implementations={ resolve: lambda d, m: d[m], getmembers_runtime: lambda d: d.keys()}) efilter-1-1.5/efilter/protocols/reducer.py0000664066434000116100000001003012713157120020661 0ustar adamsheng00000000000000# -*- coding: utf-8 -*- # EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """(EXPERIMENTAL) EFILTER abstract type system.""" import itertools from efilter import dispatch from efilter import protocol from efilter.protocols import repeated # Declarations: # pylint: disable=unused-argument # Determined as good trade-off between memory usage and speed based on the # hygdata_v3 benchmark. DEFAULT_CHUNK_SIZE = 4096 @dispatch.multimethod def fold(reducer, chunk): """Reduce 'chunk' into an intermediate value.""" raise NotImplementedError() @dispatch.multimethod def merge(reducer, left, right): """Merge two intermediate values (from 'merge' or 'fold'). Returns: Intermediate value merged from 'left' and 'right'. """ raise NotImplementedError() @dispatch.multimethod def finalize(reducer, intermediate): """Convert the 'intermediate' to the final result of the reducer.""" raise NotImplementedError() def generate_chunks(data, chunk_size=DEFAULT_CHUNK_SIZE): """Yield 'chunk_size' items from 'data' at a time.""" iterator = iter(repeated.getvalues(data)) while True: chunk = list(itertools.islice(iterator, chunk_size)) if not chunk: return yield chunk def reduce(reducer, data, chunk_size=DEFAULT_CHUNK_SIZE): """Repeatedly call fold and merge on data and then finalize. Arguments: data: Input for the fold function. reducer: The IReducer to use. chunk_size: How many items should be passed to fold at a time? Returns: Return value of finalize. """ if not chunk_size: return finalize(reducer, fold(reducer, data)) # Splitting the work up into chunks allows us to, e.g. reduce a large file # without loading everything into memory, while still being significantly # faster than repeatedly calling the fold function for every element. chunks = generate_chunks(data, chunk_size) intermediate = fold(reducer, next(chunks)) for chunk in chunks: intermediate = merge(reducer, intermediate, fold(reducer, chunk)) return finalize(reducer, intermediate) class IReducer(protocol.Protocol): _required_functions = (fold, finalize, merge) class Compose(object): """Reducer that runs multiple other reducers on the same input.""" reducers = None def __init__(self, *reducers): self.reducers = reducers def fold(self, chunk): return [fold(r, chunk) for r in self.reducers] def merge(self, left, right): result = [] for idx, r in enumerate(self.reducers): result.append(merge(r, left[idx], right[idx])) return result def finalize(self, intermediate): result = [] for idx, r in enumerate(self.reducers): result.append(finalize(r, intermediate[idx])) return result IReducer.implicit_static(Compose) class Map(object): """Reducer that converts the input before calling the delegate.""" delegate = None mapper = None def __init__(self, delegate, mapper): if not callable(mapper): raise TypeError("Mapper must be callable.") self.mapper = mapper self.delegate = delegate def fold(self, chunk): return self.delegate.fold(tuple(self.mapper(chunk))) def merge(self, left, right): return self.delegate.merge(left, right) def finalize(self, intermediate): return self.delegate.finalize(intermediate) IReducer.implicit_static(Map) efilter-1-1.5/efilter/protocols/indexable.py0000664066434000116100000000271512713157120021176 0ustar adamsheng00000000000000# -*- coding: utf-8 -*- # EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """EFILTER abstract type system. DEPRECATION NOTICE IIndexable is no longer used by any parts of EFILTER and user software no longer needs to implement it for any purpose. Because some applications do implement IIndexable it continues to be around but will eventually be removed. Implementations of IIndexable can be safely removed and do not need to be replaced with anything. """ from efilter import dispatch from efilter import protocol from efilter.protocols import hashable # Declarations: # pylint: disable=unused-argument @dispatch.multimethod def indices(x): """DEPRECATED: Return a list of keys to represent 'self' in maps.""" raise NotImplementedError() class IIndexable(protocol.Protocol): """DEPRECATED: if you're still using this you can safely remove.""" _required_functions = (indices,) efilter-1-1.5/efilter/protocols/hashable.py0000664066434000116100000000257512713157120021016 0ustar adamsheng00000000000000# -*- coding: utf-8 -*- # EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """EFILTER abstract type system.""" import datetime import numbers import six from efilter import dispatch from efilter import protocol # Declarations: # pylint: disable=unused-argument @dispatch.multimethod def hashed(x): raise NotImplementedError() class IHashable(protocol.Protocol): _required_functions = (hashed,) # Default implementations: IHashable.implement(for_types=six.string_types, implementations={hashed: hash}) IHashable.implement(for_types=six.integer_types, implementations={hashed: hash}) IHashable.implement(for_types=(numbers.Number, type(None), tuple, frozenset, datetime.datetime), implementations={hashed: hash}) efilter-1-1.5/efilter/protocols/__init__.py0000664066434000116100000000003512713157120020773 0ustar adamsheng00000000000000"""EFILTER Type Protocols""" efilter-1-1.5/efilter/protocols/applicative.py0000664066434000116100000000377012713157120021546 0ustar adamsheng00000000000000# -*- coding: utf-8 -*- # EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """EFILTER abstract type system.""" from efilter import dispatch from efilter import protocol # Declarations: # pylint: disable=unused-argument @dispatch.multimethod def apply(applicative, args, kwargs): """Return the result of calling 'applicative' with 'args'. Host program should implement function whitelisting in this function! """ raise NotImplementedError() def reflect_return(applicative): if isinstance(applicative, type): return reflect_static_return(applicative) return reflect_runtime_return(applicative) def reflect_args(applicative): if isinstance(applicative, type): return reflect_static_args(applicative) return reflect_runtime_args(applicative) @dispatch.class_multimethod def reflect_static_args(applicative_cls): raise NotImplementedError() @dispatch.multimethod def reflect_runtime_args(applicative): return reflect_static_args(type(applicative)) @dispatch.class_multimethod def reflect_static_return(applicative_cls): raise NotImplementedError() @dispatch.multimethod def reflect_runtime_return(applicative): return reflect_static_return(type(applicative)) class IApplicative(protocol.Protocol): _required_functions = (apply,) _optional_functions = (reflect_static_return, reflect_runtime_return, reflect_static_args, reflect_runtime_args) efilter-1-1.5/efilter/protocols/number.py0000664066434000116100000000272612713157120020535 0ustar adamsheng00000000000000# -*- coding: utf-8 -*- # EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """EFILTER abstract type system.""" import six from efilter import dispatch from efilter import protocol # Declarations: # pylint: disable=unused-argument @dispatch.multimethod def sum(x, y): raise NotImplementedError() @dispatch.multimethod def product(x, y): raise NotImplementedError() @dispatch.multimethod def difference(x, y): raise NotImplementedError() @dispatch.multimethod def quotient(x, y): raise NotImplementedError() class INumber(protocol.Protocol): _required_functions = (sum, product, difference, quotient) # Default implementations: INumber.implement( for_types=(float, complex) + six.integer_types, implementations={ sum: lambda x, y: x + y, product: lambda x, y: x * y, difference: lambda x, y: x - y, quotient: lambda x, y: x / y } ) efilter-1-1.5/efilter/protocols/counted.py0000664066434000116100000000226312713157120020702 0ustar adamsheng00000000000000# -*- coding: utf-8 -*- # EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """EFILTER abstract type system.""" import six from efilter import dispatch from efilter import protocol # Declarations: # pylint: disable=unused-argument @dispatch.multimethod def count(coll): raise NotImplementedError() class ICounted(protocol.Protocol): _required_functions = (count,) # Default implementations: ICounted.implement(for_types=(list, tuple, set, frozenset, dict), implementations={count: len}) ICounted.implement(for_types=six.string_types, implementations={count: len}) efilter-1-1.5/efilter/protocols/ordered.py0000664066434000116100000000305512713157120020665 0ustar adamsheng00000000000000# -*- coding: utf-8 -*- # EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """EFILTER abstract type system.""" import six from efilter import dispatch from efilter import protocol # Declarations: # pylint: disable=unused-argument def ordered(collection, key_func=None): if callable(key_func): def key_for_sorted(x): return assortkey(key_func(x)) else: key_for_sorted = assortkey return sorted(collection, key=key_for_sorted) @dispatch.multimethod def assortkey(x): raise NotImplementedError() class IOrdered(protocol.Protocol): _required_functions = (assortkey,) # Default implementations: IOrdered.implement( for_type=protocol.AnyType, implementations={ assortkey: lambda x: x } ) IOrdered.implement( for_type=dict, implementations={ assortkey: lambda x: ordered(six.iteritems(x)) } ) IOrdered.implement( for_type=type(None), implementations={ assortkey: lambda _: 0 } ) efilter-1-1.5/efilter/protocols/associative.py0000664066434000116100000000520712713157120021554 0ustar adamsheng00000000000000# -*- coding: utf-8 -*- # EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """EFILTER abstract type system.""" import six from efilter import dispatch from efilter import protocol from efilter.protocols import counted # Declarations: # pylint: disable=unused-argument @dispatch.multimethod def select(associative, key): raise NotImplementedError() def getkeys(associative): if isinstance(associative, type): return getkeys_static(associative) return getkeys_runtime(associative) @dispatch.class_multimethod def getkeys_static(associative_cls): raise NotImplementedError() @dispatch.multimethod def getkeys_runtime(associative): return getkeys_static(type(associative)) def reflect(associative, key): if isinstance(associative, type): return reflect_static_key(associative, key) return reflect_runtime_key(associative, key) @dispatch.class_multimethod def reflect_static_key(associative_cls, key): """Provide the type of 'key' which is a member of 'associative_cls'. Arguments: associative_cls: The type of the associative object (like a dict). key: The name to be reflected. Must be a member of 'associative_cls'. Returns: The type of 'name' or None. Invalid names should return None, whereas valid names with unknown type should return AnyType. """ raise NotImplementedError() @dispatch.multimethod def reflect_runtime_key(associative, key): return reflect_static_key(type(associative), key) class IAssociative(protocol.Protocol): _required_functions = (select,) _optional_functions = (reflect_runtime_key, reflect_static_key, getkeys_runtime, getkeys_static) IAssociative.implement(for_type=dict, implementations={ select: lambda d, key: d[key], getkeys_runtime: lambda d: d.keys()}) IAssociative.implement( for_types=(list, tuple), implementations={ select: lambda c, idx: c[idx], getkeys_runtime: lambda c: six.moves.range(counted.count(c))}) efilter-1-1.5/efilter/protocols/boolean.py0000664066434000116100000000206312713157120020656 0ustar adamsheng00000000000000# -*- coding: utf-8 -*- # EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """EFILTER abstract type system.""" from efilter import dispatch from efilter import protocol # Declarations: # pylint: disable=unused-argument @dispatch.multimethod def asbool(x): raise NotImplementedError() class IBoolean(protocol.Protocol): _required_functions = (asbool,) # Default implementations: IBoolean.implement(for_type=protocol.AnyType, implementations={asbool: bool}) efilter-1-1.5/efilter/protocols/iset.py0000664066434000116100000000442612713157120020210 0ustar adamsheng00000000000000# -*- coding: utf-8 -*- # EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """EFILTER abstract type system.""" from efilter import dispatch from efilter import protocol from efilter.protocols import eq # Declarations: # pylint: disable=unused-argument @dispatch.multimethod def union(x, y): raise NotImplementedError() @dispatch.multimethod def intersection(x, y): raise NotImplementedError() @dispatch.multimethod def difference(x, y): raise NotImplementedError() @dispatch.multimethod def issuperset(x, y): raise NotImplementedError() @dispatch.multimethod def issubset(x, y): return issuperset(y, x) @dispatch.multimethod def isstrictsuperset(x, y): return issuperset(x, y) and eq.ne(x, y) @dispatch.multimethod def isstrictsubset(x, y): return isstrictsuperset(y, x) @dispatch.multimethod def contains(s, e): raise NotImplementedError() class ISet(protocol.Protocol): _required_functions = (union, intersection, difference, issuperset, contains) # Default implementations: ISet.implement( for_types=(set, frozenset), implementations={ union: lambda x, y: x | frozenset(y), intersection: lambda x, y: x & frozenset(y), difference: lambda x, y: x - frozenset(y), issuperset: lambda x, y: x >= frozenset(y), contains: lambda s, e: e in s } ) ISet.implement( for_types=(list, tuple), implementations={ union: lambda x, y: frozenset(x) | frozenset(y), intersection: lambda x, y: frozenset(x) & frozenset(y), difference: lambda x, y: frozenset(x) - frozenset(y), issuperset: lambda x, y: frozenset(x) >= frozenset(y), contains: lambda s, e: e in s } ) efilter-1-1.5/efilter/protocols/eq.py0000664066434000116100000000223612713157120017646 0ustar adamsheng00000000000000# -*- coding: utf-8 -*- # EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """EFILTER abstract type system.""" from efilter import dispatch from efilter import protocol # Declarations: # pylint: disable=unused-argument @dispatch.multimethod def eq(x, y): raise NotImplementedError() @dispatch.multimethod def ne(x, y): raise NotImplementedError() class IEq(protocol.Protocol): _required_functions = (eq, ne) # Default implementations: IEq.implement( for_type=protocol.AnyType, implementations={ eq: lambda x, y: x == y, ne: lambda x, y: x != y } ) efilter-1-1.5/efilter/syntax.py0000664066434000116100000000417112713157120016543 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ EFILTER syntax base class. """ __author__ = "Adam Sindelar " import abc import six class Syntax(six.with_metaclass(abc.ABCMeta, object)): """Base class representing parsers or generators of the EFILTER AST.""" FRONTENDS = {} FORMATTERS = {} def __init__(self, original, params=None): """Create a syntax parser for this dialect. Arguments: original: The source code of this query. Most often this is a string type, but there are exceptions (e.g. lisp) params: Some dialects support parametric queries (for safety) - if used, pass them as params. This should be a dict for keywords or a tuple for positional. """ super(Syntax, self).__init__() self.params = params self.original = original @abc.abstractproperty def root(self): """The root of the resultant AST. Subclasses MUST implement parsing here. """ @classmethod def register_parser(cls, subcls, shorthand=None): cls.register(subcls) if shorthand is None: shorthand = repr(subcls) cls.FRONTENDS[shorthand] = subcls @classmethod def register_formatter(cls, shorthand, formatter): cls.FORMATTERS[shorthand] = formatter @classmethod def get_syntax(cls, shorthand): return cls.FRONTENDS.get(shorthand) @classmethod def get_formatter(cls, shorthand): return cls.FORMATTERS.get(shorthand) efilter-1-1.5/efilter/__init__.py0000664066434000116100000000013312713157120016746 0ustar adamsheng00000000000000"""EFILTER Forensic Query Language""" from efilter import ext from efilter import parsers efilter-1-1.5/efilter/ext/0000750066434000116100000000000012762014475015442 5ustar adamsheng00000000000000efilter-1-1.5/efilter/ext/__init__.py0000644066434000116100000000022612722027602017550 0ustar adamsheng00000000000000from efilter.ext import lazy_repetition from efilter.ext import line_reader from efilter.ext import list_repetition from efilter.ext import row_tuple efilter-1-1.5/efilter/ext/list_repetition.py0000664066434000116100000000637612720544207021247 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ Implements IRepeated using a list container. """ __author__ = "Adam Sindelar " from efilter.protocols import repeated class ListRepetition(object): """Repeated variable backed by a list.""" _delegate = None def __init__(self, first_value=None, *values): self._delegate = [] if first_value is None: return self._value_type = repeated.value_type(first_value) self.add_value(first_value) for value in values: if repeated.value_type(value) != self.value_type(): raise TypeError( "All values of a repeated var must be the of same type." " First argument was of type %r, but argument %r is of" " type %r." % (self.value_type(), value, repeated.value_type(value))) self.add_value(value) def __iter__(self): return iter(self._delegate) def add_value(self, value): """Add a value to this repeated var. WARNING: this mutates the object (it's NOT copy on write). Unless you're absolutely certain of what you're doing, you most likely want to call repeated.meld(x, y) instead. """ self._delegate.extend(repeated.getvalues(value)) def add_single_value(self, value): """Same as 'add_value' but even more dangerous. Same caveats apply as with 'add_value' but also, the caller is responsible for ensuring 'value' is a scalar (not another repetition). """ self._delegate.append(value) def value_type(self): return self._value_type def getvalues(self): # Return a copy because delegate is mutable and we don't want things # to blow up. return self._delegate[:] def value_eq(self, other): if isinstance(other, type(self)): # pylint: disable=protected-access return sorted(self._delegate) == sorted(other._delegate) return sorted(self._delegate) == sorted(repeated.getvalues(other)) def __eq__(self, other): if not isinstance(other, repeated.IRepeated): return False return self.value_eq(other) def __ne__(self, other): return not self == other def value_apply(self, f): return repeated.repeated(*[f(x) for x in self.getvalues()]) def __repr__(self): return "%s(%s)" % (type(self).__name__, ", ".join([repr(x) for x in self.getvalues()])) repeated.IRepeated.implicit_static(ListRepetition) repeated.repeated.implement(for_type=object, implementation=ListRepetition) efilter-1-1.5/efilter/ext/row_tuple.py0000644066434000116100000001666512722027602020047 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ Implements IStructured with a RowTuple to represent rows of output from SELECTS. """ __author__ = "Adam Sindelar " import collections import six from efilter.protocols import associative from efilter.protocols import counted from efilter.protocols import structured class RowTuple(object): """Represents a row of output where column names are significant. The Bind expression, which is how DottySQL represents "SELECT AS" queries, assigns variable names to columns in output of the SELECT statement. This is the type we use to represent that row. It has several functions: - Preserves the order of columns in the output. - Implements a way to get values by either their index or by their name. - Implements IStructured, so the resulting row can be used as lexical scope for subexpressions. - Makes it possible to select a single column from a subselect and use it as a scalar value, while still retaining the column name, for example, "SELECT proc.pid AS pid FROM pslist" will return a RowTuple() with members set to ['pid'] and _vars set to {'pid': some_value}, which can be accessed as row_tuple["pid"], row_tuple[0] or row_tuple.get_singleton. The RowTuple can be treated as both a structured container and a tuple. Using the IStructured protocol, values at named columns can be obtained with the 'resolve' function; 'getmembers' is also supported. Using IAssociative, values at numerical indices in the conceptual tuple can be obtained. The number of columns can be obtained using ICounted. The python __getitem__ magic method supports both numeric indices and column names. Iterating the RowTuple yields the values in order of columns. """ ordered_dict = None class __UnsetSentinel(object): """This is a sentinel value for columns that haven't been initialized. Because order of columns is significant, we want to always initialize the ordered_dict container with the final list of columns in the constructor. If values of those columns are not yet available, we can set them to a sentinel value (this class) that signifies a KeyError should be raised if someone attempts to access the column before it's been set. """ pass def __init__(self, values=None, ordered_columns=None): if ordered_columns is not None and values is not None: if sorted(values.keys()) != sorted(ordered_columns): raise ValueError( "Bad arguments to RowTuple: ordered_columns were %r but " "values had keys for %r." % (ordered_columns, list(values.keys()))) self.ordered_dict = collections.OrderedDict( [(c, values[c]) for c in ordered_columns]) elif ordered_columns is not None: self.ordered_dict = collections.OrderedDict( [(c, self.__UnsetSentinel) for c in ordered_columns]) elif values is not None: self.ordered_dict = collections.OrderedDict( sorted(values.items(), key=lambda t: t[1])) else: raise ValueError( "RowTuple must be instantiated with values, columns or both.") def get_singleton(self): """If the row only has one column, return that value; otherwise raise. Raises: ValueError, if count of columns is not 1. """ only_value = None for value in six.itervalues(self.ordered_dict): # This loop will raise if it runs more than once. if only_value is not None: raise ValueError("%r is not a singleton." % self) only_value = value if only_value is self.__UnsetSentinel or only_value is None: raise ValueError("%r is empty." % self) return only_value @property def ordered_values(self): """Return a tuple of values in the order columns were specified.""" return tuple(iter(self)) # Implementing IAssociative: def select(self, idx): try: key = tuple(self.ordered_dict.keys())[idx] except TypeError: # Select should only raise KeyError or AttributeError. raise KeyError(idx) return self.resolve(key) # Implementing ICounted: def count(self): return len(self) # Implementing IStructured: def resolve(self, name): value = self.ordered_dict[name] if value is self.__UnsetSentinel: # Resolve should raise, not return None. raise KeyError(name) return value def getmembers_runtime(self): return tuple(self.ordered_dict.keys()) # Magic methods: def get(self, key, default=None): try: return self[key] except (KeyError, IndexError): return default def __getitem__(self, key): if isinstance(key, six.integer_types): try: return self.select(key) except KeyError: # By convention, [] with numeric key should raise an IndexError. raise IndexError(key) return self.resolve(key) def __setitem__(self, key, value): if isinstance(key, six.integer_types): if key >= len(self): raise IndexError(key) key = tuple(self.ordered_dict.keys())[key] if not key in self.ordered_dict: raise KeyError("%r doesn't contain var %r." % (self, key)) self.ordered_dict[key] = value def __repr__(self): return "RowTuple(%r)" % (self.ordered_dict) def __iter__(self): for value in six.itervalues(self.ordered_dict): if value is self.__UnsetSentinel: yield None else: yield value def __len__(self): return len(self.ordered_dict) def __eq__(self, other): if isinstance(other, type(self)): return self.ordered_dict == other.ordered_dict elif isinstance(other, structured.IStructured): try: other_members = structured.getmembers(other) except NotImplementedError: return None members = sorted(self.ordered_dict.keys()) if members != sorted(other_members): return False vals = tuple([self.get(m) for m in members]) other_vals = tuple([structured.resolve(other, m) for m in members]) return vals == other_vals elif isinstance(other, (tuple, list)): return list(self) == list(other) else: return None def __ne__(self, other): return not self.__eq__(other) associative.IAssociative.implicit_static(RowTuple) counted.ICounted.implicit_static(RowTuple) structured.IStructured.implicit_static(RowTuple) efilter-1-1.5/efilter/ext/lazy_repetition.py0000664066434000116100000001241412713157120021235 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ Implements IRepeated using a restartable generator. """ __author__ = "Adam Sindelar " from efilter.protocols import counted from efilter.protocols import ordered from efilter.protocols import repeated class LazyRepetition(object): """Repeated variable backed by a restartable generator. Arguments: generator_func: A stable function that returns a generator. Stable means that the generator must be the same every time the function is called (for the express purpose of reseting iteration). """ _generator_func = None _value_type = None # Just a cache for value_type. _watermark = 0 # Highest idx reached so far. # The count of values. After first complete iteration this will be one # higher than watermark. _count = None def __init__(self, generator_func): if not callable(generator_func): raise TypeError("Generator function must be callable.") self._generator_func = generator_func def __eq__(self, other): if not isinstance(other, repeated.IRepeated): return False return self.value_eq(other) def __iter__(self): return self._generator_func() def __repr__(self): return "LazyRepetition(generator_func=%r, value_type=%r)" % ( self._generator_func, self.value_type()) # IRepeated protocol implementation (see IRepeated for behavior docs). def getvalues(self): """Yields all the values from 'generator_func' and type-checks. Yields: Whatever 'generator_func' yields. Raises: TypeError: if subsequent values are of a different type than first value. ValueError: if subsequent iteration returns a different number of values than the first iteration over the generator. (This would mean 'generator_func' is not stable.) """ idx = 0 generator = self._generator_func() first_value = next(generator) self._value_type = type(first_value) yield first_value for idx, value in enumerate(generator): if not isinstance(value, self._value_type): raise TypeError( "All values of a repeated var must be of the same type." " First argument was of type %r, but argument %r is of" " type %r." % (self._value_type, value, repeated.value_type(value))) self._watermark = max(self._watermark, idx + 1) yield value # Iteration stopped - check if we're at the previous watermark and raise # if not. if idx + 1 < self._watermark: raise ValueError( "LazyRepetition %r was previously able to iterate its" " generator up to idx %d, but this time iteration stopped after" " idx %d! Generator function %r is not stable." % (self, self._watermark, idx + 1, self._generator_func)) # Watermark is higher than previous count! Generator function returned # more values this time than last time. if self._count is not None and self._watermark >= self._count: raise ValueError( "LazyRepetition %r previously iterated only up to idx %d but" " was now able to reach idx %d! Generator function %r is not" " stable." % (self, self._count - 1, idx + 1, self._generator_func)) # We've finished iteration - cache count. After this the count will be # watermark + 1 forever. self._count = self._watermark + 1 def value_type(self): if self._value_type is None: for value in self.getvalues(): self._value_type = type(value) break return self._value_type def value_eq(self, other): """Sorted comparison of values.""" self_sorted = ordered.ordered(self.getvalues()) other_sorted = ordered.ordered(repeated.getvalues(other)) return self_sorted == other_sorted def value_apply(self, f): def _generator(): for value in self.getvalues(): yield f(value) return LazyRepetition(_generator) # ICounted implementation: def count(self): if not self._count: # Do a complete pass over the generator to cause _count to be set. for _ in self.getvalues(): pass return self._count repeated.IRepeated.implicit_static(LazyRepetition) repeated.lazy.implement(for_type=object, implementation=LazyRepetition) counted.ICounted.implicit_static(LazyRepetition) efilter-1-1.5/efilter/ext/csv_reader.py0000664066434000116100000000420612713157120020131 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ Implements IRepeated for text files and some common formats. """ __author__ = "Adam Sindelar " import csv from efilter.protocols import counted from efilter.protocols import repeated class LazyCSVReader(object): source = None delim = "," quote = "\"" output_dicts = False trim = True def __init__(self, fd, delim=",", quote="\"", output_dicts=False, trim=True): self.source = repeated.lines(fd) self.delim = delim self.quote = quote self.output_dicts = output_dicts self.trim = trim def __iter__(self): return self.getvalues() # IRepeated implementation. def getvalues(self): reader_cls = csv.DictReader if self.output_dicts else csv.reader return reader_cls(iter(self.source), delimiter=self.delim, quotechar=self.quote, skipinitialspace=self.trim, escapechar="\\") def value_type(self): return dict if self.output_dicts else list def value_eq(self, other): if isinstance(other, type(self)): return self.source.fd == other.source.fd return list(self) == list(other) def value_apply(self, f): for value in self: yield f(value) # ICounted implementation. def count(self): return counted.count(self.source) counted.ICounted.implicit_static(LazyCSVReader) repeated.IRepeated.implicit_static(LazyCSVReader) efilter-1-1.5/efilter/ext/line_reader.py0000664066434000116100000000541212713157120020265 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ Implements IRepeated for text files and some common formats. """ __author__ = "Adam Sindelar " import six import threading from efilter.protocols import counted from efilter.protocols import repeated class LazyLineReader(object): """Reads in a line at a time and supports restarting.""" fd = None _seek_lock = None def __init__(self, fd): self.fd = fd self._seek_lock = threading.Lock() def __iter__(self): return self.getvalues() def __del__(self): """Close 'fd' if it hasn't been closed already. If LazyLineReader was instantiated using EFILTER's stdlib.io functions then it won't be inside of a with block and we need to close fd when the repeated is deallocated. """ if not self.fd.closed: self.fd.close() # IRepeated implementation. def readline_at_offset(self, offset): self._seek_lock.acquire() self.fd.seek(offset) line = self.fd.readline() new_offset = self.fd.tell() self._seek_lock.release() return line, new_offset def getvalues(self): line, offset = self.readline_at_offset(0) while line: yield line line, offset = self.readline_at_offset(offset) def value_type(self): return six.string_types[0] def value_eq(self, other): if isinstance(other, type(self)): return self.fd == other.fd return list(self) == list(other) def value_apply(self, f): for value in self: yield f(value) # Counted implementation. def count(self): c = 0 for _ in self: c += 1 return c counted.ICounted.implicit_static(for_type=LazyLineReader) repeated.IRepeated.implicit_static(LazyLineReader) if six.PY2: # Python 3 doesn't have a file class. open() just returns a StringIO repeated.lines.implement(for_type=file, implementation=LazyLineReader) if six.PY3: import io repeated.lines.implement(for_type=io.IOBase, implementation=LazyLineReader) repeated.lines.implement(for_type=six.StringIO, implementation=LazyLineReader) efilter-1-1.5/efilter/ast.py0000664066434000116100000003124112713157120016002 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ EFILTER Abstract Syntax Tree. The AST represents the actual canonical syntax of EFILTER, as understood by all the behavior implementations and transformations. The string and lisp-based syntaxes are frontends that translate into this AST, which is what is actually interpretted. """ __author__ = "Adam Sindelar " import six from efilter import protocol from efilter.protocols import associative from efilter.protocols import boolean from efilter.protocols import eq from efilter.protocols import iset from efilter.protocols import number from efilter.protocols import ordered # This is not actually an unused import (see the Reducer class). Pylint is just # broken. from efilter.protocols import reducer # pylint: disable=unused-import from efilter.protocols import repeated from efilter.protocols import structured class Expression(object): """Base class of the query AST. Behavior of the query language is encoded in the various transform functions. Expression themselves have no behavior, and only contain children and type and arity information. """ __abstract = True children = () arity = 0 start = None # Start of the expression's source code in 'source'. end = None # End of the expression's source code in 'source'. source = None # The source code of the query this expression belongs to. type_signature = (protocol.AnyType,) return_signature = protocol.AnyType def __hash__(self): return hash((type(self), self.children)) def __eq__(self, other): return isinstance(other, type(self)) and self.children == other.children def __ne__(self, other): return not self.__eq__(other) def __init__(self, *children, **kwargs): super(Expression, self).__init__() self.start = kwargs.pop("start", None) self.end = kwargs.pop("end", None) self.source = kwargs.pop("source", None) if kwargs: raise ValueError("Unexpected argument(s) %s" % kwargs.keys()) if self.arity and len(children) != self.arity: raise ValueError("%d-ary expression %s passed %d children." % ( self.arity, type(self).__name__, len(children))) self.children = children def __repr__(self): if len(self.children) == 1: return "%s(%r)" % (type(self).__name__, self.children[0]) lines = [] for child in self.children: if isinstance(child, Expression): clines = [" %s" % line for line in repr(child).split("\n")] else: clines = repr(child).split("\n") lines.extend(clines) return "%s(\n%s)" % (type(self).__name__, "\n".join(lines)) class ValueExpression(Expression): """Unary expression.""" arity = 1 __abstract = True return_signature = protocol.AnyType @property def value(self): return self.children[0] class BinaryExpression(Expression): arity = 2 __abstract = True @property def lhs(self): return self.children[0] @property def rhs(self): return self.children[1] class VariadicExpression(Expression): """Represents an expression with variable arity.""" type_signature = protocol.AnyType arity = None __abstract = True # Value (unary) expressions ### class Literal(ValueExpression): """Represents a literal, which is to say not-an-expression.""" type_signature = None # Depends on literal. class Var(ValueExpression): """Represents a member of the evaluated object - attributes of entity.""" type_signature = (six.string_types[0],) class UnaryOperation(ValueExpression): """Represents an operation on a single operand (subexpression).""" __abstract = True class Complement(UnaryOperation): """Logical NOT.""" type_signature = (boolean.IBoolean,) return_signature = boolean.IBoolean # Binary expressions ### class Pair(BinaryExpression): """Represents a key/value pair.""" type_signature = (protocol.AnyType, protocol.AnyType) return_signature = tuple @property def key(self): return self.lhs @property def value(self): return self.rhs class Select(BinaryExpression): """Represents a selection of the key (rhs) from the value (lhs). This usually roughly corresponds to array subscription (a[i]). """ type_signature = (associative.IAssociative, protocol.AnyType) return_signature = None @property def value(self): return self.lhs @property def key(self): return self.rhs class Resolve(BinaryExpression): """Represents the resolution of the member (rhs) from the object (lhs). This is analogous to the dot (.) operator in most languages. A similar result can be achieved using map(value, var(...)) but the map construct is subject to lexical scoping rules and could end up returning something that was available in the outside scope, but wasn't a member of the object. """ type_signature = (structured.IStructured, protocol.AnyType) return_signature = None @property def obj(self): return self.lhs @property def member(self): return self.rhs class IsInstance(BinaryExpression): """Evaluates to True if the current scope is an instance of type.""" type_signature = (protocol.AnyType, type) return_signature = bool class Cast(BinaryExpression): """Represents a typecast.""" type_signature = (protocol.AnyType, type) return_signature = protocol.AnyType class Within(BinaryExpression): """Uses left side as new vars and evaluates right side as a subquery. Concrete behavior depends on the various subclasses, such as Filter and Map, but each one of them will expect left hand side to be an associative object holding the new vars, or a repeated variable of associative objects. """ __abstract = True type_signature = (structured.IStructured, protocol.AnyType) return_signature = None # Depends on RHS. @property def context(self): return self.lhs @property def expression(self): return self.rhs class Map(Within): """Returns the result of applying right side to the values on left side. If left is a repeated value then this will return another repeated value. """ class Let(Within): """Works like Map, but over a single value on the LHS.""" class Filter(Within): """Filters (repeated) values on left side using expression on right side. Will return a repeated variable containing only the values for which the expression on the right evaluated to true. """ class Reducer(BinaryExpression): """(EXPERIMENTAL) Evaluates to an IReducer on the LHS with a mapper. The LHS should return an IReducer. The RHS is a mapper expression that supplies data to the reducer. Can be used in conjunction with 'Group'. Types that implement IReducer can also be applied as functions using IApplicative, but when used using the IReducer protocol, typically exhibit better performance. """ return_signature = reducer.IReducer type_signature = (reducer.IReducer, repeated.IRepeated) @property def reducer(self): return self.lhs @property def mapper(self): return self.rhs class Group(Within): """(EXPERIMENTAL) Reduces repeated values into groups by applying reducers. This is analogous to the SQL GROUP BY statement. The LHS must evaluate to a repeated value of rows. The grouper maps each row to a group, and at least one reducer applies an IReducer instance to the data. Use 'Reducer' to instantiate IReducers with mappers attached. """ arity = None type_signature = protocol.AnyType return_signature = list @property def lhs(self): return self.children[0] @property def grouper(self): return self.children[1] @property def reducers(self): return self.children[2:] class Sort(Within): """Sorts the left hand side using the right hand side return.""" class Any(Within): """Returns true if the rhs evaluates as true for any value of lhs.""" return_signature = bool arity = None # RHS is allowed to be omitted. class Each(Within): """Returns true if the rhs evaluates as true for every value of lhs.""" return_signature = bool class Membership(BinaryExpression): """Membership of element in set.""" type_signature = (eq.IEq, iset.ISet) return_signature = boolean.IBoolean @property def element(self): return self.lhs @property def set(self): return self.rhs class RegexFilter(BinaryExpression): type_signature = (six.string_types[0], six.string_types[0]) return_signature = boolean.IBoolean @property def string(self): return self.lhs @property def regex(self): return self.rhs # Variadic Expressions ### class Apply(VariadicExpression): """Represents application of arguments to a function.""" type_signature = protocol.AnyType return_signature = protocol.AnyType @property def func(self): return self.children[0] @property def args(self): return self.children[1:] class Bind(VariadicExpression): """Creates a new IAssociative of vars.""" type_signature = protocol.AnyType return_signature = associative.IAssociative class Repeat(VariadicExpression): """Creates a new IRepeated of values.""" type_signature = protocol.AnyType return_signature = repeated.IRepeated class Tuple(VariadicExpression): """Create a new tuple of values.""" type_signature = protocol.AnyType return_signature = tuple # Conditionals ### class IfElse(VariadicExpression): """Evaluates as if-else if-else if-else blocks. Subexpressions are arranged as follows: - Children with an even ordinal number (0, 2, 4...) are conditions and must evaluate to an IBoolean. - Children with an odd ordinal number (1, 3, 5...) are the block that will be returned if the previous condition returned true. - The last child is the else block. """ def conditions(self): """The if-else pairs.""" for idx in six.moves.range(1, len(self.children), 2): yield (self.children[idx - 1], self.children[idx]) def default(self): """The else block.""" if len(self.children) % 2: return self.children[-1] # Logical Variadic ### class LogicalOperation(VariadicExpression): type_signature = boolean.IBoolean return_signature = boolean.IBoolean __abstract = True class Union(LogicalOperation): """Logical OR (variadic).""" class Intersection(LogicalOperation): """Logical AND (variadic).""" # Subtle difference - this is /actually/ required to be a bool, as opposed # to a Union, where the return signature is only required to support # the boolean protocol. return_signature = bool # Variadic Relations ### class Relation(VariadicExpression): return_signature = boolean.IBoolean __abstract = True class OrderedSet(Relation): """Abstract class to represent strict and non-strict ordering.""" type_signature = ordered.IOrdered __abstract = True class StrictOrderedSet(OrderedSet): """Greater than relation.""" type_signature = ordered.IOrdered class PartialOrderedSet(OrderedSet): """Great-or-equal than relation.""" type_signature = ordered.IOrdered class Equivalence(Relation): """Logical == (variadic).""" type_signature = eq.IEq # Variadic Arithmetic ### class NumericExpression(VariadicExpression): """Arithmetic expressions.""" return_signature = number.INumber __abstract = True class Sum(NumericExpression): """Arithmetic + (variadic).""" type_signature = number.INumber class Difference(NumericExpression): """Arithmetic - (variadic).""" type_signature = number.INumber class Product(NumericExpression): """Arithmetic * (variadic).""" type_signature = number.INumber class Quotient(NumericExpression): """Arithmetic / (variadic).""" type_signature = number.INumber efilter-1-1.5/efilter/protocol.py0000664066434000116100000002547312713157120017066 0ustar adamsheng00000000000000# -*- coding: utf-8 -*- # EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ EFILTER abstract type system. The type protocols defined under efilter.protocols.* provide a very thin layer over Python's builtin types, defined as collections of related functions with defined semantics. Each type protocol is intended to uniformly support a specific behavior across any type that participates in the protocol. To participate in a protocol, two things are required: 1) Implementations of each of the member functions must be provided. 2) The type must be formally added to the protocol. In this manner, we are able to declare strict compositional types on atoms and expressions in the EFILTER AST and allow type hierarchies external to EFILTER (Plaso Events, Rekall Entities) to be passed to the EFILTER transforms without casting or wrapping. The compositional, flat nature of the type protocols makes it simple to support basic type inference, by annotating each expression type with sets of protocols it requires on its children and guarantees on its return type. """ __author__ = "Adam Sindelar " import abc import six class AnyType(object): """Sentinel used to provide a default implementation of a protocol. If you need to provide a default implementation of functions in a protocol (for example, providing fall-through behavior for objects that don't participate in the protocol) you may pass this type in place of 'object'. This will cause the multimethod functions to fall through to this default implementation, but won't cause 'object' to be a subclass of the protocol. Example: MyProtocol.implement(for_type=AnyType, implementations={foo=lambda x: "foo"}) foo(5) # => "foo" isinstance(5, MyProtocol) # => False implements(5, MyProtocol) # => True """ BUILTIN_TYPES = [float, complex, type(None), AnyType, set, frozenset, list, dict, tuple] BUILTIN_TYPES.extend(six.integer_types) BUILTIN_TYPES.extend(six.string_types) def implements(obj, protocol): """Does the object 'obj' implement the 'prococol'?""" if isinstance(obj, type): raise TypeError("First argument to implements must be an instance. " "Got %r." % obj) return isinstance(obj, protocol) or issubclass(AnyType, protocol) def isa(cls, protocol): """Does the type 'cls' participate in the 'protocol'?""" if not isinstance(cls, type): raise TypeError("First argument to isa must be a type. Got %s." % repr(cls)) if not isinstance(protocol, type): raise TypeError(("Second argument to isa must be a type or a Protocol. " "Got an instance of %r.") % type(protocol)) return issubclass(cls, protocol) or issubclass(AnyType, protocol) class Protocol(six.with_metaclass(abc.ABCMeta, object)): """Collection of related functions that operate on a type (interface).""" _required_functions = frozenset() _optional_functions = frozenset() @classmethod def required(cls): result = set(cls._required_functions) for scls in cls.mro(): functions = getattr(scls, "_required_functions", None) if functions: result.update(functions) return result @classmethod def optional(cls): result = set(cls._optional_functions) for scls in cls.mro(): functions = getattr(scls, "_optional_functions", None) if functions: result.update(functions) return result @classmethod def functions(cls): return cls.required() | cls.optional() @classmethod def implemented(cls, for_type): """Assert that protocol 'cls' is implemented for type 'for_type'. This will cause 'for_type' to be registered with the protocol 'cls'. Subsequently, protocol.isa(for_type, cls) will return True, as will isinstance, issubclass and others. Raises: TypeError if 'for_type' doesn't implement all required functions. """ for function in cls.required(): if not function.implemented_for_type(for_type): raise TypeError( "%r doesn't implement %r so it cannot participate in " "the protocol %r." % (for_type, function.func.__name__, cls)) cls.register(for_type) @staticmethod def __get_type_args(for_type=None, for_types=None): """Parse the arguments and return a tuple of types to implement for. Raises: ValueError or TypeError as appropriate. """ if for_type: if for_types: raise ValueError("Cannot pass both for_type and for_types.") for_types = (for_type,) elif for_types: if not isinstance(for_types, tuple): raise TypeError("for_types must be passed as a tuple of " "types (classes).") else: raise ValueError("Must pass either for_type or for_types.") return for_types @classmethod def _implement_for_type(cls, for_type, implementations): # AnyType is a sentinel that means the multimethod function should # just dispatch on 'object'. dispatch_type = object if for_type is AnyType else for_type protocol_functions = cls.functions() remaining = set(protocol_functions) for func, impl in six.iteritems(implementations): if func not in protocol_functions: func_name = getattr(func, "func_name", repr(func)) raise TypeError("Function %s is not part of the protocol %r." % (func_name, cls)) func.implement(for_type=dispatch_type, implementation=impl) remaining.remove(func) cls.implemented(for_type=for_type) @classmethod def implicit_static(cls, for_type=None, for_types=None): """Automatically generate implementations for a type. Implement the protocol for the 'for_type' type by dispatching each member function of the protocol to an instance method of the same name declared on the type 'for_type'. Arguments: for_type: The type to implictly implement the protocol with. Raises: TypeError if not all implementations are provided by 'for_type'. """ for type_ in cls.__get_type_args(for_type, for_types): implementations = {} for function in cls.required(): method = getattr(type_, function.__name__, None) if not callable(method): raise TypeError( "%s.implicit invokation on type %r is missing instance " "method %r." % (cls.__name__, type_, function.__name__)) implementations[function] = method for function in cls.optional(): method = getattr(type_, function.__name__, None) if callable(method): implementations[function] = method return cls.implement(for_type=type_, implementations=implementations) @staticmethod def _build_late_dispatcher(func_name): """Return a function that calls method 'func_name' on objects. This is useful for building late-bound dynamic dispatch. Arguments: func_name: The name of the instance method that should be called. Returns: A function that takes an 'obj' parameter, followed by *args and returns the result of calling the instance method with the same name as the contents of 'func_name' on the 'obj' object with the arguments from *args. """ def _late_dynamic_dispatcher(obj, *args): method = getattr(obj, func_name, None) if not callable(method): raise NotImplementedError( "Instance method %r is not implemented by %r." % ( func_name, obj)) return method(*args) return _late_dynamic_dispatcher @classmethod def implicit_dynamic(cls, for_type=None, for_types=None): """Automatically generate late dynamic dispatchers to type. This is similar to 'implicit_static', except instead of binding the instance methods, it generates a dispatcher that will call whatever instance method of the same name happens to be available at time of dispatch. This has the obvious advantage of supporting arbitrary subclasses, but can do no verification at bind time. Arguments: for_type: The type to implictly implement the protocol with. """ for type_ in cls.__get_type_args(for_type, for_types): implementations = {} for function in cls.functions(): implementations[function] = cls._build_late_dispatcher( func_name=function.__name__) cls.implement(for_type=type_, implementations=implementations) @classmethod def implement(cls, implementations, for_type=None, for_types=None): """Provide protocol implementation for a type. Register all implementations of multimethod functions in this protocol and add the type into the abstract base class of the protocol. Arguments: implementations: A dict of (function, implementation), where each function is multimethod and each implementation is a callable. for_type: The concrete type implementations apply to. for_types: Same as for_type, but takes a tuple of types. You may not supply both for_type and for_types for obvious reasons. Raises: ValueError for arguments. TypeError if not all implementations are provided or if there are issues related to polymorphism (e.g. attempting to implement a non-multimethod function. """ for type_ in cls.__get_type_args(for_type, for_types): cls._implement_for_type(for_type=type_, implementations=implementations) efilter-1-1.5/efilter/scope.py0000664066434000116100000001474112713157120016332 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ EFILTER lexical scope container. """ __author__ = "Adam Sindelar " from efilter import protocol from efilter.protocols import structured class ScopeStack(object): """Stack of IStructured scopes from global to local. Arguments: scopes: A flat list of scopes from local (idx -1) to global (idx 0). Note that ScopeStackStack instances passed to the constructor are flattened. Each scope is either a subclass of IStructured or an instance of such subclass. When the ScopeStack is used in type inference the individual scopes are usually instances of type, or whatever objects the host application uses to emulate types. When used at runtime, they are, of course, instances. """ scopes = () @property def globals(self): return self.scopes[0] @property def locals(self): return self.scopes[-1] def __repr__(self): return "ScopeStack(%s)" % ", ".join((repr(s) for s in self.scopes)) def __init__(self, *scopes): flattened_scopes = [] for scope in scopes: if isinstance(scope, type(self)): flattened_scopes.extend(scope.scopes) elif isinstance(scope, type): flattened_scopes.append(scope) elif protocol.implements(scope, structured.IStructured): flattened_scopes.append(scope) else: raise TypeError("Scopes must be instances or subclasses of " "IStructured; got %r." % (scope,)) self.scopes = flattened_scopes # IStructured implementation. def resolve(self, name): """Call IStructured.resolve across all scopes and return first hit.""" for scope in reversed(self.scopes): try: return structured.resolve(scope, name) except (KeyError, AttributeError): continue raise AttributeError(name) def getmembers(self): """Gets members (vars) from all scopes, using both runtime and static. This method will attempt both static and runtime getmembers. This is the recommended way of getting available members. Returns: Set of available vars. Raises: NotImplementedError if any scope fails to implement 'getmembers'. """ names = set() for scope in self.scopes: if isinstance(scope, type): names.update(structured.getmembers_static(scope)) else: names.update(structured.getmembers_runtime(scope)) return names def getmembers_runtime(self): """Gets members (vars) from all scopes using ONLY runtime information. You most likely want to use ScopeStack.getmembers instead. Returns: Set of available vars. Raises: NotImplementedError if any scope fails to implement 'getmembers'. """ names = set() for scope in self.scopes: names.update(structured.getmembers_runtime(scope)) return names @classmethod def getmembers_static(cls): """Gets members (vars) from all scopes using ONLY static information. You most likely want to use ScopeStack.getmembers instead. Returns: Set of available vars. Raises: NotImplementedError if any scope fails to implement 'getmembers'. """ names = set() for scope in cls.scopes: names.update(structured.getmembers_static(scope)) return names def reflect(self, name): """Reflect 'name' starting with local scope all the way up to global. This method will attempt both static and runtime reflection. This is the recommended way of using reflection. Returns: Type of 'name', or protocol.AnyType. Caveat: The type of 'name' does not necessarily have to be an instance of Python's type - it depends on what the host application returns through the reflection API. For example, Rekall uses objects generated at runtime to simulate a native (C/C++) type system. """ # Return whatever the most local scope defines this as, or bubble all # the way to the top. result = None for scope in reversed(self.scopes): try: if isinstance(scope, type): result = structured.reflect_static_member(scope, name) else: result = structured.reflect_runtime_member(scope, name) if result is not None: return result except (NotImplementedError, KeyError, AttributeError): continue return protocol.AnyType def reflect_runtime_member(self, name): """Reflect 'name' using ONLY runtime reflection. You most likely want to use ScopeStack.reflect instead. Returns: Type of 'name', or protocol.AnyType. """ for scope in reversed(self.scopes): try: return structured.reflect_runtime_member(scope, name) except (NotImplementedError, KeyError, AttributeError): continue return protocol.AnyType @classmethod def reflect_static_member(cls, name): """Reflect 'name' using ONLY static reflection. You most likely want to use ScopeStack.reflect instead. Returns: Type of 'name', or protocol.AnyType. """ for scope in reversed(cls.scopes): try: return structured.reflect_static_member(scope, name) except (NotImplementedError, KeyError, AttributeError): continue return protocol.AnyType structured.IStructured.implicit_static(ScopeStack) efilter-1-1.5/efilter/transforms/0000750066434000116100000000000012762014475017040 5ustar adamsheng00000000000000efilter-1-1.5/efilter/transforms/normalize.py0000664066434000116100000000610512713157120021412 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ EFILTER query normalizer. """ __author__ = "Adam Sindelar " from efilter import dispatch from efilter import ast from efilter import query as q @dispatch.multimethod def normalize(expr): """Optimizes the AST for better performance and simpler structure. The returned query will be logically equivalent to what was provided but transformations will be made to flatten and optimize the structure. This works by recognizing certain patterns and replacing them with nicer ones, eliminating pointless expressions, and so on. # Collapsing nested variadic expressions: Example: Intersection(x, Interestion(y, z)) => Intersection(x, y, z) # Empty branch elimination: Example: Intersection(x) => x """ _ = expr raise NotImplementedError() @normalize.implementation(for_type=q.Query) def normalize(query): new_root = normalize(query.root) return q.Query(query, root=new_root) @normalize.implementation(for_type=ast.Expression) def normalize(expr): return expr @normalize.implementation(for_type=ast.BinaryExpression) def normalize(expr): """Normalize both sides, but don't eliminate the expression.""" lhs = normalize(expr.lhs) rhs = normalize(expr.rhs) return type(expr)(lhs, rhs, start=lhs.start, end=rhs.end) @normalize.implementation(for_type=ast.Apply) def normalize(expr): """No elimination, but normalize arguments.""" args = [normalize(arg) for arg in expr.args] return type(expr)(expr.func, *args, start=expr.start, end=expr.end) @normalize.implementation(for_type=ast.VariadicExpression) def normalize(expr): """Pass through n-ary expressions, and eliminate empty branches. Variadic and binary expressions recursively visit all their children. If all children are eliminated then the parent expression is also eliminated: (& [removed] [removed]) => [removed] If only one child is left, it is promoted to replace the parent node: (& True) => True """ children = [] for child in expr.children: branch = normalize(child) if branch is None: continue if type(branch) is type(expr): children.extend(branch.children) else: children.append(branch) if len(children) == 0: return None if len(children) == 1: return children[0] return type(expr)(*children, start=children[0].start, end=children[-1].end) efilter-1-1.5/efilter/transforms/validate.py0000664066434000116100000000715612713157120021212 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ EFILTER query type inference. """ __author__ = "Adam Sindelar " from efilter import dispatch from efilter import errors from efilter import ast from efilter import protocol from efilter import query as q from efilter.protocols import boolean from efilter.transforms import infer_type @dispatch.multimethod def validate(expr, scope=None): """Use infer_type to get actual types for 'expr' and validate sanity.""" _ = expr, scope raise NotImplementedError() @validate.implementation(for_type=q.Query) def validate(query, scope=None): try: return validate(query.root, scope) except errors.EfilterError as error: error.query = query.source raise @validate.implementation(for_type=ast.ValueExpression) def validate(expr, scope): _ = expr, scope return True @validate.implementation(for_type=ast.IfElse) def validate(expr, scope): # Make sure there's an ELSE block. if expr.default() is None: raise errors.EfilterLogicError( root=expr, message="Else blocks are required in EFILTER.") # Make sure conditions evaluate to IBoolean. for condition, _ in expr.conditions(): t = infer_type.infer_type(condition, scope) if not protocol.isa(t, boolean.IBoolean): raise errors.EfilterTypeError(root=expr, actual=t, expected=boolean.IBoolean) @validate.implementation(for_type=ast.Complement) def validate(expr, scope): t = infer_type.infer_type(expr.value, scope) if not protocol.isa(t, boolean.IBoolean): raise errors.EfilterTypeError(root=expr, actual=t, expected=boolean.IBoolean) return True @validate.implementation(for_type=ast.BinaryExpression) def validate(expr, scope): lhs_type = infer_type.infer_type(expr.lhs, scope) if not (lhs_type is protocol.AnyType or protocol.isa(lhs_type, expr.type_signature[0])): raise errors.EfilterTypeError(root=expr.lhs, expected=expr.type_signature[0], actual=lhs_type) rhs_type = infer_type.infer_type(expr.rhs, scope) if not (lhs_type is protocol.AnyType or protocol.isa(rhs_type, expr.type_signature[1])): raise errors.EfilterTypeError(root=expr.rhs, expected=expr.type_signature[1], actual=rhs_type) return True @validate.implementation(for_type=ast.VariadicExpression) def validate(expr, scope): for subexpr in expr.children: validate(subexpr, scope) t = infer_type.infer_type(subexpr, scope) if not (t is protocol.AnyType or protocol.isa(t, expr.type_signature)): raise errors.EfilterTypeError(root=subexpr, expected=expr.type_signature, actual=t) return True efilter-1-1.5/efilter/transforms/aslisp.py0000664066434000116100000000312012713157120020677 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ EFILTER lisp syntax output. """ __author__ = "Adam Sindelar " import six from efilter import dispatch from efilter import ast from efilter import syntax from efilter import query as q from efilter.parsers import lisp EXPRESSIONS = dict((v, k) for k, v in six.iteritems(lisp.EXPRESSIONS)) @dispatch.multimethod def aslisp(expr): """Produces equivalent lisp output to the AST.""" _ = expr raise NotImplementedError() syntax.Syntax.register_formatter(shorthand="lisp", formatter=aslisp) @aslisp.implementation(for_type=ast.Expression) def aslisp(expr): expr_name = EXPRESSIONS[type(expr)] return tuple([expr_name] + [aslisp(child) for child in expr.children]) @aslisp.implementation(for_type=ast.Literal) def aslisp(expr): return expr.value @aslisp.implementation(for_type=ast.Var) def aslisp(expr): return ("var", expr.value) @aslisp.implementation(for_type=q.Query) def aslisp(query): return aslisp(query.root) efilter-1-1.5/efilter/transforms/infer_type.py0000664066434000116100000001261012713157120021554 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ EFILTER query type inference. """ __author__ = "Adam Sindelar " from efilter import ast from efilter import dispatch from efilter import errors from efilter import protocol from efilter import query as q from efilter import scope as s from efilter.stdlib import core as std_core from efilter.protocols import applicative from efilter.protocols import associative from efilter.protocols import structured @dispatch.multimethod def infer_type(expr, scope=None): """Determine the return type of 'expr'. If 'expr' is evaluated with solve, what will be the type of the result? This employes two strategies to determine the types: 1) Some expression types have a return signature that never changes. For example, intersection (AND) or unions (OR) always return a boolean. 2) For types that are dependent on values (such as variables or user functions), IReflective.reflect is run on the 'scope' argument. Arguments: expr: The expression or query to infer return type of. scope (OPTIONAL): An instance of ScopeStack. Returns: A type, if known. On failure or in undecidable cases, returns AnyType. """ _ = expr, scope raise NotImplementedError() @infer_type.implementation(for_type=q.Query) def infer_type(query, scope=None): # Always include stdcore at the top level. if scope: scope = s.ScopeStack(std_core.MODULE, scope) else: scope = s.ScopeStack(std_core.MODULE) try: return infer_type(query.root, scope) except errors.EfilterError as error: error.query = query.source raise @infer_type.implementation(for_type=ast.Literal) def infer_type(expr, scope): _ = scope return type(expr.value) @infer_type.implementation(for_type=ast.Var) def infer_type(expr, scope): if not isinstance(scope, s.ScopeStack): scope = s.ScopeStack(scope) return scope.reflect(expr.value) or protocol.AnyType @infer_type.implementation(for_type=ast.Complement) def infer_type(expr, scope): _ = expr, scope return bool @infer_type.implementation(for_type=ast.IsInstance) def infer_type(expr, scope): _ = expr, scope return bool @infer_type.implementation(for_type=ast.BinaryExpression) def infer_type(expr, scope): _ = scope return expr.return_signature @infer_type.implementation(for_type=ast.Select) def infer_type(expr, scope): """Try to infer the type of x[y] if y is a known value (literal).""" # Do we know what the key even is? if isinstance(expr.key, ast.Literal): key = expr.key.value else: return protocol.AnyType container_type = infer_type(expr.value, scope) try: # Associative types are not subject to scoping rules so we can just # reflect using IAssociative. return associative.reflect(container_type, key) or protocol.AnyType except NotImplementedError: return protocol.AnyType @infer_type.implementation(for_type=ast.Resolve) def infer_type(expr, scope): """Try to infer the type of x.y if y is a known value (literal).""" # Do we know what the member is? if isinstance(expr.member, ast.Literal): member = expr.member.value else: return protocol.AnyType container_type = infer_type(expr.obj, scope) try: # We are not using lexical scope here on purpose - we want to see what # the type of the member is only on the container_type. return structured.reflect(container_type, member) or protocol.AnyType except NotImplementedError: return protocol.AnyType @infer_type.implementation(for_type=ast.VariadicExpression) def infer_type(expr, scope): _ = scope return expr.return_signature @infer_type.implementation(for_type=ast.Apply) def infer_type(expr, scope): func_type = infer_type(expr.func, scope) try: return applicative.reflect_return(func_type) or protocol.AnyType except NotImplementedError: return protocol.AnyType @infer_type.implementation(for_type=ast.Repeat) def infer_type(expr, scope): """Check the type of the repeated value (all members have the same type.)""" return infer_type(expr.children[0], scope) @infer_type.implementation(for_type=ast.Map) def infer_type(expr, scope): t = infer_type(expr.context, scope) return infer_type(expr.expression, s.ScopeStack(scope, t)) @infer_type.implementation(for_type=ast.Filter) def infer_type(expr, scope): return infer_type(expr.lhs, scope) @infer_type.implementation(for_type=ast.Sort) def infer_type(expr, scope): return infer_type(expr.lhs, scope) @infer_type.implementation(for_type=ast.Any) def infer_type(expr, scope): _ = expr, scope return bool @infer_type.implementation(for_type=ast.Each) def infer_type(expr, scope): _ = expr, scope return bool efilter-1-1.5/efilter/transforms/asdottysql.py0000664066434000116100000001634212713157120021625 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ EFILTER DottySQL syntax output. """ __author__ = "Adam Sindelar " import six from efilter import dispatch from efilter import ast from efilter import syntax from efilter import query as q from efilter.parsers.dottysql import grammar BUILTINS = dict((v, k) for k, v in six.iteritems(grammar.BUILTINS)) def __expression_precedence(expr): operator = grammar.OPERATORS.by_handler.get(type(expr)) if operator: return operator.precedence, operator.assoc return None, None @dispatch.multimethod def asdottysql(expr): """Produces equivalent DottySQL output to the AST. This class follows the visitor pattern. See documentation on VisitorEngine. """ _ = expr raise NotImplementedError() @asdottysql.implementation(for_type=q.Query) def asdottysql(query): return asdottysql(query.root) @asdottysql.implementation(for_types=(ast.Within, ast.Cast, ast.Reducer)) def asdottysql_builtin(expr): if not type(expr) in BUILTINS: return "" body = ", ".join([asdottysql(x) for x in expr.children]) return "%s(%s)" % (BUILTINS[type(expr)], body) @asdottysql.implementation(for_type=ast.Map) def asdottysql_map(expr): lhs = asdottysql(expr.lhs) rhs = asdottysql(expr.rhs) if (isinstance(expr.lhs, (ast.Map, ast.Var)) and isinstance(expr.rhs, (ast.Map, ast.Var))): return "%s.%s" % (lhs, rhs) return "map(%s, %s)" % (lhs, rhs) @asdottysql.implementation(for_type=ast.Let) def asdottysql_let(expr): if not isinstance(expr.lhs, ast.Bind): return "" pairs = [] for pair in expr.lhs.children: if not isinstance(pair.lhs, ast.Literal): return "" pairs.append("%s = %s" % (pair.lhs.value, asdottysql(pair.rhs))) return "let(%s) %s" % (", ".join(pairs), asdottysql(expr.rhs)) @asdottysql.implementation(for_types=(ast.NumericExpression, ast.Relation, ast.LogicalOperation)) def asdottysql_operator(expr): operator = grammar.OPERATORS.by_handler[type(expr)] children = [] for child in expr.children: precedence, _ = __expression_precedence(child) if precedence is not None and precedence < operator.precedence: children.append("(%s)" % asdottysql(child)) else: children.append(asdottysql(child)) separator = " %s " % operator.name return separator.join(children) def _format_binary(lhs, rhs, operator, lspace=" ", rspace=" "): left = asdottysql(lhs) right = asdottysql(rhs) lhs_precedence, lassoc = __expression_precedence(lhs) if lassoc == "left" and lhs_precedence is not None: lhs_precedence += 1 if lhs_precedence is not None and lhs_precedence < operator.precedence: left = "(%s)" % left rhs_precedence, rassoc = __expression_precedence(rhs) if rassoc == "right" and rhs_precedence is not None: rhs_precedence += 1 if rhs_precedence is not None and rhs_precedence < operator.precedence: right = "(%s)" % right return "".join((left, lspace, operator.name, rspace, right)) @asdottysql.implementation(for_type=ast.Complement) def asdottysql(expr): if (isinstance(expr.value, ast.Equivalence) and len(expr.value.children) == 2): return _format_binary(expr.value.children[0], expr.value.children[1], grammar.OPERATORS.by_name["!="]) if isinstance(expr.value, ast.Membership): return _format_binary(expr.value.children[0], expr.value.children[1], grammar.OPERATORS.by_name["not in"]) child_precedence, assoc = __expression_precedence(expr.value) if assoc == "left" and child_precedence: child_precedence += 1 if (child_precedence is not None and child_precedence < __expression_precedence(expr)[0]): return "not (%s)" % asdottysql(expr.value) return "not %s" % asdottysql(expr.value) @asdottysql.implementation(for_type=ast.Bind) def asdottysql(expr): return "bind(%s)" % ", ".join(asdottysql(x) for x in expr.children) @asdottysql.implementation(for_type=ast.Pair) def asdottysql(expr): return _format_binary(expr.lhs, expr.rhs, grammar.OPERATORS.by_name[":"], lspace="") @asdottysql.implementation(for_types=(ast.IsInstance, ast.RegexFilter, ast.Membership)) def asdottysql(expr): return _format_binary(expr.lhs, expr.rhs, grammar.OPERATORS.by_handler[type(expr)]) @asdottysql.implementation(for_type=ast.Apply) def asdottysql(expr): arguments = iter(expr.children) func = next(arguments) return "%s(%s)" % (asdottysql(func), ", ".join([asdottysql(arg) for arg in arguments])) @asdottysql.implementation(for_type=ast.Select) def asdottysql(expr): arguments = iter(expr.children) source = asdottysql(next(arguments)) if not isinstance(expr.lhs, (ast.ValueExpression, ast.Repeat, ast.Tuple, ast.Map, ast.Select, ast.Apply, ast.Bind)): source = "(%s)" % source return "%s[%s]" % (source, ", ".join([asdottysql(arg) for arg in arguments])) @asdottysql.implementation(for_type=ast.Resolve) def asdottysql(expr): if not isinstance(expr.rhs, ast.Literal): return "" return _format_binary(expr.lhs, ast.Var(expr.rhs.value), grammar.OPERATORS.by_handler[ast.Resolve], lspace="", rspace="") @asdottysql.implementation(for_type=ast.Repeat) def asdottysql(expr): return "(%s)" % ", ".join(asdottysql(x) for x in expr.children) @asdottysql.implementation(for_type=ast.Tuple) def asdottysql(expr): return "[%s]" % ", ".join(asdottysql(x) for x in expr.children) @asdottysql.implementation(for_type=ast.IfElse) def asdottysql(expr): branches = ["if %s then %s" % (asdottysql(c), asdottysql(v)) for c, v in expr.conditions()] if_ = " else ".join(branches) else_ = expr.default() if not else_ or else_ == ast.Literal(None): return if_ return "%s else %s" % (if_, asdottysql(else_)) @asdottysql.implementation(for_type=ast.Literal) def asdottysql(expr): return repr(expr.value) @asdottysql.implementation(for_type=ast.Var) def asdottysql(expr): return expr.value syntax.Syntax.register_formatter(shorthand="dottysql", formatter=asdottysql) efilter-1-1.5/efilter/transforms/__init__.py0000640066434000116100000000016712746111034021145 0ustar adamsheng00000000000000"""EFILTER Forensic Query Language""" from efilter.transforms import asdottysql from efilter.transforms import aslisp efilter-1-1.5/efilter/transforms/solve.py0000640066434000116100000007254312762014245020551 0ustar adamsheng00000000000000# EFILTER Forensic Query Language # # Copyright 2015 Google Inc. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """ EFILTER individual object filter and matcher. """ __author__ = "Adam Sindelar " # pylint: disable=function-redefined import collections import re import six from efilter import ast from efilter import dispatch from efilter import errors from efilter import protocol from efilter import query as q from efilter import scope from efilter.ext import row_tuple from efilter.protocols import applicative from efilter.protocols import associative from efilter.protocols import boolean from efilter.protocols import counted from efilter.protocols import number from efilter.protocols import ordered from efilter.protocols import reducer from efilter.protocols import repeated from efilter.protocols import structured from efilter.stdlib import core as std_core Result = collections.namedtuple("Result", ["value", "branch"]) @dispatch.multimethod def solve(query, vars): """Evaluate the 'query' using variables in 'vars'. Canonical implementation of the EFILTER AST's actual behavior. This may not be the most optimal way of executing the query, but it is guaranteed to have full coverage without falling through to some other implementation. Arguments: query: The instance of Query to evaluate against data in vars. vars: An object implementing IStructured (like a dict) containing pairs of variable -> value. Best thing to pass is an instance of efilter.scope.ScopeStack, which is what the solver will convert 'vars' to anyway, eventually. Returns: Instance of Result, with members set as follows: value: The result of evaluation. The type of the result can be determined by calling infer_type on 'query'. branch: An instance of Expression, representing a subtree of 'query' that was that last branch evaluated before a match was produced. This only applies to simple queries using AND/OR and NOT operators, which evaluate to booleans and can terminate early. For other queries this will be set to None. """ _ = query, vars raise NotImplementedError() def __solve_for_repeated(expr, vars): """Helper: solve 'expr' always returning an IRepeated. If the result of solving 'expr' is a list or a tuple of IStructured objects then treat is as a repeated value of IStructured objects because that's what the called meant to do. This is a convenience helper so users of the API don't have to create IRepeated objects. If the result of solving 'expr' is a scalar then return it as a repeated value of one element. Arguments: expr: Expression to solve. vars: The scope. Returns: IRepeated result of solving 'expr'. A booelan to indicate whether the original was repeating. """ var = solve(expr, vars).value if (var and isinstance(var, (tuple, list)) and protocol.implements(var[0], structured.IStructured)): return repeated.meld(*var), False return var, repeated.isrepeating(var) def __solve_for_scalar(expr, vars): """Helper: solve 'expr' always returning a scalar (not IRepeated). If the output of 'expr' is a single value or a single RowTuple with a single column then return the value in that column. Otherwise raise. Arguments: expr: Expression to solve. vars: The scope. Returns: A scalar value (not an IRepeated). Raises: EfilterTypeError if it cannot get a scalar. """ var = solve(expr, vars).value try: scalar = repeated.getvalue(var) except TypeError: raise errors.EfilterTypeError( root=expr, query=expr.source, message="Wasn't expecting more than one value here. Got %r." % (var,)) if isinstance(scalar, row_tuple.RowTuple): try: return scalar.get_singleton() except ValueError: raise errors.EfilterTypeError( root=expr, query=expr.source, message="Was expecting a scalar value here. Got %r." % (scalar,)) else: return scalar def __solve_and_destructure_repeated(expr, vars): """Helper: solve 'expr' always returning a list of scalars. If the output of 'expr' is one or more row tuples with only a single column then return a repeated value of values in that column. If there are more than one column per row then raise. This returns a list because there's no point in wrapping the scalars in a repeated value for use internal to the implementing solver. Returns: Two values: - An iterator (not an IRepeated!) of scalars. - A boolean to indicate whether the original value was repeating. Raises: EfilterTypeError if the values don't conform. """ iterable, isrepeating = __solve_for_repeated(expr, vars) if iterable is None: return (), isrepeating if not isrepeating: return [iterable], False values = iter(iterable) try: value = next(values) except StopIteration: return (), True if not isinstance(value, row_tuple.RowTuple): result = [value] # We skip type checking the remaining values because it'd be slow. result.extend(values) return result, True try: result = [value.get_singleton()] for value in values: result.append(value.get_singleton()) return result, True except ValueError: raise errors.EfilterTypeError( root=expr, query=expr.source, message="Was expecting exactly one column in %r." % (value,)) def __nest_scope(expr, outer, inner): try: return scope.ScopeStack(outer, inner) except TypeError: if protocol.implements(inner, applicative.IApplicative): raise errors.EfilterTypeError( root=expr, query=expr.source, message="Attempting to use a function %r as an object." % inner) raise errors.EfilterTypeError( root=expr, query=expr.source, message="Attempting to use %r as an object (IStructured)." % inner) @solve.implementation(for_type=q.Query) def solve_query(query, vars): # Standard library must always be included. Others are optional, and the # caller can add them to vars using ScopeStack. vars = scope.ScopeStack(std_core.MODULE, vars) try: return solve(query.root, vars) except errors.EfilterError as error: if not error.query: error.query = query.source raise @solve.implementation(for_type=ast.Literal) def solve_literal(expr, vars): """Returns just the value of literal.""" _ = vars return Result(expr.value, ()) @solve.implementation(for_type=ast.Var) def solve_var(expr, vars): """Returns the value of the var named in the expression.""" try: return Result(structured.resolve(vars, expr.value), ()) except (KeyError, AttributeError) as e: # Raise a better exception for accessing a non-existent member. raise errors.EfilterKeyError(root=expr, key=expr.value, message=e, query=expr.source) except (TypeError, ValueError) as e: # Raise a better exception for what is probably a null pointer error. if vars.locals is None: raise errors.EfilterNoneError( root=expr, query=expr.source, message="Trying to access member %r of a null." % expr.value) else: raise errors.EfilterTypeError( root=expr, query=expr.source, message="%r (vars: %r)" % (e, vars)) except NotImplementedError as e: raise errors.EfilterError( root=expr, query=expr.source, message="Trying to access member %r of an instance of %r." % (expr.value, type(vars))) @solve.implementation(for_type=ast.Select) def solve_select(expr, vars): """Use IAssociative.select to get key (rhs) from the data (lhs). This operation supports both scalars and repeated values on the LHS - selecting from a repeated value implies a map-like operation and returns a new repeated value. """ data, _ = __solve_for_repeated(expr.lhs, vars) key = solve(expr.rhs, vars).value try: results = [associative.select(d, key) for d in repeated.getvalues(data)] except (KeyError, AttributeError): # Raise a better exception for accessing a non-existent key. raise errors.EfilterKeyError(root=expr, key=key, query=expr.source) except (TypeError, ValueError): # Raise a better exception for what is probably a null pointer error. if vars.locals is None: raise errors.EfilterNoneError( root=expr, query=expr.source, message="Cannot select key %r from a null." % key) else: raise except NotImplementedError: raise errors.EfilterError( root=expr, query=expr.source, message="Cannot select keys from a non-associative value.") return Result(repeated.meld(*results), ()) @solve.implementation(for_type=ast.Resolve) def solve_resolve(expr, vars): """Use IStructured.resolve to get member (rhs) from the object (lhs). This operation supports both scalars and repeated values on the LHS - resolving from a repeated value implies a map-like operation and returns a new repeated values. """ objs, _ = __solve_for_repeated(expr.lhs, vars) member = solve(expr.rhs, vars).value try: results = [structured.resolve(o, member) for o in repeated.getvalues(objs)] except (KeyError, AttributeError): # Raise a better exception for the non-existent member. raise errors.EfilterKeyError(root=expr.rhs, key=member, query=expr.source) except (TypeError, ValueError): # Is this a null object error? if vars.locals is None: raise errors.EfilterNoneError( root=expr, query=expr.source, message="Cannot resolve member %r from a null." % member) else: raise except NotImplementedError: raise errors.EfilterError( root=expr, query=expr.source, message="Cannot resolve members from a non-structured value.") return Result(repeated.meld(*results), ()) @solve.implementation(for_type=ast.Apply) def solve_apply(expr, vars): """Returns the result of applying function (lhs) to its arguments (rest). We use IApplicative to apply the function, because that gives the host application an opportunity to compare the function being called against a whitelist. EFILTER will never directly call a function that wasn't provided through a protocol implementation. """ func = __solve_for_scalar(expr.func, vars) args = [] kwargs = {} for arg in expr.args: if isinstance(arg, ast.Pair): if not isinstance(arg.lhs, ast.Var): raise errors.EfilterError( root=arg.lhs, message="Invalid argument name.") kwargs[arg.key.value] = solve(arg.value, vars).value else: args.append(solve(arg, vars).value) result = applicative.apply(func, args, kwargs) return Result(result, ()) @solve.implementation(for_type=ast.Bind) def solve_bind(expr, vars): """Build a RowTuple from key/value pairs under the bind. The Bind subtree is arranged as follows: Bind | First KV Pair | | First Key Expression | | First Value Expression | Second KV Pair | | Second Key Expression | | Second Value Expression Etc... As we evaluate the subtree, each subsequent KV pair is evaluated with the all previous bingings already in scope. For example: bind(x: 5, y: x + 5) # Will bind y = 10 because x is already available. """ value_expressions = [] keys = [] for pair in expr.children: keys.append(solve(pair.key, vars).value) value_expressions.append(pair.value) result = row_tuple.RowTuple(ordered_columns=keys) intermediate_scope = scope.ScopeStack(vars, result) for idx, value_expression in enumerate(value_expressions): value = solve(value_expression, intermediate_scope).value # Update the intermediate bindings so as to make earlier bindings # already available to the next child-expression. result[keys[idx]] = value return Result(result, ()) @solve.implementation(for_type=ast.Repeat) def solve_repeat(expr, vars): """Build a repeated value from subexpressions.""" try: result = repeated.meld(*[solve(x, vars).value for x in expr.children]) return Result(result, ()) except TypeError: raise errors.EfilterTypeError( root=expr, query=expr.source, message="All values in a repeated value must be of the same type.") @solve.implementation(for_type=ast.Tuple) def solve_tuple(expr, vars): """Build a tuple from subexpressions.""" result = tuple(solve(x, vars).value for x in expr.children) return Result(result, ()) @solve.implementation(for_type=ast.IfElse) def solve_ifelse(expr, vars): """Evaluate conditions and return the one that matches.""" for condition, result in expr.conditions(): if boolean.asbool(solve(condition, vars).value): return solve(result, vars) return solve(expr.default(), vars) @solve.implementation(for_type=ast.Map) def solve_map(expr, vars): """Solves the map-form, by recursively calling its RHS with new vars. let-forms are binary expressions. The LHS should evaluate to an IAssociative that can be used as new vars with which to solve a new query, of which the RHS is the root. In most cases, the LHS will be a Var (var). Typically, map-forms result from the dotty "dot" (.) operator. For example, the query "User.name" will translate to a map-form with the var "User" on LHS and a var to "name" on the RHS. With top-level vars being something like {"User": {"name": "Bob"}}, the Var on the LHS will evaluate to {"name": "Bob"}, which subdict will then be used on the RHS as new vars, and that whole form will evaluate to "Bob". """ lhs_values, _ = __solve_for_repeated(expr.lhs, vars) def lazy_map(): try: for lhs_value in repeated.getvalues(lhs_values): yield solve(expr.rhs, __nest_scope(expr.lhs, vars, lhs_value)).value except errors.EfilterNoneError as error: error.root = expr raise return Result(repeated.lazy(lazy_map), ()) @solve.implementation(for_type=ast.Let) def solve_let(expr, vars): """Solves a let-form by calling RHS with nested scope.""" lhs_value = solve(expr.lhs, vars).value if not isinstance(lhs_value, structured.IStructured): raise errors.EfilterTypeError( root=expr.lhs, query=expr.original, message="The LHS of 'let' must evaluate to an IStructured. Got %r." % (lhs_value,)) return solve(expr.rhs, __nest_scope(expr.lhs, vars, lhs_value)) @solve.implementation(for_type=ast.Filter) def solve_filter(expr, vars): """Filter values on the LHS by evaluating RHS with each value. Returns any LHS values for which RHS evaluates to a true value. """ lhs_values, _ = __solve_for_repeated(expr.lhs, vars) def lazy_filter(): for lhs_value in repeated.getvalues(lhs_values): if solve(expr.rhs, __nest_scope(expr.lhs, vars, lhs_value)).value: yield lhs_value return Result(repeated.lazy(lazy_filter), ()) @solve.implementation(for_type=ast.Reducer) def solve_reducer(expr, vars): def _mapper(rows): mapper = expr.mapper for row in rows: yield solve(mapper, __nest_scope(expr.lhs, vars, row)).value delegate = solve(expr.reducer, vars).value return Result(reducer.Map(delegate=delegate, mapper=_mapper), ()) @solve.implementation(for_type=ast.Group) def solve_group(expr, vars): rows, _ = __solve_for_repeated(expr.lhs, vars) reducers = [solve(child, vars).value for child in expr.reducers] r = reducer.Compose(*reducers) intermediates = {} # To avoid loading too much data into memory we segment the input rows. for chunk in reducer.generate_chunks(rows, reducer.DEFAULT_CHUNK_SIZE): # Group rows based on the output of the grouper expression. groups = {} for value in chunk: key = solve(expr.grouper, __nest_scope(expr.lhs, vars, value)).value grouped_values = groups.setdefault(key, []) grouped_values.append(value) # Fold each group in this chunk, merge with previous intermediate, if # any. for key, group in six.iteritems(groups): intermediate = reducer.fold(r, group) previous = intermediates.get(key) if previous: intermediate = reducer.merge(r, intermediate, previous) intermediates[key] = intermediate # This could equally well return a lazy repeated value to avoid finalizing # right away. The assumption here is that finalize is cheap, at least # compared to fold and merge, which already have to run eagerly. Using a # lazy value here would keep the intermediates around in memory, and just # doesn't seem worth it. results = [reducer.finalize(r, intermediate) for intermediate in six.itervalues(intermediates)] return Result(repeated.meld(*results), ()) @solve.implementation(for_type=ast.Sort) def solve_sort(expr, vars): """Sort values on the LHS by the value they yield when passed to RHS.""" lhs_values = repeated.getvalues(__solve_for_repeated(expr.lhs, vars)[0]) sort_expression = expr.rhs def _key_func(x): return solve(sort_expression, __nest_scope(expr.lhs, vars, x)).value results = ordered.ordered(lhs_values, key_func=_key_func) return Result(repeated.meld(*results), ()) @solve.implementation(for_type=ast.Each) def solve_each(expr, vars): """Return True if RHS evaluates to a true value with each state of LHS. If LHS evaluates to a normal IAssociative object then this is the same as a regular let-form, except the return value is always a boolean. If LHS evaluates to a repeared var (see efilter.protocols.repeated) of IAssociative objects then RHS will be evaluated with each state and True will be returned only if each result is true. """ lhs_values, _ = __solve_for_repeated(expr.lhs, vars) for lhs_value in repeated.getvalues(lhs_values): result = solve(expr.rhs, __nest_scope(expr.lhs, vars, lhs_value)) if not result.value: # Each is required to return an actual boolean. return result._replace(value=False) return Result(True, ()) @solve.implementation(for_type=ast.Any) def solve_any(expr, vars): """Same as Each, except returning True on first true result at LHS.""" lhs_values, _ = __solve_for_repeated(expr.lhs, vars) try: rhs = expr.rhs except IndexError: # Child 1 is out of range. There is no condition on the RHS. # Just see if we have anything on the LHS. return Result(len(repeated.getvalues(lhs_values)) > 0, ()) result = Result(False, ()) for lhs_value in repeated.getvalues(lhs_values): result = solve(rhs, __nest_scope(expr.lhs, vars, lhs_value)) if result.value: # Any is required to return an actual boolean. return result._replace(value=True) return result @solve.implementation(for_type=ast.Cast) def solve_cast(expr, vars): """Get cast LHS to RHS.""" lhs = solve(expr.lhs, vars).value t = solve(expr.rhs, vars).value if t is None: raise errors.EfilterTypeError( root=expr, query=expr.source, message="Cannot find type named %r." % expr.rhs.value) if not isinstance(t, type): raise errors.EfilterTypeError( root=expr.rhs, query=expr.source, message="%r is not a type and cannot be used with 'cast'." % (t,)) try: cast_value = t(lhs) except TypeError: raise errors.EfilterTypeError( root=expr, query=expr.source, message="Invalid cast %s -> %s." % (type(lhs), t)) return Result(cast_value, ()) @solve.implementation(for_type=ast.IsInstance) def solve_isinstance(expr, vars): """Typecheck whether LHS is type on the RHS.""" lhs = solve(expr.lhs, vars) try: t = solve(expr.rhs, vars).value except errors.EfilterKeyError: t = None if t is None: raise errors.EfilterTypeError( root=expr.rhs, query=expr.source, message="Cannot find type named %r." % expr.rhs.value) if not isinstance(t, type): raise errors.EfilterTypeError( root=expr.rhs, query=expr.source, message="%r is not a type and cannot be used with 'isa'." % (t,)) return Result(protocol.implements(lhs.value, t), ()) @solve.implementation(for_type=ast.Complement) def solve_complement(expr, vars): result = solve(expr.value, vars) return result._replace(value=not result.value) @solve.implementation(for_type=ast.Intersection) def solve_intersection(expr, vars): result = Result(False, ()) for child in expr.children: result = solve(child, vars) if not result.value: # Intersections don't preserve the last value the way Unions do. return result._replace(value=False) return result @solve.implementation(for_type=ast.Union) def solve_union(expr, vars): for child in expr.children: result = solve(child, vars) if result.value: # Don't replace a matched child branch. Also, preserve the actual # value of the last subexpression (as opposed to just returning a # boolean). if result.branch: return result return result._replace(branch=child) return Result(False, ()) @solve.implementation(for_type=ast.Pair) def solve_pair(expr, vars): return Result((solve(expr.lhs, vars).value, solve(expr.rhs, vars).value), ()) @solve.implementation(for_type=ast.Sum) def solve_sum(expr, vars): total = 0 for child in expr.children: val = __solve_for_scalar(child, vars) try: total += val except TypeError: raise errors.EfilterTypeError(expected=number.INumber, actual=type(val), root=child, query=expr.source) return Result(total, ()) @solve.implementation(for_type=ast.Difference) def solve_difference(expr, vars): children = enumerate(expr.children) _, first_child = next(children) difference = __solve_for_scalar(first_child, vars) for idx, child in children: val = __solve_for_scalar(child, vars) try: difference -= val except TypeError: # The type what caused that there error. if idx == 1: actual_t = type(difference) else: actual_t = type(val) raise errors.EfilterTypeError(expected=number.INumber, actual=actual_t, root=expr.children[idx - 1], query=expr.source) return Result(difference, ()) @solve.implementation(for_type=ast.Product) def solve_product(expr, vars): product = 1 for child in expr.children: val = __solve_for_scalar(child, vars) try: product *= val except TypeError: raise errors.EfilterTypeError(expected=number.INumber, actual=type(val), root=child, query=expr.source) return Result(product, ()) @solve.implementation(for_type=ast.Quotient) def solve_quotient(expr, vars): children = enumerate(expr.children) _, first_child = next(children) quotient = __solve_for_scalar(first_child, vars) for idx, child in children: val = __solve_for_scalar(child, vars) try: quotient /= val except TypeError: # The type what caused that there error. if idx == 1: actual_t = type(quotient) else: actual_t = type(val) raise errors.EfilterTypeError(expected=number.INumber, actual=actual_t, root=expr.children[idx - 1], query=expr.source) return Result(quotient, ()) @solve.implementation(for_type=ast.Equivalence) def solve_equivalence(expr, vars): children = iter(expr.children) first_value = __solve_for_scalar(next(children), vars) for child in children: value = __solve_for_scalar(child, vars) if not value == first_value: return Result(False, ()) return Result(True, ()) @solve.implementation(for_type=ast.Membership) def solve_membership(expr, vars): # There is an expectation that "foo" in "foobar" will be true, and, # simultaneously, that "foo" in ["foobar"] will be false. This is how the # analogous operator works in Python, among other languages. Where this # mental model breaks down is around repeated values, because, in EFILTER, # there is no difference between a tuple of one value and the one value, # so that "foo" in ("foobar") is true, while "foo" in ("foobar", "bar") is # false and "foo" in ("foo", "bar") is again true. These semantics are a # little unfortunate, and it may be that, in the future, the in operator # is disallowed on repeated values to prevent ambiguity. needle = solve(expr.element, vars).value if repeated.isrepeating(needle): raise errors.EfilterError( root=expr.element, query=expr.source, message=("More than one value not allowed in the needle. " "Got %d values.") % counted.count(needle)) # We need to fall through to __solve_and_destructure_repeated to handle # row tuples correctly. haystack, isrepeating = __solve_and_destructure_repeated(expr.set, vars) # For non-repeated values just use the first (singleton) value. if not isrepeating: for straw in haystack: haystack = straw break if isinstance(haystack, six.string_types): return Result(needle in haystack, ()) # Repeated values of more than one value and collections behave the same. # There are no proper sets in EFILTER so O(N) is what we get. if isrepeating or isinstance(haystack, (tuple, list)): for straw in haystack: # We're all farmers here. if straw == needle: return Result(True, ()) return Result(False, ()) # If haystack is not a repeating value, but it is iterable then it must # have originated from outside EFILTER. Lets try to do the right thing and # delegate to Python. for straw in haystack: return Result(needle in straw, None) return Result(False, ()) @solve.implementation(for_type=ast.RegexFilter) def solve_regexfilter(expr, vars): string = __solve_for_scalar(expr.string, vars) pattern = __solve_for_scalar(expr.regex, vars) return Result(re.compile(pattern).search(six.text_type(string)), ()) @solve.implementation(for_type=ast.StrictOrderedSet) def solve_strictorderedset(expr, vars): iterator = iter(expr.children) min_ = __solve_for_scalar(next(iterator), vars) if min_ is None: return Result(False, ()) for child in iterator: val = __solve_for_scalar(child, vars) try: if not min_ > val or val is None: return Result(False, ()) except TypeError: raise errors.EfilterTypeError(expected=type(min_), actual=type(val), root=child, query=expr.source,) min_ = val return Result(True, ()) @solve.implementation(for_type=ast.PartialOrderedSet) def solve_partialorderedset(expr, vars): iterator = iter(expr.children) min_ = __solve_for_scalar(next(iterator), vars) if min_ is None: return Result(False, ()) for child in iterator: val = __solve_for_scalar(child, vars) try: if min_ < val or val is None: return Result(False, ()) except TypeError: raise errors.EfilterTypeError(expected=type(min_), actual=type(val), root=child, query=expr.source) min_ = val return Result(True, ()) efilter-1-1.5/version.txt0000640066434000116100000000000512762014475015432 0ustar adamsheng000000000000001!1.5efilter-1-1.5/README.md0000664066434000116100000000636512713157120014477 0ustar adamsheng00000000000000# EFILTER Query Language EFILTER is a general purpose query language designed to be embedded in Python applications and libraries. It supports SQL-like syntax to filter your application's data and provides a convenient way to directly search through the objects your applications manages. A second use case for EFILTER is to translate queries from one query language to another, such as from SQL to OpenIOC and so on. A basic SQL-like syntax and a POC lisp implementation are included with the language, and others are relatively simple to add. ## Projects using EFILTER: - [Rekall](https://github.com/google/rekall) ## Quick examples of integration. from efilter import api api.apply("5 + 5") # => 10 # Returns [{"name": "Alice"}, {"name": "Eve"}] api.apply("SELECT name FROM users WHERE age > 10", vars={"users": ({"age": 10, "name": "Bob"}, {"age": 20, "name": "Alice"}, {"age": 30, "name": "Eve"})) ### You can also filter custom objects: # Step 1: have a custom class. class MyUser(object): ... # Step 2: Implement a protocol (like an interface). from efilter.protocols import structured structured.IStructured.implement( for_type=MyUser, implementations: { structured.resolve: lambda user, key: getattr(user, key) } ) # Step 3: EFILTER can now use my class! from efilter import api api.apply("SELECT name FROM users WHERE age > 10 ORDER BY age", vars={"users": [MyUser(...), MyUser(...)]}) ### Don't have SQL injections. EFILTER supports query templates, which can interpolate unescaped strings safely. # Replacements are applied before the query is compiled. search_term = dangerous_user_input["name"] api.apply("SELECT * FROM users WHERE name = ?", vars={"users": [...]}, replacements=[search_term]) # We also support keyword replacements. api.apply("SELECT * FROM users WHERE name = {name}", vars={"users": [...]}, replacements={"name": search_term}) ### Basic IO is supported, including CSV data sets. # Builtin IO functions need to be explicitly enabled. api.apply("SELECT * FROM csv(users.csv) WHERE name = 'Bob'", allow_io=True) ## Language Reference Work in progress. ## Protocol documentation Work in progress. ## Example projects Several sample projects are provided. - examples/star_catalog: filters a large CSV file with nearby star systems - examples/tagging: use a custom query format ## License and Copyright Copyright 2015 Google Inc. All Rights Reserved Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at [http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0). Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ## Contributors [Adam Sindelar](https://github.com/the80srobot) efilter-1-1.5/AUTHORS.txt0000664066434000116100000000011112713157120015065 0ustar adamsheng00000000000000EFILTER Query Language Copyright 2015 Google Inc. All rights reserved. efilter-1-1.5/setup.cfg0000664066434000116100000000042512762014475015041 0ustar adamsheng00000000000000[bdist_rpm] release = 1 packager = Adam Sindelar doc_files = AUTHORS.txt LICENSE.txt README.md build_requires = python-setuptools requires = python-dateutil python-six >= 1.4.0 python-tz [egg_info] tag_build = tag_date = 0 tag_svn_revision = 0 efilter-1-1.5/LICENSE.txt0000664066434000116100000002613612713157120015041 0ustar adamsheng00000000000000 Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. efilter-1-1.5/PKG-INFO0000640066434000116100000000065712762014475014316 0ustar adamsheng00000000000000Metadata-Version: 1.0 Name: efilter Version: 1-1.5 Summary: EFILTER query language Home-page: https://github.com/google/dotty/ Author: Adam Sindelar Author-email: adam.sindelar@gmail.com License: Apache 2.0 Description: EFILTER is a general-purpose destructuring and search language implemented in Python, and suitable for integration with any Python project that requires a search function for some of its data. Platform: UNKNOWN