pax_global_header00006660000000000000000000000064145271127200014513gustar00rootroot0000000000000052 comment=d29730e1cb9daaa66bda63426cdce83505d2c809 vectorscan-5.4.11/000077500000000000000000000000001452711272000137525ustar00rootroot00000000000000vectorscan-5.4.11/.clang-format000066400000000000000000000002141452711272000163220ustar00rootroot00000000000000BasedOnStyle: LLVM IndentWidth: 4 UseTab: false AllowShortIfStatementsOnASingleLine: false IndentCaseLabels: false AccessModifierOffset: -4 vectorscan-5.4.11/.gitignore000066400000000000000000000030261452711272000157430ustar00rootroot00000000000000## ## There are some more .gitignore files in subdirs, but this is the main place ## to add new entries. These are mostly for the common case when ue2 is built ## in place ## # Autogenerated stuff that we don't want to know about .deps autom4te.cache autojunk .dirstamp # Temp and swap files *~ .*.swp .sw? # compiler output and binaries *.a *.o *.lo *.la *.so *.pyc .libs bin # Merge files created by git. *.orig # sigs dir is handled externally signatures # but not the regression tests !tools/hscollider/test_cases/signatures # ignore pcre symlink if it exists pcre # but not pcre subdirs! !pcre/ # ignore boost symlink if it exists include/boost # ignore sqlite3 symlink if it exists sqlite3 # Generated files src/config.h src/config.h.in src/hs_version.h src/parser/Parser.cpp # Generated PCRE files pcre/pcre_chartables.c pcre/pcregrep pcre/pcretest # Autoconf/automake/libtool noise Makefile Makefile.in aclocal.m4 config.cache config.log config.status configure libhs.pc libtool m4/libtool.m4 m4/ltoptions.m4 m4/ltsugar.m4 m4/ltversion.m4 m4/lt~obsolete.m4 src/stamp-h1 # Docs !doc/dev-reference/Makefile # not generated doc/dev-reference/doxygen_sqlite3.db doc/dev-reference/doxygen_xml/ doc/dev-reference/_build/ # Autotools noise in pcre pcre/INSTALL pcre/Makefile pcre/Makefile.in pcre/aclocal.m4 pcre/ar-lib pcre/compile pcre/config.* pcre/configure pcre/depcomp pcre/install-sh pcre/*.pc pcre/libtool pcre/ltmain.sh pcre/missing pcre/pcre-config pcre/pcre.h pcre/pcre_stringpiece.h pcre/pcrecpparg.h pcre/stamp-h1 pcre/test-driver vectorscan-5.4.11/CHANGELOG-vectorscan.md000066400000000000000000000065251452711272000177400ustar00rootroot00000000000000# Vectorscan Change Log This is a list of notable changes to Vectorscan, in reverse chronological order. For Hyperscan Changelog, check CHANGELOG.md ## [5.4.11] 2023-11-19 - Refactor CMake build system to be much more modular. - version in hs.h fell out of sync again #175 - Fix compile failures with recent compilers, namely clang-15 and gcc-13 - Fix clang 15,16 compilation errors on all platforms, refactor CMake build system #181 - Fix signed/unsigned char issue on Arm with Ragel generated code. - Correct set_source_files_properties usage #189 - Fix build failure on Ubuntu 20.04 - Support building on Ubuntu 20.04 #180 - Require pkg-config during Cmake - make pkgconfig a requirement #188 - Fix segfault on Fat runtimes with SVE2 code - Move VERM16 enums to the end of the list #191 - Update README.md, add CHANGELOG-vectorscan.md and Contributors-vectorscan.md files ## [5.4.10] 2023-09-23 - Fix compilation with libcxx 16 by @rschu1ze in #144 - Fix use-of-uninitialized-value due to getData128() by @azat in #148 - Use std::vector instead of boost::container::small_vector under MSan by @azat in #149 - Feature/enable fat runtime arm by @markos in #165 - adding ifndef around HS_PUBLIC_API definition so that vectorscan can be statically linked into another shared library without exporting symbols by @jeffplaisance in #164 - Feature/backport hyperscan 2023 q3 by @markos in #169 - Prepare for 5.4.10 by @markos in #167 ## [5.4.9] 2023-03-23 - Major change: Enable SVE & SVE2 builds and make it a supported architecture! (thanks to @abondarev84) - Fix various clang-related bugs - Fix Aarch64 bug in Parser.rl because of char signedness. Make unsigned char the default in the Parser for all architectures. - Fix Power bug, multiple tests were failing. - C++20 related change, use prefixed assume_aligned to avoid conflict with C++20 std::assume_aligned. ## [5.4.8] 2022-09-13 - CMake: Use non-deprecated method for finding python by @jth in #108 - Optimize vectorscan for aarch64 by using shrn instruction by @danlark1 in #113 - Fixed the PCRE download location by @pareenaverma in #116 - Bugfix/hyperscan backport 202208 by @markos in #118 - VSX optimizations by @markos in #119 - when compiling with mingw64, use __mingw_aligned_malloc() and __mingw_aligned_free() by @liquidaty in #121 - [NEON] simplify/optimize shift/align primitives by @markos in #123 - Merge develop to master by @markos in #124 ## [5.4.7] 2022-05-05 - Fix word boundary assertions under C++20 by @BigRedEye in #90 - Fix all ASAN issues in vectorscan by @danlark1 in #93 - change FAT_RUNTIME to a normal option so it can be set to off by @a16bitsysop in #94 - Optimized and correct version of movemask128 for ARM by @danlark1 in #102 ## [5.4.6] 2022-01-21 - Major refactoring of many engines to use internal SuperVector C++ templates library. Code size reduced to 1/3rd with no loss of performance in most cases. - Microbenchmarking tool added for performance finetuning - Arm Advanced SIMD/NEON fully ported. Initial work on SVE2 for a couple of engines. - Power9 VSX ppc64le fully ported. Initial port needs some optimization. - Clang compiler support added. - Apple M1 support added. - CI added, the following configurations are tested on every PR: gcc-debug, gcc-release, clang-debug, clang-release: Linux Intel: SSE4.2, AVX2, AVX512, FAT Linux Arm Linux Power9 clang-debug, clang-release: MacOS Apple M1 vectorscan-5.4.11/CHANGELOG.md000066400000000000000000000450701452711272000155710ustar00rootroot00000000000000# Hyperscan Change Log This is a list of notable changes to Hyperscan, in reverse chronological order. ## [5.4.2] 2023-04-19 - Roll back bugfix for github issue #350: Besides using scratch for corresponding database, Hyperscan also allows user to use larger scratch allocated for another database. Users can leverage this property to achieve safe scratch usage in multi-database scenarios. Behaviors beyond these are discouraged and results are undefined. - Fix hsdump issue due to invalid nfa type. ## [5.4.1] 2023-02-20 - The Intel Hyperscan team is pleased to provide a bug fix release to our open source library. Intel also maintains an upgraded version available through your Intel sales representative. - Bugfix for issue #184: fix random char value of UTF-8. - Bugfix for issue #291: bypass logical combination flag in hs_expression_info(). - Bugfix for issue #292: fix build error due to libc symbol parsing. - Bugfix for issue #302/304: add empty string check for pure literal API. - Bugfix for issue #303: fix unknown instruction error in pure literal API. - Bugfix for issue #303: avoid memory leak in stream close stage. - Bugfix for issue #305: fix assertion failure in DFA construction. - Bugfix for issue #317: fix aligned allocator segment faults. - Bugfix for issue #350: add quick validity check for scratch. - Bugfix for issue #359: fix glibc-2.34 stack size issue. - Bugfix for issue #360: fix SKIP flag issue in chimera. - Bugfix for issue #362: fix one cotec check corner issue in UTF-8 validation. - Fix other compile issues. ## [5.4.0] 2020-12-31 - Improvement on literal matcher "Fat Teddy" performance, including support for Intel(R) AVX-512 Vector Byte Manipulation Instructions (Intel(R) AVX-512 VBMI). - Introduce a new 32-state shuffle-based DFA engine ("Sheng32"). This improves scanning performance by leveraging AVX-512 VBMI. - Introduce a new 64-state shuffle-based DFA engine ("Sheng64"). This improves scanning performance by leveraging AVX-512 VBMI. - Introduce a new shuffle-based hybrid DFA engine ("McSheng64"). This improves scanning performance by leveraging AVX-512 VBMI. - Improvement on exceptional state handling performance for LimEx NFA, including support for AVX-512 VBMI. - Improvement on lookaround performance with new models, including support for AVX-512. - Improvement on DFA state space efficiency. - Optimization on decision of NFA/DFA generation. - hsbench: add CSV dump support for hsbench. - Bugfix for cmake error on Icelake under release mode. - Bugfix in find_vertices_in_cycles() to avoid self-loop checking in SCC. - Bugfix for issue #270: fix return value handling in chimera. - Bugfix for issue #284: use correct free function in logical combination. - Add BUILD_EXAMPLES cmake option to enable example code compilation. (#260) - Some typo fixing. (#242, #259) ## [5.3.0] 2020-05-15 - Improvement on literal matcher "Teddy" performance, including support for Intel(R) AVX-512 Vector Byte Manipulation Instructions (Intel(R) AVX-512 VBMI). - Improvement on single-byte/two-byte matching performance, including support for Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512). - hsbench: add hyphen support for -T option. - tools/fuzz: add test scripts for synthetic pattern generation. - Bugfix for acceleration path analysis in LimEx NFA. - Bugfix for duplicate matches for Small-write engine. - Bugfix for UTF8 checking problem for hscollider. - Bugfix for issue #205: avoid crash of `hs_compile_lit_multi()` with clang and ASAN. - Bugfix for issue #211: fix error in `db_check_platform()` function. - Bugfix for issue #217: fix cmake parsing issue of CPU arch for non-English locale. - Bugfix for issue #228: avoid undefined behavior when calling `close()` after `fdopendir()` in `loadExpressions()`. - Bugfix for issue #239: fix hyperscan compile issue under gcc-10. - Add VLAN packets processing capability in pcap analysis script. (#214) - Avoid extra convert instruction for "Noodle". (#221) - Add Hyperscan version marcro in `hs.h`. (#222) ## [5.2.1] 2019-10-13 - Bugfix for issue #186: fix compile issue when `BUILD_SHARED_LIBS` is on in release mode. - Disable redundant move check for older compiler versions. ## [5.2.0] 2019-07-12 - Literal API: add new API `hs_compile_lit()` and `hs_compile_lit_multi()` to process pure literal rule sets. The 2 literal APIs treat each expression text in a literal sense without recognizing any regular grammers. - Logical combination: add support for purely negative combinations, which report match at EOD in case of no sub-expressions matched. - Windows porting: support shared library (DLL) on Windows with available tools hscheck, hsbench and hsdump. - Bugfix for issue #148: fix uninitialized use of `scatter_unit_uX` due to padding. - Bugfix for issue #155: fix numerical result out of range error. - Bugfix for issue #165: avoid corruption of pending combination report in streaming mode. - Bugfix for issue #174: fix scratch free issue when memory allocation fails. ## [5.1.1] 2019-04-03 - Add extra detection and handling when invalid rose programs are triggered. - Bugfix for issue #136: fix CMake parsing of CPU architecure for GCC-9. - Bugfix for issue #137: avoid file path impact on fat runtime build. - Bugfix for issue #141: fix rose literal programs for multi-pattern matching when no pattern ids are provided. - Bugfix for issue #144: fix library install path in pkg-config files. ## [5.1.0] 2019-01-17 - Improve DFA state compression by wide-state optimization to reduce bytecode size. - Create specific interpreter runtime handling to boost the performance of pure literal matching. - Optimize original presentation of interpreter (the "Rose" engine ) to increase overall performance. - Bugfix for logical combinations: fix error reporting combination's match in case of sub-expression has EOD match under streaming mode. - Bugfix for logical combinations: fix miss reporting combination's match under vacuous input. - Bugfix for issue #104: fix compile error with Boost 1.68.0. - Bugfix for issue #127: avoid pcre error for hscollider with installed PCRE package. - Update version of PCRE used by testing tools as a syntax and semantic reference to PCRE 8.41 or above. - Fix github repo address in doc. ## [5.0.0] 2018-07-09 - Introduce chimera hybrid engine of Hyperscan and PCRE, to fully support PCRE syntax as well as to take advantage of the high performance nature of Hyperscan. - New API feature: logical combinations (AND, OR and NOT) of patterns in a given pattern set. - Windows porting: hsbench, hscheck, hscollider and hsdump tools now available on Windows 8 or newer. - Improve undirected graph implementation to avoid graph copy and reduce compile time. - Bugfix for issue #86: enable hscollider for installed PCRE package. ## [4.7.0] 2018-01-24 - Introduced hscollider pattern testing tool, for validating Hyperscan match behaviour against PCRE. - Introduced hscheck pattern compilation tool. - Introduced hsdump development tool for producing information about Hyperscan pattern compilation. - New API feature: extended approximate matching support for Hamming distance. - Bugfix for issue #69: Force C++ linkage in Xcode. - Bugfix for issue #73: More documentation for `hs_close_stream()`. - Bugfix for issue #78: Fix for fat runtime initialisation when used as a shared library. ## [4.6.0] 2017-09-22 - New API feature: stream state compression. This allows the user to compress and restore state for streams to reduce memory usage. - Many improvements to literal matching performance, including more support for Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512). - Compile time improvements, mainly reducing compiler memory allocation. Also results in reduced compile time for some pattern sets. - Bugfix for issue #62: fix error building Hyperscan using older versions of Boost. - Small updates to fix warnings identified by Coverity. ## [4.5.2] 2017-07-26 - Bugfix for issue #57: Treat characters between `\Q.\E` as codepoints in UTF8 mode. - Bugfix for issue #60: Use a portable flag for mktemp for fat runtime builds. - Bugfix for fat runtime builds on AVX-512 capable machines with Hyperscan's AVX-512 support disabled. ## [4.5.1] 2017-06-16 - Bugfix for issue #56: workaround for gcc-4.8 C++11 defect. - Bugfix for literal matching table generation, reversing a regression in performance for some literal matching cases. - Bugfixes for hsbench, related to multicore benchmarking, portability fixes for FreeBSD, and clarifying output results. - CMake: removed a duplicate else branch that causes very recent (v3.9) builds of CMake to fail. ## [4.5.0] 2017-06-09 - New API feature: approximate matching using the "edit distance" extended parameter. This allows the user to request all matches that are a given edit distance from an exact match for a pattern. - Initial support for Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512), disabled by default. To enable it, pass `-DBUILD_AVX512=1` to `cmake`. - Major compile time improvements in many subsystems, reducing compile time significantly for many large pattern sets. - Internal reworking of literal matchers to operate on literals of at most eight characters, with subsequent confirmation done in the Rose interpreter. This reduces complexity and bytecode size and improves performance for many pattern sets. - Improve performance of the FDR literal matcher front end. - Improve bucket assignment and other heuristics governing the FDR literal matcher. - Improve optimisation passes that take advantage of extended parameter constraints (`min_offset`, etc). - Introduce further lookaround specialisations to improve scanning performance. - Optimise Rose interpreter construction to reduce the length of programs generated in some situations. - Remove the old "Rose" pattern decomposition analysis pass in favour of the new "Violet" pass introduced in Hyperscan 4.3.0. - In streaming mode, allow exhaustion (where the stream can no longer produce matchers) to be detected in more situations, improving scanning performance. - Improve parsing of control verbs (such as `(*UTF8)`) that can only occur at the beginning of the pattern. Combinations of supported verbs in any order are now permitted. - Update version of PCRE used by testing tools as a syntax and semantic reference to PCRE 8.40. - Tuning support for Intel(R) microarchitecture code names Skylake, Skylake Server, Goldmont. - CMake: when building a native build with a version of GCC that doesn't recognise the host compiler, tune for the microarch selected by `-march=native`. - CMake: don't fail if SQLite (which is only required to build the `hsbench` tool) is not present. - CMake: detect libc++ directly and use that to inform the Boost version requirement. - Bugfix for issue #51: make the fat runtime build wrapper less fragile. - Bugfix for issues #46, #52: use `sqlite3_errmsg()` to allow SQLite 3.6.x to be used. Thanks to @EaseTheWorld for the PR. ## [4.4.1] 2017-02-28 - Bugfixes to fix issues where stale data was being referenced in scratch memory. In particular this may have resulted in `hs_close_stream()` referencing data from other previously scanned streams. This may result in incorrect matches being been reported. ## [4.4.0] 2017-01-20 - Introduce the "fat runtime" build. This will build several variants of the Hyperscan scanning engine specialised for different processor feature sets, and use the appropriate one for the host at runtime. This uses the "ifunc" indirect function attribute provided by GCC and is currently available on Linux only, where it is the default for release builds. - New API function: add the `hs_valid_platform()` function. This function tests whether the host provides the SSSE3 instruction set required by Hyperscan. - Introduce a new standard benchmarking tool, "hsbench". This provides an easy way to measure Hyperscan's performance for a particular set of patterns and corpus of data to be scanned. - Introduce a 64-bit GPR LimEx NFA model, which uses 64-bit GPRs on 64-bit hosts and SSE registers on 32-bit hosts. - Introduce a new DFA model ("McSheng") which is a hybrid of the existing McClellan and Sheng models. This improves scanning performance for some cases. - Introduce lookaround specialisations to improve scanning performance. - Improve the handling of long literals by moving confirmation to the Rose interpreter and simplifying the hash table used to track them in streaming mode. - Improve compile time optimisation for removing redundant paths from expression graphs. - Build: improve support for building with MSVC toolchain. - Reduce the size of small write DFAs used for small scans in block mode. - Introduce a custom graph type (`ue2_graph`) used in place of the Boost Graph Library's `adjacency_list` type. Improves compile time performance and type safety. - Improve scanning performance of the McClellan DFA. - Bugfix for a very unusual SOM case where the incorrect start offset was reported for a match. - Bugfix for issue #37, removing execute permissions from some source files. - Bugfix for issue #41, handle Windows line endings in pattern files. ## [4.3.2] 2016-11-15 - Bugfix for issue #39. This small change is a workaround for an issue in Boost 1.62. The fix has been submitted to Boost for inclusion in a future release. ## [4.3.1] 2016-08-29 - Bugfix for issue #30. In recent versions of Clang, a write to a variable was being elided, resulting in corrupted stream state after calling `hs_reset_stream()`. ## [4.3.0] 2016-08-24 - Introduce a new analysis pass ("Violet") used for decomposition of patterns into literals and smaller engines. - Introduce a new container engine ("Tamarama") for infix and suffix engines that can be proven to run exclusively of one another. This reduces stream state for pattern sets with many such engines. - Introduce a new shuffle-based DFA engine ("Sheng"). This improves scanning performance for pattern sets where small engines are generated. - Improve the analysis used to extract extra mask information from short literals. - Reduced compile time spent in equivalence class analysis. - Build: frame pointers are now only omitted for 32-bit release builds. - Build: Workaround for C++ issues reported on FreeBSD/libc++ platforms. (github issue #27) - Simplify the LimEx NFA with a unified "variable shift" model, which reduces the number of different NFA code paths to one per model size. - Allow some anchored prefixes that may squash the literal to which they are attached to run eagerly. This improves scanning performance for some patterns. - Simplify and improve EOD ("end of data") matching, using the interpreter for all operations. - Elide unnecessary instructions in the Rose interpreter at compile time. - Reduce the number of inlined instantiations of the Rose interpreter in order to reduce instruction cache pressure. - Small improvements to literal matcher acceleration. - Parser: ignore `\E` metacharacters that are not preceded by `\Q`. This conforms to PCRE's behaviour, rather than returning a compile error. - Check for misaligned memory when allocating an error structure in Hyperscan's compile path and return an appropriate error if detected. ## [4.2.0] 2016-05-31 - Introduce an interpreter for many complex actions to replace the use of internal reports within the core of Hyperscan (the "Rose" engine). This improves scanning performance and reduces database size for many pattern sets. - Many enhancements to the acceleration framework used by NFA and DFA engines, including more flexible multibyte implementations and more AVX2 support. This improves scanning performance for many pattern sets. - Improved prefiltering support for complex patterns containing very large bounded repeats (`R{M,N}` with large `N`). - Improve scanning performance of pattern sets with a very large number of EOD-anchored patterns. - Improve scanning performance of large pattern sets that use the `HS_FLAG_SINGLEMATCH` flag. - Improve scanning performance of pattern sets that contain a single literal by improving the "Noodle" literal matcher. - Small reductions in total stream state for many pattern sets. - Improve runtime detection of AVX2 support. - Disable -Werror for release builds, in order to behave better for packagers and users with different compiler combinations than those that we test. - Improve support for building on Windows with MSVC 2015 (github issue #14). Support for Hyperscan on Windows is still experimental. - Small updates to fix warnings identified by Coverity. - Remove Python codegen for the "FDR" and "Teddy" literal matchers. These are now implemented directly in C code. - Remove the specialist "Sidecar" engine in favour of using our more general repeat engines. - New API function: add the `hs_expression_ext_info()` function. This is a variant of `hs_expression_info()` that can accept patterns with extended parameters. - New API error value: add the `HS_SCRATCH_IN_USE` error, which is returned when Hyperscan detects that a scratch region is already in use on entry to an API function. ## [4.1.0] 2015-12-18 - Update version of PCRE used by testing tools as a syntax and semantic reference to PCRE 8.38. - Small updates to fix warnings identified by Coverity. - Clean up and unify exception handling behaviour across GPR and SIMD NFA models. - Fix bug in handling of bounded repeat triggers with large gaps between them for sparse repeat model. - Correctly reject POSIX collating elements (`[.ch.]`, `[=ch=]`) in the parser. These are not supported by Hyperscan. - Add support for quoted sequences (`\Q...\E`) inside character classes. - Simplify FDR literal matcher runtime by removing some static specialization. - Fix handling of the POSIX `[:graph:]`, `[:print:]` and `[:punct:]` character classes to match the behaviour of PCRE 8.38 in both standard operation and with the UCP flag set. (Note: some bugs were fixed in this area in PCRE 8.38.) Previously Hyperscan's behaviour was the same as versions of PCRE before 8.34. - Improve performance when compiling pattern sets that include a large number of similar bounded repeat constructs. (github issue #9) ## [4.0.1] 2015-10-30 - Minor cleanups to test code. - CMake and other build system improvements. - API update: allow `hs_reset_stream()` and `hs_reset_and_copy_stream()` to be supplied with a NULL scratch pointer if no matches are required. This is in line with the behaviour of `hs_close_stream()`. - Disallow bounded repeats with a very large minimum repeat but no maximum, i.e. {N,} for very large N. - Reduce compile memory usage in literal set explansion for some large cases. ## [4.0.0] 2015-10-20 - Original release of Hyperscan as open-source software. vectorscan-5.4.11/CMakeLists.txt000066400000000000000000001171111452711272000165140ustar00rootroot00000000000000cmake_minimum_required (VERSION 3.18.4) project (vectorscan C CXX) set (HS_MAJOR_VERSION 5) set (HS_MINOR_VERSION 4) set (HS_PATCH_VERSION 11) set (HS_VERSION ${HS_MAJOR_VERSION}.${HS_MINOR_VERSION}.${HS_PATCH_VERSION}) string (TIMESTAMP BUILD_DATE "%Y-%m-%d") message(STATUS "Build date: ${BUILD_DATE}") # Dependencies check set(CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake) include(CheckCCompilerFlag) include(CheckCXXCompilerFlag) include(CheckCXXSymbolExists) INCLUDE (CheckFunctionExists) INCLUDE (CheckIncludeFiles) INCLUDE (CheckIncludeFileCXX) INCLUDE (CheckLibraryExists) INCLUDE (CheckSymbolExists) include (CMakeDependentOption) include (GNUInstallDirs) include (${CMAKE_MODULE_PATH}/platform.cmake) include (${CMAKE_MODULE_PATH}/boost.cmake) include (${CMAKE_MODULE_PATH}/ragel.cmake) find_package(PkgConfig REQUIRED) find_program(RAGEL ragel) if(${RAGEL} STREQUAL "RAGEL-NOTFOUND") message(FATAL_ERROR "Ragel state machine compiler not found") endif() # Build type check if (NOT CMAKE_BUILD_TYPE) message(STATUS "Default build type 'Release with debug info'") set(CMAKE_BUILD_TYPE RELWITHDEBINFO CACHE STRING "" FORCE ) else() string(TOUPPER ${CMAKE_BUILD_TYPE} CMAKE_BUILD_TYPE) message(STATUS "Build type ${CMAKE_BUILD_TYPE}") endif() if(CMAKE_BUILD_TYPE MATCHES NONE|RELEASE|RELWITHDEBINFO|MINSIZEREL) message(STATUS "using release build") set(RELEASE_BUILD TRUE) else() set(RELEASE_BUILD FALSE) endif() set(BINDIR "${PROJECT_BINARY_DIR}/bin") set(LIBDIR "${PROJECT_BINARY_DIR}/lib") set(INCLUDE_INSTALL_DIR ${CMAKE_INSTALL_INCLUDEDIR}) # First for the generic no-config case set(CMAKE_RUNTIME_OUTPUT_DIRECTORY "${BINDIR}") set(CMAKE_LIBRARY_OUTPUT_DIRECTORY "${LIBDIR}") set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY "${LIBDIR}") # Second, for multi-config builds (e.g. msvc) foreach (OUTPUTCONFIG ${CMAKE_CONFIGURATION_TYPES}) string (TOUPPER ${OUTPUTCONFIG} OUTPUTCONFIG) set(CMAKE_RUNTIME_OUTPUT_DIRECTORY_${OUTPUTCONFIG} "${BINDIR}") set(CMAKE_LIBRARY_OUTPUT_DIRECTORY_${OUTPUTCONFIG} "${LIBDIR}") set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY_${OUTPUTCONFIG} "${LIBDIR}") endforeach (OUTPUTCONFIG CMAKE_CONFIGURATION_TYPES) set(CMAKE_INCLUDE_CURRENT_DIR 1) include_directories(${PROJECT_SOURCE_DIR}/src) include_directories(${PROJECT_BINARY_DIR}) include_directories(SYSTEM include) # Compiler detection include (${CMAKE_MODULE_PATH}/compiler.cmake) # CMake options if (BUILD_STATIC_AND_SHARED) message(FATAL_ERROR "This option is no longer supported, please set at least one of BUILD_STATIC_LIBS and BUILD_SHARED_LIBS") endif() option(BUILD_SHARED_LIBS "Build shared libs" OFF) option(BUILD_STATIC_LIBS "Build static libs" OFF) if (BUILD_SHARED_LIBS) message(STATUS "Building shared libraries") endif() if (BUILD_STATIC_LIBS) message(STATUS "Building static libraries") endif() if (NOT BUILD_STATIC_LIBS AND NOT BUILD_SHARED_LIBS) # if none are set build static libs message(STATUS "Neither shared nor static libraries were requested, building static libraries") set(BUILD_STATIC_LIBS ON) endif () CMAKE_DEPENDENT_OPTION(DUMP_SUPPORT "Dump code support; normally on, except in release builds" ON "NOT RELEASE_BUILD" OFF) CMAKE_DEPENDENT_OPTION(DISABLE_ASSERTS "Disable assert(); Asserts are enabled in debug builds, disabled in release builds" OFF "NOT RELEASE_BUILD" ON) option(DEBUG_OUTPUT "Enable debug output (warning: very verbose)" OFF) if(DEBUG_OUTPUT) add_definitions(-DDEBUG) set(RELEASE_BUILD FALSE) endif(DEBUG_OUTPUT) #for config if (RELEASE_BUILD) set(HS_OPTIMIZE ON) add_definitions(-DNDEBUG) endif() # Detect OS and if Fat Runtime is available include (${CMAKE_MODULE_PATH}/osdetection.cmake) if (ARCH_IA32 OR ARCH_X86_64) include (${CMAKE_MODULE_PATH}/cflags-x86.cmake) set(ARCH_FLAG march) elseif (ARCH_ARM32 OR ARCH_AARCH64) include (${CMAKE_MODULE_PATH}/cflags-arm.cmake) set(ARCH_FLAG march) elseif (ARCH_PPC64EL) include (${CMAKE_MODULE_PATH}/cflags-ppc64le.cmake) set(ARCH_FLAG mcpu) endif () # Detect Native arch flags if requested include (${CMAKE_MODULE_PATH}/archdetect.cmake) # Configure Compiler flags (Generic) include (${CMAKE_MODULE_PATH}/sanitize.cmake) if (NOT FAT_RUNTIME) if (GNUCC_TUNE) set(ARCH_C_FLAGS "-${ARCH_FLAG}=${GNUCC_ARCH} -${TUNE_FLAG}=${GNUCC_TUNE}") set(ARCH_CXX_FLAGS "-${ARCH_FLAG}=${GNUCC_ARCH} -${TUNE_FLAG}=${GNUCC_TUNE}") else() set(ARCH_C_FLAGS "-${ARCH_FLAG}=${GNUCC_ARCH} -mtune=${TUNE_FLAG} ${ARCH_C_FLAGS}") set(ARCH_CXX_FLAGS "-${ARCH_FLAG}=${GNUCC_ARCH} -mtune=${TUNE_FLAG} ${ARCH_CXX_FLAGS}") endif() endif() # remove CMake's idea of optimisation foreach (CONFIG ${CMAKE_BUILD_TYPE} ${CMAKE_CONFIGURATION_TYPES}) string(REGEX REPLACE "-O[^ ]*" "" CMAKE_C_FLAGS_${CONFIG} "${CMAKE_C_FLAGS_${CONFIG}}") string(REGEX REPLACE "-O[^ ]*" "" CMAKE_CXX_FLAGS_${CONFIG} "${CMAKE_CXX_FLAGS_${CONFIG}}") endforeach () message(STATUS "ARCH_C_FLAGS : ${ARCH_C_FLAGS}") message(STATUS "ARCH_CXX_FLAGS : ${ARCH_CXX_FLAGS}") if(RELEASE_BUILD) if (NOT CMAKE_BUILD_TYPE MATCHES MINSIZEREL) set(OPT_C_FLAG "-O3") set(OPT_CXX_FLAG "-O3") else () set(OPT_C_FLAG "-Os") set(OPT_CXX_FLAG "-Os") endif () else() set(OPT_C_FLAG "-O0") set(OPT_CXX_FLAG "-O0") endif(RELEASE_BUILD) include (${CMAKE_MODULE_PATH}/cflags-generic.cmake) include_directories(SYSTEM ${Boost_INCLUDE_DIRS}) set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${ARCH_C_FLAGS}") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${ARCH_CXX_FLAGS}") # PCRE check, we have a fixed requirement for PCRE to use Chimera # and hscollider set(PCRE_REQUIRED_MAJOR_VERSION 8) set(PCRE_REQUIRED_MINOR_VERSION 41) set(PCRE_REQUIRED_VERSION ${PCRE_REQUIRED_MAJOR_VERSION}.${PCRE_REQUIRED_MINOR_VERSION}) include (${CMAKE_MODULE_PATH}/pcre.cmake) if (NOT CORRECT_PCRE_VERSION) message(STATUS "PCRE ${PCRE_REQUIRED_VERSION} or above not found") endif() # we need static libs for Chimera - too much deep magic for shared libs if (CORRECT_PCRE_VERSION AND PCRE_BUILD_SOURCE AND BUILD_STATIC_LIBS) set(BUILD_CHIMERA TRUE) endif() set(RAGEL_C_FLAGS "-Wno-unused -funsigned-char") set_source_files_properties( src/parser/Parser.cpp PROPERTIES COMPILE_FLAGS "${RAGEL_C_FLAGS}") ragelmaker(src/parser/Parser.rl) set_source_files_properties( src/parser/control_verbs.cpp PROPERTIES COMPILE_FLAGS "${RAGEL_C_FLAGS}") ragelmaker(src/parser/control_verbs.rl) # do substitutions configure_file(${CMAKE_MODULE_PATH}/config.h.in ${PROJECT_BINARY_DIR}/config.h) configure_file(src/hs_version.h.in ${PROJECT_BINARY_DIR}/hs_version.h) configure_file(libhs.pc.in libhs.pc @ONLY) # only replace @ quoted vars install(FILES ${CMAKE_BINARY_DIR}/libhs.pc DESTINATION "${CMAKE_INSTALL_LIBDIR}/pkgconfig") # only set these after all tests are done if (NOT FAT_RUNTIME) set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${EXTRA_C_FLAGS} ${HS_C_FLAGS}") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${EXTRA_CXX_FLAGS} ${HS_CXX_FLAGS}") else() set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${EXTRA_C_FLAGS}") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${EXTRA_CXX_FLAGS}") endif() SET(hs_HEADERS ${PROJECT_BINARY_DIR}/hs_version.h src/hs.h src/hs_common.h src/hs_compile.h src/hs_runtime.h ) install(FILES ${hs_HEADERS} DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}/hs") set (hs_exec_common_SRCS src/alloc.c src/scratch.c src/util/arch/common/cpuid_flags.h src/util/multibit.c ) if (ARCH_IA32 OR ARCH_X86_64) set (hs_exec_common_SRCS ${hs_exec_common_SRCS} src/util/arch/x86/cpuid_flags.c ) elseif (ARCH_ARM32 OR ARCH_AARCH64) set (hs_exec_common_SRCS ${hs_exec_common_SRCS} src/util/arch/arm/cpuid_flags.c ) elseif (ARCH_PPC64EL) set (hs_exec_common_SRCS ${hs_exec_common_SRCS} src/util/arch/ppc64el/cpuid_flags.c) endif () set (hs_exec_SRCS ${hs_HEADERS} src/hs_version.h.in src/ue2common.h src/allocator.h src/crc32.c src/crc32.h src/report.h src/runtime.c src/stream_compress.c src/stream_compress.h src/stream_compress_impl.h src/fdr/fdr.c src/fdr/fdr.h src/fdr/fdr_internal.h src/fdr/fdr_confirm.h src/fdr/fdr_confirm_runtime.h src/fdr/flood_runtime.h src/fdr/fdr_loadval.h src/fdr/teddy.c src/fdr/teddy.h src/fdr/teddy_internal.h src/fdr/teddy_runtime_common.h src/hwlm/hwlm.c src/hwlm/hwlm.h src/hwlm/hwlm_internal.h src/hwlm/noodle_engine.cpp src/hwlm/noodle_engine.h src/hwlm/noodle_internal.h src/nfa/accel.c src/nfa/accel.h src/nfa/castle.c src/nfa/castle.h src/nfa/castle_internal.h src/nfa/gough.c src/nfa/gough_internal.h src/nfa/lbr.c src/nfa/lbr.h src/nfa/lbr_common_impl.h src/nfa/lbr_internal.h src/nfa/limex_accel.c src/nfa/limex_accel.h src/nfa/limex_exceptional.h src/nfa/limex_native.c src/nfa/limex_ring.h src/nfa/limex_64.c src/nfa/limex_simd128.c src/nfa/limex_simd256.c src/nfa/limex_simd384.c src/nfa/limex_simd512.c src/nfa/limex.h src/nfa/limex_common_impl.h src/nfa/limex_context.h src/nfa/limex_internal.h src/nfa/limex_runtime.h src/nfa/limex_runtime_impl.h src/nfa/limex_shuffle.h src/nfa/limex_state_impl.h src/nfa/mcclellan.c src/nfa/mcclellan.h src/nfa/mcclellan_common_impl.h src/nfa/mcclellan_internal.h src/nfa/mcsheng.c src/nfa/mcsheng_data.c src/nfa/mcsheng.h src/nfa/mcsheng_internal.h src/nfa/mpv.h src/nfa/mpv.c src/nfa/mpv_internal.h src/nfa/nfa_api.h src/nfa/nfa_api_dispatch.c src/nfa/nfa_internal.h src/nfa/nfa_rev_api.h src/nfa/repeat.c src/nfa/repeat.h src/nfa/repeat_internal.h src/nfa/sheng.c src/nfa/sheng.h src/nfa/sheng_defs.h src/nfa/sheng_impl.h src/nfa/sheng_impl4.h src/nfa/sheng_internal.h src/nfa/shufti.cpp src/nfa/shufti.h src/nfa/tamarama.c src/nfa/tamarama.h src/nfa/tamarama_internal.h src/nfa/truffle.cpp src/nfa/truffle.h src/nfa/vermicelli.hpp src/nfa/vermicelli_run.h src/som/som.h src/som/som_operation.h src/som/som_runtime.h src/som/som_runtime.c src/som/som_stream.c src/som/som_stream.h src/rose/block.c src/rose/catchup.h src/rose/catchup.c src/rose/infix.h src/rose/init.h src/rose/init.c src/rose/stream.c src/rose/stream_long_lit.h src/rose/stream_long_lit_hash.h src/rose/match.h src/rose/match.c src/rose/miracle.h src/rose/program_runtime.c src/rose/program_runtime.h src/rose/runtime.h src/rose/rose.h src/rose/rose_internal.h src/rose/rose_program.h src/rose/rose_types.h src/rose/rose_common.h src/rose/validate_mask.h src/rose/validate_shufti.h src/util/bitutils.h src/util/copybytes.h src/util/exhaust.h src/util/fatbit.h src/util/join.h src/util/multibit.h src/util/multibit.c src/util/multibit_compress.h src/util/multibit_internal.h src/util/pack_bits.h src/util/popcount.h src/util/pqueue.h src/util/scatter.h src/util/scatter_runtime.h src/util/simd_utils.h src/util/state_compress.h src/util/state_compress.c src/util/unaligned.h src/util/uniform_ops.h src/database.c src/database.h ) if (ARCH_IA32 OR ARCH_X86_64) set (hs_exec_SRCS ${hs_exec_SRCS} src/nfa/vermicelli_simd.cpp src/util/supervector/arch/x86/impl.cpp) elseif (ARCH_ARM32 OR ARCH_AARCH64) set (hs_exec_SRCS ${hs_exec_SRCS} src/util/supervector/arch/arm/impl.cpp) elseif (ARCH_PPC64EL) set (hs_exec_SRCS ${hs_exec_SRCS} src/nfa/vermicelli_simd.cpp src/util/supervector/arch/ppc64el/impl.cpp) endif() if (ARCH_IA32 OR ARCH_X86_64) set (hs_exec_avx2_SRCS src/fdr/teddy_avx2.c src/util/arch/x86/masked_move.c src/util/arch/x86/masked_move.h ) endif() if (ARCH_ARM32 OR ARCH_AARCH64) set (hs_exec_neon_SRCS src/nfa/vermicelli_simd.cpp) set (hs_exec_sve_SRCS src/nfa/vermicelli_simd.cpp) endif() SET (hs_compile_SRCS ${hs_HEADERS} src/crc32.h src/database.h src/grey.cpp src/grey.h src/hs.cpp src/hs_internal.h src/hs_version.h.in src/scratch.h src/state.h src/ue2common.h src/compiler/asserts.cpp src/compiler/asserts.h src/compiler/compiler.cpp src/compiler/compiler.h src/compiler/error.cpp src/compiler/error.h src/compiler/expression_info.h src/fdr/engine_description.cpp src/fdr/engine_description.h src/fdr/fdr_compile.cpp src/fdr/fdr_compile.h src/fdr/fdr_compile_internal.h src/fdr/fdr_compile_util.cpp src/fdr/fdr_confirm_compile.cpp src/fdr/fdr_confirm.h src/fdr/fdr_engine_description.cpp src/fdr/fdr_engine_description.h src/fdr/fdr_internal.h src/fdr/flood_compile.cpp src/fdr/teddy_compile.cpp src/fdr/teddy_compile.h src/fdr/teddy_engine_description.cpp src/fdr/teddy_engine_description.h src/fdr/teddy_internal.h src/hwlm/hwlm_build.cpp src/hwlm/hwlm_build.h src/hwlm/hwlm_internal.h src/hwlm/hwlm_literal.cpp src/hwlm/hwlm_literal.h src/hwlm/noodle_build.cpp src/hwlm/noodle_build.h src/hwlm/noodle_internal.h src/nfa/accel.h src/nfa/accel_dfa_build_strat.cpp src/nfa/accel_dfa_build_strat.h src/nfa/accelcompile.cpp src/nfa/accelcompile.h src/nfa/callback.h src/nfa/castlecompile.cpp src/nfa/castlecompile.h src/nfa/dfa_build_strat.cpp src/nfa/dfa_build_strat.h src/nfa/dfa_min.cpp src/nfa/dfa_min.h src/nfa/goughcompile.cpp src/nfa/goughcompile.h src/nfa/goughcompile_accel.cpp src/nfa/goughcompile_internal.h src/nfa/goughcompile_reg.cpp src/nfa/mcclellan.h src/nfa/mcclellan_internal.h src/nfa/mcclellancompile.cpp src/nfa/mcclellancompile.h src/nfa/mcclellancompile_util.cpp src/nfa/mcclellancompile_util.h src/nfa/mcsheng_compile.cpp src/nfa/mcsheng_compile.h src/nfa/limex_compile.cpp src/nfa/limex_compile.h src/nfa/limex_accel.h src/nfa/limex_internal.h src/nfa/mpv_internal.h src/nfa/mpvcompile.cpp src/nfa/mpvcompile.h src/nfa/nfa_api.h src/nfa/nfa_api_queue.h src/nfa/nfa_api_util.h src/nfa/nfa_build_util.cpp src/nfa/nfa_build_util.h src/nfa/nfa_internal.h src/nfa/nfa_kind.h src/nfa/rdfa.cpp src/nfa/rdfa.h src/nfa/rdfa_graph.cpp src/nfa/rdfa_graph.h src/nfa/rdfa_merge.cpp src/nfa/rdfa_merge.h src/nfa/repeat_internal.h src/nfa/repeatcompile.cpp src/nfa/repeatcompile.h src/nfa/sheng_internal.h src/nfa/shengcompile.cpp src/nfa/shengcompile.h src/nfa/shufticompile.cpp src/nfa/shufticompile.h src/nfa/tamaramacompile.cpp src/nfa/tamaramacompile.h src/nfa/trufflecompile.cpp src/nfa/trufflecompile.h src/nfa/vermicellicompile.cpp src/nfa/vermicellicompile.h src/nfagraph/ng.cpp src/nfagraph/ng.h src/nfagraph/ng_anchored_acyclic.cpp src/nfagraph/ng_anchored_acyclic.h src/nfagraph/ng_anchored_dots.cpp src/nfagraph/ng_anchored_dots.h src/nfagraph/ng_asserts.cpp src/nfagraph/ng_asserts.h src/nfagraph/ng_builder.cpp src/nfagraph/ng_builder.h src/nfagraph/ng_calc_components.cpp src/nfagraph/ng_calc_components.h src/nfagraph/ng_cyclic_redundancy.cpp src/nfagraph/ng_cyclic_redundancy.h src/nfagraph/ng_depth.cpp src/nfagraph/ng_depth.h src/nfagraph/ng_dominators.cpp src/nfagraph/ng_dominators.h src/nfagraph/ng_edge_redundancy.cpp src/nfagraph/ng_edge_redundancy.h src/nfagraph/ng_equivalence.cpp src/nfagraph/ng_equivalence.h src/nfagraph/ng_execute.cpp src/nfagraph/ng_execute.h src/nfagraph/ng_expr_info.cpp src/nfagraph/ng_expr_info.h src/nfagraph/ng_extparam.cpp src/nfagraph/ng_extparam.h src/nfagraph/ng_fixed_width.cpp src/nfagraph/ng_fixed_width.h src/nfagraph/ng_fuzzy.cpp src/nfagraph/ng_fuzzy.h src/nfagraph/ng_haig.cpp src/nfagraph/ng_haig.h src/nfagraph/ng_holder.cpp src/nfagraph/ng_holder.h src/nfagraph/ng_is_equal.cpp src/nfagraph/ng_is_equal.h src/nfagraph/ng_lbr.cpp src/nfagraph/ng_lbr.h src/nfagraph/ng_literal_analysis.cpp src/nfagraph/ng_literal_analysis.h src/nfagraph/ng_literal_component.cpp src/nfagraph/ng_literal_component.h src/nfagraph/ng_literal_decorated.cpp src/nfagraph/ng_literal_decorated.h src/nfagraph/ng_mcclellan.cpp src/nfagraph/ng_mcclellan.h src/nfagraph/ng_mcclellan_internal.h src/nfagraph/ng_limex.cpp src/nfagraph/ng_limex.h src/nfagraph/ng_limex_accel.cpp src/nfagraph/ng_limex_accel.h src/nfagraph/ng_misc_opt.cpp src/nfagraph/ng_misc_opt.h src/nfagraph/ng_netflow.cpp src/nfagraph/ng_netflow.h src/nfagraph/ng_prefilter.cpp src/nfagraph/ng_prefilter.h src/nfagraph/ng_prune.cpp src/nfagraph/ng_prune.h src/nfagraph/ng_puff.cpp src/nfagraph/ng_puff.h src/nfagraph/ng_redundancy.cpp src/nfagraph/ng_redundancy.h src/nfagraph/ng_region.cpp src/nfagraph/ng_region.h src/nfagraph/ng_region_redundancy.cpp src/nfagraph/ng_region_redundancy.h src/nfagraph/ng_repeat.cpp src/nfagraph/ng_repeat.h src/nfagraph/ng_reports.cpp src/nfagraph/ng_reports.h src/nfagraph/ng_restructuring.cpp src/nfagraph/ng_restructuring.h src/nfagraph/ng_revacc.cpp src/nfagraph/ng_revacc.h src/nfagraph/ng_sep.cpp src/nfagraph/ng_sep.h src/nfagraph/ng_small_literal_set.cpp src/nfagraph/ng_small_literal_set.h src/nfagraph/ng_som.cpp src/nfagraph/ng_som.h src/nfagraph/ng_som_add_redundancy.cpp src/nfagraph/ng_som_add_redundancy.h src/nfagraph/ng_som_util.cpp src/nfagraph/ng_som_util.h src/nfagraph/ng_split.cpp src/nfagraph/ng_split.h src/nfagraph/ng_squash.cpp src/nfagraph/ng_squash.h src/nfagraph/ng_stop.cpp src/nfagraph/ng_stop.h src/nfagraph/ng_uncalc_components.cpp src/nfagraph/ng_uncalc_components.h src/nfagraph/ng_utf8.cpp src/nfagraph/ng_utf8.h src/nfagraph/ng_util.cpp src/nfagraph/ng_util.h src/nfagraph/ng_vacuous.cpp src/nfagraph/ng_vacuous.h src/nfagraph/ng_violet.cpp src/nfagraph/ng_violet.h src/nfagraph/ng_width.cpp src/nfagraph/ng_width.h src/parser/AsciiComponentClass.cpp src/parser/AsciiComponentClass.h src/parser/Component.cpp src/parser/Component.h src/parser/ComponentAlternation.cpp src/parser/ComponentAlternation.h src/parser/ComponentAssertion.cpp src/parser/ComponentAssertion.h src/parser/ComponentAtomicGroup.cpp src/parser/ComponentAtomicGroup.h src/parser/ComponentBackReference.cpp src/parser/ComponentBackReference.h src/parser/ComponentBoundary.cpp src/parser/ComponentBoundary.h src/parser/ComponentByte.cpp src/parser/ComponentByte.h src/parser/ComponentClass.cpp src/parser/ComponentClass.h src/parser/ComponentCondReference.cpp src/parser/ComponentCondReference.h src/parser/ComponentEUS.cpp src/parser/ComponentEUS.h src/parser/ComponentEmpty.cpp src/parser/ComponentEmpty.h src/parser/ComponentRepeat.cpp src/parser/ComponentRepeat.h src/parser/ComponentSequence.cpp src/parser/ComponentSequence.h src/parser/ComponentVisitor.cpp src/parser/ComponentVisitor.h src/parser/ComponentWordBoundary.cpp src/parser/ComponentWordBoundary.h src/parser/ConstComponentVisitor.cpp src/parser/ConstComponentVisitor.h src/parser/Parser.cpp src/parser/Parser.h src/parser/Utf8ComponentClass.cpp src/parser/Utf8ComponentClass.h src/parser/buildstate.cpp src/parser/buildstate.h src/parser/check_refs.cpp src/parser/check_refs.h src/parser/control_verbs.cpp src/parser/control_verbs.h src/parser/logical_combination.cpp src/parser/logical_combination.h src/parser/parse_error.cpp src/parser/parse_error.h src/parser/parser_util.cpp src/parser/position.h src/parser/position_info.h src/parser/prefilter.cpp src/parser/prefilter.h src/parser/shortcut_literal.cpp src/parser/shortcut_literal.h src/parser/ucp_table.cpp src/parser/ucp_table.h src/parser/unsupported.cpp src/parser/unsupported.h src/parser/utf8_validate.h src/parser/utf8_validate.cpp src/smallwrite/smallwrite_build.cpp src/smallwrite/smallwrite_build.h src/smallwrite/smallwrite_internal.h src/som/slot_manager.cpp src/som/slot_manager.h src/som/slot_manager_internal.h src/som/som.h src/som/som_operation.h src/rose/rose_build.h src/rose/rose_build_add.cpp src/rose/rose_build_add_internal.h src/rose/rose_build_add_mask.cpp src/rose/rose_build_anchored.cpp src/rose/rose_build_anchored.h src/rose/rose_build_bytecode.cpp src/rose/rose_build_castle.h src/rose/rose_build_castle.cpp src/rose/rose_build_compile.cpp src/rose/rose_build_convert.cpp src/rose/rose_build_convert.h src/rose/rose_build_dedupe.cpp src/rose/rose_build_engine_blob.cpp src/rose/rose_build_engine_blob.h src/rose/rose_build_exclusive.cpp src/rose/rose_build_exclusive.h src/rose/rose_build_groups.cpp src/rose/rose_build_groups.h src/rose/rose_build_impl.h src/rose/rose_build_infix.cpp src/rose/rose_build_infix.h src/rose/rose_build_instructions.cpp src/rose/rose_build_instructions.h src/rose/rose_build_lit_accel.cpp src/rose/rose_build_lit_accel.h src/rose/rose_build_long_lit.cpp src/rose/rose_build_long_lit.h src/rose/rose_build_lookaround.cpp src/rose/rose_build_lookaround.h src/rose/rose_build_matchers.cpp src/rose/rose_build_matchers.h src/rose/rose_build_merge.cpp src/rose/rose_build_merge.h src/rose/rose_build_misc.cpp src/rose/rose_build_misc.h src/rose/rose_build_program.cpp src/rose/rose_build_program.h src/rose/rose_build_resources.h src/rose/rose_build_role_aliasing.cpp src/rose/rose_build_scatter.cpp src/rose/rose_build_scatter.h src/rose/rose_build_util.h src/rose/rose_build_width.cpp src/rose/rose_build_width.h src/rose/rose_graph.h src/rose/rose_in_graph.h src/rose/rose_in_util.cpp src/rose/rose_in_util.h src/util/accel_scheme.h src/util/alloc.cpp src/util/alloc.h src/util/bitfield.h src/util/boundary_reports.h src/util/charreach.cpp src/util/charreach.h src/util/charreach_util.h src/util/clique.cpp src/util/clique.h src/util/compare.h src/util/compile_context.cpp src/util/compile_context.h src/util/compile_error.cpp src/util/compile_error.h src/util/container.h src/util/depth.cpp src/util/depth.h src/util/determinise.h src/util/dump_mask.cpp src/util/dump_mask.h src/util/fatbit_build.cpp src/util/fatbit_build.h src/util/flat_containers.h src/util/graph.h src/util/graph_range.h src/util/graph_small_color_map.h src/util/graph_undirected.h src/util/hash.h src/util/hash_dynamic_bitset.h src/util/insertion_ordered.h src/util/math.h src/util/multibit_build.cpp src/util/multibit_build.h src/util/noncopyable.h src/util/operators.h src/util/order_check.h src/util/partial_store.h src/util/partitioned_set.h src/util/popcount.h src/util/queue_index_factory.h src/util/report.h src/util/report_manager.cpp src/util/report_manager.h src/util/simd_utils.h src/util/small_vector.h src/util/target_info.cpp src/util/target_info.h src/util/ue2_graph.h src/util/ue2string.cpp src/util/ue2string.h src/util/unaligned.h src/util/unicode_def.h src/util/unicode_set.h src/util/uniform_ops.h src/util/unordered.h src/util/verify_types.h ) set(hs_dump_SRCS src/scratch_dump.cpp src/scratch_dump.h src/fdr/fdr_dump.cpp src/hwlm/hwlm_dump.cpp src/hwlm/hwlm_dump.h src/nfa/accel_dump.cpp src/nfa/accel_dump.h src/nfa/castle_dump.cpp src/nfa/castle_dump.h src/nfagraph/ng_dump.cpp src/nfagraph/ng_dump.h src/nfa/goughcompile_dump.cpp src/nfa/goughcompile_dump.h src/nfa/goughdump.cpp src/nfa/goughdump.h src/nfa/lbr_dump.cpp src/nfa/limex_dump.cpp src/nfa/mcclellandump.cpp src/nfa/mcclellandump.h src/nfa/mcsheng_dump.cpp src/nfa/mcsheng_dump.h src/nfa/mpv_dump.cpp src/nfa/nfa_dump_api.h src/nfa/nfa_dump_dispatch.cpp src/nfa/nfa_dump_internal.cpp src/nfa/nfa_dump_internal.h src/nfa/shengdump.cpp src/nfa/shengdump.h src/nfa/tamarama_dump.cpp src/nfa/tamarama_dump.h src/parser/dump.cpp src/parser/dump.h src/parser/position_dump.h src/smallwrite/smallwrite_dump.cpp src/smallwrite/smallwrite_dump.h src/som/slot_manager_dump.cpp src/som/slot_manager_dump.h src/rose/rose_build_dump.cpp src/rose/rose_build_dump.h src/rose/rose_in_dump.cpp src/rose/rose_in_dump.h src/util/dump_charclass.cpp src/util/dump_charclass.h src/util/dump_util.cpp src/util/dump_util.h ) if (DUMP_SUPPORT) set(hs_compile_SRCS ${hs_compile_SRCS} ${hs_dump_SRCS}) endif() # we group things by sublibraries, specifying shared and static and then # choose which ones to build set (LIB_VERSION ${HS_VERSION}) set (LIB_SOVERSION ${HS_MAJOR_VERSION}) if (NOT FAT_RUNTIME) set(hs_exec_SRCS ${hs_exec_SRCS} ${hs_exec_common_SRCS}) if (ARCH_IA32 OR ARCH_X86_64) if (BUILD_AVX2) set(hs_exec_SRCS ${hs_exec_SRCS} ${hs_exec_avx2_SRCS}) endif() elseif (ARCH_AARCH64) if (BUILD_SVE2) set(hs_exec_SRCS ${hs_exec_SRCS} ${hs_exec_sve2_SRCS}) elseif (BUILD_SVE) set(hs_exec_SRCS ${hs_exec_SRCS} ${hs_exec_sve_SRCS}) else() set(hs_exec_SRCS ${hs_exec_SRCS} ${hs_exec_neon_SRCS}) endif() endif() if (BUILD_STATIC_LIBS) add_library(hs_exec OBJECT ${hs_exec_SRCS}) add_library(hs_runtime STATIC src/hs_version.c src/hs_valid_platform.c $) set_target_properties(hs_runtime PROPERTIES LINKER_LANGUAGE C) add_library(hs_compile OBJECT ${hs_compile_SRCS}) add_library(hs STATIC src/hs_version.c src/hs_valid_platform.c $ $) endif (BUILD_STATIC_LIBS) if (BUILD_SHARED_LIBS) add_library(hs_exec_shared OBJECT ${hs_exec_SRCS}) set_target_properties(hs_exec_shared PROPERTIES POSITION_INDEPENDENT_CODE TRUE) add_library(hs_compile_shared OBJECT ${hs_compile_SRCS}) set_target_properties(hs_compile_shared PROPERTIES POSITION_INDEPENDENT_CODE TRUE) endif() else () if (ARCH_IA32 OR ARCH_X86_64) set(BUILD_WRAPPER "${PROJECT_SOURCE_DIR}/cmake/build_wrapper.sh") if (NOT BUILD_AVX512) set (DISPATCHER_DEFINE "-DDISABLE_AVX512_DISPATCH") endif (NOT BUILD_AVX512) if (NOT BUILD_AVX512VBMI) set (DISPATCHER_DEFINE "${DISPATCHER_DEFINE} -DDISABLE_AVX512VBMI_DISPATCH") endif (NOT BUILD_AVX512VBMI) set_source_files_properties(src/dispatcher.c PROPERTIES COMPILE_FLAGS "-Wno-unused-parameter -Wno-unused-function ${DISPATCHER_DEFINE}") if (BUILD_STATIC_LIBS) add_library(hs_exec_core2 OBJECT ${hs_exec_SRCS}) list(APPEND RUNTIME_LIBS $) set_target_properties(hs_exec_core2 PROPERTIES COMPILE_FLAGS "-march=core2 -msse4.2" RULE_LAUNCH_COMPILE "${BUILD_WRAPPER} core2 ${CMAKE_MODULE_PATH}/keep.syms.in" ) add_library(hs_exec_corei7 OBJECT ${hs_exec_SRCS}) list(APPEND RUNTIME_LIBS $) set_target_properties(hs_exec_corei7 PROPERTIES COMPILE_FLAGS "-march=corei7 -msse4.2" RULE_LAUNCH_COMPILE "${BUILD_WRAPPER} corei7 ${CMAKE_MODULE_PATH}/keep.syms.in" ) if (BUILD_AVX2) add_library(hs_exec_avx2 OBJECT ${hs_exec_SRCS} ${hs_exec_avx2_SRCS}) list(APPEND RUNTIME_LIBS $) set_target_properties(hs_exec_avx2 PROPERTIES COMPILE_FLAGS "-march=core-avx2 -mavx2" RULE_LAUNCH_COMPILE "${BUILD_WRAPPER} avx2 ${CMAKE_MODULE_PATH}/keep.syms.in" ) endif (BUILD_AVX2) if (BUILD_AVX512) add_library(hs_exec_avx512 OBJECT ${hs_exec_SRCS} ${hs_exec_avx2_SRCS}) list(APPEND RUNTIME_LIBS $) set_target_properties(hs_exec_avx512 PROPERTIES COMPILE_FLAGS "${SKYLAKE_FLAG}" RULE_LAUNCH_COMPILE "${BUILD_WRAPPER} avx512 ${CMAKE_MODULE_PATH}/keep.syms.in" ) endif (BUILD_AVX512) if (BUILD_AVX512VBMI) add_library(hs_exec_avx512vbmi OBJECT ${hs_exec_SRCS} ${hs_exec_avx2_SRCS}) list(APPEND RUNTIME_LIBS $) set_target_properties(hs_exec_avx512vbmi PROPERTIES COMPILE_FLAGS "${ICELAKE_FLAG}" RULE_LAUNCH_COMPILE "${BUILD_WRAPPER} avx512vbmi ${CMAKE_MODULE_PATH}/keep.syms.in" ) endif (BUILD_AVX512VBMI) add_library(hs_exec_common OBJECT ${hs_exec_common_SRCS} src/dispatcher.c ) # hs_version.c is added explicitly to avoid some build systems that refuse to # create a lib without any src (I'm looking at you Xcode) add_library(hs_runtime STATIC src/hs_version.c $ ${RUNTIME_LIBS}) set_target_properties(hs_runtime PROPERTIES LINKER_LANGUAGE C) add_library(hs_compile OBJECT ${hs_compile_SRCS}) # we want the static lib for testing add_library(hs STATIC src/hs_version.c src/hs_valid_platform.c $ $ ${RUNTIME_LIBS}) endif (BUILD_STATIC_LIBS) if (BUILD_SHARED_LIBS) # build shared libs add_library(hs_compile_shared OBJECT ${hs_compile_SRCS}) set_target_properties(hs_compile_shared PROPERTIES POSITION_INDEPENDENT_CODE TRUE) add_library(hs_exec_shared_core2 OBJECT ${hs_exec_SRCS}) list(APPEND RUNTIME_SHLIBS $) set_target_properties(hs_exec_shared_core2 PROPERTIES COMPILE_FLAGS "-march=core2 -msse4.2" POSITION_INDEPENDENT_CODE TRUE RULE_LAUNCH_COMPILE "${BUILD_WRAPPER} core2 ${CMAKE_MODULE_PATH}/keep.syms.in" ) add_library(hs_exec_shared_corei7 OBJECT ${hs_exec_SRCS}) list(APPEND RUNTIME_SHLIBS $) set_target_properties(hs_exec_shared_corei7 PROPERTIES COMPILE_FLAGS "-march=corei7 -msse4.2" POSITION_INDEPENDENT_CODE TRUE RULE_LAUNCH_COMPILE "${BUILD_WRAPPER} corei7 ${CMAKE_MODULE_PATH}/keep.syms.in" ) if (BUILD_AVX2) add_library(hs_exec_shared_avx2 OBJECT ${hs_exec_SRCS} ${hs_exec_avx2_SRCS}) list(APPEND RUNTIME_SHLIBS $) set_target_properties(hs_exec_shared_avx2 PROPERTIES COMPILE_FLAGS "-march=core-avx2 -mavx2" POSITION_INDEPENDENT_CODE TRUE RULE_LAUNCH_COMPILE "${BUILD_WRAPPER} avx2 ${CMAKE_MODULE_PATH}/keep.syms.in" ) endif (BUILD_AVX2) if (BUILD_AVX512) add_library(hs_exec_shared_avx512 OBJECT ${hs_exec_SRCS} ${hs_exec_avx2_SRCS}) list(APPEND RUNTIME_SHLIBS $) set_target_properties(hs_exec_shared_avx512 PROPERTIES COMPILE_FLAGS "${SKYLAKE_FLAG}" POSITION_INDEPENDENT_CODE TRUE RULE_LAUNCH_COMPILE "${BUILD_WRAPPER} avx512 ${CMAKE_MODULE_PATH}/keep.syms.in" ) endif (BUILD_AVX512) if (BUILD_AVX512VBMI) add_library(hs_exec_shared_avx512vbmi OBJECT ${hs_exec_SRCS} ${hs_exec_avx2_SRCS}) list(APPEND RUNTIME_SHLIBS $) set_target_properties(hs_exec_shared_avx512vbmi PROPERTIES COMPILE_FLAGS "${ICELAKE_FLAG}" POSITION_INDEPENDENT_CODE TRUE RULE_LAUNCH_COMPILE "${BUILD_WRAPPER} avx512vbmi ${CMAKE_MODULE_PATH}/keep.syms.in" ) endif (BUILD_AVX512VBMI) add_library(hs_exec_common_shared OBJECT ${hs_exec_common_SRCS} src/dispatcher.c ) set_target_properties(hs_exec_common_shared PROPERTIES OUTPUT_NAME hs_exec_common POSITION_INDEPENDENT_CODE TRUE) endif() # SHARED endif (ARCH_IA32 OR ARCH_X86_64) if (ARCH_AARCH64) set(BUILD_WRAPPER "${PROJECT_SOURCE_DIR}/cmake/build_wrapper.sh") if (BUILD_STATIC_LIBS) add_library(hs_exec_neon OBJECT ${hs_exec_SRCS} ${hs_exec_neon_SRCS}) list(APPEND RUNTIME_LIBS $) set_target_properties(hs_exec_neon PROPERTIES COMPILE_FLAGS "-march=${ARMV8_ARCH}" RULE_LAUNCH_COMPILE "${BUILD_WRAPPER} neon ${CMAKE_MODULE_PATH}/keep.syms.in" ) add_library(hs_exec_sve OBJECT ${hs_exec_SRCS} ${hs_exec_sve_SRCS}) list(APPEND RUNTIME_LIBS $) set_target_properties(hs_exec_sve PROPERTIES COMPILE_FLAGS "-march=${SVE_ARCH}" RULE_LAUNCH_COMPILE "${BUILD_WRAPPER} sve ${CMAKE_MODULE_PATH}/keep.syms.in" ) add_library(hs_exec_sve2 OBJECT ${hs_exec_SRCS} ${hs_exec_sve2_SRCS}) list(APPEND RUNTIME_LIBS $) set_target_properties(hs_exec_sve2 PROPERTIES COMPILE_FLAGS "-march=${SVE2_BITPERM_ARCH}" RULE_LAUNCH_COMPILE "${BUILD_WRAPPER} sve2 ${CMAKE_MODULE_PATH}/keep.syms.in" ) add_library(hs_exec_common OBJECT ${hs_exec_common_SRCS} src/dispatcher.c ) # hs_version.c is added explicitly to avoid some build systems that refuse to # create a lib without any src (I'm looking at you Xcode) add_library(hs_runtime STATIC src/hs_version.c $ ${RUNTIME_LIBS}) set_target_properties(hs_runtime PROPERTIES LINKER_LANGUAGE C) add_library(hs_compile OBJECT ${hs_compile_SRCS}) # we want the static lib for testing add_library(hs STATIC src/hs_version.c src/hs_valid_platform.c $ $ ${RUNTIME_LIBS}) endif (BUILD_STATIC_LIBS) if (BUILD_SHARED_LIBS) # build shared libs add_library(hs_compile_shared OBJECT ${hs_compile_SRCS}) set_target_properties(hs_compile_shared PROPERTIES POSITION_INDEPENDENT_CODE TRUE) add_library(hs_exec_shared_neon OBJECT ${hs_exec_SRCS} ${hs_exec_neon_SRCS}) list(APPEND RUNTIME_SHLIBS $) set_target_properties(hs_exec_shared_neon PROPERTIES COMPILE_FLAGS "-march=${ARMV8_ARCH}" POSITION_INDEPENDENT_CODE TRUE RULE_LAUNCH_COMPILE "${BUILD_WRAPPER} neon ${CMAKE_MODULE_PATH}/keep.syms.in" ) add_library(hs_exec_shared_sve OBJECT ${hs_exec_SRCS} ${hs_exec_sve_SRCS}) list(APPEND RUNTIME_SHLIBS $) set_target_properties(hs_exec_shared_sve PROPERTIES COMPILE_FLAGS "-march=${SVE_ARCH}" POSITION_INDEPENDENT_CODE TRUE RULE_LAUNCH_COMPILE "${BUILD_WRAPPER} sve ${CMAKE_MODULE_PATH}/keep.syms.in" ) add_library(hs_exec_shared_sve2 OBJECT ${hs_exec_SRCS} ${hs_exec_sve2_SRCS}) list(APPEND RUNTIME_SHLIBS $) set_target_properties(hs_exec_shared_sve2 PROPERTIES COMPILE_FLAGS "-march=${SVE2_BITPERM_ARCH}" POSITION_INDEPENDENT_CODE TRUE RULE_LAUNCH_COMPILE "${BUILD_WRAPPER} sve2 ${CMAKE_MODULE_PATH}/keep.syms.in" ) add_library(hs_exec_common_shared OBJECT ${hs_exec_common_SRCS} src/dispatcher.c ) set_target_properties(hs_exec_common_shared PROPERTIES OUTPUT_NAME hs_exec_common POSITION_INDEPENDENT_CODE TRUE) endif() # SHARED endif (ARCH_AARCH64) endif (NOT FAT_RUNTIME) if (BUILD_STATIC_LIBS) install(TARGETS hs_runtime DESTINATION ${CMAKE_INSTALL_LIBDIR}) endif() if (BUILD_SHARED_LIBS) if (NOT FAT_RUNTIME) add_library(hs_runtime_shared SHARED src/hs_version.c src/hs_valid_platform.c $ hs_runtime.def) else() add_library(hs_runtime_shared SHARED src/hs_version.c src/hs_valid_platform.c $ ${RUNTIME_SHLIBS} hs_runtime.def) endif() set_target_properties(hs_runtime_shared PROPERTIES VERSION ${LIB_VERSION} SOVERSION ${LIB_SOVERSION} OUTPUT_NAME hs_runtime MACOSX_RPATH ON LINKER_LANGUAGE C) install(TARGETS hs_runtime_shared RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR} LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}) endif() if (BUILD_STATIC_LIBS) add_dependencies(hs ragel_Parser) endif () if (BUILD_STATIC_LIBS) install(TARGETS hs DESTINATION ${CMAKE_INSTALL_LIBDIR}) endif() if (BUILD_SHARED_LIBS) set(hs_shared_SRCS src/hs_version.c src/hs_valid_platform.c $) if (NOT FAT_RUNTIME) set(hs_shared_SRCS ${hs_shared_SRCS} $) else () set(hs_shared_SRCS ${hs_shared_SRCS} $ ${RUNTIME_SHLIBS}) endif () add_library(hs_shared SHARED ${hs_shared_SRCS} hs.def) add_dependencies(hs_shared ragel_Parser) set_target_properties(hs_shared PROPERTIES OUTPUT_NAME hs VERSION ${LIB_VERSION} SOVERSION ${LIB_SOVERSION} MACOSX_RPATH ON) install(TARGETS hs_shared RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR} LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}) endif() # used by tools and other targets if (NOT BUILD_STATIC_LIBS) # use shared lib without having to change all the targets add_library(hs ALIAS hs_shared) endif () add_subdirectory(util) add_subdirectory(unit) if (EXISTS ${CMAKE_SOURCE_DIR}/tools/CMakeLists.txt) add_subdirectory(tools) endif() if (EXISTS ${CMAKE_SOURCE_DIR}/chimera/CMakeLists.txt AND BUILD_CHIMERA) add_subdirectory(chimera) endif() option(BUILD_EXAMPLES "Build Hyperscan example code (default TRUE)" TRUE) if(BUILD_EXAMPLES) add_subdirectory(examples) endif() option(BUILD_BENCHMARKS "Build benchmarks (default TRUE)" TRUE) if(BUILD_BENCHMARKS) add_subdirectory(benchmarks) endif() add_subdirectory(doc/dev-reference) vectorscan-5.4.11/COPYING000066400000000000000000000027531452711272000150140ustar00rootroot00000000000000Copyright (c) 2015, Intel Corporation Copyright (c) 2019-20, VectorCamp PC Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. vectorscan-5.4.11/Contributors-vectorscan.md000066400000000000000000000021321452711272000211340ustar00rootroot00000000000000 394 Konstantinos Margaritis 59 apostolos 25 Hong, Yang A 19 George Wort 16 Chang, Harry 7 Danila Kutenin 7 Wang Xiang W 6 Alex Bondarev 5 Konstantinos Margaritis 3 Duncan Bellamy 2 Azat Khuzhin 2 Jan Henning 1 BigRedEye 1 Daniel Kutenin 1 Danila Kutenin 1 Liu Zixian 1 Mitchell Wasson 1 Piotr Skamruk 1 Robbie Williamson 1 Robert Schulze 1 Walt Stoneburner 1 Zhu,Wenjun 1 hongyang7 1 jplaisance 1 liquidaty vectorscan-5.4.11/LICENSE000066400000000000000000000137571452711272000147740ustar00rootroot00000000000000Hyperscan is licensed under the BSD License. Copyright (c) 2015, Intel Corporation Vectorscan is licensed under the BSD License. Copyright (c) 2020, VectorCamp PC Copyright (c) 2021, Arm Limited Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -------------------------------------------------------------------------------- This product also contains code from third parties, under the following licenses: Intel's Slicing-by-8 CRC32 implementation ----------------------------------------- Copyright (c) 2004-2006, Intel Corporation All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Boost C++ Headers Library ------------------------- Boost Software License - Version 1.0 - August 17th, 2003 Permission is hereby granted, free of charge, to any person or organization obtaining a copy of the software and accompanying documentation covered by this license (the "Software") to use, reproduce, display, distribute, execute, and transmit the Software, and to prepare derivative works of the Software, and to permit third-parties to whom the Software is furnished to do so, all subject to the following: The copyright notices in the Software and this entire statement, including the above license grant, this restriction and the following disclaimer, must be included in all copies of the Software, in whole or in part, and all derivative works of the Software, unless such copies or derivative works are solely in the form of machine-executable object code generated by a source language processor. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. The Google C++ Testing Framework (Google Test) ---------------------------------------------- Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Google Inc. nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. vectorscan-5.4.11/README.md000066400000000000000000000166541452711272000152450ustar00rootroot00000000000000# About Vectorscan A fork of Intel's Hyperscan, modified to run on more platforms. Currently ARM NEON/ASIMD is 100% functional, and Power VSX are in development. ARM SVE2 support is in ongoing with access to hardware now. More platforms will follow in the future. Vectorscan will follow Intel's API and internal algorithms where possible, but will not hesitate to make code changes where it is thought of giving better performance or better portability. In addition, the code will be gradually simplified and made more uniform and all architecture specific -currently Intel- #ifdefs will be removed and abstracted away. # Why was there a need for a fork? Originally, the ARM porting was intended to be merged into Intel's own Hyperscan, and relevant Pull Requests were made to the project for this reason. Unfortunately, the PRs were rejected for now and the forseeable future, thus we have created Vectorscan for our own multi-architectural and opensource collaborative needs. The recent license change of Hyperscan makes Vectorscan even more relevant for the FLOSS ecosystem. # What is Vectorscan/Hyperscan/? Hyperscan and by extension Vectorscan is a high-performance multiple regex matching library. It follows the regular expression syntax of the commonly-used libpcre library, but is a standalone library with its own C API. Hyperscan/Vectorscan uses hybrid automata techniques to allow simultaneous matching of large numbers (up to tens of thousands) of regular expressions and for the matching of regular expressions across streams of data. Vectorscan is typically used in a DPI library stack, just like Hyperscan. # License Vectorscan follows a BSD License like the original Hyperscan (up to 5.4). Vectorscan continues to be an open source project and we are committed to keep it that way. See the LICENSE file in the project repository. ## Hyperscan License Change after 5.4 According to [Accelerate Snort Performance with Hyperscan and Intel Xeon Processors on Public Clouds](https://networkbuilders.intel.com/docs/networkbuilders/accelerate-snort-performance-with-hyperscan-and-intel-xeon-processors-on-public-clouds-1680176363.pdf) versions of Hyperscan later than 5.4 are going to be closed-source: > The latest open-source version (BSD-3 license) of Hyperscan on Github is 5.4. Intel conducts continuous internal > development and delivers new Hyperscan releases under Intel Proprietary License (IPL) beginning from 5.5 for interested > customers. Please contact authors to learn more about getting new Hyperscan releases. # Versioning The `master` branch on Github will always contain the most recent stable release of Hyperscan. Each version released to `master` goes through QA and testing before it is released; if you're a user, rather than a developer, this is the version you should be using. Further development towards the next release takes place on the `develop` branch. All PRs are first made against the develop branch and if the pass the [Vectorscan CI](https://buildbot-ci.vectorcamp.gr/#/grid), then they get merged. Similarly with PRs from develop to master. # Compatibility with Hyperscan Vectorscan aims to be ABI and API compatible with the last open source version of Intel Hyperscan 5.4. After careful consideration we decided that we will **NOT** aim to achieving compatibility with later Hyperscan versions 5.5/5.6 that have extended Hyperscan's API. If keeping up to date with latest API of Hyperscan, you should talk to Intel and get a license to use that. However, we intend to extend Vectorscan's API with user requested changes or API extensions and improvements that we think are best for the project. # Installation ## Debian/Ubuntu On recent Debian/Ubuntu systems, vectorscan should be directly available for installation: ``` $ sudo apt install libvectorscan5 ``` Or to install the devel package you can install `libvectorscan-dev` package: ``` $ sudo apt install libvectorscan-dev ``` For other distributions/OSes please check the [Wiki](https://github.com/VectorCamp/vectorscan/wiki/Installation-from-package) # Build Instructions The build system has recently been refactored to be more modular and easier to extend. For that reason, some small but necessary changes were made that might break compatibility with how Hyperscan was built. ## Install Common Dependencies ### Debian/Ubuntu In order to build on Debian/Ubuntu make sure you install the following build-dependencies ``` $ sudo apt build-essential cmake ragel pkg-config libsqlite3-dev libpcap-dev ``` ### Other distributions TBD ### MacOS X (M1/M2/M3 CPUs only) Assuming an existing HomeBrew installation: ``` % brew install boost cmake gcc libpcap pkg-config ragel sqlite ``` ## Configure & build In order to configure with `cmake` first create and cd into a build directory: ``` $ mkdir build $ cd build ``` Then call `cmake` from inside the `build` directory: ``` $ cmake ../ ``` Common options for Cmake are: * `-DBUILD_STATIC_LIBS=[On|Off]` Build static libraries * `-DBUILD_SHARED_LIBS=[On|Off]` Build shared libraries (if none are set static libraries are built by default) * `-DCMAKE_BUILD_TYPE=[Release|Debug|RelWithDebInfo|MinSizeRel]` Configure build type and determine optimizations and certain features. * `-DUSE_CPU_NATIVE=[On|Off]` Native CPU detection is off by default, however it is possible to build a performance-oriented non-fat library tuned to your CPU * `-DFAT_RUNTIME=[On|Off]` Fat Runtime is only available for X86 32-bit/64-bit and AArch64 architectures and only on Linux. It is incompatible with `Debug` type and `USE_CPU_NATIVE`. ### Specific options for X86 32-bit/64-bit (Intel/AMD) CPUs * `-DBUILD_AVX2=[On|Off]` Enable code for AVX2. * `-DBUILD_AVX512=[On|Off]` Enable code for AVX512. Implies `BUILD_AVX2`. * `-DBUILD_AVX512VBMI=[On|Off]` Enable code for AVX512 with VBMI extension. Implies `BUILD_AVX512`. ### Specific options for Arm 64-bit CPUs * `-DBUILD_SVE=[On|Off]` Enable code for SVE, like on AWS Graviton3 CPUs. Not much code is ported just for SVE , but enabling SVE code production, does improve code generation, see [Benchmarks](https://github.com/VectorCamp/vectorscan/wiki/Benchmarks). * `-DBUILD_SVE2=[On|Off]` Enable code for SVE2, implies `BUILD_SVE`. Most non-Neon code is written for SVE2 * `-DBUILD_SVE2_BITPERM=[On|Off]` Enable code for SVE2_BITPERM harwdare feature, implies `BUILD_SVE2`. ## Other options * `SANITIZE=[address|memory|undefined]` (experimental) Use `libasan` sanitizer to detect possible bugs. For now only `address` is tested. This will eventually be integrated in the CI. ## Build If `cmake` has completed successfully you can run `make` in the same directory, if you have a multi-core system with `N` cores, running ``` $ make -j ``` will speed up the process. If all goes well, you should have the vectorscan library compiled. # Contributions The official homepage for Vectorscan is at [www.github.com/VectorCamp/vectorscan](https://www.github.com/VectorCamp/vectorscan). # Vectorscan Development All development of Vectorscan is done in public. # Original Hyperscan links For reference, the official homepage for Hyperscan is at [www.hyperscan.io](https://www.hyperscan.io). # Hyperscan Documentation Information on building the Hyperscan library and using its API is available in the [Developer Reference Guide](http://intel.github.io/hyperscan/dev-reference/). And you can find the source code [on Github](https://github.com/intel/hyperscan). For Intel Hyperscan related issues and questions, please follow the relevant links there.vectorscan-5.4.11/benchmarks/000077500000000000000000000000001452711272000160675ustar00rootroot00000000000000vectorscan-5.4.11/benchmarks/CMakeLists.txt000066400000000000000000000004151452711272000206270ustar00rootroot00000000000000if (NOT FAT_RUNTIME AND (BUILD_STATIC_AND_SHARED OR BUILD_STATIC_LIBS)) add_executable(benchmarks benchmarks.cpp) set_source_files_properties(benchmarks.cpp PROPERTIES COMPILE_FLAGS "-Wall -Wno-unused-variable") target_link_libraries(benchmarks hs) endif() vectorscan-5.4.11/benchmarks/benchmarks.cpp000066400000000000000000000241111452711272000207070ustar00rootroot00000000000000/* * Copyright (c) 2020, 2021, VectorCamp PC * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * * Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * * Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * Neither the name of Intel Corporation nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ #include #include #include #include #include #include #include #include "benchmarks.hpp" #define MAX_LOOPS 1000000000 #define MAX_MATCHES 5 #define N 8 struct hlmMatchEntry { size_t to; u32 id; hlmMatchEntry(size_t end, u32 identifier) : to(end), id(identifier) {} }; std::vector ctxt; static hwlmcb_rv_t hlmSimpleCallback(size_t to, u32 id, UNUSED struct hs_scratch *scratch) { DEBUG_PRINTF("match @%zu = %u\n", to, id); ctxt.push_back(hlmMatchEntry(to, id)); return HWLM_CONTINUE_MATCHING; } template static void run_benchmarks(int size, int loops, int max_matches, bool is_reverse, MicroBenchmark &bench, InitFunc &&init, BenchFunc &&func) { init(bench); double total_sec = 0.0; u64a total_size = 0; double bw = 0.0; double avg_bw = 0.0; double max_bw = 0.0; double avg_time = 0.0; if (max_matches) { int pos = 0; for(int j = 0; j < max_matches - 1; j++) { bench.buf[pos] = 'b'; pos = (j+1) *size / max_matches ; bench.buf[pos] = 'a'; u64a actual_size = 0; auto start = std::chrono::steady_clock::now(); for(int i = 0; i < loops; i++) { const u8 *res = func(bench); if (is_reverse) actual_size += bench.buf.data() + size - res; else actual_size += res - bench.buf.data(); } auto end = std::chrono::steady_clock::now(); double dt = std::chrono::duration_cast(end - start).count(); total_sec += dt; /*convert microseconds to seconds*/ /*calculate bandwidth*/ bw = (actual_size / dt) * 1000000.0 / 1048576.0; /*std::cout << "act_size = " << act_size << std::endl; std::cout << "dt = " << dt << std::endl; std::cout << "bw = " << bw << std::endl;*/ avg_bw += bw; /*convert to MB/s*/ max_bw = std::max(bw, max_bw); /*calculate average time*/ avg_time += total_sec / loops; } avg_time /= max_matches; avg_bw /= max_matches; total_sec /= 1000000.0; /*convert average time to us*/ printf(KMAG "%s: %u matches, %u * %u iterations," KBLU " total elapsed time =" RST " %.3f s, " KBLU "average time per call =" RST " %.3f μs," KBLU " max bandwidth = " RST " %.3f MB/s," KBLU " average bandwidth =" RST " %.3f MB/s \n", bench.label, max_matches, size ,loops, total_sec, avg_time, max_bw, avg_bw); } else { auto start = std::chrono::steady_clock::now(); for (int i = 0; i < loops; i++) { const u8 *res = func(bench); } auto end = std::chrono::steady_clock::now(); total_sec += std::chrono::duration_cast(end - start).count(); /*calculate transferred size*/ total_size = size * loops; /*calculate average time*/ avg_time = total_sec / loops; /*convert microseconds to seconds*/ total_sec /= 1000000.0; /*calculate maximum bandwidth*/ max_bw = total_size / total_sec; /*convert to MB/s*/ max_bw /= 1048576.0; printf(KMAG "%s: no matches, %u * %u iterations," KBLU " total elapsed time =" RST " %.3f s, " KBLU "average time per call =" RST " %.3f μs ," KBLU " bandwidth = " RST " %.3f MB/s \n", bench.label, size ,loops, total_sec, avg_time, max_bw ); } } int main(){ int matches[] = {0, MAX_MATCHES}; std::vector sizes; for (size_t i = 0; i < N; i++) sizes.push_back(16000 << i*2); const char charset[] = "aAaAaAaAAAaaaaAAAAaaaaAAAAAAaaaAAaaa"; for (int m = 0; m < 2; m++) { for (size_t i = 0; i < std::size(sizes); i++) { MicroBenchmark bench("Shufti", sizes[i]); run_benchmarks(sizes[i], MAX_LOOPS / sizes[i], matches[m], false, bench, [&](MicroBenchmark &b) { b.chars.set('a'); ue2::shuftiBuildMasks(b.chars, (u8 *)&b.lo, (u8 *)&b.hi); memset(b.buf.data(), 'b', b.size); }, [&](MicroBenchmark &b) { return shuftiExec(b.lo, b.hi, b.buf.data(), b.buf.data() + b.size); } ); } for (size_t i = 0; i < std::size(sizes); i++) { MicroBenchmark bench("Reverse Shufti", sizes[i]); run_benchmarks(sizes[i], MAX_LOOPS / sizes[i], matches[m], true, bench, [&](MicroBenchmark &b) { b.chars.set('a'); ue2::shuftiBuildMasks(b.chars, (u8 *)&b.lo, (u8 *)&b.hi); memset(b.buf.data(), 'b', b.size); }, [&](MicroBenchmark &b) { return rshuftiExec(b.lo, b.hi, b.buf.data(), b.buf.data() + b.size); } ); } for (size_t i = 0; i < std::size(sizes); i++) { MicroBenchmark bench("Truffle", sizes[i]); run_benchmarks(sizes[i], MAX_LOOPS / sizes[i], matches[m], false, bench, [&](MicroBenchmark &b) { b.chars.set('a'); ue2::truffleBuildMasks(b.chars, (u8 *)&b.lo, (u8 *)&b.hi); memset(b.buf.data(), 'b', b.size); }, [&](MicroBenchmark &b) { return truffleExec(b.lo, b.hi, b.buf.data(), b.buf.data() + b.size); } ); } for (size_t i = 0; i < std::size(sizes); i++) { MicroBenchmark bench("Reverse Truffle", sizes[i]); run_benchmarks(sizes[i], MAX_LOOPS / sizes[i], matches[m], true, bench, [&](MicroBenchmark &b) { b.chars.set('a'); ue2::truffleBuildMasks(b.chars, (u8 *)&b.lo, (u8 *)&b.hi); memset(b.buf.data(), 'b', b.size); }, [&](MicroBenchmark &b) { return rtruffleExec(b.lo, b.hi, b.buf.data(), b.buf.data() + b.size); } ); } for (size_t i = 0; i < std::size(sizes); i++) { MicroBenchmark bench("Vermicelli", sizes[i]); run_benchmarks(sizes[i], MAX_LOOPS / sizes[i], matches[m], false, bench, [&](MicroBenchmark &b) { b.chars.set('a'); ue2::truffleBuildMasks(b.chars, (u8 *)&b.lo, (u8 *)&b.hi); memset(b.buf.data(), 'b', b.size); }, [&](MicroBenchmark &b) { return vermicelliExec('a', 'b', b.buf.data(), b.buf.data() + b.size); } ); } for (size_t i = 0; i < std::size(sizes); i++) { MicroBenchmark bench("Reverse Vermicelli", sizes[i]); run_benchmarks(sizes[i], MAX_LOOPS / sizes[i], matches[m], true, bench, [&](MicroBenchmark &b) { b.chars.set('a'); ue2::truffleBuildMasks(b.chars, (u8 *)&b.lo, (u8 *)&b.hi); memset(b.buf.data(), 'b', b.size); }, [&](MicroBenchmark &b) { return rvermicelliExec('a', 'b', b.buf.data(), b.buf.data() + b.size); } ); } for (size_t i = 0; i < std::size(sizes); i++) { //we imitate the noodle unit tests std::string str; const size_t char_len = 5; str.resize(char_len + 1); for (size_t j=0; j < char_len; j++) { srand (time(NULL)); int key = rand() % + 36 ; str[char_len] = charset[key]; str[char_len + 1] = '\0'; } MicroBenchmark bench("Noodle", sizes[i]); run_benchmarks(sizes[i], MAX_LOOPS / sizes[i], matches[m], false, bench, [&](MicroBenchmark &b) { ctxt.clear(); memset(b.buf.data(), 'a', b.size); u32 id = 1000; ue2::hwlmLiteral lit(str, true, id); b.nt = ue2::noodBuildTable(lit); assert(b.nt != nullptr); }, [&](MicroBenchmark &b) { noodExec(b.nt.get(), b.buf.data(), b.size, 0, hlmSimpleCallback, &b.scratch); return b.buf.data() + b.size; } ); } } return 0; } vectorscan-5.4.11/benchmarks/benchmarks.hpp000066400000000000000000000046201452711272000207170ustar00rootroot00000000000000/* * Copyright (c) 2020, 2021, VectorCamp PC * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * * Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * * Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * Neither the name of Intel Corporation nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ #include "nfa/shufti.h" #include "nfa/shufticompile.h" #include "nfa/truffle.h" #include "nfa/trufflecompile.h" #include "nfa/vermicelli.hpp" #include "hwlm/noodle_build.h" #include "hwlm/noodle_engine.h" #include "hwlm/noodle_internal.h" #include "hwlm/hwlm_literal.h" #include "util/bytecode_ptr.h" #include "scratch.h" /*define colour control characters*/ #define RST "\x1B[0m" #define KRED "\x1B[31m" #define KGRN "\x1B[32m" #define KYEL "\x1B[33m" #define KBLU "\x1B[34m" #define KMAG "\x1B[35m" #define KCYN "\x1B[36m" #define KWHT "\x1B[37m" class MicroBenchmark { public: char const *label; size_t size; // Shufti/Truffle m128 lo, hi; ue2::CharReach chars; std::vector buf; // Noodle struct hs_scratch scratch; ue2::bytecode_ptr nt; MicroBenchmark(char const *label_, size_t size_) :label(label_), size(size_), buf(size_) { }; }; vectorscan-5.4.11/chimera/000077500000000000000000000000001452711272000153625ustar00rootroot00000000000000vectorscan-5.4.11/chimera/CMakeLists.txt000066400000000000000000000023651452711272000201300ustar00rootroot00000000000000# Chimera lib include_directories(${PCRE_INCLUDE_DIRS}) # only set these after all tests are done set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${EXTRA_C_FLAGS}") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${EXTRA_CXX_FLAGS}") SET(chimera_HEADERS ch.h ch_common.h ch_compile.h ch_runtime.h ) install(FILES ${chimera_HEADERS} DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}/hs") SET(chimera_SRCS ${chimera_HEADERS} ch_alloc.c ch_alloc.h ch_compile.cpp ch_database.c ch_database.h ch_internal.h ch_runtime.c ch_scratch.h ch_scratch.c ) add_library(chimera STATIC ${chimera_SRCS}) add_dependencies(chimera hs pcre) target_link_libraries(chimera hs pcre) install(TARGETS chimera DESTINATION ${CMAKE_INSTALL_LIBDIR}) # expand out library names for pkgconfig static link info foreach (LIB ${CMAKE_CXX_IMPLICIT_LINK_LIBRARIES}) # this is fragile, but protects us from toolchain specific files if (NOT EXISTS ${LIB}) set(PRIVATE_LIBS "${PRIVATE_LIBS} -l${LIB}") endif() endforeach() set(PRIVATE_LIBS "${PRIVATE_LIBS} -L${LIBDIR} -lpcre") configure_file(libch.pc.in libch.pc @ONLY) # only replace @ quoted vars install(FILES ${CMAKE_BINARY_DIR}/chimera/libch.pc DESTINATION "${CMAKE_INSTALL_LIBDIR}/pkgconfig")vectorscan-5.4.11/chimera/ch.h000066400000000000000000000035701452711272000161320ustar00rootroot00000000000000/* * Copyright (c) 2018, Intel Corporation * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * * Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * * Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * Neither the name of Intel Corporation nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ #ifndef CH_H_ #define CH_H_ /** * @file * @brief The complete Chimera API definition. * * Chimera is a hybrid solution of Hyperscan and PCRE. * * This header includes both the Chimera compiler and runtime components. See * the individual component headers for documentation. */ #include "ch_compile.h" #include "ch_runtime.h" #endif /* CH_H_ */ vectorscan-5.4.11/chimera/ch_alloc.c000066400000000000000000000072551452711272000173030ustar00rootroot00000000000000/* * Copyright (c) 2018, Intel Corporation * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * * Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * * Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * Neither the name of Intel Corporation nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ /** \file * \brief Runtime functions for setting custom allocators. */ #include "ch.h" #include "ch_common.h" #include "ch_internal.h" #include "hs.h" #include "ue2common.h" #define default_malloc malloc #define default_free free ch_alloc_t ch_database_alloc = default_malloc; ch_alloc_t ch_misc_alloc = default_malloc; ch_alloc_t ch_scratch_alloc = default_malloc; ch_free_t ch_database_free = default_free; ch_free_t ch_misc_free = default_free; ch_free_t ch_scratch_free = default_free; static ch_alloc_t normalise_alloc(ch_alloc_t a) { if (!a) { return default_malloc; } else { return a; } } static ch_free_t normalise_free(ch_free_t f) { if (!f) { return default_free; } else { return f; } } HS_PUBLIC_API ch_error_t HS_CDECL ch_set_allocator(ch_alloc_t allocfunc, ch_free_t freefunc) { ch_set_database_allocator(allocfunc, freefunc); ch_set_misc_allocator(allocfunc, freefunc); ch_set_scratch_allocator(allocfunc, freefunc); // Set core Hyperscan alloc/free. hs_error_t ret = hs_set_allocator(allocfunc, freefunc); return ret; } HS_PUBLIC_API ch_error_t HS_CDECL ch_set_database_allocator(ch_alloc_t allocfunc, ch_free_t freefunc) { ch_database_alloc = normalise_alloc(allocfunc); ch_database_free = normalise_free(freefunc); // Set Hyperscan database alloc/free. return hs_set_database_allocator(allocfunc, freefunc); } HS_PUBLIC_API ch_error_t HS_CDECL ch_set_misc_allocator(ch_alloc_t allocfunc, ch_free_t freefunc) { ch_misc_alloc = normalise_alloc(allocfunc); ch_misc_free = normalise_free(freefunc); // Set Hyperscan misc alloc/free. return hs_set_misc_allocator(allocfunc, freefunc); } HS_PUBLIC_API ch_error_t HS_CDECL ch_set_scratch_allocator(ch_alloc_t allocfunc, ch_free_t freefunc) { ch_scratch_alloc = normalise_alloc(allocfunc); ch_scratch_free = normalise_free(freefunc); // Set Hyperscan scratch alloc/free. return hs_set_scratch_allocator(allocfunc, freefunc); } vectorscan-5.4.11/chimera/ch_alloc.h000066400000000000000000000045321452711272000173030ustar00rootroot00000000000000/* * Copyright (c) 2018, Intel Corporation * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * * Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * * Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * Neither the name of Intel Corporation nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ #ifndef CH_ALLOC_H #define CH_ALLOC_H #include "hs_common.h" #include "ue2common.h" #include "ch_common.h" #ifdef __cplusplus extern "C" { #endif extern hs_alloc_t ch_database_alloc; extern hs_alloc_t ch_misc_alloc; extern hs_alloc_t ch_scratch_alloc; extern hs_free_t ch_database_free; extern hs_free_t ch_misc_free; extern hs_free_t ch_scratch_free; #ifdef __cplusplus } /* extern C */ #endif /** \brief Check the results of an alloc done with hs_alloc for alignment. * * If we have incorrect alignment, return an error. Caller should free the * offending block. */ static really_inline ch_error_t ch_check_alloc(const void *mem) { ch_error_t ret = CH_SUCCESS; if (!mem) { ret = CH_NOMEM; } else if (!ISALIGNED_N(mem, alignof(unsigned long long))) { ret = CH_BAD_ALLOC; } return ret; } #endif vectorscan-5.4.11/chimera/ch_common.h000066400000000000000000000273631452711272000175100ustar00rootroot00000000000000/* * Copyright (c) 2018-2020, Intel Corporation * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * * Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * * Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * Neither the name of Intel Corporation nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ #ifndef CH_COMMON_H_ #define CH_COMMON_H_ #include "hs_common.h" #include /** * @file * @brief The Chimera common API definition. * * Chimera is a hybrid of Hyperscan and PCRE. * * This header contains functions available to both the Chimera compiler and * runtime. */ #ifdef __cplusplus extern "C" { #endif struct ch_database; /** * A Chimera pattern database. * * Generated by one of the Chimera compiler functions: * - @ref ch_compile() * - @ref ch_compile_multi() * - @ref ch_compile_ext_multi() */ typedef struct ch_database ch_database_t; /** * A type for errors returned by Chimera functions. */ typedef int ch_error_t; /** * Free a compiled pattern database. * * The free callback set by @ref ch_set_allocator()) will be used by this * function. * * @param db * A compiled pattern database. NULL may also be safely provided, in which * case the function does nothing. * * @return * @ref CH_SUCCESS on success, other values on failure. */ ch_error_t HS_CDECL ch_free_database(ch_database_t *db); /** * Utility function for identifying this release version. * * @return * A string containing the version number of this release build and the * date of the build. It is allocated statically, so it does not need to * be freed by the caller. */ const char * HS_CDECL ch_version(void); /** * Returns the size of the given database. * * @param database * Pointer to compiled expression database. * * @param database_size * On success, the size of the compiled database in bytes is placed in this * parameter. * * @return * @ref CH_SUCCESS on success, other values on failure. */ ch_error_t HS_CDECL ch_database_size(const ch_database_t *database, size_t *database_size); /** * Utility function providing information about a database. * * @param database * Pointer to a compiled database. * * @param info * On success, a string containing the version and platform information for * the supplied database is placed in the parameter. The string is * allocated using the allocator supplied in @ref hs_set_allocator() * (or malloc() if no allocator was set) and should be freed by the caller. * * @return * @ref CH_SUCCESS on success, other values on failure. */ ch_error_t HS_CDECL ch_database_info(const ch_database_t *database, char **info); /** * The type of the callback function that will be used by Chimera to allocate * more memory at runtime as required. * * If Chimera is to be used in a multi-threaded, or similarly concurrent * environment, the allocation function will need to be re-entrant, or * similarly safe for concurrent use. * * @param size * The number of bytes to allocate. * @return * A pointer to the region of memory allocated, or NULL on error. */ typedef void *(HS_CDECL *ch_alloc_t)(size_t size); /** * The type of the callback function that will be used by Chimera to free * memory regions previously allocated using the @ref ch_alloc_t function. * * @param ptr * The region of memory to be freed. */ typedef void (HS_CDECL *ch_free_t)(void *ptr); /** * Set the allocate and free functions used by Chimera for allocating * memory at runtime for stream state, scratch space, database bytecode, * and various other data structure returned by the Chimera API. * * The function is equivalent to calling @ref ch_set_scratch_allocator(), * @ref ch_set_database_allocator() and * @ref ch_set_misc_allocator() with the provided parameters. * * This call will override any previous allocators that have been set. * * Note: there is no way to change the allocator used for temporary objects * created during the various compile calls (@ref ch_compile() and @ref * ch_compile_multi()). * * @param alloc_func * A callback function pointer that allocates memory. This function must * return memory suitably aligned for the largest representable data type * on this platform. * * @param free_func * A callback function pointer that frees allocated memory. * * @return * @ref CH_SUCCESS on success, other values on failure. */ ch_error_t HS_CDECL ch_set_allocator(ch_alloc_t alloc_func, ch_free_t free_func); /** * Set the allocate and free functions used by Chimera for allocating memory * for database bytecode produced by the compile calls (@ref ch_compile() and @ref * ch_compile_multi()). * * If no database allocation functions are set, or if NULL is used in place of * both parameters, then memory allocation will default to standard methods * (such as the system malloc() and free() calls). * * This call will override any previous database allocators that have been set. * * Note: the database allocator may also be set by calling @ref * ch_set_allocator(). * * Note: there is no way to change how temporary objects created during the * various compile calls (@ref ch_compile() and @ref ch_compile_multi()) are * allocated. * * @param alloc_func * A callback function pointer that allocates memory. This function must * return memory suitably aligned for the largest representable data type * on this platform. * * @param free_func * A callback function pointer that frees allocated memory. * * @return * @ref HS_SUCCESS on success, other values on failure. */ ch_error_t HS_CDECL ch_set_database_allocator(ch_alloc_t alloc_func, ch_free_t free_func); /** * Set the allocate and free functions used by Chimera for allocating memory * for items returned by the Chimera API such as @ref ch_compile_error_t. * * If no misc allocation functions are set, or if NULL is used in place of both * parameters, then memory allocation will default to standard methods (such as * the system malloc() and free() calls). * * This call will override any previous misc allocators that have been set. * * Note: the misc allocator may also be set by calling @ref ch_set_allocator(). * * @param alloc_func * A callback function pointer that allocates memory. This function must * return memory suitably aligned for the largest representable data type * on this platform. * * @param free_func * A callback function pointer that frees allocated memory. * * @return * @ref CH_SUCCESS on success, other values on failure. */ ch_error_t HS_CDECL ch_set_misc_allocator(ch_alloc_t alloc_func, ch_free_t free_func); /** * Set the allocate and free functions used by Chimera for allocating memory * for scratch space by @ref ch_alloc_scratch() and @ref ch_clone_scratch(). * * If no scratch allocation functions are set, or if NULL is used in place of * both parameters, then memory allocation will default to standard methods * (such as the system malloc() and free() calls). * * This call will override any previous scratch allocators that have been set. * * Note: the scratch allocator may also be set by calling @ref * ch_set_allocator(). * * @param alloc_func * A callback function pointer that allocates memory. This function must * return memory suitably aligned for the largest representable data type * on this platform. * * @param free_func * A callback function pointer that frees allocated memory. * * @return * @ref CH_SUCCESS on success, other values on failure. */ ch_error_t HS_CDECL ch_set_scratch_allocator(ch_alloc_t alloc_func, ch_free_t free_func); /** * @defgroup CH_ERROR ch_error_t values * * @{ */ /** * The engine completed normally. */ #define CH_SUCCESS 0 /** * A parameter passed to this function was invalid. */ #define CH_INVALID (-1) /** * A memory allocation failed. */ #define CH_NOMEM (-2) /** * The engine was terminated by callback. * * This return value indicates that the target buffer was partially scanned, * but that the callback function requested that scanning cease after a match * was located. */ #define CH_SCAN_TERMINATED (-3) /** * The pattern compiler failed, and the @ref ch_compile_error_t should be * inspected for more detail. */ #define CH_COMPILER_ERROR (-4) /** * The given database was built for a different version of the Chimera matcher. */ #define CH_DB_VERSION_ERROR (-5) /** * The given database was built for a different platform (i.e., CPU type). */ #define CH_DB_PLATFORM_ERROR (-6) /** * The given database was built for a different mode of operation. This error * is returned when streaming calls are used with a non-streaming database and * vice versa. */ #define CH_DB_MODE_ERROR (-7) /** * A parameter passed to this function was not correctly aligned. */ #define CH_BAD_ALIGN (-8) /** * The memory allocator did not correctly return memory suitably aligned for * the largest representable data type on this platform. */ #define CH_BAD_ALLOC (-9) /** * The scratch region was already in use. * * This error is returned when Chimera is able to detect that the scratch * region given is already in use by another Chimera API call. * * A separate scratch region, allocated with @ref ch_alloc_scratch() or @ref * ch_clone_scratch(), is required for every concurrent caller of the Chimera * API. * * For example, this error might be returned when @ref ch_scan() has been * called inside a callback delivered by a currently-executing @ref ch_scan() * call using the same scratch region. * * Note: Not all concurrent uses of scratch regions may be detected. This error * is intended as a best-effort debugging tool, not a guarantee. */ #define CH_SCRATCH_IN_USE (-10) /** * Unexpected internal error from Hyperscan. * * This error indicates that there was unexpected matching behaviors from * Hyperscan. This could be related to invalid usage of scratch space or * invalid memory operations by users. * */ #define CH_UNKNOWN_HS_ERROR (-13) /** * Returned when pcre_exec (called for some expressions internally from @ref * ch_scan) failed due to a fatal error. */ #define CH_FAIL_INTERNAL (-32) /** @} */ #ifdef __cplusplus } /* extern "C" */ #endif #endif /* CH_COMMON_H_ */ vectorscan-5.4.11/chimera/ch_compile.cpp000066400000000000000000000726071452711272000202040ustar00rootroot00000000000000/* * Copyright (c) 2018, Intel Corporation * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * * Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * * Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * Neither the name of Intel Corporation nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ /** \file * \brief Compiler front-end, including public API calls for compilation. */ #include "ch_compile.h" #include "ch_alloc.h" #include "ch_internal.h" #include "ch_database.h" #include "grey.h" #include "hs_common.h" #include "hs_internal.h" #include "ue2common.h" #include "util/compile_error.h" #include "util/multibit_build.h" #include "util/target_info.h" #include #include #include #include #include #include #include #include #include #include #include #define PCRE_ERROR_MSG "Internal error building PCRE pattern." using namespace std; using namespace ue2; static const char failureNoMemory[] = "Unable to allocate memory."; static const char failureInternal[] = "Internal error."; static const char failureBadAlloc[] = "Allocator returned misaligned memory."; static const ch_compile_error_t ch_enomem = { const_cast(failureNoMemory), 0 }; static const ch_compile_error_t ch_einternal = { const_cast(failureInternal), 0 }; static const ch_compile_error_t ch_badalloc = { const_cast(failureBadAlloc), 0 }; static ch_compile_error_t *generateChimeraCompileError(const string &err, int expression) { ch_compile_error_t *ret = (struct ch_compile_error *)ch_misc_alloc(sizeof(ch_compile_error_t)); if (ret) { ch_error_t e = ch_check_alloc(ret); if (e != CH_SUCCESS) { ch_misc_free(ret); return const_cast(&ch_badalloc); } char *msg = (char *)ch_misc_alloc(err.size() + 1); if (msg) { e = ch_check_alloc(msg); if (e != HS_SUCCESS) { ch_misc_free(msg); return const_cast(&ch_badalloc); } memcpy(msg, err.c_str(), err.size() + 1); ret->message = msg; } else { ch_misc_free(ret); ret = nullptr; } } if (!ret || !ret->message) { return const_cast(&ch_enomem); } ret->expression = expression; return ret; } static void freeChimeraCompileError(ch_compile_error_t *error) { if (!error) { return; } if (error == &ch_enomem || error == &ch_einternal || error == &ch_badalloc) { // These are not allocated. return; } ch_misc_free(error->message); ch_misc_free(error); } static bool checkMode(unsigned int mode, ch_compile_error_t **comp_error) { static const unsigned int supported = CH_MODE_GROUPS; if (mode & ~supported) { *comp_error = generateChimeraCompileError("Invalid mode flag supplied.", -1); return false; } return true; } /** \brief Throw a compile error if we're passed some unsupported flags. */ static void checkFlags(const unsigned int flags) { static const unsigned int supported = HS_FLAG_DOTALL | HS_FLAG_MULTILINE | HS_FLAG_CASELESS | HS_FLAG_SINGLEMATCH | HS_FLAG_UCP | HS_FLAG_UTF8; if (flags & ~supported) { throw CompileError("Unrecognized flag used."); } } static bool isHyperscanSupported(const char *expression, unsigned int flags, const hs_platform_info *platform) { hs_database_t *db = nullptr; hs_compile_error *comp_error = nullptr; unsigned int id = 0; hs_error_t err = hs_compile_multi(&expression, &flags, &id, 1, HS_MODE_BLOCK, platform, &db, &comp_error); if (err != HS_SUCCESS) { assert(!db); assert(comp_error); DEBUG_PRINTF("unsupported: %s\n", comp_error->message); hs_free_compile_error(comp_error); return false; } assert(db); assert(!comp_error); hs_free_database(db); return true; } static bool writeHyperscanDatabase(char *ptr, hs_database_t *db) { // Note: we must use our serialization calls to re-home the database. char *serialized = nullptr; size_t slen = 0; hs_error_t err = hs_serialize_database(db, &serialized, &slen); if (err != HS_SUCCESS) { DEBUG_PRINTF("hs_serialize_database returned %d\n", err); assert(0); return false; } DEBUG_PRINTF("writing database to ptr %p\n", ptr); // deserialize_at without the platform tests. err = hs_deserialize_database_at(serialized, slen, (hs_database_t *)ptr); if (err != HS_SUCCESS) { DEBUG_PRINTF("hs_deserialize_database_at returned %d\n", err); assert(0); ch_misc_free(serialized); return false; } ch_misc_free(serialized); return true; } static bool writeHyperscanDatabase(ch_bytecode *db, hs_database_t *hs_db) { db->databaseOffset = ROUNDUP_CL(sizeof(*db)); char *ptr = (char *)db + db->databaseOffset; return writeHyperscanDatabase(ptr, hs_db); } static int convertFlagsToPcreOptions(unsigned int flags) { int options = 0; if (flags & HS_FLAG_CASELESS) { options |= PCRE_CASELESS; } if (flags & HS_FLAG_DOTALL) { options |= PCRE_DOTALL; } if (flags & HS_FLAG_MULTILINE) { options |= PCRE_MULTILINE; } if (flags & HS_FLAG_UTF8) { options |= PCRE_UTF8; } if (flags & HS_FLAG_UCP) { options |= PCRE_UCP; } // All other flags are meaningless to PCRE. return options; } namespace { /** \brief Data about a single pattern. */ struct PatternData : boost::noncopyable { PatternData(const char *pattern, u32 flags, u32 idx, u32 id_in, unsigned mode, unsigned long int match_limit, unsigned long int match_limit_recursion, const hs_platform_info *platform); ~PatternData() { pcre_free(compiled); pcre_free(extra); } void buildPcre(const char *pattern, u32 flags); size_t patternSize() const; void writePattern(ch_pattern *pattern) const; pcre *compiled; //!< pcre_compile output pcre_extra *extra; //!< pcre_study output size_t compiled_size; int study_size; int capture_cnt; bool utf8; u32 id; //!< ID from the user u32 expr_index; //!< index in the expression array bool singlematch; //!< pattern is in highlander mode bool guard; //!< this pattern should be guarded by the multimatcher u32 minWidth; //!< min match width u32 maxWidth; //!< max match width u32 fixedWidth; //!< fixed pattern width unsigned long int matchLimit; //! pcre match limit unsigned long int matchLimitRecursion; //! pcre match_limit_recursion }; PatternData::PatternData(const char *pattern, u32 flags, u32 idx, u32 id_in, unsigned mode, unsigned long int match_limit, unsigned long int match_limit_recursion, const hs_platform_info *platform) : compiled(nullptr), extra(nullptr), id(id_in), expr_index(idx), singlematch(flags & HS_FLAG_SINGLEMATCH), guard(false), minWidth(0), maxWidth(UINT_MAX), fixedWidth(UINT_MAX), matchLimit(match_limit), matchLimitRecursion(match_limit_recursion) { assert(pattern); flags |= HS_FLAG_ALLOWEMPTY; /* don't hand things off to pcre for no reason */ buildPcre(pattern, flags); // Fetch the expression info for a prefiltering, non-singlematch version of // this pattern, if possible. hs_expr_info *info = nullptr; hs_compile_error_t *error = nullptr; u32 infoflags = (flags | HS_FLAG_PREFILTER) & ~HS_FLAG_SINGLEMATCH; u32 rawflags = (flags | HS_FLAG_SOM_LEFTMOST) & ~HS_FLAG_SINGLEMATCH; hs_error_t err = hs_expression_info(pattern, infoflags, &info, &error); if (err == HS_SUCCESS) { assert(info); hs_expr_info *i = (hs_expr_info *)info; minWidth = i->min_width; maxWidth = i->max_width; bool ordered = i->unordered_matches ? false : true; // Only enable capturing if required u32 captureCnt = 0; if (mode & CH_MODE_GROUPS) { captureCnt = capture_cnt; } // No need to confirm with PCRE if: // 1) pattern is fixed width // 2) pattern isn't vacuous as it can't combine with start of match // 3) no capturing in this pattern // 4) no offset adjust in this pattern as hyperscan match callback // will arrive without order, i.e. [^a]\z has offset adjust // 5) hyperscan compile succeeds without prefiltering if (minWidth == maxWidth && minWidth && maxWidth != UINT_MAX && !captureCnt && ordered && isHyperscanSupported(pattern, rawflags, platform)) { fixedWidth = maxWidth; } DEBUG_PRINTF("gathered info: widths=[%u,%u]\n", minWidth, maxWidth); ch_misc_free(info); u32 guardflags; guardflags = flags | HS_FLAG_PREFILTER; guard = isHyperscanSupported(pattern, guardflags, platform); } else { // We can't even prefilter this pattern, so we're dependent on Big Dumb // Pcre Scans. DEBUG_PRINTF("hs_expression_info failed, falling back to pcre\n"); hs_free_compile_error(error); } } void PatternData::buildPcre(const char *pattern, u32 flags) { int options = convertFlagsToPcreOptions(flags); const char *errptr = nullptr; int erroffset = 0; compiled = pcre_compile(pattern, options, &errptr, &erroffset, nullptr); if (!compiled) { DEBUG_PRINTF("PCRE failed to compile: %s\n", pattern); string err("PCRE compilation failed: "); err += string(errptr); err += "."; throw CompileError(expr_index, err); } extra = pcre_study(compiled, PCRE_STUDY_JIT_COMPILE, &errptr); // Note that it's OK for pcre_study to return NULL if there's nothing // to be found, but a non-NULL error is always bad. if (errptr) { DEBUG_PRINTF("PCRE could not be studied: %s\n", errptr); string err("PCRE compilation failed: "); err += string(errptr); err += "."; throw CompileError(expr_index, err); } if (pcre_fullinfo(compiled, extra, PCRE_INFO_SIZE, &compiled_size)) { throw CompileError(PCRE_ERROR_MSG); } if (!extra) { study_size = 0; } else { if (pcre_fullinfo(compiled, extra, PCRE_INFO_STUDYSIZE, &study_size)) { throw CompileError(PCRE_ERROR_MSG); } } if (pcre_fullinfo(compiled, extra, PCRE_INFO_CAPTURECOUNT, &capture_cnt)) { throw CompileError(PCRE_ERROR_MSG); } /* We use the pcre rather than hs to get this information as we may need it * even in the pure unguarded pcre mode where there is no hs available. We * can not use the compile flags due to (*UTF8) verb */ unsigned long int opts = 0; // PCRE_INFO_OPTIONS demands an unsigned long if (pcre_fullinfo(compiled, extra, PCRE_INFO_OPTIONS, &opts)) { throw CompileError(PCRE_ERROR_MSG); } utf8 = opts & PCRE_UTF8; } size_t PatternData::patternSize() const { size_t len = 0; // ch_pattern header. len += sizeof(ch_pattern); len = ROUNDUP_N(len, 8); DEBUG_PRINTF("compiled pcre at %zu\n", len); len += compiled_size; // PCRE study data, which may be zero. if (study_size) { len = ROUNDUP_N(len, 8); DEBUG_PRINTF("study at %zu\n", len); len += (size_t)study_size; } DEBUG_PRINTF("pattern size %zu\n", len); return len; } /** \brief Write out an ch_pattern structure, which should already be sized * correctly according to PatternData::patternSize. */ void PatternData::writePattern(ch_pattern *pattern) const { assert(pattern); assert(ISALIGNED_CL(pattern)); pattern->id = id; u32 flags = 0; if (singlematch) { flags |= CHIMERA_PATTERN_FLAG_SINGLEMATCH; } if (utf8) { flags |= CHIMERA_PATTERN_FLAG_UTF8; } pattern->flags = flags; pattern->maxWidth = maxWidth; pattern->minWidth = minWidth == UINT_MAX ? 0 : minWidth; pattern->fixedWidth = fixedWidth; // Compiled PCRE pattern. char *ptr = (char *)pattern; ptr += ROUNDUP_N(sizeof(*pattern), 8); DEBUG_PRINTF("compiled pcre at %zu\n", (size_t)(ptr - (char *)pattern)); memcpy(ptr, compiled, compiled_size); ptr += compiled_size; // PCRE match limits pattern->extra.flags = PCRE_EXTRA_MATCH_LIMIT | PCRE_EXTRA_MATCH_LIMIT_RECURSION; pattern->extra.match_limit = matchLimit ? matchLimit : 10000000; // Set to avoid segment fault pattern->extra.match_limit_recursion = matchLimitRecursion ? matchLimitRecursion : 1500; // PCRE study_data. u32 studyOffset = 0; if (extra) { assert(extra->study_data); ptr = ROUNDUP_PTR(ptr, 8); DEBUG_PRINTF("study at %zu\n", (size_t)(ptr - (char *)pattern)); memcpy(ptr, extra->study_data, study_size); studyOffset = (size_t)(ptr - (char *)pattern); pattern->extra.flags |= PCRE_EXTRA_STUDY_DATA; pattern->extra.study_data = ptr; ptr += study_size; } else { pattern->extra.flags &= ~PCRE_EXTRA_STUDY_DATA; } pattern->studyOffset = studyOffset; size_t pcreLen = (ptr - (char *)pattern); assert(pcreLen <= patternSize()); pattern->length = (u32)pcreLen; // We shouldn't overrun the space we've allocated for this pattern. assert(patternSize() >= (size_t)(ptr - (char *)pattern)); } } // namespace namespace ch { static void ch_compile_multi_int(const char *const *expressions, const unsigned *flags, const unsigned *ids, unsigned elements, unsigned mode, unsigned long int match_limit, unsigned long int match_limit_recursion, const hs_platform_info_t *platform, ch_database_t **out) { vector> pcres; pcres.reserve(elements); vector unguarded; // indices of unguarded PCREs. vector multiExpr; vector multiFlags; vector multiIds; bool allConfirm = true; bool allSingleMatch = true; for (unsigned int i = 0; i < elements; i++) { const char *myExpr = expressions[i]; unsigned int myFlags = flags ? flags[i] : 0; unsigned int myId = ids ? ids[i] : 0; checkFlags(myFlags); // First, build with libpcre. A build failure from libpcre will throw // an exception up to the caller. auto patternData = std::make_unique(myExpr, myFlags, i, myId, mode, match_limit, match_limit_recursion, platform); pcres.push_back(move(patternData)); PatternData &curr = *pcres.back(); if (!(myFlags & HS_FLAG_SINGLEMATCH)) { allSingleMatch = false; } // in the multimatch, we always run in prefilter mode and accept vacuous // patterns. myFlags |= HS_FLAG_ALLOWEMPTY | HS_FLAG_PREFILTER; if (curr.fixedWidth != UINT_MAX) { myFlags |= HS_FLAG_SOM_LEFTMOST; DEBUG_PRINTF("fixed width, turn off prefiltering\n"); myFlags &= ~HS_FLAG_PREFILTER; allConfirm = false; // Single match can't coexist with SOM. myFlags &= ~HS_FLAG_SINGLEMATCH; } if (curr.guard) { // We use the index into the PCREs array as the Hyperscan idx. multiExpr.push_back(myExpr); multiFlags.push_back(myFlags); multiIds.push_back(i); } else { // No Hyperscan support, PCRE is unguarded. unguarded.push_back(i); } } DEBUG_PRINTF("built %zu PCREs, %zu of which are unguarded\n", pcres.size(), unguarded.size()); // Work out our sizing for the output database. size_t patternSize = 0; for (unsigned int i = 0; i < elements; i++) { size_t len = pcres[i]->patternSize(); patternSize += ROUNDUP_CL(len); } DEBUG_PRINTF("pcre bytecode takes %zu bytes\n", patternSize); bool noMulti = multiExpr.empty(); size_t multiSize = 0; hs_database *multidb = nullptr; if (!noMulti) { hs_compile_error_t *hs_comp_error = nullptr; hs_error_t err = hs_compile_multi(&multiExpr[0], &multiFlags[0], &multiIds[0], multiExpr.size(), HS_MODE_BLOCK, platform, &multidb, &hs_comp_error); if (err != HS_SUCCESS) { assert(hs_comp_error); DEBUG_PRINTF("hs_compile_multi returned error: %s\n", hs_comp_error->message); assert(0); hs_free_compile_error(hs_comp_error); throw CompileError("Internal error."); } assert(multidb); err = hs_database_size(multidb, &multiSize); if (err != HS_SUCCESS) { assert(0); throw CompileError("Internal error."); } DEBUG_PRINTF("built hyperscan database with len %zu bytes\n", multiSize); } size_t bytecodeLen = sizeof(ch_bytecode) + multiSize + alignof(u32) + (sizeof(u32) * unguarded.size()) + (sizeof(u32) * elements) + patternSize + 128; // padding for alignment size_t totalSize = sizeof(ch_database) + bytecodeLen; DEBUG_PRINTF("allocating %zu bytes for database\n", totalSize); char *ptr = (char *)ch_database_alloc(totalSize); if (ch_check_alloc(ptr) != CH_SUCCESS) { ch_database_free(ptr); throw std::bad_alloc(); } memset(ptr, 0, totalSize); // First, the header. ch_database *hydb = (ch_database *)ptr; hydb->magic = CH_DB_MAGIC; hydb->version = HS_VERSION_32BIT; hydb->length = bytecodeLen; // Then, the bytecode. size_t shift = (size_t)hydb->bytes & 0x3f; hydb->bytecode = offsetof(struct ch_database, bytes) - shift; ch_bytecode *db = (ch_bytecode *)((char *)hydb + hydb->bytecode); db->patternCount = elements; db->activeSize = mmbit_size(elements); db->flags = 0; db->length = bytecodeLen; if (noMulti) { db->flags |= CHIMERA_FLAG_NO_MULTIMATCH; } if (mode & CH_MODE_GROUPS) { db->flags |= CHIMERA_FLAG_GROUPS; } if (allConfirm) { db->flags |= CHIMERA_FLAG_ALL_CONFIRM; } if (allSingleMatch) { db->flags |= CHIMERA_FLAG_ALL_SINGLE; } // Find and set the max ovector size by looking at the capture count for // each pcre. u32 maxCaptureGroups = 0; for (unsigned int i = 0; i < elements; i++) { maxCaptureGroups = max(maxCaptureGroups, (u32)pcres[i]->capture_cnt); } db->maxCaptureGroups = maxCaptureGroups; DEBUG_PRINTF("max capture groups is %u\n", maxCaptureGroups); if (!noMulti) { DEBUG_PRINTF("write hyperscan database\n"); // Write Hyperscan database directly after the header struct, then free it. if (!writeHyperscanDatabase(db, multidb)) { ch_database_free(hydb); hs_free_database(multidb); throw CompileError("Internal error."); } hs_free_database(multidb); } else { db->databaseOffset = ROUNDUP_CL(sizeof(*db)); } // Then, write our unguarded PCRE list. db->unguardedCount = unguarded.size(); db->unguardedOffset = ROUNDUP_N(db->databaseOffset + multiSize, 4); ptr = (char *)db + db->unguardedOffset; copy(unguarded.begin(), unguarded.end(), (u32 *)ptr); // Then, write all our compiled PCRE patterns and the lookup table for // them. db->patternOffset = db->unguardedOffset + unguarded.size() * sizeof(u32); u32 *patternOffset = (u32 *)((char *)db + db->patternOffset); u32 offset = ROUNDUP_CL(db->patternOffset + elements * sizeof(u32)); for (unsigned int i = 0; i < elements; i++) { *patternOffset = offset; size_t len = pcres[i]->patternSize(); ptr = (char *)db + offset; struct ch_pattern *pattern = (struct ch_pattern *)ptr; pcres[i]->writePattern(pattern); DEBUG_PRINTF("wrote pcre %u into offset %u, len %zu\n", i, offset, len); offset += ROUNDUP_CL(len); patternOffset++; } assert(offset <= totalSize); assert(hydb->magic == CH_DB_MAGIC); DEBUG_PRINTF("built hybrid database, size %zu bytes\n", totalSize); DEBUG_PRINTF("offset=%u\n", offset); *out = hydb; } } // namespace ch extern "C" HS_PUBLIC_API ch_error_t HS_CDECL ch_compile(const char *expression, unsigned flags, unsigned mode, const hs_platform_info_t *platform, ch_database_t **db, ch_compile_error_t **comp_error) { if (!comp_error) { if (db) { db = nullptr; } // nowhere to write the string, but we can still report an error code return CH_COMPILER_ERROR; } if (!db) { *comp_error = generateChimeraCompileError("Invalid parameter: db is NULL", -1); return CH_COMPILER_ERROR; } if (!expression) { *db = nullptr; *comp_error = generateChimeraCompileError("Invalid parameter: expressions is\ NULL", -1); return CH_COMPILER_ERROR; } if (!checkMode(mode, comp_error)) { *db = nullptr; assert(*comp_error); // set by checkMode return CH_COMPILER_ERROR; } try { unsigned id = 0; // single expressions get zero as an ID // Internal function to do all the work, now that we've handled all the // argument checking. ch::ch_compile_multi_int(&expression, &flags, &id, 1, mode, 0, 0, platform, db); } catch (const CompileError &e) { // Compiler error occurred *db = nullptr; *comp_error = generateChimeraCompileError(e.reason, e.hasIndex ? (int)e.index : -1); return CH_COMPILER_ERROR; } catch (std::bad_alloc &) { *db = nullptr; *comp_error = const_cast(&ch_enomem); return CH_COMPILER_ERROR; } catch (...) { assert(!"Internal error, unexpected exception"); *db = nullptr; *comp_error = const_cast(&ch_einternal); return CH_COMPILER_ERROR; } DEBUG_PRINTF("success!\n"); return CH_SUCCESS; } extern "C" HS_PUBLIC_API ch_error_t HS_CDECL ch_compile_multi(const char *const *expressions, const unsigned *flags, const unsigned *ids, unsigned elements, unsigned mode, const hs_platform_info_t *platform, ch_database_t **db, ch_compile_error_t **comp_error) { if (!comp_error) { if (db) { db = nullptr; } // nowhere to write the string, but we can still report an error code return CH_COMPILER_ERROR; } if (!db) { *comp_error = generateChimeraCompileError("Invalid parameter: db is NULL", -1); return CH_COMPILER_ERROR; } if (!expressions) { *db = nullptr; *comp_error = generateChimeraCompileError("Invalid parameter: expressions is\ NULL", -1); return CH_COMPILER_ERROR; } if (!elements) { *db = nullptr; *comp_error = generateChimeraCompileError("Invalid parameter:\ elements is zero", -1); return CH_COMPILER_ERROR; } if (!checkMode(mode, comp_error)) { *db = nullptr; assert(*comp_error); // set by checkMode return CH_COMPILER_ERROR; } try { // Internal function to do all the work, now that we've handled all the // argument checking. ch::ch_compile_multi_int(expressions, flags, ids, elements, mode, 0, 0, platform, db); } catch (const CompileError &e) { // Compiler error occurred *db = nullptr; *comp_error = generateChimeraCompileError(e.reason, e.hasIndex ? (int)e.index : -1); return CH_COMPILER_ERROR; } catch (std::bad_alloc &) { *db = nullptr; *comp_error = const_cast(&ch_enomem); return CH_COMPILER_ERROR; } catch (...) { assert(!"Internal error, unexpected exception"); *db = nullptr; *comp_error = const_cast(&ch_einternal); return CH_COMPILER_ERROR; } DEBUG_PRINTF("success!\n"); return CH_SUCCESS; } extern "C" HS_PUBLIC_API ch_error_t HS_CDECL ch_compile_ext_multi( const char *const *expressions, const unsigned *flags, const unsigned *ids, unsigned elements, unsigned mode, unsigned long int match_limit, unsigned long int match_limit_recursion, const hs_platform_info_t *platform, ch_database_t **db, ch_compile_error_t **comp_error) { if (!comp_error) { if (db) { db = nullptr; } // nowhere to write the string, but we can still report an error code return CH_COMPILER_ERROR; } if (!db) { *comp_error = generateChimeraCompileError("Invalid parameter: db is NULL", -1); return CH_COMPILER_ERROR; } if (!expressions) { *db = nullptr; *comp_error = generateChimeraCompileError("Invalid parameter: expressions is\ NULL", -1); return CH_COMPILER_ERROR; } if (!elements) { *db = nullptr; *comp_error = generateChimeraCompileError("Invalid parameter:\ elements is zero", -1); return CH_COMPILER_ERROR; } if (!checkMode(mode, comp_error)) { *db = nullptr; assert(*comp_error); // set by checkMode return CH_COMPILER_ERROR; } try { // Internal function to do all the work, now that we've handled all the // argument checking. ch::ch_compile_multi_int(expressions, flags, ids, elements, mode, match_limit, match_limit_recursion, platform, db); } catch (const CompileError &e) { // Compiler error occurred *db = nullptr; *comp_error = generateChimeraCompileError(e.reason, e.hasIndex ? (int)e.index : -1); return CH_COMPILER_ERROR; } catch (std::bad_alloc &) { *db = nullptr; *comp_error = const_cast(&ch_enomem); return CH_COMPILER_ERROR; } catch (...) { assert(!"Internal error, unexpected exception"); *db = nullptr; *comp_error = const_cast(&ch_einternal); return CH_COMPILER_ERROR; } DEBUG_PRINTF("success!\n"); return CH_SUCCESS; } extern "C" HS_PUBLIC_API ch_error_t HS_CDECL ch_free_compile_error(ch_compile_error_t *error) { freeChimeraCompileError(error); return CH_SUCCESS; } vectorscan-5.4.11/chimera/ch_compile.h000066400000000000000000000376261452711272000176530ustar00rootroot00000000000000/* * Copyright (c) 2018, Intel Corporation * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * * Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * * Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * Neither the name of Intel Corporation nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ #ifndef CH_COMPILE_H_ #define CH_COMPILE_H_ /** * @file * @brief The Chimera compiler API definition. * * Chimera is a hybrid solution of Hyperscan and PCRE. * * This header contains functions for compiling regular expressions into * Chimera databases that can be used by the Chimera runtime. */ #include "ch_common.h" #include "hs_compile.h" #ifdef __cplusplus extern "C" { #endif /** * A type containing error details that is returned by the compile calls (@ref * ch_compile() and @ref ch_compile_multi() on failure. The caller may inspect * the values returned in this type to determine the cause of failure. */ typedef struct ch_compile_error { /** * A human-readable error message describing the error. */ char *message; /** * The zero-based number of the expression that caused the error (if this * can be determined). If the error is not specific to an expression, then * this value will be less than zero. */ int expression; } ch_compile_error_t; /** * The basic regular expression compiler. * * This is the function call with which an expression is compiled into a * Chimera database which can be passed to the runtime function ( * @ref ch_scan()) * * @param expression * The NULL-terminated expression to parse. Note that this string must * represent ONLY the pattern to be matched, with no delimiters or flags; * any global flags should be specified with the @a flags argument. For * example, the expression `/abc?def/i` should be compiled by providing * `abc?def` as the @a expression, and @ref CH_FLAG_CASELESS as the @a * flags. * * @param flags * Flags which modify the behaviour of the expression. Multiple flags may * be used by ORing them together. Valid values are: * - CH_FLAG_CASELESS - Matching will be performed case-insensitively. * - CH_FLAG_DOTALL - Matching a `.` will not exclude newlines. * - CH_FLAG_MULTILINE - `^` and `$` anchors match any newlines in data. * - CH_FLAG_SINGLEMATCH - Only one match will be generated for the * expression per stream. * - CH_FLAG_UTF8 - Treat this pattern as a sequence of UTF-8 characters. * - CH_FLAG_UCP - Use Unicode properties for character classes. * * @param mode * Compiler mode flag that affect the database as a whole for capturing * groups. One of CH_MODE_NOGROUPS or CH_MODE_GROUPS must be supplied. * See @ref CH_MODE_FLAG for more details. * * @param platform * If not NULL, the platform structure is used to determine the target * platform for the database. If NULL, a database suitable for running * on the current host platform is produced. * * @param db * On success, a pointer to the generated database will be returned in * this parameter, or NULL on failure. The caller is responsible for * deallocating the buffer using the @ref ch_free_database() function. * * @param compile_error * If the compile fails, a pointer to a @ref ch_compile_error_t will be * returned, providing details of the error condition. The caller is * responsible for deallocating the buffer using the @ref * ch_free_compile_error() function. * * @return * @ref CH_SUCCESS is returned on successful compilation; @ref * CH_COMPILER_ERROR on failure, with details provided in the error * parameter. */ ch_error_t HS_CDECL ch_compile(const char *expression, unsigned int flags, unsigned int mode, const hs_platform_info_t *platform, ch_database_t **db, ch_compile_error_t **compile_error); /** * The multiple regular expression compiler. * * This is the function call with which a set of expressions is compiled into a * database which can be passed to the runtime function (@ref ch_scan()). * Each expression can be labelled with a unique integer which is passed into * the match callback to identify the pattern that has matched. * * @param expressions * Array of NULL-terminated expressions to compile. Note that (as for @ref * ch_compile()) these strings must contain only the pattern to be * matched, with no delimiters or flags. For example, the expression * `/abc?def/i` should be compiled by providing `abc?def` as the first * string in the @a expressions array, and @ref CH_FLAG_CASELESS as the * first value in the @a flags array. * * @param flags * Array of flags which modify the behaviour of each expression. Multiple * flags may be used by ORing them together. Specifying the NULL pointer * in place of an array will set the flags value for all patterns to zero. * Valid values are: * - CH_FLAG_CASELESS - Matching will be performed case-insensitively. * - CH_FLAG_DOTALL - Matching a `.` will not exclude newlines. * - CH_FLAG_MULTILINE - `^` and `$` anchors match any newlines in data. * - CH_FLAG_SINGLEMATCH - Only one match will be generated by patterns * with this match id per stream. * - CH_FLAG_UTF8 - Treat this pattern as a sequence of UTF-8 characters. * - CH_FLAG_UCP - Use Unicode properties for character classes. * * @param ids * An array of integers specifying the ID number to be associated with the * corresponding pattern in the expressions array. Specifying the NULL * pointer in place of an array will set the ID value for all patterns to * zero. * * @param elements * The number of elements in the input arrays. * * @param mode * Compiler mode flag that affect the database as a whole for capturing * groups. One of CH_MODE_NOGROUPS or CH_MODE_GROUPS must be supplied. * See @ref CH_MODE_FLAG for more details. * * @param platform * If not NULL, the platform structure is used to determine the target * platform for the database. If NULL, a database suitable for running * on the current host platform is produced. * * @param db * On success, a pointer to the generated database will be returned in * this parameter, or NULL on failure. The caller is responsible for * deallocating the buffer using the @ref ch_free_database() function. * * @param compile_error * If the compile fails, a pointer to a @ref ch_compile_error_t will be * returned, providing details of the error condition. The caller is * responsible for deallocating the buffer using the @ref * ch_free_compile_error() function. * * @return * @ref CH_SUCCESS is returned on successful compilation; @ref * CH_COMPILER_ERROR on failure, with details provided in the @a error * parameter. * */ ch_error_t HS_CDECL ch_compile_multi(const char *const *expressions, const unsigned int *flags, const unsigned int *ids, unsigned int elements, unsigned int mode, const hs_platform_info_t *platform, ch_database_t **db, ch_compile_error_t **compile_error); /** * The multiple regular expression compiler with extended match limits support. * * This is the function call with which a set of expressions is compiled into a * database in the same way as @ref ch_compile_multi(), but allows additional * parameters to be specified via match_limit and match_limit_recursion to * define match limits for PCRE runtime. * * @param expressions * Array of NULL-terminated expressions to compile. Note that (as for @ref * ch_compile()) these strings must contain only the pattern to be * matched, with no delimiters or flags. For example, the expression * `/abc?def/i` should be compiled by providing `abc?def` as the first * string in the @a expressions array, and @ref CH_FLAG_CASELESS as the * first value in the @a flags array. * * @param flags * Array of flags which modify the behaviour of each expression. Multiple * flags may be used by ORing them together. Specifying the NULL pointer * in place of an array will set the flags value for all patterns to zero. * Valid values are: * - CH_FLAG_CASELESS - Matching will be performed case-insensitively. * - CH_FLAG_DOTALL - Matching a `.` will not exclude newlines. * - CH_FLAG_MULTILINE - `^` and `$` anchors match any newlines in data. * - CH_FLAG_SINGLEMATCH - Only one match will be generated by patterns * with this match id per stream. * - CH_FLAG_UTF8 - Treat this pattern as a sequence of UTF-8 characters. * - CH_FLAG_UCP - Use Unicode properties for character classes. * * @param ids * An array of integers specifying the ID number to be associated with the * corresponding pattern in the expressions array. Specifying the NULL * pointer in place of an array will set the ID value for all patterns to * zero. * * @param elements * The number of elements in the input arrays. * * @param mode * Compiler mode flag that affect the database as a whole for capturing * groups. One of CH_MODE_NOGROUPS or CH_MODE_GROUPS must be supplied. * See @ref CH_MODE_FLAG for more details. * * @param match_limit * A limit from pcre_extra on the amount of match function called in PCRE * to limit backtracking that can take place. * * @param match_limit_recursion * A limit from pcre_extra on the recursion depth of match function * in PCRE. * * @param platform * If not NULL, the platform structure is used to determine the target * platform for the database. If NULL, a database suitable for running * on the current host platform is produced. * * @param db * On success, a pointer to the generated database will be returned in * this parameter, or NULL on failure. The caller is responsible for * deallocating the buffer using the @ref ch_free_database() function. * * @param compile_error * If the compile fails, a pointer to a @ref ch_compile_error_t will be * returned, providing details of the error condition. The caller is * responsible for deallocating the buffer using the @ref * ch_free_compile_error() function. * * @return * @ref CH_SUCCESS is returned on successful compilation; @ref * CH_COMPILER_ERROR on failure, with details provided in the @a error * parameter. * */ ch_error_t HS_CDECL ch_compile_ext_multi(const char *const *expressions, const unsigned int *flags, const unsigned int *ids, unsigned int elements, unsigned int mode, unsigned long int match_limit, unsigned long int match_limit_recursion, const hs_platform_info_t *platform, ch_database_t **db, ch_compile_error_t **compile_error); /** * Free an error structure generated by @ref ch_compile(), @ref * ch_compile_multi(). * * @param error * The @ref ch_compile_error_t to be freed. NULL may also be safely * provided. * * @return * @ref CH_SUCCESS on success, other values on failure. */ ch_error_t HS_CDECL ch_free_compile_error(ch_compile_error_t *error); /** * @defgroup CH_PATTERN_FLAG Pattern flags * * @{ */ /** * Compile flag: Set case-insensitive matching. * * This flag sets the expression to be matched case-insensitively by default. * The expression may still use PCRE tokens (notably `(?i)` and * `(?-i)`) to switch case-insensitive matching on and off. */ #define CH_FLAG_CASELESS 1 /** * Compile flag: Matching a `.` will not exclude newlines. * * This flag sets any instances of the `.` token to match newline characters as * well as all other characters. The PCRE specification states that the `.` * token does not match newline characters by default, so without this flag the * `.` token will not cross line boundaries. */ #define CH_FLAG_DOTALL 2 /** * Compile flag: Set multi-line anchoring. * * This flag instructs the expression to make the `^` and `$` tokens match * newline characters as well as the start and end of the stream. If this flag * is not specified, the `^` token will only ever match at the start of a * stream, and the `$` token will only ever match at the end of a stream within * the guidelines of the PCRE specification. */ #define CH_FLAG_MULTILINE 4 /** * Compile flag: Set single-match only mode. * * This flag sets the expression's match ID to match at most once, only the * first match for each invocation of @ref ch_scan() will be returned. * */ #define CH_FLAG_SINGLEMATCH 8 /** * Compile flag: Enable UTF-8 mode for this expression. * * This flag instructs Chimera to treat the pattern as a sequence of UTF-8 * characters. The results of scanning invalid UTF-8 sequences with a Chimera * library that has been compiled with one or more patterns using this flag are * undefined. */ #define CH_FLAG_UTF8 32 /** * Compile flag: Enable Unicode property support for this expression. * * This flag instructs Chimera to use Unicode properties, rather than the * default ASCII interpretations, for character mnemonics like `\w` and `\s` as * well as the POSIX character classes. It is only meaningful in conjunction * with @ref CH_FLAG_UTF8. */ #define CH_FLAG_UCP 64 /** @} */ /** * @defgroup CH_MODE_FLAG Compile mode flags * * The mode flags are used as values for the mode parameter of the various * compile calls (@ref ch_compile(), @ref ch_compile_multi(). * * By default, the matcher will only supply the start and end offsets of the * match when the match callback is called. Using mode flag @ref CH_MODE_GROUPS * will also fill the `captured' array with the start and end offsets of all * the capturing groups specified by the pattern that has matched. * * @{ */ /** * Compiler mode flag: Disable capturing groups. */ #define CH_MODE_NOGROUPS 0 /** * Compiler mode flag: Enable capturing groups. */ #define CH_MODE_GROUPS 1048576 /** @} */ #ifdef __cplusplus } /* extern "C" */ #endif #endif /* CH_COMPILE_H_ */ vectorscan-5.4.11/chimera/ch_database.c000066400000000000000000000075321452711272000177530ustar00rootroot00000000000000/* * Copyright (c) 2018, Intel Corporation * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * * Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * * Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * Neither the name of Intel Corporation nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ /** \file * \brief Chimera: database construction, etc. */ #include #include #include #include #include "allocator.h" #include "database.h" #include "hs.h" #include "ch.h" #include "hs_internal.h" #include "ch_common.h" #include "ch_alloc.h" #include "ch_database.h" #include "ch_internal.h" static really_inline int db_correctly_aligned(const void *db) { return ISALIGNED_N(db, alignof(unsigned long long)); } HS_PUBLIC_API ch_error_t HS_CDECL ch_free_database(ch_database_t *hydb) { if (hydb && hydb->magic != CH_DB_MAGIC) { return CH_INVALID; } ch_database_free(hydb); return CH_SUCCESS; } HS_PUBLIC_API ch_error_t HS_CDECL ch_database_size(const ch_database_t *hydb, size_t *size) { if (!size) { return CH_INVALID; } ch_error_t ret = hydbIsValid(hydb); if (unlikely(ret != CH_SUCCESS)) { return ret; } *size = sizeof(struct ch_database) + hydb->length; return CH_SUCCESS; } /** \brief Identifier prepended to database info. */ static const char CHIMERA_IDENT[] = "Chimera "; HS_PUBLIC_API ch_error_t HS_CDECL ch_database_info(const ch_database_t *hydb, char **info) { if (!info) { return CH_INVALID; } *info = NULL; if (!hydb || !db_correctly_aligned(hydb) || hydb->magic != CH_DB_MAGIC) { return HS_INVALID; } const struct ch_bytecode *bytecode = ch_get_bytecode(hydb); char noMulti = (bytecode->flags & CHIMERA_FLAG_NO_MULTIMATCH); if (noMulti) { size_t len = strlen(CHIMERA_IDENT); *info = ch_misc_alloc(len + 1); if (!(*info)) { return CH_INVALID; } memcpy((*info), CHIMERA_IDENT, len); (*info)[len] = '\0'; return CH_SUCCESS; } char *hsinfo = NULL; hs_error_t ret = hs_database_info(getHyperscanDatabase(bytecode), &hsinfo); if (ret != HS_SUCCESS) { assert(!hsinfo); return ret; } size_t hybridlen = strlen(CHIMERA_IDENT); size_t hslen = strlen(hsinfo); *info = ch_misc_alloc(hybridlen + hslen + 1); if (!(*info)) { ch_misc_free(hsinfo); return CH_INVALID; } memcpy((*info), CHIMERA_IDENT, hybridlen); memcpy((*info) + hybridlen, hsinfo, hslen); (*info)[hybridlen + hslen] = '\0'; ch_misc_free(hsinfo); return CH_SUCCESS; } vectorscan-5.4.11/chimera/ch_database.h000066400000000000000000000131551452711272000177560ustar00rootroot00000000000000/* * Copyright (c) 2018, Intel Corporation * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * * Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * * Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * Neither the name of Intel Corporation nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ /** \file * \brief Runtime code for ch_database manipulation. */ #ifndef CH_DATABASE_H_ #define CH_DATABASE_H_ #ifdef __cplusplus extern "C" { #endif #define PCRE_STATIC #include #include "ch_compile.h" // for CH_MODE_ flags #include "ue2common.h" #include "hs_version.h" #include "hs.h" #define CH_DB_MAGIC 0xdedededeU //!< Magic number stored in \ref ch_database /** \brief Main Chimera database header. */ struct ch_database { u32 magic; //!< must be \ref CH_DB_MAGIC u32 version; //!< release version u32 length; //!< total allocated length in bytes u32 reserved0; //!< unused u32 reserved1; //!< unused u32 bytecode; //!< offset relative to db start u32 padding[16]; //!< padding for alignment of rest of bytecode char bytes[]; }; /** \brief Chimera bytecode header, which follows the \ref ch_database and is * always 64-byte aligned. */ struct ch_bytecode { u32 length; //!< length of bytecode including this header struct u32 flags; //!< whole-database flags (CHIMERA_FLAG_NO_MULTIMATCH, // CHIMERA_FLAG_GROUPS) u32 patternCount; //!< total number of patterns u32 activeSize; //!< size of mmbit to store active pattern ids u32 databaseOffset; //!< offset for database following \ref ch_bytecode // header u32 patternOffset; //!< points to an array of u32 offsets, each pointing to // a \ref ch_pattern u32 unguardedOffset; //!< pointer to a list of unguarded pattern indices u32 unguardedCount; //!< number of unguarded patterns u32 maxCaptureGroups; //!< max number of capture groups used by any pattern }; /** \brief Per-pattern header. * * struct is followed in bytecode by: * 1. pcre bytecode (always present) * 2. pcre study data (sometimes) */ struct ch_pattern { u32 id; //!< pattern ID to report to the user u32 flags; //!< per-pattern flags (e.g. \ref CHIMERA_PATTERN_FLAG_UTF8) u32 maxWidth; //!< maximum width of a match, or UINT_MAX for inf. u32 minWidth; //!< minimum width of a match. u32 fixedWidth;//!< pattern has fixed width. u32 studyOffset; //!< offset relative to struct start of study data, // or zero if there is none u32 length; //!< length of struct plus pcre bytecode and study data pcre_extra extra; //!< pcre_extra struct, used to store study data ptr for // the currently-running pcre at runtime. }; static really_inline const void *ch_get_bytecode(const struct ch_database *db) { assert(db); const void *bytecode = (const char *)db + db->bytecode; assert(ISALIGNED_16(bytecode)); return bytecode; } struct hs_database; static really_inline const struct hs_database *getHyperscanDatabase(const struct ch_bytecode *db) { assert(db); const char *ptr = (const char *)db; const struct hs_database *hs_db; hs_db = (const struct hs_database *)(ptr + db->databaseOffset); assert(ISALIGNED_CL(hs_db)); return hs_db; } static really_inline const u32 *getUnguarded(const struct ch_bytecode *db) { assert(db); const char *ptr = (const char *)db; const u32 *unguarded = (const u32 *)(ptr + db->unguardedOffset); assert(ISALIGNED_N(unguarded, sizeof(u32))); return unguarded; } static really_inline const struct ch_pattern *getPattern(const struct ch_bytecode *db, u32 i) { assert(db); assert(i < db->patternCount); const char *ptr = (const char *)db; const u32 *patternOffset = (const u32 *)(ptr + db->patternOffset); assert(patternOffset[i] < db->length); return (const struct ch_pattern *)(ptr + patternOffset[i]); } static really_inline ch_error_t hydbIsValid(const struct ch_database *hydb) { if (!hydb || hydb->magic != CH_DB_MAGIC) { DEBUG_PRINTF("bad magic (%u != %u)\n", hydb->magic, CH_DB_MAGIC); return CH_INVALID; } if (hydb->version != HS_VERSION_32BIT) { DEBUG_PRINTF("bad version\n"); return CH_DB_VERSION_ERROR; } return CH_SUCCESS; } #ifdef __cplusplus } /* extern "C" */ #endif #endif /* CH_DATABASE_H_ */ vectorscan-5.4.11/chimera/ch_internal.h000066400000000000000000000040751452711272000200270ustar00rootroot00000000000000/* * Copyright (c) 2018, Intel Corporation * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * * Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * * Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * Neither the name of Intel Corporation nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ /** \file * \brief Chimera: data structures and internals. */ #ifndef CH_INTERNAL_H #define CH_INTERNAL_H #define CHIMERA_FLAG_NO_MULTIMATCH 1 //!< Don't run a multimatch scan #define CHIMERA_FLAG_GROUPS 2 //!< Return capturing groups #define CHIMERA_FLAG_ALL_CONFIRM 4 //!< All patterns need confirm #define CHIMERA_FLAG_ALL_SINGLE 8 //!< All patterns need only one match #define CHIMERA_PATTERN_FLAG_SINGLEMATCH 1 //!< only report the first match #define CHIMERA_PATTERN_FLAG_UTF8 2 //!< pattern is in UTF-8 mode #endif vectorscan-5.4.11/chimera/ch_runtime.c000066400000000000000000000530651452711272000176740ustar00rootroot00000000000000/* * Copyright (c) 2018-2022, Intel Corporation * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * * Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * * Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * Neither the name of Intel Corporation nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ /** \file * \brief Chimera: main runtime. */ #include #include #include #include #include "ch.h" #include "hs.h" #include "hs_internal.h" #include "ue2common.h" #include "ch_database.h" #include "ch_internal.h" #include "ch_scratch.h" #include "util/multibit.h" #include "util/unicode_def.h" typedef struct queue_item PQ_T; static char PQ_COMP(PQ_T *pqc_items, int a, int b) { if ((pqc_items)[a].to != (pqc_items)[b].to) { return (pqc_items)[a].to < (pqc_items)[b].to; } else if ((pqc_items)[a].from != (pqc_items)[b].from) { return (pqc_items)[a].from < (pqc_items)[b].from; } else { return (pqc_items)[a].id < (pqc_items)[b].id; } } static char PQ_COMP_B(PQ_T *pqc_items, int a, PQ_T b_fixed) { if ((pqc_items)[a].to != (b_fixed).to) { return (pqc_items)[a].to < (b_fixed).to; } else if ((pqc_items)[a].from != (b_fixed).from) { return (pqc_items)[a].from < (b_fixed).from; } else { return (pqc_items)[a].id < b_fixed.id; } } #include "util/pqueue.h" static really_inline void pq_insert_with(struct match_pq *pq, int from, int to, u32 id) { DEBUG_PRINTF("inserting pattern%u in pq at %u\n", id, to); struct queue_item temp = { .from = from, .to = to, .id = id, }; pq_insert(pq->item, pq->size, temp); ++pq->size; } static really_inline void pq_pop_nice(struct match_pq *pq) { pq_pop(pq->item, pq->size); pq->size--; } /** dummy event handler for use when user does not provide one */ static int HS_CDECL null_onEvent(UNUSED unsigned id, UNUSED unsigned long long from, UNUSED unsigned long long to, UNUSED unsigned flags, UNUSED unsigned size, UNUSED const ch_capture_t *captured, UNUSED void *ctxt) { return 0; } /** \brief Chimera runtime context. */ struct HybridContext { const char *data; //!< buffer being scanned u32 length; //!< length of data buffer u32 valid_utf8_highwater; //!< UTF-8 has been validated up to here. const struct ch_bytecode *db; struct ch_scratch *scratch; struct match_pq *pq; /** \brief user-supplied match callback */ int (HS_CDECL *match_callback)(unsigned int id, unsigned long long from, unsigned long long to, unsigned int flags, unsigned int size, const ch_capture_t *capture, void *ctx); /** \brief user-supplied error callback */ int (HS_CDECL *error_callback)(ch_error_event_t error_type, unsigned int id, void *info, void *ctx); /** \brief user-supplied context */ void *context; }; // Internal PCRE func. extern int _pcre_valid_utf(const unsigned char *, int, int *); /** UTF-8 validity check. Returns >0 if the given region of the data is valid * UTF-8, 0 otherwise. */ static char isValidUTF8(struct HybridContext *hyctx, u32 end) { assert(hyctx); if (hyctx->valid_utf8_highwater >= end) { return 1; // Already validated. } const unsigned char *data = (const unsigned char *)hyctx->data + hyctx->valid_utf8_highwater; int validate_len = end - hyctx->valid_utf8_highwater; DEBUG_PRINTF("validating %d bytes\n", validate_len); int erroroffset = 0; if (_pcre_valid_utf(data, validate_len, &erroroffset)) { DEBUG_PRINTF("UTF8 invalid at offset %d\n", erroroffset); return 0; } hyctx->valid_utf8_highwater = end; return 1; } static const pcre *getPcre(const struct ch_pattern *pattern) { const char *ptr = (const char *)pattern; const pcre *p = (const pcre *)(ptr + ROUNDUP_N(sizeof(*pattern), 8)); assert(ISALIGNED_N(p, 8)); return p; } /** \brief Fill the Chimera groups array from a pcre_exec ovector. */ static void fillGroupsFromOvector(ch_capture_t *groups, int numPairs, int *ovector) { assert(groups); assert(ISALIGNED_N(groups, alignof(ch_capture_t))); DEBUG_PRINTF("filling %d groups (@ %p) from pcre ovector\n", numPairs, groups); for (int i = 0; i < numPairs * 2; i += 2) { if (ovector[i] == -1) { groups->flags = CH_CAPTURE_FLAG_INACTIVE; } else { groups->flags = CH_CAPTURE_FLAG_ACTIVE; assert(ovector[i] <= ovector[i + 1]); groups->from = ovector[i]; groups->to = ovector[i + 1]; } ++groups; } } static ch_error_t handlePcreNonMatch(const struct ch_pattern *pattern, int rv, ch_error_event_handler onError, void *userContext) { assert(rv < 0); if (rv == PCRE_ERROR_NOMATCH) { DEBUG_PRINTF("no match found by libpcre\n"); return CH_SUCCESS; } else if (rv == PCRE_ERROR_MATCHLIMIT) { DEBUG_PRINTF("pcre hit match limit\n"); if (onError) { return onError(CH_ERROR_MATCHLIMIT, pattern->id, NULL, userContext); } return CH_SUCCESS; } else if (rv == PCRE_ERROR_RECURSIONLIMIT) { DEBUG_PRINTF("pcre hit recursion limit\n"); if (onError) { return onError(CH_ERROR_RECURSIONLIMIT, pattern->id, NULL, userContext); } return CH_SUCCESS; } // All other errors not handled above are fatal. return CH_FAIL_INTERNAL; } static ch_error_t scanPcre(struct HybridContext *hyctx, UNUSED unsigned int length, unsigned int offset, u32 id) { const char *data = hyctx->data; unsigned int full_length = hyctx->length; ch_error_event_handler onError = hyctx->error_callback; void *userContext = hyctx->context; const struct ch_pattern *pattern = getPattern(hyctx->db, id); const pcre *p = getPcre(pattern); // Set up the PCRE extra block. const pcre_extra *extra = &pattern->extra; int startoffset = offset; int *ovector = hyctx->scratch->ovector; int ovectorSize = (hyctx->scratch->maxCaptureGroups + 1) * 3; assert(ovectorSize >= 2); DEBUG_PRINTF("scanning %u bytes, pattern %u, startoffset %d\n", length, id, startoffset); int options = 0; if (pattern->flags & CHIMERA_PATTERN_FLAG_UTF8) { // We do our own UTF-8 validation. options |= PCRE_NO_UTF8_CHECK; if (!isValidUTF8(hyctx, full_length)) { return handlePcreNonMatch(pattern, PCRE_ERROR_BADUTF8, onError, userContext); } } int rv = pcre_exec(p, extra, data, full_length, startoffset, options, ovector, ovectorSize); DEBUG_PRINTF("pcre return code is %d\n", rv); // Handle all non-match or error cases, all of which involve us // terminating the loop. if (rv < 0) { return handlePcreNonMatch(pattern, rv, onError, userContext); } // We've found a match, and we should always have room for at least the // start and end offsets in our ovector. Pass this info to the user. assert(rv >= 1); assert(rv < ovectorSize); int from = ovector[0]; int to = ovector[1]; DEBUG_PRINTF("match %d -> %d\n", from, to); struct ch_patterndata *pd = hyctx->scratch->patternData + id; if (hyctx->db->flags & CHIMERA_FLAG_GROUPS) { fillGroupsFromOvector(pd->match, rv, ovector); } else { rv = 0; } pd->groupCount = (u32)rv; // Insert new matched item to the queue pq_insert_with(hyctx->pq, from, to, id); // Next scan starts at the first codepoint after the match. It's // possible that we have a vacuous match, in which case we must step // past it to ensure that we always progress. if (from != to) { startoffset = to; } else if (pattern->flags & CHIMERA_PATTERN_FLAG_UTF8) { startoffset = to + 1; while (startoffset < (int)full_length && ((data[startoffset] & 0xc0) == UTF_CONT_BYTE_HEADER)) { ++startoffset; } } else { startoffset = to + 1; } pd->scanStart = startoffset; DEBUG_PRINTF("new offset %u\n", pd->scanStart); return CH_SUCCESS; } static ch_error_t catchupPcre(struct HybridContext *hyctx, unsigned int id, unsigned long long from, unsigned long long to) { ch_match_event_handler onEvent = hyctx->match_callback; void *userContext = hyctx->context; DEBUG_PRINTF("priority queue size %u\n", hyctx->pq->size); while (hyctx->pq->size) { u32 num_item = hyctx->pq->size; struct queue_item *item = pq_top(hyctx->pq->item); size_t top_from = item->from; size_t top_to = item->to; u32 top_id = item->id; if (top_to > to) { pq_insert_with(hyctx->pq, from, to, id); break; } pq_pop_nice(hyctx->pq); const struct ch_pattern *pattern = getPattern(hyctx->db, top_id); struct ch_patterndata *pd = hyctx->scratch->patternData + top_id; // Report match for pattern DEBUG_PRINTF("trigger match@%zu\n", top_to); ch_callback_t cbrv = onEvent(pattern->id, top_from, top_to, 0 /* flags */, pd->groupCount, pd->match, userContext); if (cbrv == CH_CALLBACK_TERMINATE) { DEBUG_PRINTF("user callback told us to terminate scanning\n"); return CH_SCAN_TERMINATED; } else if (cbrv == CH_CALLBACK_SKIP_PATTERN) { DEBUG_PRINTF("user callback told us to skip this pattern\n"); pd->scanStart = hyctx->length; if (top_id == id) { break; } continue; } if (top_id == id) { break; } // Push a new match to replace the old one unsigned int start = pd->scanStart; unsigned int len = hyctx->length - pd->scanStart; if (hyctx->length >= pd->scanStart && !(pattern->flags & CHIMERA_PATTERN_FLAG_SINGLEMATCH)) { DEBUG_PRINTF("get a new match item\n"); int ret = scanPcre(hyctx, len, start, top_id); if (ret == CH_CALLBACK_TERMINATE) { DEBUG_PRINTF("user callback told us to terminate scanning\n"); return CH_SCAN_TERMINATED; } else if (ret == CH_CALLBACK_SKIP_PATTERN) { DEBUG_PRINTF("user callback told us to skip this pattern\n"); pd->scanStart = hyctx->length; ret = CH_SUCCESS; } else if (ret == CH_FAIL_INTERNAL) { return ret; } // No further match is found if (hyctx->pq->size == num_item - 1) { pd->scanStart = hyctx->length; } } } return CH_SUCCESS; } /** \brief Callback used for internal Hyperscan multi-matcher. */ static int HS_CDECL multiCallback(unsigned int id, unsigned long long from, unsigned long long to, UNUSED unsigned int flags, void *ctx) { assert(ctx); struct HybridContext *hyctx = ctx; DEBUG_PRINTF("match for ID %u at offset %llu\n", id, to); assert(id < hyctx->db->patternCount); const struct ch_pattern *pattern = getPattern(hyctx->db, id); struct ch_patterndata *pd = hyctx->scratch->patternData + id; char needConfirm = pattern->fixedWidth == ~0U; if (needConfirm && mmbit_isset(hyctx->scratch->active, hyctx->db->patternCount, id)) { if ((hyctx->db->flags & CHIMERA_FLAG_ALL_CONFIRM) && mmbit_all(hyctx->scratch->active, hyctx->db->patternCount)) { return 1; } return 0; } // Store the fact that we've seen this bit. char already = mmbit_set(hyctx->scratch->active, hyctx->db->patternCount, id); DEBUG_PRINTF("match from %u to %llu\n", pd->scanStart, to); if (!already) { pd->scanStart = 0; } else if (to < pd->scanStart + pattern->minWidth) { return 0; } else if (pattern->flags & CHIMERA_PATTERN_FLAG_SINGLEMATCH) { if ((hyctx->db->flags & CHIMERA_FLAG_ALL_SINGLE) && mmbit_all(hyctx->scratch->active, hyctx->db->patternCount)) { return 1; } // Note: we may have unordered match from Hyperscan, // thus possibly get to < pd->scanStart. return 0; } int ret = HS_SUCCESS; unsigned int start = pd->scanStart; unsigned int len = hyctx->length - pd->scanStart; assert(hyctx->length >= pd->scanStart); const char *data = hyctx->data; if (needConfirm) { DEBUG_PRINTF("run confirm for the first time\n"); ret = scanPcre(hyctx, len, start, id); hyctx->scratch->ret = ret; if (ret == CH_CALLBACK_TERMINATE) { DEBUG_PRINTF("user callback told us to terminate scanning\n"); return HS_SCAN_TERMINATED; } else if (ret == CH_CALLBACK_SKIP_PATTERN) { DEBUG_PRINTF("user callback told us to skip this pattern\n"); pd->scanStart = hyctx->length; ret = HS_SUCCESS; hyctx->scratch->ret = ret; } else if (ret == CH_FAIL_INTERNAL) { return ret; } } else { if (already) { DEBUG_PRINTF("catch up with new matches\n"); ret = catchupPcre(hyctx, id, from, to); hyctx->scratch->ret = ret; if (pd->scanStart >= hyctx->length) { return ret; } } int startoffset = 0; // Next scan starts at the first codepoint after the match. It's // possible that we have a vacuous match, in which case we must step // past it to ensure that we always progress. if (from != to) { startoffset = to; } else if (pattern->flags & CHIMERA_PATTERN_FLAG_UTF8) { startoffset = to + 1; while (startoffset < (int)hyctx->length && ((data[startoffset] & 0xc0) == UTF_CONT_BYTE_HEADER)) { ++startoffset; } } else { startoffset = to + 1; } pd->scanStart = startoffset; int rv = 0; if (hyctx->db->flags & CHIMERA_FLAG_GROUPS) { ch_capture_t *groups = pd->match; groups->flags = CH_CAPTURE_FLAG_ACTIVE; groups->from = from; groups->to = to; rv = 1; } pd->groupCount = (u32)rv; pq_insert_with(hyctx->pq, from, to, id); } return ret; } static hs_error_t scanHyperscan(struct HybridContext *hyctx, const char *data, unsigned int length) { DEBUG_PRINTF("scanning %u bytes with Hyperscan\n", length); const struct ch_bytecode *hydb = hyctx->db; const hs_database_t *db = getHyperscanDatabase(hydb); hs_scratch_t *scratch = hyctx->scratch->multi_scratch; hs_error_t err = hs_scan(db, data, length, 0, scratch, multiCallback, hyctx); return err; } /** \brief Init match priority queue. * * Add a first match offset for each pattern that is not supported by Hyperscan * with prefiltering. */ static really_inline ch_error_t initQueue(struct HybridContext *hyctx, struct match_pq *pq) { const struct ch_bytecode *db = hyctx->db; u8 *active = hyctx->scratch->active; mmbit_clear(active, db->patternCount); // Init match queue size pq->size = 0; unsigned int length = hyctx->length; const u32 *unguarded = getUnguarded(db); for (u32 i = 0; i < db->unguardedCount; i++) { u32 patternId = unguarded[i]; DEBUG_PRINTF("switch on unguarded pcre %u\n", patternId); mmbit_set(active, db->patternCount, patternId); DEBUG_PRINTF("get a new match item\n"); int ret = scanPcre(hyctx, length, 0, patternId); struct ch_patterndata *pd = hyctx->scratch->patternData + patternId; if (ret == CH_CALLBACK_TERMINATE) { DEBUG_PRINTF("user callback told us to terminate scanning\n"); return CH_SCAN_TERMINATED; } else if (ret == CH_CALLBACK_SKIP_PATTERN) { DEBUG_PRINTF("user callback told us to skip this pattern\n"); pd->scanStart = length; ret = CH_SUCCESS; } else if (ret == CH_FAIL_INTERNAL) { return ret; } } return CH_SUCCESS; } static really_inline ch_error_t ch_scan_i(const ch_database_t *hydb, const char *data, unsigned int length, UNUSED unsigned int flags, ch_scratch_t *scratch, ch_match_event_handler onEvent, ch_error_event_handler onError, void *userContext) { if (unlikely(!hydb || !scratch || !data)) { DEBUG_PRINTF("args invalid\n"); return CH_INVALID; } ch_error_t ret = hydbIsValid(hydb); if (ret != CH_SUCCESS) { DEBUG_PRINTF("database invalid\n"); return ret; } if (!ISALIGNED_CL(scratch)) { DEBUG_PRINTF("bad alignment %p\n", scratch); return CH_INVALID; } if (scratch->magic != CH_SCRATCH_MAGIC) { DEBUG_PRINTF("scratch invalid\n"); return CH_INVALID; } if (unlikely(markScratchInUse(scratch))) { return CH_SCRATCH_IN_USE; } // Hyperscan underlying scratch and database validity will be checked by // the hs_scan() call, so no need to do it here. // PCRE takes the data region length in as an int, so this limits our block // size to INT_MAX. if (length > INT_MAX) { DEBUG_PRINTF("length invalid\n"); unmarkScratchInUse(scratch); return CH_INVALID; } const struct ch_bytecode *db = ch_get_bytecode(hydb); scratch->pq.size = 0; scratch->ret = CH_SUCCESS; // Firstly, we run Hyperscan in block mode and add its matches into the // active list for subsequent confirmation with pcre. struct HybridContext hyctx = { .data = data, .length = length, .valid_utf8_highwater = 0, .db = db, .scratch = scratch, .pq = &scratch->pq, .match_callback = onEvent ? onEvent : null_onEvent, .error_callback = onError, .context = userContext }; // Init priority queue. ret = initQueue(&hyctx, &scratch->pq); if (ret != CH_SUCCESS) { DEBUG_PRINTF("Chimera returned error %d\n", ret); unmarkScratchInUse(scratch); return ret; } if (!(db->flags & CHIMERA_FLAG_NO_MULTIMATCH)) { ret = scanHyperscan(&hyctx, data, length); // Errors from pcre scan. if (scratch->ret == CH_CALLBACK_TERMINATE) { DEBUG_PRINTF("Pcre terminates scan\n"); unmarkScratchInUse(scratch); return CH_SCAN_TERMINATED; } else if (scratch->ret != CH_SUCCESS) { DEBUG_PRINTF("Pcre internal error\n"); unmarkScratchInUse(scratch); return scratch->ret; } // Errors from Hyperscan scan. Note Chimera could terminate // Hyperscan callback on purpose so this is not counted as an error. if (ret != HS_SUCCESS && ret != HS_SCAN_TERMINATED) { assert(scratch->ret == CH_SUCCESS); DEBUG_PRINTF("Hyperscan returned error %d\n", ret); unmarkScratchInUse(scratch); return ret; } } DEBUG_PRINTF("Flush priority queue\n"); // Catch up with PCRE and make up id and offsets as we don't really care // about their values ret = catchupPcre(&hyctx, ~0U, length, length); if (ret != CH_SUCCESS) { DEBUG_PRINTF("PCRE catch up returned error %d\n", ret); unmarkScratchInUse(scratch); return ret; } unmarkScratchInUse(scratch); return CH_SUCCESS; } HS_PUBLIC_API ch_error_t HS_CDECL ch_scan(const ch_database_t *hydb, const char *data, unsigned int length, unsigned int flags, ch_scratch_t *scratch, ch_match_event_handler onEvent, ch_error_event_handler onError, void *userContext) { ch_error_t ret = ch_scan_i(hydb, data, length, flags, scratch, onEvent, onError, userContext); return ret; } HS_PUBLIC_API const char * HS_CDECL ch_version(void) { return HS_VERSION_STRING; } vectorscan-5.4.11/chimera/ch_runtime.h000066400000000000000000000302541452711272000176740ustar00rootroot00000000000000/* * Copyright (c) 2018, Intel Corporation * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * * Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * * Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * Neither the name of Intel Corporation nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ #ifndef CH_RUNTIME_H_ #define CH_RUNTIME_H_ #include /** * @file * @brief The Chimera runtime API definition. * * Chimera is a hybrid of Hyperscan and PCRE regular expression engine. * * This header contains functions for using compiled Chimera databases for * scanning data at runtime. */ #include "hs_common.h" #ifdef __cplusplus extern "C" { #endif struct ch_scratch; /** * A Chimera scratch space. */ typedef struct ch_scratch ch_scratch_t; /** * Callback return value used to tell the Chimera matcher what to do after * processing this match. */ typedef int ch_callback_t; /** * @defgroup CH_CALLBACK ch_callback_t values * * @{ */ /** * Continue matching. */ #define CH_CALLBACK_CONTINUE 0 /** * Terminate matching. */ #define CH_CALLBACK_TERMINATE 1 /** * Skip remaining matches for this ID and continue. */ #define CH_CALLBACK_SKIP_PATTERN 2 /** @} */ /** * Type used to differentiate the errors raised with the @ref * ch_error_event_handler callback. */ typedef int ch_error_event_t; /** * @defgroup CH_ERROR_EVENT ch_error_event_t values * * @{ */ /** * PCRE hits its match limit and reports PCRE_ERROR_MATCHLIMIT. */ #define CH_ERROR_MATCHLIMIT 1 /** * PCRE hits its recursion limit and reports PCRE_ERROR_RECURSIONLIMIT. */ #define CH_ERROR_RECURSIONLIMIT 2 /** @} */ /** * Structure representing a captured subexpression within a match. An array of * these structures corresponding to capture groups in order is passed to the * callback on match, with active structures identified by the * CH_CAPTURE_FLAG_ACTIVE flag. */ typedef struct ch_capture { /** * The flags indicating if this structure is active. */ unsigned int flags; /** * offset at which this capture group begins. */ unsigned long long from; /*< offset at which this capture group begins. */ /** * offset at which this capture group ends. */ unsigned long long to; } ch_capture_t; /** * @defgroup CH_CAPTURE ch_capture_t flags * * These flags are used in @ref ch_capture_t::flags to indicate if this * structure is active. * * @{ */ /** * Flag indicating that a particular capture group is inactive, used in @ref * ch_capture_t::flags. */ #define CH_CAPTURE_FLAG_INACTIVE 0 /** * Flag indicating that a particular capture group is active, used in @ref * ch_capture_t::flags. */ #define CH_CAPTURE_FLAG_ACTIVE 1 /** @} */ /** * Definition of the match event callback function type. * * A callback function matching the defined type must be provided by the * application calling the @ref ch_scan() * * This callback function will be invoked whenever a match is located in the * target data during the execution of a scan. The details of the match are * passed in as parameters to the callback function, and the callback function * should return a value indicating whether or not matching should continue on * the target data. If no callbacks are desired from a scan call, NULL may be * provided in order to suppress match production. * * @param id * The ID number of the expression that matched. If the expression was a * single expression compiled with @ref ch_compile(), this value will be * zero. * * @param from * The offset of the first byte that matches the expression. * * @param to * The offset after the last byte that matches the expression. * * @param flags * This is provided for future use and is unused at present. * * @param size * The number of valid entries pointed to by the captured parameter. * * @param captured * A pointer to an array of @ref ch_capture_t structures that * contain the start and end offsets of entire pattern match and * each captured subexpression. * * @param ctx * The pointer supplied by the user to the @ref ch_scan() function. * * @return * The callback can return @ref CH_CALLBACK_TERMINATE to stop matching. * Otherwise, a return value of @ref CH_CALLBACK_CONTINUE will continue, * with the current pattern if configured to produce multiple matches per * pattern, while a return value of @ref CH_CALLBACK_SKIP_PATTERN will * cease matching this pattern but continue matching the next pattern. */ typedef ch_callback_t (HS_CDECL *ch_match_event_handler)(unsigned int id, unsigned long long from, unsigned long long to, unsigned int flags, unsigned int size, const ch_capture_t *captured, void *ctx); /** * Definition of the Chimera error event callback function type. * * A callback function matching the defined type may be provided by the * application calling the @ref ch_scan function. This callback function * will be invoked when an error event occurs during matching; this indicates * that some matches for a given expression may not be reported. * * @param error_type * The type of error event that occurred. Currently these errors * correspond to resource limits on PCRE backtracking * @ref CH_ERROR_MATCHLIMIT and @ref CH_ERROR_RECURSIONLIMIT. * * @param id * The ID number of the expression that matched. * * @param info * Event-specific data, for future use. Currently unused. * * @param ctx * The context pointer supplied by the user to the @ref ch_scan * function. * * @return * The callback can return @ref CH_CALLBACK_SKIP_PATTERN to cease matching * this pattern but continue matching the next pattern. Otherwise, we stop * matching for all patterns with @ref CH_CALLBACK_TERMINATE. */ typedef ch_callback_t (HS_CDECL *ch_error_event_handler)( ch_error_event_t error_type, unsigned int id, void *info, void *ctx); /** * The block regular expression scanner. * * This is the function call in which the actual pattern matching takes place * for block-mode pattern databases. * * @param db * A compiled pattern database. * * @param data * Pointer to the data to be scanned. * * @param length * The number of bytes to scan. * * @param flags * Flags modifying the behaviour of this function. This parameter is * provided for future use and is unused at present. * * @param scratch * A per-thread scratch space allocated by @ref ch_alloc_scratch() for this * database. * * @param onEvent * Pointer to a match event callback function. If a NULL pointer is given, * no matches will be returned. * * @param onError * Pointer to a error event callback function. If a NULL pointer is given, * @ref CH_ERROR_MATCHLIMIT and @ref CH_ERROR_RECURSIONLIMIT errors will * be ignored and match will continue. * * @param context * The user defined pointer which will be passed to the callback function. * * @return * Returns @ref CH_SUCCESS on success; @ref CH_SCAN_TERMINATED if the * match callback indicated that scanning should stop; other values on * error. */ ch_error_t HS_CDECL ch_scan(const ch_database_t *db, const char *data, unsigned int length, unsigned int flags, ch_scratch_t *scratch, ch_match_event_handler onEvent, ch_error_event_handler onError, void *context); /** * Allocate a "scratch" space for use by Chimera. * * This is required for runtime use, and one scratch space per thread, or * concurrent caller, is required. Any allocator callback set by @ref * ch_set_scratch_allocator() or @ref ch_set_allocator() will be used by this * function. * * @param db * The database, as produced by @ref ch_compile(). * * @param scratch * On first allocation, a pointer to NULL should be provided so a new * scratch can be allocated. If a scratch block has been previously * allocated, then a pointer to it should be passed back in to see if it * is valid for this database block. If a new scratch block is required, * the original will be freed and the new one returned, otherwise the * previous scratch block will be returned. On success, the scratch block * will be suitable for use with the provided database in addition to any * databases that original scratch space was suitable for. * * @return * @ref CH_SUCCESS on successful allocation; @ref CH_NOMEM if the * allocation fails. Other errors may be returned if invalid parameters * are specified. */ ch_error_t HS_CDECL ch_alloc_scratch(const ch_database_t *db, ch_scratch_t **scratch); /** * Allocate a scratch space that is a clone of an existing scratch space. * * This is useful when multiple concurrent threads will be using the same set * of compiled databases, and another scratch space is required. Any allocator * callback set by @ref ch_set_scratch_allocator() or @ref ch_set_allocator() * will be used by this function. * * @param src * The existing @ref ch_scratch_t to be cloned. * * @param dest * A pointer to the new scratch space will be returned here. * * @return * @ref CH_SUCCESS on success; @ref CH_NOMEM if the allocation fails. * Other errors may be returned if invalid parameters are specified. */ ch_error_t HS_CDECL ch_clone_scratch(const ch_scratch_t *src, ch_scratch_t **dest); /** * Provides the size of the given scratch space. * * @param scratch * A per-thread scratch space allocated by @ref ch_alloc_scratch() or @ref * ch_clone_scratch(). * * @param scratch_size * On success, the size of the scratch space in bytes is placed in this * parameter. * * @return * @ref CH_SUCCESS on success, other values on failure. */ ch_error_t HS_CDECL ch_scratch_size(const ch_scratch_t *scratch, size_t *scratch_size); /** * Free a scratch block previously allocated by @ref ch_alloc_scratch() or @ref * ch_clone_scratch(). * * The free callback set by @ref ch_set_scratch_allocator() or @ref * ch_set_allocator() will be used by this function. * * @param scratch * The scratch block to be freed. NULL may also be safely provided. * * @return * @ref CH_SUCCESS on success, other values on failure. */ ch_error_t HS_CDECL ch_free_scratch(ch_scratch_t *scratch); #ifdef __cplusplus } /* extern "C" */ #endif #endif /* CH_RUNTIME_H_ */ vectorscan-5.4.11/chimera/ch_scratch.c000066400000000000000000000234721452711272000176370ustar00rootroot00000000000000/* * Copyright (c) 2018, Intel Corporation * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * * Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * * Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * Neither the name of Intel Corporation nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ /** \file * \brief Chimera: scratch space alloc. */ #include #include "allocator.h" #include "ch.h" #include "hs.h" #include "hs_internal.h" #include "ue2common.h" #include "ch_alloc.h" #include "ch_internal.h" #include "ch_scratch.h" #include "ch_database.h" static size_t getPatternDataSize(const ch_scratch_t *s) { size_t numCapturingStructs = s->patternCount * (s->maxCaptureGroups + 1); return (sizeof(struct ch_patterndata) * s->patternCount) + alignof(struct ch_capture) + // padding (sizeof(struct ch_capture) * numCapturingStructs); } static void initPatternData(const ch_scratch_t *s) { // ch_capture array is aligned, directly after the patterndata array. char *ptr = (char *)s->patternData + (sizeof(struct ch_patterndata) * s->patternCount); struct ch_capture *cap = (struct ch_capture *) (ROUNDUP_PTR(ptr, alignof(struct ch_capture))); for (u32 i = 0; i < s->patternCount; i++) { struct ch_patterndata *pd = &s->patternData[i]; pd->match = cap; DEBUG_PRINTF("pattern %u: pd=%p, match=%p\n", i, pd, pd->match); cap += (s->maxCaptureGroups + 1); } } static ch_error_t alloc_scratch(const ch_scratch_t *proto, ch_scratch_t **scratch) { size_t ovectorSize = (proto->maxCaptureGroups + 1) * sizeof(int) * 3; size_t capturedSize = sizeof(struct ch_capture) * (proto->maxCaptureGroups + 1); size_t patternDataSize = getPatternDataSize(proto); size_t activeSize = proto->activeSize; size_t queueSize = proto->patternCount * sizeof(struct queue_item); // max padding for alignment below. size_t padding = alignof(int) + alignof(struct ch_capture) + alignof(struct ch_patterndata) + alignof(struct queue_item); size_t allocSize = sizeof(ch_scratch_t) + ovectorSize + capturedSize + patternDataSize + activeSize + queueSize + padding + 256; /* padding for cacheline alignment */ ch_scratch_t *s; ch_scratch_t *s_tmp = ch_scratch_alloc(allocSize); ch_error_t err = ch_check_alloc(s_tmp); if (err != CH_SUCCESS) { ch_scratch_free(s_tmp); *scratch = NULL; return err; } memset(s_tmp, 0, allocSize); s = ROUNDUP_PTR(s_tmp, 64); // Set ordinary members. *s = *proto; s->magic = CH_SCRATCH_MAGIC; s->in_use = 0; s->scratch_alloc = (char *)s_tmp; // Set pointers internal to allocation. char *ptr = (char *)s + sizeof(*s); ptr = ROUNDUP_PTR(ptr, alignof(int)); s->ovector = (int *)ptr; ptr += ovectorSize; ptr = ROUNDUP_PTR(ptr, alignof(struct ch_capture)); s->captured = (struct ch_capture *)ptr; ptr += capturedSize; ptr = ROUNDUP_PTR(ptr, alignof(struct ch_patterndata)); s->patternData = (struct ch_patterndata *)ptr; ptr += patternDataSize; // Pre-fill pattern data, setting captureOffsets initPatternData(s); ptr = ROUNDUP_PTR(ptr, alignof(struct queue_item)); s->pq.item = (struct queue_item *)ptr; ptr += queueSize; s->active = (u8 *)ptr; // Store size. s->scratchSize = allocSize; // We should never overrun our allocation. assert((ptr + activeSize) - (char *)s <= (ptrdiff_t)allocSize); *scratch = s; return CH_SUCCESS; } HS_PUBLIC_API ch_error_t HS_CDECL ch_alloc_scratch(const ch_database_t *hydb, ch_scratch_t **scratch) { if (!hydb || !scratch) { DEBUG_PRINTF("invalid args\n"); return CH_INVALID; } DEBUG_PRINTF("hydb=%p, &scratch=%p\n", hydb, scratch); ch_error_t rv = hydbIsValid(hydb); if (rv != CH_SUCCESS) { DEBUG_PRINTF("invalid database\n"); return rv; } if (*scratch != NULL) { /* has to be aligned before we can do anything with it */ if (!ISALIGNED_CL(*scratch)) { return CH_INVALID; } if ((*scratch)->magic != CH_SCRATCH_MAGIC) { return CH_INVALID; } if (markScratchInUse(*scratch)) { return CH_SCRATCH_IN_USE; } } // We allocate a prototype of the scratch header to do our sizing with. ch_scratch_t *proto; ch_scratch_t *proto_tmp = ch_scratch_alloc(sizeof(ch_scratch_t) + 256); ch_error_t proto_ret = ch_check_alloc(proto_tmp); if (proto_ret != CH_SUCCESS) { ch_scratch_free(proto_tmp); ch_scratch_free(*scratch); *scratch = NULL; return proto_ret; } proto = ROUNDUP_PTR(proto_tmp, 64); int resize = 0; if (*scratch) { *proto = **scratch; } else { memset(proto, 0, sizeof(*proto)); resize = 1; } proto->scratch_alloc = (char *)proto_tmp; const struct ch_bytecode *db = ch_get_bytecode(hydb); if (db->maxCaptureGroups > proto->maxCaptureGroups) { proto->maxCaptureGroups = db->maxCaptureGroups; resize = 1; } if (db->patternCount > proto->patternCount) { proto->patternCount = db->patternCount; proto->activeSize = db->activeSize; resize = 1; } if (resize) { if (*scratch) { ch_scratch_free((*scratch)->scratch_alloc); } ch_error_t alloc_ret = alloc_scratch(proto, scratch); ch_scratch_free(proto_tmp); if (alloc_ret != CH_SUCCESS) { *scratch = NULL; return alloc_ret; } } else { ch_scratch_free(proto_tmp); unmarkScratchInUse(*scratch); } if (db->flags & CHIMERA_FLAG_NO_MULTIMATCH) { return CH_SUCCESS; } // We may still have to realloc the underlying Hyperscan scratch. rv = hs_alloc_scratch(getHyperscanDatabase(db), &(*scratch)->multi_scratch); if (rv != HS_SUCCESS) { DEBUG_PRINTF("hs_alloc_scratch for multi_scratch failed\n"); hs_free_scratch((*scratch)->multi_scratch); ch_scratch_free((*scratch)->scratch_alloc); *scratch = NULL; return rv; } return CH_SUCCESS; } HS_PUBLIC_API ch_error_t HS_CDECL ch_clone_scratch(const ch_scratch_t *src, ch_scratch_t **dest) { if (!dest || !src || !ISALIGNED_CL(src) || src->magic != CH_SCRATCH_MAGIC) { DEBUG_PRINTF("scratch invalid\n"); return CH_INVALID; } ch_error_t ret = alloc_scratch(src, dest); if (ret != CH_SUCCESS) { DEBUG_PRINTF("alloc_scratch failed\n"); *dest = NULL; return ret; } if (src->multi_scratch) { (*dest)->multi_scratch = NULL; ret = hs_clone_scratch(src->multi_scratch, &(*dest)->multi_scratch); if (ret != HS_SUCCESS) { DEBUG_PRINTF("hs_clone_scratch(multi_scratch,...) failed\n"); ch_scratch_free(*dest); return ret; } } return CH_SUCCESS; } HS_PUBLIC_API ch_error_t HS_CDECL ch_free_scratch(ch_scratch_t *scratch) { ch_error_t ret = CH_SUCCESS; if (scratch) { /* has to be aligned before we can do anything with it */ if (!ISALIGNED_CL(scratch)) { return CH_INVALID; } if (scratch->magic != CH_SCRATCH_MAGIC) { return CH_INVALID; } if (markScratchInUse(scratch)) { return CH_SCRATCH_IN_USE; } if (scratch->multi_scratch) { ret = hs_free_scratch(scratch->multi_scratch); } scratch->magic = 0; assert(scratch->scratch_alloc); DEBUG_PRINTF("scratch %p is really at %p : freeing\n", scratch, scratch->scratch_alloc); ch_scratch_free(scratch->scratch_alloc); } return ret; } /** Not public, but used for info from our internal tools. Note that in the * hybrid matcher the scratch is definitely not a contiguous memory region. */ HS_PUBLIC_API ch_error_t HS_CDECL ch_scratch_size(const ch_scratch_t *scratch, size_t *size) { ch_error_t ret = CH_SUCCESS; if (!size || !scratch || !ISALIGNED_CL(scratch) || scratch->magic != CH_SCRATCH_MAGIC) { return CH_INVALID; } else { size_t multi_size = 0; if (scratch->multi_scratch) { ret = hs_scratch_size(scratch->multi_scratch, &multi_size); } if (ret) { multi_size = 0; } *size = scratch->scratchSize + multi_size; } return ret; } vectorscan-5.4.11/chimera/ch_scratch.h000066400000000000000000000104061452711272000176350ustar00rootroot00000000000000/* * Copyright (c) 2018, Intel Corporation * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * * Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * * Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * Neither the name of Intel Corporation nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ /** \file * \brief Scratch and associated data structures. * * This header gets pulled into many places (many deep, slow to compile * places). Try to keep the included headers under control. */ #ifndef CH_SCRATCH_H_ #define CH_SCRATCH_H_ #include "ch_common.h" #include "ch_runtime.h" #ifdef __cplusplus extern "C" { #endif #define CH_SCRATCH_MAGIC 0x554F4259 //!< Magic number stored in \ref ch_scratch struct queue_item { int from; /** \brief used to store the start location. */ int to; /** \brief used to store the current location. */ u32 id; /**< pattern index. */ }; struct match_pq { struct queue_item *item; u32 size; /**< current size of the priority queue */ }; /** \brief Information about a pattern stored at runtime when a match is * encountered. */ struct ch_patterndata { struct ch_capture *match; //!< buffered group info u32 groupCount; //!< number of capturing groups u32 scanStart; //!< start of match window (still to be single-scanned). }; /** \brief Scratch space header for Chimera. */ struct ch_scratch { u32 magic; //!< must be \ref CH_SCRATCH_MAGIC u8 in_use; /**< non-zero when being used by an API call. */ struct hs_scratch *multi_scratch; //!< for hyperscan scatch. int *ovector; //!< maximally-sized ovector for PCRE usage. struct ch_capture *captured; //!< max-sized capture group struct. u8 *active; //!< active multibit. struct ch_patterndata *patternData; //!< per-pattern match data, indexed by // pattern ID. struct match_pq pq; //!< priority queue to ensure matching ordering u32 patternCount; //!< number of patterns, used to size active multibit u32 activeSize; //!< size of active multibit u32 maxCaptureGroups; //!< largest num of capturing groups required u32 scratchSize; //!< size of allocation int ret; //!< return value in Hyperscan callback char *scratch_alloc; /* user allocated scratch object */ }; /** * \brief Mark scratch as in use. * * Returns non-zero if it was already in use, zero otherwise. */ static really_inline char markScratchInUse(struct ch_scratch *scratch) { DEBUG_PRINTF("marking scratch as in use\n"); assert(scratch && scratch->magic == CH_SCRATCH_MAGIC); if (scratch->in_use) { DEBUG_PRINTF("scratch already in use!\n"); return 1; } scratch->in_use = 1; return 0; } /** * \brief Mark scratch as no longer in use. */ static really_inline void unmarkScratchInUse(struct ch_scratch *scratch) { DEBUG_PRINTF("marking scratch as not in use\n"); assert(scratch && scratch->magic == CH_SCRATCH_MAGIC); assert(scratch->in_use == 1); scratch->in_use = 0; } #ifdef __cplusplus } /* extern "C" */ #endif #endif /* CH_SCRATCH_H_ */ vectorscan-5.4.11/chimera/libch.pc.in000066400000000000000000000005501452711272000173740ustar00rootroot00000000000000prefix=@CMAKE_INSTALL_PREFIX@ exec_prefix=@CMAKE_INSTALL_PREFIX@ libdir=@CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_LIBDIR@ includedir=@CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_INCLUDEDIR@ Name: libch Description: Intel(R) Chimera Library Version: @HS_VERSION@ Requires.private: libhs Libs: -L${libdir} -lchimera Libs.private: @PRIVATE_LIBS@ Cflags: -I${includedir}/hs vectorscan-5.4.11/cmake/000077500000000000000000000000001452711272000150325ustar00rootroot00000000000000vectorscan-5.4.11/cmake/archdetect.cmake000066400000000000000000000071171452711272000201500ustar00rootroot00000000000000if (USE_CPU_NATIVE) # Detect best GNUCC_ARCH to tune for if (CMAKE_COMPILER_IS_GNUCC) message(STATUS "gcc version ${CMAKE_C_COMPILER_VERSION}") # If gcc doesn't recognise the host cpu, then mtune=native becomes # generic, which isn't very good in some cases. march=native looks at # cpuid info and then chooses the best microarch it can (and replaces # the flag), so use that for tune. set(TUNE_FLAG "mtune") set(GNUCC_TUNE "") message(STATUS "ARCH_FLAG '${ARCH_FLAG}' '${GNUCC_ARCH}', TUNE_FLAG '${TUNE_FLAG}' '${GNUCC_TUNE}' ") # arg1 might exist if using ccache string (STRIP "${CMAKE_C_COMPILER_ARG1}" CC_ARG1) set (EXEC_ARGS ${CC_ARG1} -c -Q --help=target -${ARCH_FLAG}=native -${TUNE_FLAG}=native) execute_process(COMMAND ${CMAKE_C_COMPILER} ${EXEC_ARGS} OUTPUT_VARIABLE _GCC_OUTPUT) set(_GCC_OUTPUT_TUNE ${_GCC_OUTPUT}) string(FIND "${_GCC_OUTPUT}" "${ARCH_FLAG}=" POS) string(SUBSTRING "${_GCC_OUTPUT}" ${POS} -1 _GCC_OUTPUT) string(REGEX REPLACE "${ARCH_FLAG}=[ \t]*([^ \n]*)[ \n].*" "\\1" GNUCC_ARCH "${_GCC_OUTPUT}") string(FIND "${_GCC_OUTPUT_TUNE}" "${TUNE_FLAG}=" POS_TUNE) string(SUBSTRING "${_GCC_OUTPUT_TUNE}" ${POS_TUNE} -1 _GCC_OUTPUT_TUNE) string(REGEX REPLACE "${TUNE_FLAG}=[ \t]*([^ \n]*)[ \n].*" "\\1" GNUCC_TUNE "${_GCC_OUTPUT_TUNE}") message(STATUS "ARCH_FLAG '${ARCH_FLAG}' '${GNUCC_ARCH}', TUNE_FLAG '${TUNE_FLAG}' '${GNUCC_TUNE}' ") # test the parsed flag set (EXEC_ARGS ${CC_ARG1} -E - -${ARCH_FLAG}=${GNUCC_ARCH} -${TUNE_FLAG}=${GNUCC_TUNE}) execute_process(COMMAND ${CMAKE_C_COMPILER} ${EXEC_ARGS} OUTPUT_QUIET ERROR_QUIET INPUT_FILE /dev/null RESULT_VARIABLE GNUCC_TUNE_TEST) if (NOT GNUCC_TUNE_TEST EQUAL 0) message(WARNING "Something went wrong determining gcc tune: -mtune=${GNUCC_TUNE} not valid, falling back to -mtune=native") set(GNUCC_TUNE native) else() set(GNUCC_TUNE ${GNUCC_TUNE}) message(STATUS "gcc will tune for ${GNUCC_ARCH}, ${GNUCC_TUNE}") endif() elseif (CMAKE_COMPILER_IS_CLANG) if (ARCH_IA32 OR ARCH_X86_64) set(GNUCC_ARCH x86_64_v2) set(TUNE_FLAG generic) elseif(ARCH_AARCH64) if (BUILD_SVE2_BITPERM) set(GNUCC_ARCH ${SVE2_BITPERM_ARCH}) elseif (BUILD_SVE2) set(GNUCC_ARCH ${SVE2_ARCH}) elseif (BUILD_SVE) set(GNUCC_ARCH ${SVE_ARCH}) else () set(GNUCC_ARCH ${ARMV8_ARCH}) endif() set(TUNE_FLAG generic) elseif(ARCH_ARM32) set(GNUCC_ARCH armv7a) set(TUNE_FLAG generic) else() set(GNUCC_ARCH native) set(TUNE_FLAG generic) endif() message(STATUS "clang will tune for ${GNUCC_ARCH}, ${TUNE_FLAG}") endif() else() if (ARCH_IA32 OR ARCH_X86_64) set(GNUCC_ARCH native) set(TUNE_FLAG generic) elseif(ARCH_AARCH64) if (BUILD_SVE2_BITPERM) set(GNUCC_ARCH ${SVE2_BITPERM_ARCH}) elseif (BUILD_SVE2) set(GNUCC_ARCH ${SVE2_ARCH}) elseif (BUILD_SVE) set(GNUCC_ARCH ${SVE_ARCH}) else () set(GNUCC_ARCH ${ARMV8_ARCH}) endif() set(TUNE_FLAG generic) elseif(ARCH_ARM32) set(GNUCC_ARCH armv7a) set(TUNE_FLAG generic) else() set(GNUCC_ARCH power9) set(TUNE_FLAG power9) endif() endif() vectorscan-5.4.11/cmake/attrib.cmake000066400000000000000000000005521452711272000173230ustar00rootroot00000000000000# tests for compiler properties # set -Werror so we can't ignore unused attribute warnings set (CMAKE_REQUIRED_FLAGS "-Werror") CHECK_C_SOURCE_COMPILES(" int foo(int) __attribute__ ((ifunc(\"foo_i\"))); int f1(int i) { return i; } void (*foo_i()) { return f1; } int main(void) { return 0; } " HAS_C_ATTR_IFUNC) unset(CMAKE_REQUIRED_FLAGS) vectorscan-5.4.11/cmake/backtrace.cmake000066400000000000000000000040041452711272000177510ustar00rootroot00000000000000# The `backtrace' function is available on Linux via glibc, and on FreeBSD if # the 'libexecinfo' package is installed. CHECK_C_SOURCE_COMPILES( "#include \n#include \nint main () { backtrace(NULL, 0); }" BACKTRACE_LIBC) if(BACKTRACE_LIBC) set(HAVE_BACKTRACE TRUE) set(BACKTRACE_CFLAGS "") set(BACKTRACE_LDFLAGS "") endif() if(NOT BACKTRACE_LIBC) # FreeBSD 10 has backtrace but requires libexecinfo list(INSERT CMAKE_REQUIRED_LIBRARIES 0 "-lexecinfo") CHECK_C_SOURCE_COMPILES( "#include \n#include \nint main () { backtrace(NULL, 0); }" BACKTRACE_LIBEXECINFO) list(REMOVE_ITEM CMAKE_REQUIRED_LIBRARIES "-lexecinfo") if(BACKTRACE_LIBEXECINFO) set(HAVE_BACKTRACE TRUE) set(BACKTRACE_CFLAGS "") set(BACKTRACE_LDFLAGS "-lexecinfo") else() # older FreeBSD requires it from ports list(INSERT CMAKE_REQUIRED_INCLUDES 0 "/usr/local/include") list(INSERT CMAKE_REQUIRED_LIBRARIES 0 "-L/usr/local/lib -lexecinfo") CHECK_C_SOURCE_COMPILES( "#include \n#include \nint main () { backtrace(NULL, 0); }" BACKTRACE_LIBEXECINFO_LOCAL) list(REMOVE_ITEM CMAKE_REQUIRED_INCLUDES 0 "/usr/local/include") list(REMOVE_ITEM CMAKE_REQUIRED_LIBRARIES "-L/usr/local/lib -lexecinfo") if(BACKTRACE_LIBEXECINFO_LOCAL) set(HAVE_BACKTRACE TRUE) set(BACKTRACE_CFLAGS "-I/usr/local/include") set(BACKTRACE_LDFLAGS "-L/usr/local/lib -lexecinfo") endif() endif() endif() if(HAVE_BACKTRACE) CHECK_C_COMPILER_FLAG(-rdynamic HAS_RDYNAMIC) if(HAS_RDYNAMIC) list(INSERT BACKTRACE_LDFLAGS 0 -rdynamic) endif() else() set(BACKTRACE_CFLAGS "") set(BACKTRACE_LDFLAGS "") endif() # cmake scope fun set(HAVE_BACKTRACE ${HAVE_BACKTRACE} CACHE BOOL INTERNAL) set(BACKTRACE_CFLAGS ${BACKTRACE_CFLAGS} CACHE STRING INTERNAL) set(BACKTRACE_LDFLAGS ${BACKTRACE_LDFLAGS} CACHE STRING INTERNAL) vectorscan-5.4.11/cmake/boost.cmake000066400000000000000000000045171452711272000171710ustar00rootroot00000000000000# Various checks related to Boost set(BOOST_USE_STATIC_LIBS OFF) set(BOOST_USE_MULTITHREADED OFF) set(BOOST_USE_STATIC_RUNTIME OFF) if (HAVE_LIBCPP) # we need a more recent boost for libc++ set(BOOST_MINVERSION 1.61.0) else () set(BOOST_MINVERSION 1.57.0) endif () set(BOOST_NO_BOOST_CMAKE ON) unset(Boost_INCLUDE_DIR CACHE) # we might have boost in tree, so provide a hint and try again set(BOOST_INCLUDEDIR "${PROJECT_SOURCE_DIR}/include") find_package(Boost ${BOOST_MINVERSION} QUIET) if(NOT Boost_FOUND) # otherwise check for Boost installed on the system unset(BOOST_INCLUDEDIR) find_package(Boost ${BOOST_MINVERSION} QUIET) if(NOT Boost_FOUND) message(FATAL_ERROR "Boost ${BOOST_MINVERSION} or later not found. Either install system packages if available, extract Boost headers to ${CMAKE_SOURCE_DIR}/include, or set the CMake BOOST_ROOT variable.") endif() endif() message(STATUS "Boost version: ${Boost_MAJOR_VERSION}.${Boost_MINOR_VERSION}.${Boost_SUBMINOR_VERSION}") # Boost 1.62 has a bug that we've patched around, check if it is required if (Boost_VERSION EQUAL 106200) set (CMAKE_REQUIRED_INCLUDES ${BOOST_INCLUDEDIR} "${PROJECT_SOURCE_DIR}/include") set (BOOST_REV_TEST " #include #include #include #include int main(int,char*[]) { using namespace boost; // Check const reverse_graph { typedef adjacency_list< vecS, vecS, bidirectionalS, property, property, property > AdjList; typedef reverse_graph Graph; BOOST_CONCEPT_ASSERT(( BidirectionalGraphConcept )); } return 0; } ") CHECK_CXX_SOURCE_COMPILES("${BOOST_REV_TEST}" BOOST_REVGRAPH_OK) if (NOT BOOST_REVGRAPH_OK) message(STATUS "trying patched") CHECK_CXX_SOURCE_COMPILES(" #include ${BOOST_REV_TEST}" BOOST_REVGRAPH_PATCH) endif() if (NOT BOOST_REVGRAPH_OK AND NOT BOOST_REVGRAPH_PATCH) message(FATAL_ERROR "Something is wrong with this copy of boost::reverse_graph") endif() unset (CMAKE_REQUIRED_INCLUDES) else () unset(BOOST_REVGRAPH_OK CACHE) unset(BOOST_REVGRAPH_PATCH CACHE) endif () # Boost 1.62.0 vectorscan-5.4.11/cmake/build_wrapper.sh000077500000000000000000000016461452711272000202370ustar00rootroot00000000000000#!/bin/sh -e # This is used for renaming symbols for the fat runtime, don't call directly # TODO: make this a lot less fragile! cleanup () { rm -f ${SYMSFILE} ${KEEPSYMS} } PREFIX=$1 KEEPSYMS_IN=$2 shift 2 # $@ contains the actual build command OUT=$(echo "$@" | rev | cut -d ' ' -f 2- | rev | sed 's/.* -o \(.*\.o\).*/\1/') trap cleanup INT QUIT EXIT SYMSFILE=$(mktemp -p /tmp ${PREFIX}_rename.syms.XXXXX) KEEPSYMS=$(mktemp -p /tmp keep.syms.XXXXX) # find the libc used by gcc LIBC_SO=$("$@" --print-file-name=libc.so.6) cp ${KEEPSYMS_IN} ${KEEPSYMS} # get all symbols from libc and turn them into patterns nm -f p -g -D ${LIBC_SO} | sed -s 's/\([^ @]*\).*/^\1$/' >> ${KEEPSYMS} # build the object "$@" # rename the symbols in the object nm -f p -g ${OUT} | cut -f1 -d' ' | grep -v -f ${KEEPSYMS} | sed -e "s/\(.*\)/\1\ ${PREFIX}_\1/" >> ${SYMSFILE} if test -s ${SYMSFILE} then objcopy --redefine-syms=${SYMSFILE} ${OUT} fi vectorscan-5.4.11/cmake/cflags-arm.cmake000066400000000000000000000052341452711272000200540ustar00rootroot00000000000000if (NOT FAT_RUNTIME) if (BUILD_SVE2_BITPERM) message (STATUS "SVE2_BITPERM implies SVE2, enabling BUILD_SVE2") set(BUILD_SVE2 ON) endif () if (BUILD_SVE2) message (STATUS "SVE2 implies SVE, enabling BUILD_SVE") set(BUILD_SVE ON) endif () endif () if (CMAKE_COMPILER_IS_GNUCXX) set(ARMV9BASE_MINVER "12") if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS ARMV9BASE_MINVER) set(SVE2_ARCH "armv8-a+sve2") else() set(SVE2_ARCH "armv9-a") endif() else() set(SVE2_ARCH "armv9-a") endif() set(ARMV8_ARCH "armv8-a") set(SVE_ARCH "${ARMV8_ARCH}+sve") set(SVE2_BITPERM_ARCH "${SVE2_ARCH}+sve2-bitperm") CHECK_INCLUDE_FILE_CXX(arm_neon.h HAVE_C_ARM_NEON_H) if (BUILD_SVE OR BUILD_SVE2 OR BUILD_SVE2_BITPERM OR FAT_RUNTIME) set(CMAKE_REQUIRED_FLAGS "-march=${SVE_ARCH}") CHECK_INCLUDE_FILE_CXX(arm_sve.h HAVE_C_ARM_SVE_H) if (NOT HAVE_C_ARM_SVE_H) message(FATAL_ERROR "arm_sve.h is required to build for SVE.") endif() endif() CHECK_C_SOURCE_COMPILES("#include int main() { int32x4_t a = vdupq_n_s32(1); (void)a; }" HAVE_NEON) if (BUILD_SVE2_BITPERM) set(CMAKE_REQUIRED_FLAGS "-march=${SVE2_BITPERM_ARCH}") CHECK_C_SOURCE_COMPILES("#include int main() { svuint8_t a = svbext(svdup_u8(1), svdup_u8(2)); (void)a; }" HAVE_SVE2_BITPERM) endif() if (BUILD_SVE2) set(CMAKE_REQUIRED_FLAGS "-march=${SVE2_ARCH}") CHECK_C_SOURCE_COMPILES("#include int main() { svuint8_t a = svbsl(svdup_u8(1), svdup_u8(2), svdup_u8(3)); (void)a; }" HAVE_SVE2) endif() if (BUILD_SVE) set(CMAKE_REQUIRED_FLAGS "-march=${SVE_ARCH}") CHECK_C_SOURCE_COMPILES("#include int main() { svuint8_t a = svdup_u8(1); (void)a; }" HAVE_SVE) endif () if (FAT_RUNTIME) if (NOT HAVE_NEON) message(FATAL_ERROR "NEON support required to build fat runtime") endif () if (BUILD_SVE AND NOT HAVE_SVE) message(FATAL_ERROR "SVE support required to build fat runtime") endif () if (BUILD_SVE2 AND NOT HAVE_SVE2) message(FATAL_ERROR "SVE2 support required to build fat runtime") endif () if (BUILD_SVE2_BITPERM AND NOT HAVE_SVE2_BITPERM) message(FATAL_ERROR "SVE2 support required to build fat runtime") endif () else (NOT FAT_RUNTIME) if (NOT BUILD_SVE) message(STATUS "Building without SVE support") endif () if (NOT BUILD_SVE2) message(STATUS "Building without SVE2 support") endif () if (NOT HAVE_NEON) message(FATAL_ERROR "Neon/ASIMD support required for Arm support") endif () endif () vectorscan-5.4.11/cmake/cflags-generic.cmake000066400000000000000000000144671452711272000207210ustar00rootroot00000000000000# set compiler flags - more are tested and added later set(EXTRA_C_FLAGS "${OPT_C_FLAG} -std=c17 -Wall -Wextra -Wshadow -Wcast-qual -fno-strict-aliasing") set(EXTRA_CXX_FLAGS "${OPT_CXX_FLAG} -std=c++17 -Wall -Wextra -Wshadow -Wswitch -Wreturn-type -Wcast-qual -Wno-deprecated -Wnon-virtual-dtor -fno-strict-aliasing") if (NOT CMAKE_COMPILER_IS_CLANG) set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -fno-new-ttp-matching") endif() if (NOT RELEASE_BUILD) # -Werror is most useful during development, don't potentially break # release builds set(EXTRA_C_FLAGS "${EXTRA_C_FLAGS} -Werror") set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -Werror") if (CMAKE_COMPILER_IS_CLANG) if (CMAKE_C_COMPILER_VERSION VERSION_GREATER "13.0") set(EXTRA_C_FLAGS "${EXTRA_C_FLAGS} -Wno-unused-but-set-variable") set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -Wno-unused-but-set-variable") endif() endif() endif() if (DISABLE_ASSERTS) set(EXTRA_C_FLAGS "${EXTRA_C_FLAGS} -DNDEBUG") set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -DNDEBUG") endif() if(CMAKE_COMPILER_IS_GNUCC) # spurious warnings? set(EXTRA_C_FLAGS "${EXTRA_C_FLAGS} -Wno-array-bounds -Wno-maybe-uninitialized") endif() if(CMAKE_COMPILER_IS_GNUCXX) set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -Wno-maybe-uninitialized") if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS 5.0) set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -fabi-version=0") endif () # don't complain about abi set(EXTRA_C_FLAGS "${EXTRA_C_FLAGS} -Wno-abi") set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -Wno-abi") endif() if (NOT(ARCH_IA32 AND RELEASE_BUILD)) set(EXTRA_C_FLAGS "${EXTRA_C_FLAGS} -fno-omit-frame-pointer") set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -fno-omit-frame-pointer") endif() CHECK_INCLUDE_FILES(unistd.h HAVE_UNISTD_H) CHECK_FUNCTION_EXISTS(posix_memalign HAVE_POSIX_MEMALIGN) CHECK_FUNCTION_EXISTS(_aligned_malloc HAVE__ALIGNED_MALLOC) # these end up in the config file CHECK_C_COMPILER_FLAG(-fvisibility=hidden HAS_C_HIDDEN) CHECK_CXX_COMPILER_FLAG(-fvisibility=hidden HAS_CXX_HIDDEN) # are we using libc++ CHECK_CXX_SYMBOL_EXISTS(_LIBCPP_VERSION ciso646 HAVE_LIBCPP) if (RELEASE_BUILD) if (HAS_C_HIDDEN) set(EXTRA_C_FLAGS "${EXTRA_C_FLAGS} -fvisibility=hidden") endif() if (HAS_CXX_HIDDEN) set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -fvisibility=hidden") endif() endif() # testing a builtin takes a little more work CHECK_C_SOURCE_COMPILES("void *aa_test(void *x) { return __builtin_assume_aligned(x, 16);}\nint main(void) { return 0; }" HAVE_CC_BUILTIN_ASSUME_ALIGNED) CHECK_CXX_SOURCE_COMPILES("void *aa_test(void *x) { return __builtin_assume_aligned(x, 16);}\nint main(void) { return 0; }" HAVE_CXX_BUILTIN_ASSUME_ALIGNED) # Clang does not use __builtin_constant_p() the same way as gcc if (NOT CMAKE_COMPILER_IS_CLANG) CHECK_C_SOURCE_COMPILES("int main(void) { __builtin_constant_p(0); }" HAVE__BUILTIN_CONSTANT_P) endif() set(C_FLAGS_TO_CHECK # Variable length arrays are way bad, most especially at run time "-Wvla" # Pointer arith on void pointers is doing it wrong. "-Wpointer-arith" # Build our C code with -Wstrict-prototypes -Wmissing-prototypes "-Wstrict-prototypes" "-Wmissing-prototypes" ) foreach (FLAG ${C_FLAGS_TO_CHECK}) # munge the name so it doesn't break things string(REPLACE "-" "_" FNAME C_FLAG${FLAG}) CHECK_C_COMPILER_FLAG("${FLAG}" ${FNAME}) if (${FNAME}) set(EXTRA_C_FLAGS "${EXTRA_C_FLAGS} ${FLAG}") endif() endforeach() # self-assign should be thrown away, but clang whinges CHECK_C_COMPILER_FLAG("-Wself-assign" CC_SELF_ASSIGN) if (CC_SELF_ASSIGN) set(EXTRA_C_FLAGS "${EXTRA_C_FLAGS} -Wno-self-assign") endif() CHECK_CXX_COMPILER_FLAG("-Wself-assign" CXX_SELF_ASSIGN) if (CXX_SELF_ASSIGN) set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -Wno-self-assign") endif() # clang gets up in our face for going paren crazy with macros CHECK_C_COMPILER_FLAG("-Wparentheses-equality" CC_PAREN_EQUALITY) if (CC_PAREN_EQUALITY) set(EXTRA_C_FLAGS "${EXTRA_C_FLAGS} -Wno-parentheses-equality") endif() # clang complains about unused const vars in our Ragel-generated code. CHECK_CXX_COMPILER_FLAG("-Wunused-const-variable" CXX_UNUSED_CONST_VAR) if (CXX_UNUSED_CONST_VAR) set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -Wno-unused-const-variable") endif() # clang-14 complains about unused-but-set variable. CHECK_CXX_COMPILER_FLAG("-Wunused-but-set-variable" CXX_UNUSED_BUT_SET_VAR) if (CXX_UNUSED_BUT_SET_VAR) set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -Wno-unused-but-set-variable") endif() # clang-14 complains about using bitwise operator instead of logical ones. CHECK_CXX_COMPILER_FLAG("-Wbitwise-instead-of-logical" CXX_BITWISE_INSTEAD_OF_LOGICAL) if (CXX_BITWISE_INSTEAD_OF_LOGICAL) set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -Wno-bitwise-instead-of-logical") endif() # clang-14 complains about using bitwise operator instead of logical ones. CHECK_CXX_COMPILER_FLAG("-Wbitwise-instead-of-logical" CXX_BITWISE_INSTEAD_OF_LOGICAL) if (CXX_BITWISE_INSTEAD_OF_LOGICAL) set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -Wno-bitwise-instead-of-logical") endif() CHECK_CXX_COMPILER_FLAG("-Wignored-attributes" CXX_IGNORED_ATTR) if (CXX_IGNORED_ATTR) set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -Wno-ignored-attributes") endif() # gcc 9 complains about redundant move for returned variable CHECK_CXX_COMPILER_FLAG("-Wredundant-move" CXX_REDUNDANT_MOVE) if (CXX_REDUNDANT_MOVE) set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -Wno-redundant-move") endif() # note this for later, g++ doesn't have this flag but clang does CHECK_CXX_COMPILER_FLAG("-Wweak-vtables" CXX_WEAK_VTABLES) if (CXX_WEAK_VTABLES) set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -Wweak-vtables") endif() CHECK_CXX_COMPILER_FLAG("-Wmissing-declarations" CXX_MISSING_DECLARATIONS) if (CXX_MISSING_DECLARATIONS) set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -Wmissing-declarations") endif() CHECK_CXX_COMPILER_FLAG("-Wunused-local-typedefs" CXX_UNUSED_LOCAL_TYPEDEFS) CHECK_CXX_COMPILER_FLAG("-Wunused-variable" CXX_WUNUSED_VARIABLE) # gcc 10 complains about this CHECK_C_COMPILER_FLAG("-Wstringop-overflow" CC_STRINGOP_OVERFLOW) CHECK_CXX_COMPILER_FLAG("-Wstringop-overflow" CXX_STRINGOP_OVERFLOW) if(CC_STRINGOP_OVERFLOW OR CXX_STRINGOP_OVERFLOW) set(EXTRA_C_FLAGS "${EXTRA_C_FLAGS} -Wno-stringop-overflow") set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -Wno-stringop-overflow") endif() vectorscan-5.4.11/cmake/cflags-ppc64le.cmake000066400000000000000000000006541452711272000205530ustar00rootroot00000000000000 CHECK_INCLUDE_FILE_CXX(altivec.h HAVE_C_PPC64EL_ALTIVEC_H) if (HAVE_C_PPC64EL_ALTIVEC_H) set (INTRIN_INC_H "altivec.h") else() message (FATAL_ERROR "No intrinsics header found for VSX") endif () CHECK_C_SOURCE_COMPILES("#include <${INTRIN_INC_H}> int main() { vector int a = vec_splat_s32(1); (void)a; }" HAVE_VSX) if (NOT HAVE_VSX) message(FATAL_ERROR "VSX support required for Power support") endif () vectorscan-5.4.11/cmake/cflags-x86.cmake000066400000000000000000000075561452711272000177330ustar00rootroot00000000000000option(BUILD_AVX512 "Enabling support for AVX512" OFF) option(BUILD_AVX512VBMI "Enabling support for AVX512VBMI" OFF) set(SKYLAKE_FLAG "-march=skylake-avx512") set(ICELAKE_FLAG "-march=icelake-server") if (NOT FAT_RUNTIME) if (BUILD_AVX512VBMI) message (STATUS "AVX512VBMI implies AVX512, enabling BUILD_AVX512") set(BUILD_AVX512 ON) set(ARCH_C_FLAGS "${ICELAKE_FLAG}") set(ARCH_CXX_FLAGS "${ICELAKE_FLAG}") endif () if (BUILD_AVX512) message (STATUS "AVX512 implies AVX2, enabling BUILD_AVX2") set(BUILD_AVX2 ON) set(ARCH_C_FLAGS "${SKYLAKE_FLAG}") set(ARCH_CXX_FLAGS "${SKYLAKE_FLAG}") endif () if (BUILD_AVX2) message (STATUS "Enabling BUILD_AVX2") set(ARCH_C_FLAGS "-mavx2") set(ARCH_CXX_FLAGS "-mavx2") else() set(ARCH_C_FLAGS "-msse4.2") set(ARCH_CXX_FLAGS "-msse4.2") endif() else() set(ARCH_C_FLAGS "-msse4.2") set(ARCH_CXX_FLAGS "-msse4.2") endif() set(CMAKE_REQUIRED_FLAGS "${ARCH_C_FLAGS}") CHECK_INCLUDE_FILES(intrin.h HAVE_C_INTRIN_H) CHECK_INCLUDE_FILE_CXX(intrin.h HAVE_CXX_INTRIN_H) CHECK_INCLUDE_FILES(x86intrin.h HAVE_C_X86INTRIN_H) CHECK_INCLUDE_FILE_CXX(x86intrin.h HAVE_CXX_X86INTRIN_H) if (HAVE_C_X86INTRIN_H) set (INTRIN_INC_H "x86intrin.h") elseif (HAVE_C_INTRIN_H) set (INTRIN_INC_H "intrin.h") else() message (FATAL_ERROR "No intrinsics header found for SSE/AVX2/AVX512") endif () if (BUILD_AVX512) CHECK_C_COMPILER_FLAG(${SKYLAKE_FLAG} HAS_ARCH_SKYLAKE) if (NOT HAS_ARCH_SKYLAKE) message (FATAL_ERROR "AVX512 not supported by compiler") endif () endif () if (BUILD_AVX512VBMI) CHECK_C_COMPILER_FLAG(${ICELAKE_FLAG} HAS_ARCH_ICELAKE) if (NOT HAS_ARCH_ICELAKE) message (FATAL_ERROR "AVX512VBMI not supported by compiler") endif () endif () # ensure we have the minimum of SSE4.2 - call a SSE4.2 intrinsic CHECK_C_SOURCE_COMPILES("#include <${INTRIN_INC_H}> int main() { __m128i a = _mm_set1_epi8(1); (void)_mm_shuffle_epi8(a, a); }" HAVE_SSE42) # now look for AVX2 set(CMAKE_REQUIRED_FLAGS "-mavx2") CHECK_C_SOURCE_COMPILES("#include <${INTRIN_INC_H}> #if !defined(__AVX2__) #error no avx2 #endif int main(){ __m256i z = _mm256_setzero_si256(); (void)_mm256_xor_si256(z, z); }" HAVE_AVX2) # and now for AVX512 set(CMAKE_REQUIRED_FLAGS "${SKYLAKE_FLAG}") CHECK_C_SOURCE_COMPILES("#include <${INTRIN_INC_H}> #if !defined(__AVX512BW__) #error no avx512bw #endif int main(){ __m512i z = _mm512_setzero_si512(); (void)_mm512_abs_epi8(z); }" HAVE_AVX512) # and now for AVX512VBMI set(CMAKE_REQUIRED_FLAGS "${ICELAKE_FLAG}") CHECK_C_SOURCE_COMPILES("#include <${INTRIN_INC_H}> #if !defined(__AVX512VBMI__) #error no avx512vbmi #endif int main(){ __m512i a = _mm512_set1_epi8(0xFF); __m512i idx = _mm512_set_epi64(3ULL, 2ULL, 1ULL, 0ULL, 7ULL, 6ULL, 5ULL, 4ULL); (void)_mm512_permutexvar_epi8(idx, a); }" HAVE_AVX512VBMI) if (FAT_RUNTIME) if (NOT HAVE_SSE42) message(FATAL_ERROR "SSE4.2 support required to build fat runtime") endif () if (BUILD_AVX2 AND NOT HAVE_AVX2) message(FATAL_ERROR "AVX2 support required to build fat runtime") endif () if (BUILD_AVX512 AND NOT HAVE_AVX512) message(FATAL_ERROR "AVX512 support requested but not supported") endif () if (BUILD_AVX512VBMI AND NOT HAVE_AVX512VBMI) message(FATAL_ERROR "AVX512VBMI support requested but not supported") endif () else (NOT FAT_RUNTIME) if (NOT BUILD_AVX2) message(STATUS "Building without AVX2 support") endif () if (NOT HAVE_AVX512) message(STATUS "Building without AVX512 support") endif () if (NOT HAVE_AVX512VBMI) message(STATUS "Building without AVX512VBMI support") endif () if (NOT HAVE_SSE42) message(FATAL_ERROR "A minimum of SSE4.2 compiler support is required") endif () endif () vectorscan-5.4.11/cmake/compiler.cmake000066400000000000000000000013421452711272000176460ustar00rootroot00000000000000# determine compiler if (CMAKE_CXX_COMPILER_ID MATCHES "Clang") set(CMAKE_COMPILER_IS_CLANG TRUE) set(CLANGCXX_MINVER "5") message(STATUS "clang++ version ${CMAKE_CXX_COMPILER_VERSION}") if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS CLANGCXX_MINVER) message(FATAL_ERROR "A minimum of clang++ ${CLANGCXX_MINVER} is required for C++17 support") endif() endif() # compiler version checks TODO: test more compilers if (CMAKE_COMPILER_IS_GNUCXX) set(GNUCXX_MINVER "9") message(STATUS "g++ version ${CMAKE_CXX_COMPILER_VERSION}") if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS GNUCXX_MINVER) message(FATAL_ERROR "A minimum of g++ ${GNUCXX_MINVER} is required for C++17 support") endif() endif() vectorscan-5.4.11/cmake/config.h.in000066400000000000000000000066561452711272000170720ustar00rootroot00000000000000/* used by cmake */ #ifndef CONFIG_H_ #define CONFIG_H_ /* "Define if the build is 32 bit" */ #cmakedefine ARCH_32_BIT /* "Define if the build is 64 bit" */ #cmakedefine ARCH_64_BIT /* "Define if building for IA32" */ #cmakedefine ARCH_IA32 /* "Define if building for EM64T" */ #cmakedefine ARCH_X86_64 /* "Define if building for ARM32" */ #cmakedefine ARCH_ARM32 /* "Define if building for AARCH64" */ #cmakedefine ARCH_AARCH64 /* "Define if building for PPC64EL" */ #cmakedefine ARCH_PPC64EL /* "Define if cross compiling for AARCH64" */ #cmakedefine CROSS_COMPILE_AARCH64 /* Define if building SVE for AARCH64. */ #cmakedefine BUILD_SVE /* Define if building SVE2 for AARCH64. */ #cmakedefine BUILD_SVE2 /* Define if building SVE2+BITPERM for AARCH64. */ #cmakedefine BUILD_SVE2_BITPERM /* internal build, switch on dump support. */ #cmakedefine DUMP_SUPPORT /* Define if building "fat" runtime. */ #cmakedefine FAT_RUNTIME /* Define if building AVX2 in the fat runtime. */ #cmakedefine BUILD_AVX2 /* Define if building AVX-512 in the fat runtime. */ #cmakedefine BUILD_AVX512 /* Define if building AVX512VBMI in the fat runtime. */ #cmakedefine BUILD_AVX512VBMI /* Define to 1 if `backtrace' works. */ #cmakedefine HAVE_BACKTRACE /* C compiler has __builtin_assume_aligned */ #cmakedefine HAVE_CC_BUILTIN_ASSUME_ALIGNED /* C++ compiler has __builtin_assume_aligned */ #cmakedefine HAVE_CXX_BUILTIN_ASSUME_ALIGNED /* C++ compiler has x86intrin.h */ #cmakedefine HAVE_CXX_X86INTRIN_H /* C compiler has x86intrin.h */ #cmakedefine HAVE_C_X86INTRIN_H /* C++ compiler has intrin.h */ #cmakedefine HAVE_CXX_INTRIN_H /* C compiler has intrin.h */ #cmakedefine HAVE_C_INTRIN_H /* C compiler has arm_neon.h */ #cmakedefine HAVE_C_ARM_NEON_H /* C compiler has arm_sve.h */ #cmakedefine HAVE_C_ARM_SVE_H /* C compiler has arm_neon.h */ #cmakedefine HAVE_C_PPC64EL_ALTIVEC_H /* Define to 1 if you have the declaration of `pthread_setaffinity_np', and to 0 if you don't. */ #cmakedefine HAVE_DECL_PTHREAD_SETAFFINITY_NP #cmakedefine HAVE_PTHREAD_NP_H /* Define to 1 if you have the `malloc_info' function. */ #cmakedefine HAVE_MALLOC_INFO /* Define to 1 if you have the `memmem' function. */ #cmakedefine HAVE_MEMMEM /* Define to 1 if you have a working `mmap' system call. */ #cmakedefine HAVE_MMAP /* Define to 1 if `posix_memalign' works. */ #cmakedefine HAVE_POSIX_MEMALIGN /* Define to 1 if you have the `setrlimit' function. */ #cmakedefine HAVE_SETRLIMIT /* Define to 1 if you have the `shmget' function. */ #cmakedefine HAVE_SHMGET /* Define to 1 if you have the `sigaction' function. */ #cmakedefine HAVE_SIGACTION /* Define to 1 if you have the `sigaltstack' function. */ #cmakedefine HAVE_SIGALTSTACK /* Define if the sqlite3_open_v2 call is available */ #cmakedefine HAVE_SQLITE3_OPEN_V2 /* Define to 1 if you have the header file. */ #cmakedefine HAVE_UNISTD_H /* Define to 1 if you have the `_aligned_malloc' function. */ #cmakedefine HAVE__ALIGNED_MALLOC /* Define if compiler has __builtin_constant_p */ #cmakedefine HAVE__BUILTIN_CONSTANT_P /* Optimize, inline critical functions */ #cmakedefine HS_OPTIMIZE #cmakedefine HS_VERSION #cmakedefine HS_MAJOR_VERSION #cmakedefine HS_MINOR_VERSION #cmakedefine HS_PATCH_VERSION #cmakedefine BUILD_DATE /* define if this is a release build. */ #cmakedefine RELEASE_BUILD /* define if reverse_graph requires patch for boost 1.62.0 */ #cmakedefine BOOST_REVGRAPH_PATCH #endif /* CONFIG_H_ */ vectorscan-5.4.11/cmake/formatdate.py000077500000000000000000000004751452711272000175430ustar00rootroot00000000000000#!/usr/bin/env python import os import sys import datetime def usage(): print("Usage:", os.path.basename(sys.argv[0]), "") if len(sys.argv) != 2: usage() sys.exit(1) ts = sys.argv[1] build_date = datetime.datetime.utcfromtimestamp(int(ts)) print(build_date.strftime("%Y-%m-%d")) vectorscan-5.4.11/cmake/keep.syms.in000066400000000000000000000002441452711272000173000ustar00rootroot00000000000000# names to exclude hs_misc_alloc hs_misc_free hs_free_scratch hs_stream_alloc hs_stream_free hs_scratch_alloc hs_scratch_free hs_database_alloc hs_database_free ^_ vectorscan-5.4.11/cmake/osdetection.cmake000066400000000000000000000030311452711272000203510ustar00rootroot00000000000000if(CMAKE_SYSTEM_NAME MATCHES "Linux") set(LINUX TRUE) endif(CMAKE_SYSTEM_NAME MATCHES "Linux") if(CMAKE_SYSTEM_NAME MATCHES "FreeBSD") set(FREEBSD true) endif(CMAKE_SYSTEM_NAME MATCHES "FreeBSD") option(FAT_RUNTIME "Build a library that supports multiple microarchitectures" OFF) message("Checking Fat Runtime Requirements...") if (FAT_RUNTIME AND NOT LINUX) message(FATAL_ERROR "Fat runtime is only supported on Linux OS") endif() if (USE_CPU_NATIVE AND FAT_RUNTIME) message(FATAL_ERROR "Fat runtime is not compatible with Native CPU detection") endif() if (FAT_RUNTIME AND LINUX) if (NOT (ARCH_IA32 OR ARCH_X86_64 OR ARCH_AARCH64)) message(FATAL_ERROR "Fat runtime is only supported on Intel and Aarch64 architectures") else() message(STATUS "Building Fat runtime for multiple microarchitectures") message(STATUS "generator is ${CMAKE_GENERATOR}") if (NOT (CMAKE_GENERATOR MATCHES "Unix Makefiles" OR (CMAKE_VERSION VERSION_GREATER "3.0" AND CMAKE_GENERATOR MATCHES "Ninja"))) message (FATAL_ERROR "Building the fat runtime requires the Unix Makefiles generator, or Ninja with CMake v3.0 or higher") else() include (${CMAKE_MODULE_PATH}/attrib.cmake) if (NOT HAS_C_ATTR_IFUNC) message(FATAL_ERROR "Compiler does not support ifunc attribute, cannot build fat runtime") endif() endif() endif() if (NOT RELEASE_BUILD) message(FATAL_ERROR "Fat runtime is only built on Release builds") endif() endif () vectorscan-5.4.11/cmake/pcre.cmake000066400000000000000000000046041452711272000167710ustar00rootroot00000000000000# first look in pcre-$version or pcre subdirs if (PCRE_SOURCE) # either provided on cmdline or we've seen it already set (PCRE_BUILD_SOURCE TRUE) elseif (EXISTS ${PROJECT_SOURCE_DIR}/pcre-${PCRE_REQUIRED_VERSION}) set (PCRE_SOURCE ${PROJECT_SOURCE_DIR}/pcre-${PCRE_REQUIRED_VERSION}) set (PCRE_BUILD_SOURCE TRUE) elseif (EXISTS ${PROJECT_SOURCE_DIR}/pcre) set (PCRE_SOURCE ${PROJECT_SOURCE_DIR}/pcre) set (PCRE_BUILD_SOURCE TRUE) endif() if (PCRE_BUILD_SOURCE) if (NOT IS_ABSOLUTE ${PCRE_SOURCE}) set(PCRE_SOURCE "${CMAKE_BINARY_DIR}/${PCRE_SOURCE}") endif () set (saved_INCLUDES "${CMAKE_REQUIRED_INCLUDES}") set (CMAKE_REQUIRED_INCLUDES "${CMAKE_REQUIRED_INCLUDES} ${PCRE_SOURCE}") if (PCRE_CHECKED) set(PCRE_INCLUDE_DIRS ${PCRE_SOURCE} ${PROJECT_BINARY_DIR}/pcre) set(PCRE_LDFLAGS -L"${LIBDIR}" -lpcre) # already processed this file and set up pcre building return() endif () # first, check version number CHECK_C_SOURCE_COMPILES("#include #if PCRE_MAJOR != ${PCRE_REQUIRED_MAJOR_VERSION} || PCRE_MINOR < ${PCRE_REQUIRED_MINOR_VERSION} #error Incorrect pcre version #endif main() {}" CORRECT_PCRE_VERSION) set (CMAKE_REQUIRED_INCLUDES "${saved_INCLUDES}") if (NOT CORRECT_PCRE_VERSION) unset(CORRECT_PCRE_VERSION CACHE) message(STATUS "Incorrect version of pcre - version ${PCRE_REQUIRED_VERSION} or above is required") return () else() message(STATUS "PCRE version ${PCRE_REQUIRED_VERSION} or above - building from source.") endif() # PCRE compile options option(PCRE_BUILD_PCRECPP OFF) option(PCRE_BUILD_PCREGREP OFF) option(PCRE_SHOW_REPORT OFF) set(PCRE_SUPPORT_UNICODE_PROPERTIES ON CACHE BOOL "Build pcre with unicode") add_subdirectory(${PCRE_SOURCE} ${PROJECT_BINARY_DIR}/pcre EXCLUDE_FROM_ALL) set(PCRE_INCLUDE_DIRS ${PCRE_SOURCE} ${PROJECT_BINARY_DIR}/pcre) set(PCRE_LDFLAGS -L"${LIBDIR}" -lpcre) else () # pkgconf should save us find_package(PkgConfig) pkg_check_modules(PCRE libpcre>=${PCRE_REQUIRED_VERSION}) if (PCRE_FOUND) set(CORRECT_PCRE_VERSION TRUE) message(STATUS "PCRE version ${PCRE_REQUIRED_VERSION} or above") else () message(STATUS "PCRE version ${PCRE_REQUIRED_VERSION} or above not found") return () endif () endif (PCRE_BUILD_SOURCE) vectorscan-5.4.11/cmake/platform.cmake000066400000000000000000000016041452711272000176610ustar00rootroot00000000000000# determine the target arch # really only interested in the preprocessor here CHECK_C_SOURCE_COMPILES("#if !(defined(__x86_64__) || defined(_M_X64))\n#error not 64bit\n#endif\nint main(void) { return 0; }" ARCH_X86_64) CHECK_C_SOURCE_COMPILES("#if !(defined(__i386__) || defined(_M_IX86))\n#error not 32bit\n#endif\nint main(void) { return 0; }" ARCH_IA32) CHECK_C_SOURCE_COMPILES("#if !defined(__ARM_ARCH_ISA_A64)\n#error not 64bit\n#endif\nint main(void) { return 0; }" ARCH_AARCH64) CHECK_C_SOURCE_COMPILES("#if !defined(__ARM_ARCH_ISA_ARM)\n#error not 32bit\n#endif\nint main(void) { return 0; }" ARCH_ARM32) CHECK_C_SOURCE_COMPILES("#if !defined(__PPC64__) && !(defined(__LITTLE_ENDIAN__) && defined(__VSX__))\n#error not ppc64el\n#endif\nint main(void) { return 0; }" ARCH_PPC64EL) if (ARCH_X86_64 OR ARCH_AARCH64 OR ARCH_PPC64EL) set(ARCH_64_BIT TRUE) else() set(ARCH_32_BIT TRUE) endif() vectorscan-5.4.11/cmake/ragel.cmake000066400000000000000000000014101452711272000171220ustar00rootroot00000000000000# function for doing all the dirty work in turning a .rl into C++ function(ragelmaker src_rl) get_filename_component(src_dir ${src_rl} PATH) # old cmake needs PATH get_filename_component(src_file ${src_rl} NAME_WE) set(rl_out ${CMAKE_CURRENT_BINARY_DIR}/${src_dir}/${src_file}.cpp) add_custom_command( OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${src_dir}/${src_file}.cpp COMMAND ${CMAKE_COMMAND} -E make_directory ${CMAKE_CURRENT_BINARY_DIR}/${src_dir} COMMAND ${RAGEL} ${CMAKE_CURRENT_SOURCE_DIR}/${src_rl} -o ${rl_out} -G0 DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/${src_rl} ) add_custom_target(ragel_${src_file} DEPENDS ${rl_out}) set_source_files_properties(${rl_out} PROPERTIES GENERATED TRUE) endfunction(ragelmaker) vectorscan-5.4.11/cmake/sanitize.cmake000066400000000000000000000033521452711272000176650ustar00rootroot00000000000000# Possible values: # - `address` (ASan) # - `memory` (MSan) # - `undefined` (UBSan) # - "" (no sanitizing) option (SANITIZE "Enable one of the code sanitizers" "") set (SAN_FLAGS "${SAN_FLAGS} -g -fno-omit-frame-pointer -DSANITIZER") if (SANITIZE) if (SANITIZE STREQUAL "address") set (ASAN_FLAGS "-fsanitize=address -fsanitize-address-use-after-scope") set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${SAN_FLAGS} ${ASAN_FLAGS}") set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${SAN_FLAGS} ${ASAN_FLAGS}") if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU") set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${ASAN_FLAGS}") endif() elseif (SANITIZE STREQUAL "memory") if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU") set (FATAL_ERROR "GCC does not have memory sanitizer") endif() # MemorySanitizer flags are set according to the official documentation: # https://clang.llvm.org/docs/MemorySanitizer.html#usage set (MSAN_FLAGS "-fsanitize=memory -fsanitize-memory-use-after-dtor -fsanitize-memory-track-origins -fno-optimize-sibling-calls") set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${SAN_FLAGS} ${MSAN_FLAGS}") set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${SAN_FLAGS} ${MSAN_FLAGS}") elseif (SANITIZE STREQUAL "undefined") set (UBSAN_FLAGS "-fsanitize=undefined") set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${SAN_FLAGS} ${UBSAN_FLAGS}") set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${SAN_FLAGS} ${UBSAN_FLAGS}") if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU") set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -fsanitize=undefined") endif() else () message (FATAL_ERROR "Unknown sanitizer type: ${SANITIZE}") endif () endif() vectorscan-5.4.11/cmake/sqlite3.cmake000066400000000000000000000024061452711272000174220ustar00rootroot00000000000000# # a lot of noise to find sqlite # option(SQLITE_PREFER_STATIC "Build sqlite3 statically instead of using an installed lib" OFF) if(NOT SQLITE_PREFER_STATIC) find_package(PkgConfig QUIET) # first check for sqlite on the system pkg_check_modules(SQLITE3 sqlite3) endif() # now do version checks if (SQLITE3_FOUND) list(INSERT CMAKE_REQUIRED_INCLUDES 0 "${SQLITE3_INCLUDE_DIRS}") if (SQLITE_VERSION LESS "3.8.10") message(FATAL_ERROR "sqlite3 is broken from 3.8.7 to 3.8.10 - please find a working version") endif() endif() if (NOT SQLITE3_BUILD_SOURCE) set(_SAVED_FLAGS ${CMAKE_REQUIRED_FLAGS}) list(INSERT CMAKE_REQUIRED_LIBRARIES 0 ${SQLITE3_LDFLAGS}) CHECK_SYMBOL_EXISTS(sqlite3_open_v2 sqlite3.h HAVE_SQLITE3_OPEN_V2) list(REMOVE_ITEM CMAKE_REQUIRED_INCLUDES "${SQLITE3_INCLUDE_DIRS}") list(REMOVE_ITEM CMAKE_REQUIRED_LIBRARIES ${SQLITE3_LDFLAGS}) else() if (NOT TARGET sqlite3_static) # build sqlite as a static lib to compile into our test programs add_library(sqlite3_static STATIC "${PROJECT_SOURCE_DIR}/sqlite3/sqlite3.c") set_target_properties(sqlite3_static PROPERTIES COMPILE_FLAGS "-Wno-error -Wno-extra -Wno-unused -Wno-cast-qual -DSQLITE_OMIT_LOAD_EXTENSION") endif() endif() # that's enough about sqlite vectorscan-5.4.11/doc/000077500000000000000000000000001452711272000145175ustar00rootroot00000000000000vectorscan-5.4.11/doc/dev-reference/000077500000000000000000000000001452711272000172315ustar00rootroot00000000000000vectorscan-5.4.11/doc/dev-reference/CMakeLists.txt000066400000000000000000000022721452711272000217740ustar00rootroot00000000000000find_program(DOXYGEN doxygen) if (DOXYGEN STREQUAL DOXYGEN-NOTFOUND) message(STATUS "Doxygen not found, unable to generate API reference") else() configure_file("${CMAKE_CURRENT_SOURCE_DIR}/hyperscan.doxyfile.in" "${CMAKE_CURRENT_BINARY_DIR}/hyperscan.doxyfile" @ONLY) add_custom_target(dev-reference-doxygen ${DOXYGEN} ${CMAKE_CURRENT_BINARY_DIR}/hyperscan.doxyfile COMMENT "Building doxygen XML for API reference") endif() find_program(SPHINX_BUILD sphinx-build) if (SPHINX_BUILD STREQUAL SPHINX_BUILD-NOTFOUND) message(STATUS "Sphinx not found, unable to generate developer reference") else() set(SPHINX_BUILD_DIR "${CMAKE_CURRENT_BINARY_DIR}/_build") set(SPHINX_CACHE_DIR "${CMAKE_CURRENT_BINARY_DIR}/_doctrees") set(SPHINX_HTML_DIR "${CMAKE_CURRENT_BINARY_DIR}/html") configure_file("${CMAKE_CURRENT_SOURCE_DIR}/conf.py.in" "${CMAKE_CURRENT_BINARY_DIR}/conf.py" @ONLY) add_custom_target(dev-reference ${SPHINX_BUILD} -b html -c "${CMAKE_CURRENT_BINARY_DIR}" -d "${SPHINX_CACHE_DIR}" "${CMAKE_CURRENT_SOURCE_DIR}" "${SPHINX_HTML_DIR}" DEPENDS dev-reference-doxygen COMMENT "Building HTML dev reference with Sphinx") endif() vectorscan-5.4.11/doc/dev-reference/_static/000077500000000000000000000000001452711272000206575ustar00rootroot00000000000000vectorscan-5.4.11/doc/dev-reference/_static/hyperscan.css000066400000000000000000000004571452711272000233730ustar00rootroot00000000000000/* Differentiate the way we display regex fragments. */ .regexp { color: darkred !important; } /* Avoid (the alabaster theme default) Goudy Old Style, which renders in * italics on some Mac/Safari systems. */ body { font-family: 'minion pro', 'bell mt', Georgia, 'Hiragino Mincho Pro', serif; } vectorscan-5.4.11/doc/dev-reference/api_constants.rst000066400000000000000000000014431452711272000226320ustar00rootroot00000000000000.. _api_constants: ######################## API Reference: Constants ######################## *********** Error Codes *********** .. doxygengroup:: HS_ERROR :content-only: :no-link: ***************** hs_expr_ext flags ***************** .. doxygengroup:: HS_EXT_FLAG :content-only: :no-link: ************* Pattern flags ************* .. doxygengroup:: HS_PATTERN_FLAG :content-only: :no-link: ************************* CPU feature support flags ************************* .. doxygengroup:: HS_CPU_FEATURES_FLAG :content-only: :no-link: **************** CPU tuning flags **************** .. doxygengroup:: HS_TUNE_FLAG :content-only: :no-link: ****************** Compile mode flags ****************** .. doxygengroup:: HS_MODE_FLAG :content-only: :no-link: vectorscan-5.4.11/doc/dev-reference/api_files.rst000066400000000000000000000006171452711272000217220ustar00rootroot00000000000000.. _api_files: #################### API Reference: Files #################### ********** File: hs.h ********** .. doxygenfile:: hs.h ***************** File: hs_common.h ***************** .. doxygenfile:: hs_common.h ****************** File: hs_compile.h ****************** .. doxygenfile:: hs_compile.h ****************** File: hs_runtime.h ****************** .. doxygenfile:: hs_runtime.h vectorscan-5.4.11/doc/dev-reference/chimera.rst000066400000000000000000000246021452711272000213770ustar00rootroot00000000000000.. _chimera: ####### Chimera ####### This section describes Chimera library. ************ Introduction ************ Chimera is a software regular expression matching engine that is a hybrid of Hyperscan and PCRE. The design goals of Chimera are to fully support PCRE syntax as well as to take advantage of the high performance nature of Hyperscan. Chimera inherits the design guideline of Hyperscan with C APIs for compilation and scanning. The Chimera API itself is composed of two major components: =========== Compilation =========== These functions take a group of regular expressions, along with identifiers and option flags, and compile them into an immutable database that can be used by the Chimera scanning API. This compilation process performs considerable analysis and optimization work in order to build a database that will match the given expressions efficiently. See :ref:`chcompile` for more details ======== Scanning ======== Once a Chimera database has been created, it can be used to scan data in memory. Chimera only supports block mode in which we scan a single contiguous block in memory. Matches are delivered to the application via a user-supplied callback function that is called synchronously for each match. For a given database, Chimera provides several guarantees: * No memory allocations occur at runtime with the exception of scratch space allocation, it should be done ahead of time for performance-critical applications: - **Scratch space**: temporary memory used for internal data at scan time. Structures in scratch space do not persist beyond the end of a single scan call. * The size of the scratch space required for a given database is fixed and determined at database compile time. This means that the memory requirement of the application are known ahead of time, and the scratch space can be pre-allocated if required for performance reasons. * Any pattern that has successfully been compiled by the Chimera compiler can be scanned against any input. There could be internal resource limits or other limitations caused by PCRE at runtime that could cause a scan call to return an error. .. note:: Chimera is designed to have the same matching behavior as PCRE, including greedy/ungreedy, capturing, etc. Chimera reports both **start offset** and **end offset** for each match like PCRE. Different from the fashion of reporting all matches in Hyperscan, Chimera only reports non-overlapping matches. For example, the pattern :regexp:`/foofoo/` will match ``foofoofoofoo`` at offsets (0, 6) and (6, 12). .. note:: Since Chimera is a hybrid of Hyperscan and PCRE in order to support full PCRE syntax, there will be extra performance overhead compared to Hyperscan-only solution. Please always use Hyperscan for better performance unless you must need full PCRE syntax support. See :ref:`chruntime` for more details ************ Requirements ************ The PCRE library (http://pcre.org/) version 8.41 is required for Chimera. .. note:: Since Chimera needs to reference PCRE internal function, please place PCRE source directory under Hyperscan root directory in order to build Chimera. Beside this, both hardware and software requirements of Chimera are the same to Hyperscan. See :ref:`hardware` and :ref:`software` for more details. .. note:: Building Hyperscan will automatically generate Chimera library. Currently only static library is supported for Chimera, so please use static build type when configure CMake build options. .. _chcompile: ****************** Compiling Patterns ****************** =================== Building a Database =================== The Chimera compiler API accepts regular expressions and converts them into a compiled pattern database that can then be used to scan data. The API provides two functions that compile regular expressions into databases: #. :c:func:`ch_compile`: compiles a single expression into a pattern database. #. :c:func:`ch_compile_multi`: compiles an array of expressions into a pattern database. All of the supplied patterns will be scanned for concurrently at scan time, with user-supplied identifiers returned when they match. #. :c:func:`ch_compile_ext_multi`: compiles an array of expressions as above, but allows PCRE match limits to be specified for each expression. Compilation allows the Chimera library to analyze the given pattern(s) and pre-determine how to scan for these patterns in an optimized fashion using Hyperscan and PCRE. =============== Pattern Support =============== Chimera fully supports the pattern syntax used by the PCRE library ("libpcre"), described at .The version of PCRE used to validate Chimera's interpretation of this syntax is 8.41. ========= Semantics ========= Chimera supports the exact same semantics of PCRE library. Moreover, it supports multiple simultaneous pattern matching like Hyperscan and the multiple matches will be reported in order by end offset. .. _chruntime: ********************* Scanning for Patterns ********************* Chimera provides scan function with ``ch_scan``. ================ Handling Matches ================ ``ch_scan`` will call a user-supplied callback function when a match is found. This function has the following signature: .. doxygentypedef:: ch_match_event_handler :outline: :no-link: The *id* argument will be set to the identifier for the matching expression provided at compile time, and the *from* argument will be set to the start-offset of the match the *to* argument will be set to the end-offset of the match. The *captured* stores offsets of entire pattern match as well as captured subexpressions. The *size* will be set to the number of valid entries in the *captured*. The match callback function has the capability to continue or halt scanning by returning different values. See :c:type:`ch_match_event_handler` for more information. ======================= Handling Runtime Errors ======================= ``ch_scan`` will call a user-supplied callback function when a runtime error occurs in libpcre. This function has the following signature: .. doxygentypedef:: ch_error_event_handler :outline: :no-link: The *id* argument will be set to the identifier for the matching expression provided at compile time. The match callback function has the capability to either halt scanning or continue scanning for the next pattern. See :c:type:`ch_error_event_handler` for more information. ============= Scratch Space ============= While scanning data, Chimera needs a small amount of temporary memory to store on-the-fly internal data. This amount is unfortunately too large to fit on the stack, particularly for embedded applications, and allocating memory dynamically is too expensive, so a pre-allocated "scratch" space must be provided to the scanning functions. The function :c:func:`ch_alloc_scratch` allocates a large enough region of scratch space to support a given database. If the application uses multiple databases, only a single scratch region is necessary: in this case, calling :c:func:`ch_alloc_scratch` on each database (with the same ``scratch`` pointer) will ensure that the scratch space is large enough to support scanning against any of the given databases. While the Chimera library is re-entrant, the use of scratch spaces is not. For example, if by design it is deemed necessary to run recursive or nested scanning (say, from the match callback function), then an additional scratch space is required for that context. In the absence of recursive scanning, only one such space is required per thread and can (and indeed should) be allocated before data scanning is to commence. In a scenario where a set of expressions are compiled by a single "main" thread and data will be scanned by multiple "worker" threads, the convenience function :c:func:`ch_clone_scratch` allows multiple copies of an existing scratch space to be made for each thread (rather than forcing the caller to pass all the compiled databases through :c:func:`ch_alloc_scratch` multiple times). For example: .. code-block:: c ch_error_t err; ch_scratch_t *scratch_prototype = NULL; err = ch_alloc_scratch(db, &scratch_prototype); if (err != CH_SUCCESS) { printf("ch_alloc_scratch failed!"); exit(1); } ch_scratch_t *scratch_thread1 = NULL; ch_scratch_t *scratch_thread2 = NULL; err = ch_clone_scratch(scratch_prototype, &scratch_thread1); if (err != CH_SUCCESS) { printf("ch_clone_scratch failed!"); exit(1); } err = ch_clone_scratch(scratch_prototype, &scratch_thread2); if (err != CH_SUCCESS) { printf("ch_clone_scratch failed!"); exit(1); } ch_free_scratch(scratch_prototype); /* Now two threads can both scan against database db, each with its own scratch space. */ ================= Custom Allocators ================= By default, structures used by Chimera at runtime (scratch space, etc) are allocated with the default system allocators, usually ``malloc()`` and ``free()``. The Chimera API provides a facility for changing this behaviour to support applications that use custom memory allocators. These functions are: - :c:func:`ch_set_database_allocator`, which sets the allocate and free functions used for compiled pattern databases. - :c:func:`ch_set_scratch_allocator`, which sets the allocate and free functions used for scratch space. - :c:func:`ch_set_misc_allocator`, which sets the allocate and free functions used for miscellaneous data, such as compile error structures and informational strings. The :c:func:`ch_set_allocator` function can be used to set all of the custom allocators to the same allocate/free pair. ************************ API Reference: Constants ************************ =========== Error Codes =========== .. doxygengroup:: CH_ERROR :content-only: :no-link: ============= Pattern flags ============= .. doxygengroup:: CH_PATTERN_FLAG :content-only: :no-link: ================== Compile mode flags ================== .. doxygengroup:: CH_MODE_FLAG :content-only: :no-link: ******************** API Reference: Files ******************** ========== File: ch.h ========== .. doxygenfile:: ch.h ================= File: ch_common.h ================= .. doxygenfile:: ch_common.h ================== File: ch_compile.h ================== .. doxygenfile:: ch_compile.h ================== File: ch_runtime.h ================== .. doxygenfile:: ch_runtime.h vectorscan-5.4.11/doc/dev-reference/compilation.rst000066400000000000000000000720301452711272000223030ustar00rootroot00000000000000.. include:: .. _compilation: ################## Compiling Patterns ################## ******************* Building a Database ******************* The Hyperscan compiler API accepts regular expressions and converts them into a compiled pattern database that can then be used to scan data. The API provides three functions that compile regular expressions into databases: #. :c:func:`hs_compile`: compiles a single expression into a pattern database. #. :c:func:`hs_compile_multi`: compiles an array of expressions into a pattern database. All of the supplied patterns will be scanned for concurrently at scan time, with user-supplied identifiers returned when they match. #. :c:func:`hs_compile_ext_multi`: compiles an array of expressions as above, but allows :ref:`extparam` to be specified for each expression. Compilation allows the Hyperscan library to analyze the given pattern(s) and pre-determine how to scan for these patterns in an optimized fashion that would be far too expensive to compute at run-time. When compiling expressions, a decision needs to be made whether the resulting compiled patterns are to be used in a streaming, block or vectored mode: - **Streaming mode**: the target data to be scanned is a continuous stream, not all of which is available at once; blocks of data are scanned in sequence and matches may span multiple blocks in a stream. In streaming mode, each stream requires a block of memory to store its state between scan calls. - **Block mode**: the target data is a discrete, contiguous block which can be scanned in one call and does not require state to be retained. - **Vectored mode**: the target data consists of a list of non-contiguous blocks that are available all at once. As for block mode, no retention of state is required. To compile patterns to be used in streaming mode, the ``mode`` parameter of :c:func:`hs_compile` must be set to :c:member:`HS_MODE_STREAM`; similarly, block mode requires the use of :c:member:`HS_MODE_BLOCK` and vectored mode requires the use of :c:member:`HS_MODE_VECTORED`. A pattern database compiled for one mode (streaming, block or vectored) can only be used in that mode. The version of Hyperscan used to produce a compiled pattern database must match the version of Hyperscan used to scan with it. Hyperscan provides support for targeting a database at a particular CPU platform; see :ref:`instr_specialization` for details. ===================== Compile Pure Literals ===================== Pure literal is a special case of regular expression. A character sequence is regarded as a pure literal if and only if each character is read and interpreted independently. No syntax association happens between any adjacent characters. For example, given an expression written as :regexp:`/bc?/`. We could say it is a regular expression, with the meaning that character ``b`` followed by nothing or by one character ``c``. On the other view, we could also say it is a pure literal expression, with the meaning that this is a character sequence of 3-byte length, containing characters ``b``, ``c`` and ``?``. In regular case, the question mark character ``?`` has a particular syntax role called 0-1 quantifier, which has a syntax association with the character ahead of it. Similar characters exist in regular grammar like ``[``, ``]``, ``(``, ``)``, ``{``, ``}``, ``-``, ``*``, ``+``, ``\``, ``|``, ``/``, ``:``, ``^``, ``.``, ``$``. While in pure literal case, all these meta characters lost extra meanings expect for that they are just common ASCII codes. Hyperscan is initially designed to process common regular expressions. It is hence embedded with a complex parser to do comprehensive regular grammar interpretation. Particularly, the identification of above meta characters is the basic step for the interpretation of far more complex regular grammars. However in real cases, patterns may not always be regular expressions. They could just be pure literals. Problem will come if the pure literals contain regular meta characters. Supposing fed directly into traditional Hyperscan compile API, all these meta characters will be interpreted in predefined ways, which is unnecessary and the result is totally out of expectation. To avoid such misunderstanding by traditional API, users have to preprocess these literal patterns by converting the meta characters into some other formats: either by adding a backslash ``\`` before certain meta characters, or by converting all the characters into a hexadecimal representation. In ``v5.2.0``, Hyperscan introduces 2 new compile APIs for pure literal patterns: #. :c:func:`hs_compile_lit`: compiles a single pure literal into a pattern database. #. :c:func:`hs_compile_lit_multi`: compiles an array of pure literals into a pattern database. All of the supplied patterns will be scanned for concurrently at scan time, with user-supplied identifiers returned when they match. These 2 APIs are designed for use cases where all patterns contained in the target rule set are pure literals. Users can pass the initial pure literal content directly into these APIs without worrying about writing regular meta characters in their patterns. No preprocessing work is needed any more. For new APIs, the ``length`` of each literal pattern is a newly added parameter. Hyperscan needs to locate the end position of the input expression via clearly knowing each literal's length, not by simply identifying character ``\0`` of a string. Supported flags: :c:member:`HS_FLAG_CASELESS`, :c:member:`HS_FLAG_SINGLEMATCH`, :c:member:`HS_FLAG_SOM_LEFTMOST`. .. note:: We don't support literal compilation API with :ref:`extparam`. And for runtime implementation, traditional runtime APIs can still be used to match pure literal patterns. .. note:: If the target rule set contains at least one regular expression, please use traditional compile APIs :c:func:`hs_compile`, :c:func:`hs_compile_multi` and :c:func:`hs_compile_ext_multi`. The new literal APIs introduced here are designed for rule sets containing only pure literal expressions. *************** Pattern Support *************** Hyperscan supports the pattern syntax used by the PCRE library ("libpcre"), described at . However, not all constructs available in libpcre are supported. The use of unsupported constructs will result in compilation errors. The version of PCRE used to validate Hyperscan's interpretation of this syntax is 8.41 or above. ==================== Supported Constructs ==================== The following regex constructs are supported by Hyperscan: * Literal characters and strings, with all libpcre quoting and character escapes. * Character classes such as :regexp:`.` (dot), :regexp:`[abc]`, and :regexp:`[^abc]`, as well as the predefined character classes :regexp:`\\s`, :regexp:`\\d`, :regexp:`\\w`, :regexp:`\\v`, and :regexp:`\\h` and their negated counterparts (:regexp:`\\S`, :regexp:`\\D`, :regexp:`\\W`, :regexp:`\\V`, and :regexp:`\\H`). * The POSIX named character classes :regexp:`[[:xxx:]]` and negated named character classes :regexp:`[[:^xxx:]]`. * Unicode character properties, such as :regexp:`\\p{L}`, :regexp:`\\P{Sc}`, :regexp:`\\p{Greek}`. * Quantifiers: * Quantifiers such as :regexp:`?`, :regexp:`*` and :regexp:`+` are supported when applied to arbitrary supported sub-expressions. * Bounded repeat qualifiers such as :regexp:`{n}`, :regexp:`{m,n}`, :regexp:`{n,}` are supported with limitations. * For arbitrary repeated sub-patterns: *n* and *m* should be either small or infinite, e.g. :regexp:`(a|b){4}`, :regexp:`(ab?c?d){4,10}` or :regexp:`(ab(cd)*){6,}`. * For single-character width sub-patterns such as :regexp:`[^\\a]` or :regexp:`.` or :regexp:`x`, nearly all repeat counts are supported, except where repeats are extremely large (maximum bound greater than 32767). Stream states may be very large for large bounded repeats, e.g. :regexp:`a.{2000}b`. Note: such sub-patterns may be considerably cheaper if at the beginning or end of patterns and especially if the :c:member:`HS_FLAG_SINGLEMATCH` flag is on for that pattern. * Lazy modifiers (:regexp:`?` appended to another quantifier, e.g. :regexp:`\\w+?`) are supported but ignored (as Hyperscan reports all matches). * Parenthesization, including the named and unnamed capturing and non-capturing forms. However, capturing is ignored. * Alternation with the :regexp:`|` symbol, as in :regexp:`foo|bar`. * The anchors :regexp:`^`, :regexp:`$`, :regexp:`\\A`, :regexp:`\\Z` and :regexp:`\\z`. * Option modifiers: These allow behaviour to be switched on (with :regexp:`(?