pax_global_header 0000666 0000000 0000000 00000000064 13014065137 0014512 g ustar 00root root 0000000 0000000 52 comment=8a0088ecfacd4b88ed9a606da415bff1211ee114 cbmc-cbmc-5.6/ 0000775 0000000 0000000 00000000000 13014065137 0013172 5 ustar 00root root 0000000 0000000 cbmc-cbmc-5.6/.gitattributes 0000664 0000000 0000000 00000000175 13014065137 0016070 0 ustar 00root root 0000000 0000000 *.cpp text *.c text *.h text *.y text *.tex text *.shtml text *.html text *.css text *.inc text test.desc text Makefile text cbmc-cbmc-5.6/.gitignore 0000664 0000000 0000000 00000003570 13014065137 0015167 0 ustar 00root root 0000000 0000000 # compilation files *.lo *.od *.d *.o *.obj *.a *.lib src/ansi-c/arm_builtin_headers.inc src/ansi-c/clang_builtin_headers.inc src/ansi-c/cprover_library.inc src/ansi-c/cw_builtin_headers.inc src/ansi-c/gcc_builtin_headers_alpha.inc src/ansi-c/gcc_builtin_headers_arm.inc src/ansi-c/gcc_builtin_headers_generic.inc src/ansi-c/gcc_builtin_headers_ia32-2.inc src/ansi-c/gcc_builtin_headers_ia32.inc src/ansi-c/gcc_builtin_headers_mips.inc src/ansi-c/gcc_builtin_headers_power.inc src/util/irep_ids.h src/util/irep_ids.inc # regression/test files *.out regression/ansi-c/tests.log regression/symex/tests.log regression/cbmc-java/tests.log regression/cbmc/tests.log src/big-int/test-bigint src/big-int/test-bigint.exe # files stored by editors *~ # libs downloaded by make [name]-download libzip/ zlib/ minisat*/ glucose-syrup/ # flex/bison generated files src/ansi-c/ansi_c_lex.yy.cpp src/ansi-c/ansi_c_y.output src/ansi-c/ansi_c_y.tab.cpp src/ansi-c/ansi_c_y.tab.h src/assembler/assembler_lex.yy.cpp src/jsil/jsil_lex.yy.cpp src/jsil/jsil_y.output src/jsil/jsil_y.tab.cpp src/jsil/jsil_y.tab.h src/json/json_lex.yy.cpp src/json/json_y.output src/json/json_y.tab.cpp src/json/json_y.tab.h src/xmllang/xml_lex.yy.cpp src/xmllang/xml_y.output src/xmllang/xml_y.tab.cpp src/xmllang/xml_y.tab.h # binaries src/cbmc/cbmc src/cbmc/cbmc.exe src/cegis/cegis src/cegis/cegis.exe src/goto-analyzer/goto-analyzer src/goto-analyzer/goto-analyzer.exe src/goto-cc/goto-cc src/goto-cc/goto-cc.exe src/goto-cc/goto-cl.exe src/goto-instrument/goto-instrument src/goto-instrument/goto-instrument.exe src/musketeer/musketeer src/musketeer/musketeer.exe src/symex/symex src/symex/symex.exe src/goto-diff/goto-diff src/goto-diff/goto-diff.exe # build tools src/ansi-c/file_converter src/ansi-c/file_converter.exe src/ansi-c/library/converter src/ansi-c/library/converter.exe src/util/irep_ids_convert src/util/irep_ids_convert.exe cbmc-cbmc-5.6/.travis.yml 0000664 0000000 0000000 00000001327 13014065137 0015306 0 ustar 00root root 0000000 0000000 language: cpp os: - linux - osx sudo: required addons: apt: packages: - libwww-perl compiler: - gcc - clang before_install: - if [ "$(expr substr $(uname -s) 1 5)" == "Linux" ] ; then sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test && sudo apt-get -qq update && sudo apt-get -qq install g++-4.8 gcc-4.8 && sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.8 90 && sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.8 90 ; fi install: - chmod a+x regression/failed-tests-printer.pl - cd src && make minisat2-download script: - make CXXFLAGS="-Wall -O2 -g -Werror -Wno-deprecated-register -pedantic -Wno-sign-compare" -j2 && cd ../regression && make test cbmc-cbmc-5.6/CHANGELOG 0000664 0000000 0000000 00000004440 13014065137 0014406 0 ustar 00root root 0000000 0000000 4.7 === Added support for Solaris 11. Bugfixes in partial-order encoding. Added --float-overflow-check 4.6 === Improved floating-point encoding. Improved AIG->CNF encoder. 4.5 === Optimizations to reduce memory consumption. Bugfixes in partial-order encoding. 4.4 === Now checks concurrent programs, with partial-order encoding. Support for SMT-LIB standard floating-point theory. goto-instrument knows k-induction and underapproximating loop accelleration. 4.3 === Floating-point arithmetic now takes the rounding mode into account, which can be changed dynamically. goto-gcc generates hybrid executables on Linux, containing both machine code and the CFG. Limited support for Spec#-style quantifiers added. Pointer-checks no longer use a heavy-weight alias analysis. Limited support for some x86 and ARM inline assembly constructs. 4.2 === goto-cc now passes all command line options to the gcc preprocessor. The MacOS binaries are now signed. The C/C++ front-end has been tested and fixed for the Visual Studio 2012 header files. The man-page has been elaborated. Support for the C99 complex type and gcc's vector type has been added. Various built-ins for x86 MMX and SSE instructions have been added. Support for various C11 features has been added. Support for various built-in primitives has been added, in particular for the __sync_* commands. New feature: --all-claims now reports the status of all claims; the verification continues even if a counterexample is found. This feature uses incremental SAT. The counterexample beautification (--beautify) now uses incremental SAT. Numerous improvements to SMT1 and SMT2 interfaces. Support for further SAT solvers (PRECOSAT, PICOSAT, LINGELING) 4.1 === The support for low-level accesses to dynamically allocated data structures and "integer addressed memory" (usually memory-mapped I/O) has been further improved. Numerous improvements to the SMT back-ends. Specifically, support through the SMT1 path for Boolector and Z3 has been improved; support for MathSAT has been added. In combination with the very latest version of MathSAT, CBMC now also supports an SMT2 flow (use --mathsat --smt2 to activate this). 4.0 === Better support for low-level accesses to dynamically allocated data structures. Numerous front-end improvements. cbmc-cbmc-5.6/CODING_STANDARD 0000664 0000000 0000000 00000001172 13014065137 0015341 0 ustar 00root root 0000000 0000000 Here a few minimalistic coding rules for the cprover source tree: a) 2 spaces indent, no tabs b) no "using namespace std;" c) Avoid new/delete, use containers instead. d) Avoid unnecessary #includes, especially in header files e) No lines wider than 80 chars f) Put matching { } into the same column g) If a method is bigger than a page, break it into parts h) Avoid destructive updates if possible. The irept has constant time copy. Architecture-specific code: a) Avoid if possible. b) Use __LINUX__, __MACH__, and _WIN32 to distinguish the architectures. c) Don't include architecture-specific header files without #ifdef ... cbmc-cbmc-5.6/COMPILING 0000664 0000000 0000000 00000011650 13014065137 0014441 0 ustar 00root root 0000000 0000000 What architecture? ------------------ CPROVER now needs a C++11 compliant compiler and works in the following environments: - Linux - MacOS X - Solaris 11 - FreeBSD 10 or 11 - Cygwin (We recommend the i686-pc-mingw32-g++ cross compiler, version 4.7 or above.) - Microsoft's Visual Studio version 12 (2013), version 14 (2015), or version 15 (older versions won't work) The rest of this document is split up into three parts: compilation on Linux, MacOS, Windows. Please read the section appropriate for your machine. COMPILATION ON LINUX -------------------- We assume that you have a Debian/Ubuntu or Red Hat-like distribution. 0) You need a C/C++ compiler, Flex and Bison, and GNU make. The GNU Make needs to be version 3.81 or higher. On Debian-like distributions, do apt-get install g++ gcc flex bison make git libz-dev libwww-perl patch libzip-dev On Red Hat/Fedora or derivates, do yum install gcc gcc-c++ flex bison perl-libwww-perl patch 1) As a user, get the CBMC source via git clone https://github.com/diffblue/cbmc cbmc-git 2) Do cd cbmc-git/src make minisat2-download make libzip-download zlib-download make libzip-build make COMPILATION ON SOLARIS 11 ------------------------- 1) As root, get the necessary development tools: pkg install system/header developer/lexer/flex developer/parser/bison developer/versioning/git pkg install --accept developer/gcc-49 2) As a user, get the CBMC source via git clone https://github.com/diffblue/cbmc cbmc-git 3) Get MiniSat2 by entering cd cbmc-git wget http://ftp.debian.org/debian/pool/main/m/minisat2/minisat2_2.2.1.orig.tar.gz gtar xfz minisat_2.2.1.orig.tar.gz mv minisat2-2.2.1 minisat-2.2.1 (cd minisat-2.2.1; patch -p1 < ../scripts/minisat-2.2.1-patch) 4) Type cd src; gmake That should do it. To run, you will need export LD_LIBRARY_PATH=/usr/gcc/4.9/lib Do not attempt to compile with gcc-45 that comes with Solaris 11. It will mis-optimize MiniSat2. COMPILATION ON FREEBSD 10/11 ---------------------------- 1) As root, get the necessary tools: pkg install bash gmake git www/p5-libwww patch flex bison 2) As a user, get the CBMC source via git clone https://github.com/diffblue/cbmc cbmc-git 3) Do cd cbmc-git/src 4) Do gmake minisat2-download gmake COMPILATION ON MACOS X ---------------------- Follow these instructions: 0) You need a C/C++ compiler, Flex and Bison, and GNU make. To this end, first install the XCode from the App-store and then type xcode-select --install in a terminal window. 1) Then get the CBMC source via git clone https://github.com/diffblue/cbmc cbmc-git 2) Then type cd cbmc-git/src make minisat2-download make libzip-download zlib-download make libzip-build make COMPILATION ON WINDOWS ---------------------- There are two options: compilation using g++ from Cygwin, or using Visual Studio's compiler. As Cygwin has significant overhead during process creation, we advise you use Visual Studio. Follow these instructions: 0) You need a C/C++ compiler, Flex and Bison, GNU tar, gzip2, GNU make, and patch. The GNU Make needs to be version 3.81 or higher. If you don't already have the above, we recommend you install Cygwin. 1) You need a SAT solver (in source). We recommend MiniSat2. Using a browser, download from http://ftp.debian.org/debian/pool/main/m/minisat2/minisat2_2.2.1.orig.tar.gz and then unpack with tar xfz minisat-2.2.1.tar.gz mv minisat minisat-2.2.1 cd minisat-2.2.1 patch -p1 < ../scripts/minisat-2.2.1-patch The patch removes the dependency on zlib and prevents a problem with a header file that is often unavailable on Windows. 2) Adjust src/config.inc for the paths to item 1). 3A) To compile with Cygwin, install the mingw compilers, and adjust the second line of config.inc to say BUILD_ENV = MinGW 3B) To compile with Visual Studio, make sure you have at least Visual Studio version 12 (2013), and adjust the second line of config.inc to say BUILD_ENV = MSVC Open the Visual Studio Command prompt, and then run the make.exe from Cygwin from in there. 4) Type cd src; make - that should do it. Note that "nmake" is not expected to work. Use "make". (Optional) A Visual Studio project file can be generated with the script "generate_vcxproj" that is in the subdirectory "scripts". The project file is helpful for GUI-based tasks, e.g., the class viewer, debugging, etc., and can be used for building with MSBuild. Note that you still need to run flex/bison using "make generated_files" before opening the project. WORKING WITH ECLIPSE -------------------- To work with Eclipse, do the following: 1) Select File -> New -> Makefile Project with Existing Code 2) Type "cprover" as "Project Name" 3) Select the "src" subdirectory as "Existing Code Location" 4) Select a toolchain appropriate for your platform 5) Click "Finish" 6) Select Project -> Build All cbmc-cbmc-5.6/LICENSE 0000664 0000000 0000000 00000003650 13014065137 0014203 0 ustar 00root root 0000000 0000000 (C) 2001-2016, Daniel Kroening, Edmund Clarke, Computer Science Department, University of Oxford Computer Science Department, Carnegie Mellon University All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. All advertising materials mentioning features or use of this software must display the following acknowledgement: This product includes software developed by Daniel Kroening, Edmund Clarke, Computer Science Department, University of Oxford Computer Science Department, Carnegie Mellon University 4. Neither the name of the University nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. cbmc-cbmc-5.6/README.md 0000664 0000000 0000000 00000001535 13014065137 0014455 0 ustar 00root root 0000000 0000000 [![Build Status][build_img]][travis] [CProver Wiki](http://www.cprover.org/wiki) About ===== CBMC is a Bounded Model Checker for C and C++ programs. It supports C89, C99, most of C11 and most compiler extensions provided by gcc and Visual Studio. It also supports SystemC using Scoot. It allows verifying array bounds (buffer overflows), pointer safety, exceptions and user-specified assertions. Furthermore, it can check C and C++ for consistency with other languages, such as Verilog. The verification is performed by unwinding the loops in the program and passing the resulting equation to a decision procedure. For full information see [cprover.org](http://www.cprover.org/cbmc). License ======= 4-clause BSD license, see `LICENSE` file. [build_img]: https://travis-ci.org/diffblue/cbmc.svg?branch=master [travis]: https://travis-ci.org/diffblue/cbmc cbmc-cbmc-5.6/doc/ 0000775 0000000 0000000 00000000000 13014065137 0013737 5 ustar 00root root 0000000 0000000 cbmc-cbmc-5.6/doc/guide/ 0000775 0000000 0000000 00000000000 13014065137 0015034 5 ustar 00root root 0000000 0000000 cbmc-cbmc-5.6/doc/guide/CBMC-guide.tex 0000664 0000000 0000000 00000073617 13014065137 0017373 0 ustar 00root root 0000000 0000000 \documentclass{article} \newcommand{\dir}[1]{\texttt{#1}} \newcommand{\file}[1]{\texttt{#1}} \newcommand{\code}[1]{\texttt{#1}} \newcommand{\prog}[1]{\texttt{#1}} \title{Beginner's Guide to CPROVER} \author{Martin Brain\thanks{But most of the content is from Michael Tautschnig}} \begin{document} \maketitle \section{Background Information} First off; read the CPROVER manual. It describes how to get, build and use CBMC and SATABS. This document covers the internals of the system and how to get started on development. \subsection{Documentation} Apart from the (user-orientated) CPROVER manual and this document, most of the rest of the documentation is inline in the code as \texttt{doxygen} and some comments. A man page for CBMC, goto-cc and goto-instrument is contained in the \dir{doc/} directory and gives some options for these tools. All of these could be improved and patches are very welcome. In some cases the algorithms used are described in the relevant papers. \subsection{Architecture} CPROVER is structured in a similar fashion to a compiler. It has language specific front-ends which perform limited syntactic analysis and then convert to an intermediate format. The intermediate format can be output to files (this is what \texttt{goto-cc} does) and are (informally) referred to as ``goto binaries'' or ``goto programs''. The back-end are tools process this format, either directly from the front-end or from it's saved output. These include a wide range of analysis and transformation tools (see Section \ref{section:other-apps}). \subsection{Coding Standards} CPROVER is written in a fairly minimalist subset of C++; templates and meta-programming are avoided except where necessary. The standard library is used but in many cases there are alternatives provided in \dir{util/} (see Section \ref{section:util}) which are preferred. Boost is not used. Patches should be formatted so that code is indented with two space characters, not tab and wrapped to 75 or 72 columns. Headers for doxygen should be given (and preferably filled!) and the author will be the person who first created the file. Identifiers should be lower case with underscores to separate words. Types (classes, structures and typedefs) names must\footnote{There are a couple of exceptions, including the graph classes} end with a \code{t}. Types that model types (i.e. C types in the program that is being interpreted) are named with \code{\_typet}. For example \code{ui\_message\_handlert} rather than \code{UI\_message\_handlert} or \code{UIMessageHandler} and \code{union\_typet}. \subsection{How to Contribute} Fixes, changes and enhancements to the CPROVER code base should be developed against the \texttt{trunk} version and submitted to Daniel as patches produced by \texttt{diff -Naur} or \texttt{svn diff}. Entire applications are best developed independently (\texttt{git svn} is a popular choice for tracking the main trunk but also having local development) until it is clear what their utility, future and maintenance is likely to be. \subsection{Other Useful Code} \label{section:other-apps} The CPROVER subversion archive contains a number of separate programs. Others are developed separately as patches or separate branches.% New applications are initially developed in their version %control system and may be merged into the main subversion system %depending on their utility, popularity and maintenance. Interfaces are have been and are continuing to stablise but older code may require work to compile and function correctly. In the main archive: \begin{description} \item[\prog{CBMC}]{A bounded model checking tool for C and C++. See Section \ref{section:CBMC}.} \item[\prog{goto-cc}]{A drop-in, flag compatible replacement for GCC and other compilers that produces goto-programs rather than executable binaries. See Section \ref{section:goto-cc}.} \item[\prog{goto-instrument}]{A collection of functions for instrumenting and modifying goto-programs. See Section \ref{section:goto-instrument}.} \end{description} Model checkers and similar tools: \begin{description} \item[\prog{SatABS}]{A CEGAR model checker using predicate abstraction. Is roughly 10,000 lines of code (on top of the CPROVER code base) and is developed in its own subversion archive. It uses an external model checker to find potentially feasible paths. Key limitations are related to code with pointers and there is scope for significant improvement.} \item[\prog{Scratch}]{Alistair Donaldson's k-induction based tool. The front-end is in the old project CVS and some of the functionality is in \prog{goto-instrument}.} \item[\prog{Wolverine}]{An implementation of Ken McMillan's IMPACT algorithm for sequential programs. In the old project CVS.} \item[\prog{C-Impact}]{An implementation of Ken McMillan's IMPACT algorithm for parallel programs. In the old project CVS.} \item[\prog{LoopFrog}]{A loop summarisation tool.} \item[\prog{???}]{Christoph's termination analyser.} \end{description} Test case generation: \begin{description} \item[\prog{cover}]{A basic test-input generation tool. In the old project CVS.} \item[\prog{FShell}]{A test-input generation tool that allows the user to specify the desired coverage using a custom language (which includes regular expressions over paths). It uses incremental SAT and is thus faster than the na\"ive ``add assertions one at a time and use the counter-examples'' approach. Is developed in its own subversion.} \end{description} Alternative front-ends and input translators: \begin{description} \item[\prog{Scoot}]{A System-C to C translator. Probably in the old project CVS.} \item[\prog{???}]{A Simulink to C translator. In the old project CVS.} \item[\prog{???}]{A Verilog front-end. In the old project CVS.} \item[\prog{???}]{A converter from Codewarrior project files to Makefiles. In the old project CVS.} \end{description} Other tools: \begin{description} \item[\prog{ai}]{Leo's hybrid abstract interpretation / CEGAR tool.} \item[\prog{DeltaCheck?}]{Ajitha's slicing tool, aimed at locating changes and differential verification. In the old project CVS.} \end{description} There are tools based on the CPROVER framework from other research groups which are not listed here. \section{Source Walkthrough} This section walks through the code bases in a rough order of interest / comprehensibility to the new developer. \subsection{\dir{doc}} At the moment just contains the CBMC man page. \subsection{\dir{regression/}} The regression tests are currently being moved from CVS. The \dir{regression/} directory contains all of those that have been moved. They are grouped into directories for each of the tools. Each of these contains a directory per test case, containing a C or C++ file that triggers the bug and a \file{.dsc} file that describes the tests, expected output and so on. There is a Perl script, \file{test.pl} that is used to invoke the tests as: \begin{center} \code{../test.pl -c PATH\_TO\_CBMC} \end{center} The \code{--help} option gives instructions for use and the format of the description files. \subsection{\dir{src/}} The source code is divided into a number of sub-directories, each containing the code for a different part of the system. In the top level files there are only a few files: \begin{description} \item[\file{config.inc}]{The user-editable configuration parameters for the build process. The main use of this file is setting the paths for the various external SAT solvers that are used. As such, anyone building from source will likely need to edit this.} \item[\file{Makefile}]{The main systems Make file. Parallel builds are supported and encouraged; please don't break them!} \item[\file{common}]{System specific magic required to get the system to build. This should only need to be edited if porting CBMC to a new platform / build environment.} \item[\file{doxygen.cfg}]{The config file for doxygen.cfg} \end{description} \subsubsection{\dir{util/}} \label{section:util} \dir{util/} contains the low-level data structures and manipulation functions that are used through-out the CPROVER code-base. For almost any low-level task, the code required is probably in \dir{util/}. Key files include: \begin{description} \item[\file{irep.h}]{This contains the definition of \code{irept}, the basis of many of the data structures in the project. They should not be used directly; one of the derived classes should be used. For more information see Section \ref{section:irept}.} \item[\file{expr.h}]{The parent class for all of the expressions. Provides a number of generic functions, \code{exprt} can be used with these but when creating data, subclasses of \code{exprt} should be used.} \item[\file{std\_expr.h}]{Provides subclasses of \code{exprt} for common kinds of expression for example \code{plus\_exprt}, \code{minus\_exprt}, \code{dereference\_exprt}. These are the intended interface for creating expressions.} \item[\file{std\_types.h}]{Provides subclasses of \code{typet} (a subclass of \code{irept}) to model C and C++ types. This is one of the preferred interfaces to \code{irept}. The front-ends handle type promotion and most coercision so the type system and checking goto-programs is simpler than C.} \item[\file{dstring.h}]{The CPROVER string class. This enables sharing between strings which significantly reduces the amount of memory required and speeds comparison. \code{dstring} should not be used directly, \code{irep\_idt} should be used instead, which (dependent on build options) is an alias for \code{dstring}.} \item[\file{mp\_arith.h}]{The wrapper class for multi-precision arithmetic within CPROVER. Also see \file{arith\_tools.h}.} \item[\file{ieee\_float.h}]{The arbitrary precision float model used within CPROVER. Based on \code{mp\_integer}s.} \item[\file{context.h}]{A generic container for symbol table like constructs such as namespaces. Lookup gives type, location of declaration, name, `pretty name', whether it is static or not.} \item[\file{namespace.h}]{The preferred interface for the context class. The key function is \code{lookup} which converts a string (\code{irep\_idt}) to a symbol which gives the scope of declaration, type and so on. This works for functions as well as variables.} \end{description} \subsubsection{\dir{langapi/}} This contains the basic interfaces and support classes for programming language front ends. Developers only really need look at this if they are adding support for a new language. It's main users are the two (in trunk) language front-ends; \dir{ansi-c/} and \dir{cpp/}. \subsubsection{\dir{ansi-c/}} Contains the front-end for ANSI C, plus a variety of common extensions. This parses the file, performs some basic sanity checks (this is one area in which the UI could be improved; patches most welcome) and then produces a goto-program (see below). The parser is a traditional Flex / Bison system. \file{internal\_addition.c} contains the implementation of various `magic' functions that are that allow control of the analysis from the source code level. These include assertions, assumptions, atomic blocks, memory fences and rounding modes. The \dir{library/} subdirectory contains versions of some of the C standard header files that make use of the CPROVER built-in functions. This allows CPROVER programs to be `aware' of the functionality and model it correctly. Examples include \file{stdio.c}, \file{string.c}, \file{setjmp.c} and various threading interfaces. \subsubsection{\dir{cpp/}} This directory contains the C++ front-end. It supports the subset of C++ commonly found in embedded and system applications. Consequentially it doesn't have full support for templates and many of the more advanced and obscure C++ features. The subset of the language that can be handled is being extended over time so bug reports of programs that cannot be parsed are useful. The functionality is very similar to the ANSI C front end; parsing the code and converting to goto-programs. It makes use of code from \dir{langapi} and \dir{ansi-c}. \subsubsection{\dir{goto-programs/}} Goto programs are the intermediate representation of the CPROVER tool chain. They are language independent and similar to many of the compiler intermediate languages. Section \ref{section:goto-programs} describes the \code{goto\_programt} and \code{goto\_functionst} data structures in detail. However it useful to understand some of the basic concepts. Each function is a list of instructions, each of which has a type (one of 18 kinds of instruction), a code expression, a guard expression and potentially some targets for the next instruction. They are not natively in static single-assign (SSA) form. Transitions are nondeterministic (although in practise the guards on the transitions normally cover form a disjoint cover of all possibilities). Local variables have non-deterministic values if they are not initialised. Variables and data within the program is commonly one of three types (parameterised by width): \code{unsignedbv\_typet}, \code{signedbv\_typet} and \code{floatbv\_typet}, see \file{util/std\_types.h} for more information. Goto programs can be serialised in a binary (wrapped in ELF headers) format or in XML (see the various \code{\_serialization} files). The \prog{cbmc} option \code{--show-goto-programs} is often a good starting point as it outputs goto-programs in a human readable form. However there are a few things to be aware of. Functions have an internal name (for example \code{c::f00}) and a `pretty name' (for example \code{f00}) and which is used depends on whether it is internal or being presented to the user. The \code{main} method is the `logical' main which is not necessarily the main method from the code. In the output \code{NONDET} is use to represent a nondeterministic assignment to a variable. Likewise \code{IF} as a beautified \code{GOTO} instruction where the guard expression is used as the condition. \code{RETURN} instructions may be dropped if they precede an \code{END\_FUNCTION} instruction. The comment lines are generated from the \code{locationt} field of the \code{instructiont} structure. \dir{goto-programs/} is one of the few places in the CPROVER codebase that templates are used. The intention is to allow the general architecture of program and functions to be used for other formalisms. At the moment most of the templates have a single instantiation; for example \code{goto\_functionst} and \code{goto\_function\_templatet} and \code{goto\_programt} and \code{goto\_program\_templatet}. \subsubsection{\dir{goto-symex/}} This directory contains a symbolic evaluation system for goto-programs. This takes a goto-program and translates it to an equation system by traversing the program, branching and merging and unwinding loops as needed. Each reverse goto has a separate counter (the actual counting is handled by \prog{cbmc}, see the \code{--unwind} and \code{--unwind-set} options). When a counter limit is reach, an assertion can be added to explicitly show when analysis is incomplete. The symbolic execution includes constant folding so loops that have a constant number of iterations will be handled completely (assuming the unwinding limit is sufficient). The output of the symbolic execution is a system of equations; an object containing a list of \code{symex\_target\_elements}, each of which are equalities between \prog{expr} expressions. See \file{symex\_target\_equation.h}. The output is in static, single assignment (SSA) form, which is \emph{not} the case for goto-programs. \subsubsection{\dir{pointer-analysis/}} To perform symbolic execution on programs with dereferencing of arbitrary pointers, some alias analysis is needed. \dir{pointer-analysis} contains the three levels of analysis; flow and context insensitive, context sensitive and flow and context sensitive. The code needed is subtle and sophisticated and thus there may be bugs. \subsubsection{\dir{solvers/}} The \dir{solvers/} directory contains interfaces to a number of different decision procedures, roughly one per directory. \begin{description} \item[prop/]{The basic and common functionality. The key file is \file{prop\_conv.h} which defines \code{prop\_convt}. This is the base class that is used to interface to the decision procedures. The key functions are \code{convert} which takes an \code{exprt} and converts it to the appropriate, solver specific, data structures and \code{dec\_solve} (inherited from \code{decision\_proceduret}) which invokes the actual decision procedures. Individual decision procedures (named \code{*\_dect}) objects can be created but \code{prop\_convt} is the preferred interface for code that uses them.} \item[flattening/]{A library that converts operations to bit-vectors, including calling the conversions in \dir{floatbv} as necessary. Is implemented as a simple conversion (with caching) and then a post-processing function that adds extra constraints. This is not used by the SMT or CVC back-ends.} %%%% \item[dplib/]{Provides the \code{dplib\_dect} object which used the decision procedure library from ``Decision Procedures : An Algorithmic Point of View''.} \item[cvc/]{Provides the \code{cvc\_dect} type which interfaces to the old (pre SMTLib) input format for the CVC family of solvers. This format is still supported by depreciated in favour of SMTLib 2.} \item[smt1/]{Provides the \code{smt1\_dect} type which converts the formulae to SMTLib version 1 and then invokes one of Boolector, CVC3, OpenSMT, Yices, MathSAT or Z3. Again, note that this format is depreciated.} \item[smt2/]{Provides the \code{smt2\_dect} type which functions in a similar way to \code{smt1\_dect}, calling Boolector, CVC3, MathSAT, Yices or Z3. Note that the interaction with the solver is batched and uses temporary files rather than using the interactive command supported by SMTLib 2. With the \code{--fpa} option, this output mode will not flatten the floating point arithmetic and instead output the proposed SMTLib floating point standard.} \item[qbf/]{Back-ends for a variety of QBF solvers. Appears to be no longer used or maintained.} \item[sat/]{Back-ends for a variety of SAT solvers and DIMACS output.} \end{description} \subsubsection{\dir{cbmc/}} \label{section:CBMC} This contains the first full application. CBMC is a bounded model checker that uses the front ends (\dir{ansi-c}, \dir{cpp}, goto-program or others) to create a goto-program, \dir{goto-symex} to unwind the loops the given number of times and to produce and equation system and finally \dir{solvers} to find a counter-example (technically, \dir{goto-symex} is then used to construct the counter-example trace). \subsubsection{\dir{goto-cc/}} \label{section:goto-cc} \dir{goto-cc} is a compiler replacement that just performs the first step of the process; converting C or C++ programs to goto-binaries. It is intended to be dropped in to an existing build procedure in place of the compiler, thus it emulates flags that would affect the semantics of the code produced. Which set of flags are emulated depends on the naming of the \dir{goto-cc/} binary. If it is called \prog{goto-cc} then it emulates GCC flags, \prog{goto-armcc} emulates the ARM compiler, \prog{goto-cl} emulates VCC and \prog{goto-cw} emulates the Code Warrior compiler. The output of this tool can then be used with \prog{cbmc} or \prog{goto-instrument}. \subsubsection{\dir{goto-instrument/}} \label{section:goto-instrument} The \dir{goto-instrument/} directory contains a number of tools, one per file, that are built into the \prog{goto-instrument} program. All of them take in a goto-program (produced by \prog{goto-cc}) and either modify it or perform some analysis. Examples include \file{nondet\_static.cpp} which initialises static variables to a non-deterministic value, \file{nondet\_volatile.cpp} which assigns a non-deterministic value to any volatile variable before it is read and \file{weak\_memory.h} which performs the necessary transformations to reason about weak memory models. The exception to the ``one file for each piece of functionality'' rule are the program instrumentation options (mostly those given as ``Safety checks'' in the \prog{goto-instrument} help text) which are included in the \prog{goto-program/} directory. An example of this is \file{goto-program/stack\_depth.h} and the general rule seems to be that transformations and instrumentation that \prog{cbmc} uses should be in \dir{goto-program/}, others should be in \dir{goto-instrument}. \prog{goto-instrument} is a very good template for new analysis tools. New developers are advised to copy the directory, remove all files apart from \file{main.*}, \file{parseoptions.*} and the \file{Makefile} and use these as the skeleton of their application. The \code{doit()} method in \file{parseoptions.cpp} is the preferred location for the top level control for the program. \subsubsection{\dir{linking/}} Probably the code to emulate a linker. This allows multiple `object files' (goto-programs) to be linked into one `executable' (another goto-program), thus allowing existing build systems to be used to build complete goto-program binaries. \subsubsection{\dir{big-int/}} CPROVER is distributed with its own multi-precision arithmetic library; mainly for historical and portability reasons. The library is externally developed and thus \dir{big-int} contains the source as it is distributed. This should not be used directly, see \file{util/mp\_arith.h} for the CPROVER interface. \subsubsection{\dir{xmllang/}} CPROVER has optional XML output for results and there is an XML format for goto-programs. It is used to interface to various IDEs. The \dir{xmllang/} directory contains the parser and helper functions for handling this format. \subsubsection{\dir{floatbv/}} This library contains the code that is used to convert floating point variables (\code{floatbv}) to bit vectors (\code{bv}). This is referred to as `bit-blasting' and is called in the \dir{solver} code during conversion to SAT or SMT. It also contains the abstraction code described in the FMCAD09 paper. \section{Data Structures} This section discusses some of the key data-structures used in the CPROVER codebase. \subsection{\code{irept}} \label{section:irept} There are a large number of kind of tree structured or tree-like data in CPROVER. \code{irept} provides a single, unified representation for all of these, allowing structure sharing and reference counting of data. As such \code{irept} is the basic unit of data in CPROVER. Each \code{irept} contains\footnote{Or references, if reference counted data sharing is enabled. It is enabled by default; see the \code{SHARING} macro.} a basic unit of data (of type \code{dt}) which contains four things: \begin{description} \item[\code{data}]{A string\footnote{When \code{USE\_DSTRING} is enabled (it is by default), this is actually a \code{dstring} and thus an integer which is a reference into a string table}, which is returned when the \code{id()} function is used.} \item[\code{named\_sub}]{A map from \code{irep\_namet} (a string) to an \code{irept}. This is used for named children, i.e. subexpressions, parameters, etc.} \item[\code{comments}]{Another map from \code{irep\_namet} to \code{irept} which is used for annotations and other `non-semantic' information} \item[\code{sub}]{A vector of \code{irept} which is used to store ordered but unnamed children.} \end{description} The \code{irept::pretty} function outputs the contents of an \code{irept} directly and can be used to understand an debug problems with \code{irept}s. On their own \code{irept}s do not ``mean'' anything; they are effectively generic tree nodes. Their interpretation depends on the contents of result of the \code{id} function (the \code{data}) field. \file{util/irep\_ids.txt} contains the complete list of \code{id} values. During the build process it is used to generate \file{util/irep\_ids.h} which gives constants for each id (named \code{ID\_*}). These can then be used to identify what kind of data \code{irept} stores and thus what can be done with it. To simplify this process, there are a variety of classes that inherit from \code{irept}, roughly corresponding to the ids listed (i.e. \code{ID\_or} (the string \code{"or''}) corresponds to the class \code{or\_exprt}). These give semantically relevant accessor functions for the data; effectively different APIs for the same underlying data structure. None of these classes add fields (only methods) and so static casting can be used. The inheritance graph of the subclasses of \code{irept} is a useful starting point for working out how to manipulate data. There are three main groups of classes (or APIs); those derived from \code{typet}, \code{codet} and \code{exprt} respectively. Although all of these inherit from \code{irept}, these are the most abstract level that code should handle data. If code is manipulating plain \code{irept}s then something is wrong with the architecture of the code. Many of the key descendent of \code{exprt} are declared in \file{std\_expr.h}. All expressions have a named subfield / annotation which gives the type of the expression (slightly simplified from C/C++ as \code{unsignedbv\_typet}, \code{signedbv\_typet}, \code{floatbv\_typet}, etc.). All type conversions are explicit with an expression with \code{id() == ID\_typecast} and an `interface class' named \code{typecast\_exprt}. One key descendent of \code{exprt} is \code{symbol\_exprt} which creates \code{irept} instances with the id of ``symbol''. These are used to represent variables; the name of which can be found using the \code{get\_identifier} accessor function. \code{codet} inherits from \code{exprt} and is defined in \file{std\_code.h}. They represent executable code; statements in C rather than expressions. In the front-end there are versions of these that hold whole code blocks, but in goto-programs these have been flattened so that each \code{irept} represents one sequence point (almost one line of code / one semi-colon). The most common descendents of \code{codet} are \code{code\_assignt} so a common pattern is to cast the \code{codet} to an assignment and then recurse on the expression on either side. \subsection{\code{goto-programs}} \label{section:goto-programs} The common starting point for working with goto-programs is the \code{read\_goto\_binary} function which populates an object of \code{goto\_functionst} type. This is defined in \file{goto\_functions.h} and is an instantiation of the template \code{goto\_functions\_templatet} which is contained in \file{goto\_functions\_template.h}. They are wrappers around a map from strings to \code{goto\_programt}'s and iteration macros are provided. Note that \code{goto\_function\_templatet} (no \code{s}) is defined in the same header as \code{goto\_functions\_templatet} and is gives the C type for the function and Boolean which indicates whether the body is available (before linking this might not always be true). Also note the slightly counter-intuitive naming; \code{goto\_functionst} instances are the top level structure representing the program and contain \code{goto\_programt} instances which represent the individual functions. At the time of writing \code{goto\_functionst} is the only instantiation of the template \code{goto\_functions\_templatet} but other could be produced if a different data-structures / kinds of models were needed for functions. \code{goto\_programt} is also an instantiation of a template. In a similar fashion it is \code{goto\_program\_templatet} and allows the types of the guard and expression used in instructions to be parameterised. Again, this is currently the only use of the template. As such there are only really helper functions in \file{goto\_program.h} and thus \code{goto\_program\_template.h} is probably the key file that describes the representation of (C) functions in the goto-program format. It is reasonably stable and reasonably documented and thus is a good place to start looking at the code. An instance of \code{goto\_program\_templatet} is effectively a list of instructions (and inner template called \code{instructiont}). It is important to use the copy and insertion functions that are provided as iterators are used to link instructions to their predecessors and targets and careless manipulation of the list could break these. Likewise there are helper macros for iterating over the instructions in an instance of \code{goto\_program\_templatet} and the use of these is good style and strongly encouraged. Individual instructions are instances of type \code{instructiont}. They represent one step in the function. Each has a type, an instance of \code{goto\_program\_instruction\_typet} which denotes what kind of instruction it is. They can be computational (such as \code{ASSIGN} or \code{FUNCTION\_CALL}), logical (such as \code{ASSUME} and \code{ASSERT}) or informational (such as \code{LOCATION} and \code{DEAD}). At the time of writing there are 18 possible values for \code{goto\_program\_instruction\_typet} / kinds of instruction. Instructions also have a guard field (the condition under which it is executed) and a code field (what the instruction does). These may be empty depending on the kind of instruction. In the default instantiations these are of type \code{exprt} and \code{codet} respectively and thus covered by the previous discussion of \code{irept} and its descendents. The next instructions (remembering that transitions are guarded by non-deterministic) are given by the list \code{targets} (with the corresponding list of labels \code{labels}) and the corresponding set of previous instructions is get by \code{incoming\_edges}. Finally \code{instructiont} have informational \code{function} and \code{location} fields that indicate where they are in the code. \end{document} cbmc-cbmc-5.6/doc/html-manual/ 0000775 0000000 0000000 00000000000 13014065137 0016156 5 ustar 00root root 0000000 0000000 cbmc-cbmc-5.6/doc/html-manual/api.shtml 0000664 0000000 0000000 00000017570 13014065137 0020012 0 ustar 00root root 0000000 0000000
The following sections summarize the functions available to programs that are passed to the CPROVER tools.
void __CPROVER_assume(_Bool assumption);
void __CPROVER_assert(_Bool assertion, const char *description);
void assert(_Bool assertion);
The function __CPROVER_assume adds an expression as a constraint to the program. If the expression evaluates to false, the execution aborts without failure. More detail on the use of assumptions is in the section on Assumptions and Assertions.
_Bool __CPROVER_same_object(const void *, const void *);
unsigned __CPROVER_POINTER_OBJECT(const void *p);
signed __CPROVER_POINTER_OFFSET(const void *p);
_Bool __CPROVER_DYNAMIC_OBJECT(const void *p);
The function __CPROVER_same_object returns true if the two pointers given as arguments point to the same object. The function __CPROVER_POINTER_OFFSET returns the offset of the given pointer relative to the base address of the object. The function __CPROVER_DYNAMIC_OBJECT returns true if the pointer passed as arguments points to a dynamically allocated object.
_Bool __CPROVER_is_zero_string(const void *);
__CPROVER_size_t __CPROVER_zero_string_length(const void *);
__CPROVER_size_t __CPROVER_buffer_size(const void *);
void __CPROVER_initialize(void);
The function __CPROVER_initialize computes the initial state of the program. It is called prior to calling the main procedure of the program.
void __CPROVER_input(const char *id, ...);
void __CPROVER_output(const char *id, ...);
The functions __CPROVER_input and __CPROVER_output are used to report an input or output value. Note that they do not generate input or output values. The first argument is a string constant to distinguish multiple inputs and outputs (inputs are typically generated using nondeterminism, as described here). The string constant is followed by an arbitrary number of values of arbitrary types.
void __CPROVER_cover(_Bool condition);
_Bool __CPROVER_isnan(double f);
_Bool __CPROVER_isfinite(double f);
_Bool __CPROVER_isinf(double f);
_Bool __CPROVER_isnormal(double f);
_Bool __CPROVER_sign(double f);
The function __CPROVER_isnan returns true if the double-precision floating-point number passed as argument is a NaN.
The function __CPROVER_isfinite returns true if the double-precision floating-point number passed as argument is a finite number.
This function __CPROVER_isinf returns true if the double-precision floating-point number passed as argument is plus or minus infinity.
The function __CPROVER_isnormal returns true if the double-precision floating-point number passed as argument is a normal number.
This function __CPROVER_sign returns true if the double-precision floating-point number passed as argument is negative.
int __CPROVER_abs(int x);
long int __CPROVER_labs(long int x);
double __CPROVER_fabs(double x);
long double __CPROVER_fabsl(long double x);
float __CPROVER_fabsf(float x);
These functions return the absolute value of the given argument.
_Bool __CPROVER_array_equal(const void array1[], const void array2[]);
void __CPROVER_array_copy(const void dest[], const void src[]);
void __CPROVER_array_set(const void dest[], value);
The function __CPROVER_array_equal returns true if the values stored in the given arrays are equal. The function __CPROVER_array_copy copies the contents of the array src to the array dest. The function __CPROVER_array_set initializes the array dest with the given value.
Uninterpreted functions are documented here.
__CPROVER_bitvector [ expression ]
This type is only available in the C frontend. It is used to specify a bit vector with arbitrary but fixed size. The usual integer type modifiers signed and unsigned can be applied. The usual arithmetic promotions will be applied to operands of this type.
__CPROVER_floatbv [ expression ] [ expression ]
This type is only available in the C frontend. It is used to specify an IEEE-754 floating point number with arbitrary but fixed size. The first parameter is the total size (in bits) of the number, and the second is the size (in bits) of the mantissa, or significand (not including the hidden bit, thus for single precision this should be 23).
__CPROVER_fixedbv [ expression ] [ expression ]
This type is only available in the C frontend. It is used to specify a fixed-point bit vector with arbitrary but fixed size. The first parameter is the total size (in bits) of the type, and the second is the number of bits after the radix point.
The type of sizeof expressions.
extern int __CPROVER_rounding_mode;
This variable contains the IEEE floating-point arithmetic rounding mode.
This is a constant that models a large unsigned integer.
__CPROVER_integer is an unbounded, signed integer type. __CPROVER_rational is an unbounded, signed rational number type.
extern unsigned char __CPROVER_memory[];
This array models the contents of integer-addressed memory.
This type is the equivalent of unsigned __CPROVER_bitvector[N] in the C++ front-end.
This type is the equivalent of signed __CPROVER_bitvector[N] in the C++ front-end.
This type is the equivalent of __CPROVER_fixedbv[N,m] in the C++ front-end.
Asynchronous threads are created by preceding an instruction with a label with the prefix __CPROVER_ASYNC_.
cbmc-cbmc-5.6/doc/html-manual/architecture.shtml 0000664 0000000 0000000 00000005574 13014065137 0021724 0 ustar 00root root 0000000 0000000The behavior of a C/C++ program depends on a number of parameters that are specific to the architecture the program was compiled for. The three most important architectural parameters are:
sizeof(long int)
on various machines.sizeof(int *)
on various machines.
In general, the CPROVER tools attempt to adopt the settings of the
particular architecture the tool itself was compiled for. For example,
when running a 64 bit binary of CBMC on Linux, the program will be processed
assuming that sizeof(long int)==8
.
As a consequence of these architectural parameters, you may observe different verification results for an identical program when running CBMC on different machines. In order to get consistent results, or when aiming at validating a program written for a different platform, the following command-line arguments can be passed to the CPROVER tools:
--16
,
--32
, --64
.--little-endian
and --big-endian
.When using a goto binary, CBMC and the other tools read the configuration from the binary, i.e., the setting when running goto-cc is the one that matters; the option given to the model checker is ignored in this case.
In order to see the effect of the options --16
,
--32
and --64
, pass
the following program to CBMC:
#include <stdio.h>
#include <assert.h>
int main() {
printf("sizeof(long int): %d\n", (int)sizeof(long int));
printf("sizeof(int *): %d\n", (int)sizeof(int *));
assert(0);
}
The counterexample trace contains the strings printed by the
printf
command.
The effects of endianness are
more subtle. Try the following program with --big-endian
and --little-endian
:
#include <stdio.h>
#include <assert.h>
int main() {
int i=0x01020304;
char *p=(char *)&i;
printf("Bytes of i: %d, %d, %d, %d\n",
p[0], p[1], p[2], p[3]);
assert(0);
}
The basic idea of CBMC is to model the computation of the programs up to a particular depth. Technically, this is achieved by a process that essentially amounts to unwinding loops. This concept is best illustrated with a generic example:
int main(int argc, char **argv) {
while(cond) {
BODY CODE
}
}
A BMC instance that will find bugs with up to five iterations of the loop would contain five copies of the loop body, and essentially corresponds to checking the following loop-free program:
int main(int argc, char **argv) {
if(cond) {
BODY CODE COPY 1
if(cond) {
BODY CODE COPY 2
if(cond) {
BODY CODE COPY 3
if(cond) {
BODY CODE COPY 4
if(cond) {
BODY CODE COPY 5
}
}
}
}
}
}
Note the use of the if
statement to prevent the execution of
the loop body in the case that the loop ends before five iterations are executed.
The construction above is meant to produce a program that is trace equivalent
with the original programs for those traces that contain up to five iterations
of the loop.
In many cases, CBMC is able to automatically determine an upper bound on the number of loop iterations. This may even work when the number of loop unwindings is not constant. Consider the following example:
_Bool f();
int main() {
for(int i=0; i<100; i++) {
if(f()) break;
}
assert(0);
}
The loop in the program above has an obvious upper bound on the number of
iterations, but note that the loop may abort prematurely depending on the
value that is returned by f()
. CBMC is nevertheless able to
automatically unwind the loop to completion.
This automatic detection of the unwinding
bound may fail if the number of loop iterations is highly data-dependent.
Furthermore, the number of iterations that are executed by any given
loop may be too large or may simply be unbounded. For this case,
CBMC offers the command-line option --unwind B
, where
B
denotes a number that corresponds to the maximal number
of loop unwindings CBMC performs on any loop.
Note that the number of unwindings is measured by counting the number of
backjumps. In the example above, note that the condition
i<100
is in fact evaluated 101 times before the loop
terminates. Thus, the loop requires a limit of 101, and not 100.
The setting given with --unwind
is used globally,
that is, for all loops in the program. In order to set individual
limits for the loops, first use
--show-loops
to obtain a list of all loops in the program. Then identify the loops you need to set a separate bound for, and note their loop ID. Then use
--unwindset L:B,L:B,...
where L
denotes a loop ID and B
denotes
the bound for that loop.
As an example, consider a program with two loops in the function main:
--unwindset c::main.0:10,c::main.1:20
This sets a bound of 10 for the first loop, and a bound of 20 for the second loop.
What if the number of unwindings specified is too small? In this case, bugs
that require paths that are deeper may be missed. In order to address this
problem, CBMC can optionally insert checks that the given unwinding bound is
actually sufficiently large. These checks are called unwinding
assertions, and are enabled with the option
--unwinding-assertions
. Continuing the generic example above,
this unwinding assertion for a bound of five corresponds to checking the
following loop-free program:
int main(int argc, char **argv) {
if(cond) {
BODY CODE COPY 1
if(cond) {
BODY CODE COPY 2
if(cond) {
BODY CODE COPY 3
if(cond) {
BODY CODE COPY 4
if(cond) {
BODY CODE COPY 5
assert(!cond);
}
}
}
}
}
}
The unwinding assertions can be verified just like any other generated assertion. If all of them are proven to hold, the given loop bounds are sufficient for the program. This establishes a high-level worst-case execution time (WCET).
In some cases, it is desirable to cut off very deep loops in favor of code that follows the loop. As an example, consider the following program:
int main() {
for(int i=0; i<10000; i++) {
BODY CODE
}
assert(0);
}
In the example above, small values of --unwind
will
prevent that the assertion is reached. If the code in the loop
is considered irrelevant to the later assertion, use the option
--partial-loops
This option will allow paths that execute loops only partially, enabling a counterexample for the assertion above even for small unwinding bounds. The disadvantage of using this option is that the resulting path may be spurious, i.e., may not exist in the original program.
The loop-based unwinding bound is not always appropriate. In particular,
it is often difficult to control the size of the generated formula
when using the --unwind
option. The option
--depth nr
specifies an unwinding bound in terms of the number of instructions that are executed on a given path, irrespectively of the number of loop iterations. Note that CBMC uses the number of instructions in the control-flow graph as the criterion, not the number of instructions in the source code.
cbmc-cbmc-5.6/doc/html-manual/cbmc.shtml 0000664 0000000 0000000 00000031017 13014065137 0020135 0 ustar 00root root 0000000 0000000We assume you have already installed CBMC and the necessary support files on your system. If not so, please follow these instructions.
Like a compiler, CBMC takes the names of .c files as command line arguments. CBMC then translates the program and merges the function definitions from the various .c files, just like a linker. But instead of producing a binary for execution, CBMC performs symbolic simulation on the program.
As an example, consider the following simple program, named file1.c:
int puts(const char *s) { }
int main(int argc, char **argv) {
puts(argv[2]);
}
Of course, this program is faulty, as the argv
array might have fewer
than three elements, and then the array access argv[2]
is out of bounds.
Now, run CBMC as follows:
cbmc file1.c --show-properties --bounds-check --pointer-check
The two options --bounds-check
and --pointer-check
instruct CBMC to look for errors related to pointers and array bounds.
CBMC will print the list of properties it checks. Note that it lists,
among others, a property labeled with "object bounds in argv" together with
the location of the faulty array access. As you can see, CBMC largely
determines the property it needs to check itself. This is realized by means
of a preliminary static analysis, which relies on computing a fixed point on
various abstract
domains. More detail on automatically generated properties is provided
here.
Note that these automatically generated properties need not necessarily correspond to bugs – these are just potential flaws, as abstract interpretation might be imprecise. Whether these properties hold or correspond to actual bugs needs to be determined by further analysis.
CBMC performs this analysis using symbolic simulation, which corresponds to a translation of the program into a formula. The formula is then combined with the property. Let's look at the formula that is generated by CBMC's symbolic simulation:
cbmc file1.c --show-vcc --bounds-check --pointer-check
With this option, CBMC performs the symbolic simulation and prints the verification conditions on the screen. A verification condition needs to be proven to be valid by a decision procedure in order to assert that the corresponding property holds. Let's run the decision procedure:
cbmc file1.c --bounds-check --pointer-check
CBMC transforms the equation you have seen before into CNF and passes it to
a SAT solver (more background on this step is in the book on Decision Procedures). It
then determines which of the properties that it has generated for the
program hold and which do not. Using the SAT solver, CBMC detects that the
property for the object bounds of argv
does not hold, and will
thus print a line as follows:
[main.pointer_dereference.6] dereference failure: object bounds in argv[(signed long int)2]: FAILURE
Let us have a closer look at this property and why it fails. To aid the understanding of the problem, CBMC can generate a counterexample trace for failed properties. To obtain this trace, run
cbmc file1.c --bounds-check --trace
CBMC then prints a counterexample trace, i.e., a program trace that begins
with main
and ends in a state which violates the property. In
our example, the program trace ends in the faulty array access. It also
gives the values the input variables must have for the bug to occur. In
this example, argc
must be one to trigger the out-of-bounds
array access. If you add a branch to the example that requires that
argc>=3
, the bug is fixed and CBMC will report that the
proofs of all properties have been successful.
In the example above, we used a program that starts with a main
function. However, CBMC is aimed at embedded software, and these
kinds of programs usually have different entry points. Furthermore, CBMC
is also useful for verifying program modules. Consider the following example,
called file2.c:
int array[10];
int sum() {
unsigned i, sum;
sum=0;
for(i=0; i<10; i++)
sum+=array[i];
}
In order to set the entry point to the sum
function, run
cbmc file2.c --function sum --bounds-check
It is often necessary to build a suitable harness for the function in order to set up the environment appropriately.
When running the previous example, you will have noted that CBMC unwinds the
for
loop in the program. As CBMC performs Bounded Model
Checking, all loops have to have a finite upper run-time bound in order to
guarantee that all bugs are found. CBMC can optionally check that enough
unwinding is performed. As an example, consider the program binsearch.c:
int binsearch(int x) {
int a[16];
signed low=0, high=16;
while(low<high) {
signed middle=low+((high-low)>>1);
if(a[middle]<x)
high=middle;
else if(a[middle]>x)
low=middle+1;
else // a[middle]==x
return middle;
}
return -1;
}
If you run CBMC on this function, you will notice that the unwinding does not stop on its own. The built-in simplifier is not able to determine a run time bound for this loop. The unwinding bound has to be given as a command line argument:
cbmc binsearch.c --function binsearch --unwind 6 --bounds-check --unwinding-assertions
CBMC verifies that verifies the array accesses are within the bounds; note
that this actually depends on the result of the right shift. In addition,
as CBMC is given the option
--unwinding-assertions
--property
.
CBMC can also be used for programs with unbounded loops. In this case, CBMC is used for bug hunting only; CBMC does not attempt to find all bugs. The following program (lock-example.c) is an example of a program with a user-specified property:
_Bool nondet_bool();
_Bool LOCK = 0;
_Bool lock() {
if(nondet_bool()) {
assert(!LOCK);
LOCK=1;
return 1; }
return 0;
}
void unlock() {
assert(LOCK);
LOCK=0;
}
int main() {
unsigned got_lock = 0;
int times;
while(times > 0) {
if(lock()) {
got_lock++;
/* critical section */
}
if(got_lock!=0)
unlock();
got_lock--;
times--;
} }
The while
loop in the main
function has no
(useful) run-time bound. Thus, a bound has to be set on the amount of
unwinding that CBMC performs. There are two ways to do so:
--unwind
command-line parameter can to be used to limit
the number of times loops are unwound.--depth
command-line parameter can be used to limit
the number of program steps to be processed.
Given the option --unwinding-assertions
, CBMC checks whether
the arugment to --unwind
is large enough to cover all program
paths. If the argument is too small, CBMC will detect that not enough
unwinding is done reports that an unwinding assertion has failed.
Reconsider the example. For a loop unwinding bound of one, no bug is found.
But already for a bound of two, CBMC detects a trace that violates an
assertion. Without unwinding assertions, or when using the --depth
command line switch, CBMC does not prove the program correct, but it can be
helpful to find program bugs. The various command line options that CBMC
offers for loop unwinding are described in the section on
understanding loop unwinding.
Most C programs make use of functions provided by a library; instances are
functions from the standard ANSI-C library such as malloc
or
printf
. The verification of programs that use such functions
has two requirements:
Most C compilers come with header files for the ANSI-C library functions. We briefly discuss how to obtain/install these library files.
Linux systems that are able to compile software are usually equipped with the appropriate header files. Consult the documentation of your distribution on how to install the compiler and the header files. First try to compile some significant program before attempting to verify it.
On Microsoft Windows, CBMC is pre-configured to use the compiler that is part of Microsoft's Visual Studio. Microsoft's Visual Studio Community is fully featured and available for download for free from the Microsoft webpage. Visual Studio installs the usual set of header files together with the compiler. However, the Visual Studio compiler requires a large set of environment variables to function correctly. It is therefore required to run CBMC from the Visual Studio Command Prompt, which can be found in the menu Visual Studio Tools.
Note that in both cases, only header files are available. CBMC only
comes with a small set of definitions, which includes functions such as
malloc
. Detailed information about the built-in definitions is
here.
This section describes the command line interface of CBMC. Like a C
compiler, CBMC takes the names of the .c source files as arguments.
Additional options allow to customize the behavior of CBMC. Use
cbmc --help
to get a full list of the available options.
Structured output can be obtained from CBMC using the option --xml-ui
.
Any output from CBMC (e.g., counterexamples) will then use an XML
representation.
We also have a list of interesting applications of CBMC.