Tsmart 2.0 User Manual

Introduction

What's Tsmart

Tsmart is a static analyzer for C source code. Given a C program, Tsmart outputs a report which contains a list of potential defects and how they could be triggered in program execution. Currently the latest version of Tsmart is 2.0 released on Dec 16, 2017.

Static analysis offers static compile-time techniques for predicting safe and computable approximations to the set of values or behaviors arising dynamically at runtime when executing a program. The main applications of static analysis include allowing compilers to generate more optimized code and reducing the likelihood of malicious or unintended behavior of programs.

It is almost impossible to write a bug-free program. Although developers employ tests to validate the program functionality, there could still be a substantial number of potential defects which could only be manifested by corner inputs. Tsmart can be utilized to improve the quality of source code in the terms of reliability and security by finding bugs that reside deeply inside programs

The overall schematic overview of Tsmart is shown as follows.

Tsmart_Overall

The Workflow of Tsmart

To use Tsmart for static analysis, users should specify the program files to be analyzed, the types of defects to be checked, and the output location of analysis results. Tsmart analyzes the input source files fully automatically and then generates a report on potential bugs with the information including bug locations and execution paths for detected bugs.

Tsmart has 3 main working modes:

  1. Build and preprocess (i.e. Build-capture) a C project and then perform static analysis;
  2. Perform static analysis on a build-captured C project;
  3. Perform static analysis for a collection of source files.

Terms

# Term Definition
1 Simple Check Tsmart's working mode on which a collection of source files is checked against the specified types of defects.
2 Build-capture Build-capture replaces compilation commands in Makefile with the corresponding preprocessing commands and produces one or more analysis tasks, each of which consists of pre-processed source files and corresponds to a target in Makefile.
3 Analysis Task An analysis task is a module that consists of one or more preprocessed source files and corresponds to a target in Makefile.
4 Compound Check Tsmart's working mode on which a C project is checked against the specified types of defects.
5 Phase A phase is an indivisible step in static analysis. Common phases include program parsing, semantics analysis and results processing.
6 CWE The Common Weakness Enumeration (CWE) is a category system for software weaknesses and vulnerabilities. It is sustained by a community project with the goals of understanding flaws in software and creating automated tools that can be used to identify, fix, and prevent those flaws. The project is sponsored by the National Cybersecurity FFRDC, which is owned by The MITRE Corporation, with support from US-CERT and the National Cyber Security Division of the U.S. Department of Homeland Security. The official homepage of CWE is https://cwe.mitre.org
7 Data-flow Analysis Data-flow analysis is a technique for gathering information about the possible set of values calculated at various points in a computer program.
8 Abstract Interpretation Abstract interpretation is a theory of sound approximation of the semantics of computer programs, based on monotonic functions over ordered sets, especially lattices. It can be viewed as a partial execution of a computer program which gains information about its semantics.
9 Counterexample In term of program analysis, a counterexample is a concrete execution that triggers a certain defect.

The Capability of Tsmart

Effectiveness
Tsmart achieves 0.6% false-positive rate and 0.6% false-negative rate on 25 CWE categories in Juliet Test Suite 1.2.
Scalability
Tsmart finishes a completed analysis on Wireshark 1.2.2 (~2.97MLOC) within 1.4 hours. The underlying hardware platform is a server with Intel(R) Xeon(R) E5-2603v3 CPU and 64GB memory.
Supported Defects
Tsmart currently supports 26 CWEs which generally cover the most fundamental and important program defects such as integer overflow, buffer overflow, null pointer dereference, resource leak and unreachable code. Details are listed as follows. Note that Tsmart 2.0 is based on CWE version 3.1.
# Internal Name CWE CWE Name Description
1 INTEGER_OVERFLOW 190 Integer Overflow or Wraparound The software performs a calculation that can produce an integer overflow or wraparound, when the logic assumes that the resulting value will always be larger than the original value. This can introduce other weaknesses when the calculation is used for resource management or execution control.
2 INTEGER_UNDERFLOW 191 Integer Underflow (Wrap or Wraparound) The product subtracts one value from another, such that the result is less than the minimum allowable integer value, which produces a value that is not equal to the correct result.
3 UNEXPECTED_SIGN_EXT 194 Unexpected Sign Extension The software performs an operation on a number that causes it to be sign extended when it is transformed into a larger data type. When the original number is negative, this can produce unexpected values that lead to resultant weaknesses.
4 SIGNED_TO_UNSIGNED 195 Signed to Unsigned Conversion Error The software uses a signed primitive and performs a cast to an unsigned primitive, which can produce an unexpected value if the value of the signed primitive can not be represented using an unsigned primitive.
5 UNSIGNED_TO_SIGNED 196 Unsigned to Signed Conversion Error The software uses an unsigned primitive and performs a cast to a signed primitive, which can produce an unexpected value if the value of the unsigned primitive can not be represented using a signed primitive.
6 LOSSY_TRUNCATION 197 Numeric Truncation Error Truncation errors occur when a primitive is cast to a primitive of a smaller size and data is lost in the conversion.
7 DIV_ZERO 369 Divide By Zero The product divides a value by zero.
8 DEAD_CODE 561 Dead Code The software contains dead code, which can never be executed.
9 ALWAYS_TRUE 571 Expression is Always True The software contains an expression that will always evaluate to true.
10 ALWAYS_FALSE 570 Expression is Always False The software contains an expression that will always evaluate to false.
11 UNUSED_VARIABLE 563 Assignment to Variable without Use The variable's value is assigned but never used, making it a dead store.
12 INVALID_FREE 590 Free of Memory not on the Heap The application calls free() on a pointer to memory that was not allocated using associated heap allocation functions such as malloc(), calloc(), or realloc().
13 415 Double Free The product calls free() twice on the same memory address, potentially leading to modification of unexpected memory locations.
14 761 Free of Pointer not at Start of Buffer The application calls free() on a pointer to a memory resource that was allocated on the heap, but the pointer is not at the start of the buffer.
15 690 Unchecked Return Value to NULL Pointer Dereference The product does not check for an error after calling a function that can return with a NULL pointer if the function fails, which leads to a resultant NULL pointer dereference.
16 404 Improper Resource Shutdown or Release The program does not release or incorrectly releases a resource before it is made available for re-use.
17 INVALID_READ 476 NULL Pointer Dereference A NULL pointer dereference occurs when the application dereferences a pointer that it expects to be valid, but is NULL, typically causing a crash or exit.
18 126 Buffer Over-read The software reads from a buffer using buffer access mechanisms such as indexes or pointers that reference memory locations after the targeted buffer.
19 127 Buffer Under-read The software reads from a buffer using buffer access mechanisms such as indexes or pointers that reference memory locations prior to the targeted buffer.
20 416 Use After Free Referencing memory after it has been freed can cause a program to crash, use unexpected values, or execute code.
21 INVALID_WRITE 121 Stack-based Buffer Overflow A stack-based buffer overflow condition is a condition where the buffer being overwritten is allocated on the stack (i.e., is a local variable or, rarely, a parameter to a function).
22 122 Heap-based Buffer Overflow A heap overflow condition is a buffer overflow, where the buffer that can be overwritten is allocated in the heap portion of memory, generally meaning that the buffer was allocated using a routine such as malloc().
23 124 Buffer Underwrite ('Buffer Underflow') The software writes to a buffer using an index or pointer that references a memory location prior to the beginning of the buffer.
24 MEMORY_LEAK 401 Improper Release of Memory Before Removing Last Reference ('Memory Leak') The software does not sufficiently track and release allocated memory after it has been used, which slowly consumes remaining memory.
25 775 Missing Release of File Descriptor or Handle after Effective Lifetime The software does not release a file descriptor or handle after its effective lifetime has ended, i.e., after the file descriptor/handle is no longer needed.
26 STACK_ADDRESS_RETURN 562 Return of Stack Variable Address A function returns the address of a stack variable, which will cause unintended program behavior, typically in the form of a crash.

CWE Compatibility

Tsmart is CWE-compatible, and the details are demonstrated in the following table.

Requirement Description Implementation
CWE Searchable users may search security elements using CWE identifiers Bug report can be filtered by CWE identifiers (ref. Review of Analysis Results)
CWE Output security elements presented to users includes, or allows users to obtain, associated CWE identifiers Each defect entry in bug report has a CWE field which records the possibly corresponding CWE identifiers (ref. Review of Analysis Results)
Mapping Accuracy security elements accurately link to the appropriate CWE identifiers The Capability of Tsmart gives the relation from defect types in Tsmart to CWE identifiers
CWE Documentation capability's documentation describes CWE, CWE compatibility, and how CWE-related functionality in the capability is used Terms introduces the concept of CWE. This section introduces CWE-compatibility and how CWE-related functionality is used in Tsmart.
CWE Coverage for CWE-Compatibility and CWE-Effectiveness, the capability's documentation explicitly lists the CWE-IDs that the capability claims coverage and effectiveness against locating in software The Capability of Tsmart gives the supported CWE identifiers by Tsmart.

For more information on CWE-compatibility, please visit https://cwe.mitre.org/compatible/index.html

Support

A major release of Tsmart is scheduled in December every year. The improvements in a major release mainly include: (1) resolving bugs in the previous version; (2) adding supports for some defect types uncovered in the previous version; (3) updating CWE capability mapping in terms of tool and documentation according to the update of CWE.

Users can mail to tsmart.project@gmail.com for an evaluation copy or a technical support.

Preparation

System Requirements

Minimum Recommended
Operating System 32-bit/64-bit Linux 64-bit Ubuntu 16.04LTS or newer
CPU 3GHz Intel (R) Core2 Duo Intel Core i5(2.66GHz) or better
Memory 2GB for 32-bit system, 4GB for 64-bit system more than 8GB
Disk Space more than 2GB more than 2GB

Runtime Dependencies

Software Version Availability
Java 8 or higher http://openjdk.java.net/install/
Python 2 2.7 or higher https://www.python.org/downloads/
Requests 2.8.14 or higher install via pip
Python 3 3.5 or higher https://www.python.org/downloads/
Apache Ant 1.9 or higher https://ant.apache.org/bindownload.cgi
Z3 4.5.0 or higher https://github.com/Z3Prover/z3

Availability

Please mail to tsmart.project@gmail.com for a demo version of Tsmart.

Components

The components of Tsmart are listed as follows.

Component Directory File or Folder Functionality
TsmartBD tsmart-v2/ lib/ Dependencies for Tsmart
TsmartBD tsmart-v2/ config/ Preset configurations for Tsmart
TsmartBD tsmart-v2/ TsmartBuild.jar Build-capture executable
TsmartBD tsmart-v2/ TsmartAnalyze.jar The main entry of Tsmart
BugVisualizer bug_visualizer/ server.py The server for management of analysis results
BugVisualizer bug_visualizer/ upload_sh.py Upload the analysis results and relevant source code to the server
BugVisualizer bug_visualizer/ bug_history/ The history of analysis results
BugVisualizer bug_visualizer/ visualizer.html Webpage template for BugVisualizer

Usage

Installation

Extract the compression archive of Tsmart to somewhere. The target path should only contain A-Z, a-z, 0-9 or _ to avoid unpredictable issues.

Command Line Overview

The main entry of Tsmart is TsmartAnalyze.jar. Users can specify the working mode, the input source code and the output location of analysis results via command line arguments. The supported arguments are listed as follows.

Argument Usage Explanation
-build -build=[BUILD_PATH] make Perform build-capture for the project under [BUILD_PATH] with make
-captured -captured=[CAPT_PATH] Specify the build-captured project under [CAPT_PATH] for static analysis
-manual -manual [SRC_PATH] Specify a collection of source files ([SRC_PATH] is the list of source paths) for static analysis
-cwe -cwe=[CWE_LIST] Specify the target defects via CWE identifiers [CWE_LIST] (Note: SHOULD NOT use -config when -cwe is used)
-task -task=[TASK_NO_LIST] Specify the tasks to be analyzed by task numbers in [TASK_NO_LIST]
-taskName -taskName=[TASK_NAME_LIST] Specify the tasks to be analyzed by task names in [TASK_NAME_LIST]
-output -output=[OUTPUT_PATH] Specify [OUTPUT_PATH] as the location of output analysis results
-config -config=[CONFIG_PATH] Specify [CONFIG_PATH] as the configuration (Note: SHOULD NOT use -config when -cwe is used)
-list -list Enumerate available analysis tasks for a compound check
-version -version View the version of Tsmart

Usage Details

Simple Check

Perform static analysis on a collection of source files. The input source files could contain header files which are processed by preprocessor in Tsmart. The command for simple check is as follows.

> java -jar TsmartAnalyze.jar -manual [SRC_PATH] -cwe=[CWE_LIST] (-output=[OUTPUT_PATH])

-manual is required while -output is optional. If -output is not specified, the results are output to output\ under the root location of TsmartBD module. Furthermore, users can specify multiple CWE identifiers split by commas in the -cwe argument. For example, -cwe=190,369 makes Tsmart to check the defects of CWE 190 (integer overflow) and CWE 369 (division-by-zero). In fact, -cwe argument specifies preset configurations associated with CWE identifiers. If it is necessary to tweak analysis algorithm in a more fine-grained manner, users can specify a configuration file via -config argument.

Compound Check

Perform static analysis on a C project based on Makefile compilation system. In some cases Makefile needs to be generated first, typically by invoking the following command in the root directory of the project.

> ./configure

If the input C project has not been pre-processed, build-capture is necessary in prior to analysis. The output of build-capture is stored under .process_makefile/ in the root directory of the input project. The command for compound check with build-capture is as follows.

> java -jar TsmartAnalyze.jar -build=[BUILD_PATH] make -cwe=[CWE_LIST] -task=[TASK_NO_LIST] (-output=[OUTPUT_PATH])

The required argument -build=[BUILD_PATH] make denotes that using make to handle the Makefile in the project under [BUILD_PATH]. The -task argument is to specify the analysis task(s) to be analyzed and is optional. If the -task argument is absent, all the analysis tasks in the build-captured project are to be analyzed. Furthermore, the -task argument can be replaced with the -taskName argument which specifies the names of analysis tasks to be analyzed.

Note that, performing build-capture on a build-captured project would arise runtime errors. Let path/to/project be the path of the input project, then the following commands restore a build-captured project to its original status.

> cd path/to/project > make clean > rm -rf .process_makefile/

It is possible to build-capture a project without performing static analysis by invoking TsmartBuild.jar. Let path/to/project be the path of the input project, build-capture can be performed by invoking the following commands in order:

> cd path/to/project > ./configure > java -jar TsmartBuild.jar -shell=/bin/bash make

Next, invoke the following command to perform static analysis on a build-captured project:

> java -jar TsmartAnalyze.jar -captured=[CAPT_PATH] -cwe=[CWE_LIST] -task=[TASK_NO_LIST] (-output=[OUTPUT_PATH])

where -captured is the path of the build-captured project and should be in the form of path/to/project/.process_makefile/.

Review of Analysis Results

To review analysis results, the server for management of analysis results should be started first. This is because development members can examine the potential bugs individually by sharing analysis results. On the server side, enter the bug_visualizer directory and execute the following command:

python server.py ([port_number])

where port_number denotes the port number. If port_number is absent, port 8080 is employed by default.

Let [server_ip] be the IP of the server side. Then execute the following command on the machine on which the static analysis is performed to upload analysis results along with the source files to the server.

python3 upload_sh.py --name=[task_name] --build=[build_path] --captured=[capture_path] --manual=[src_list] --xml=[xml_file] --server=[server_ip] --port=[port_number]

The requirement argument --name specifies a name for current analysis. Only one (and at least one) argument in --build, -captured and --manual is used and should be consistent with the corresponding argument in invoking TsmartAnalyze.jar. For example, if we use --captured [CAPT_PATH] in invoking TsmartAnalyze.jar, then --captured [CAPT_PATH] should be used in invoking upload_sh.py here. The argument --xml specifies the location of analysis result .xml file. If -output is unspecified in invoking TsmartAnalyze.jar, the location of analysis results is at tsmart-v2/output/result.xml; otherwise result.xml can be retrieved under [OUTPUT_PATH]. --server and --port specify the IP of server side and its port, respectively.

If upload succeeds, upload_sh.py prints 上传成功 or finished.

To view the analysis results, please visit http://[server_ip]:[port_number]/visualizer.html on a web browser. Note that the webpage of bug visualizer is tested on the latest version of Firefox and Chrome. Compatibility on other web browsers is not guaranteed. The webpage of bug visualizer is as follows. Bug Visualizer

  1. Select the name of analysis the results of which need to be viewed
  2. A file explorer
  3. A code viewer
  4. A search box. Users can enter a CWE identifier or a internal defect name to filter the detected potential bugs.
  5. A clickable CWE identifier. Once it is clicked, the panel under the bug list (No. 7) shows the details of the certain CWE identifier along with a link to its official definition webpage at https://cwe.mitre.org
  6. A bug list. A bug entry contains information on associated CWE identifiers, internal name of the defect, severity of the defect and the confidence of certain bug report. Once a bug entry is clicked, counter trace panel (No. 8) shows the execution trace that trigger the certain bug.
  7. A CWE detail panel which shows the description of selected CWE identifier along with a link to the official definition webpage at https://cwe.mitre.org
  8. A counter trace panel. Once a concrete execution step is clicked, code viewer shows the contents of the relevant source file and goes to the code line corresponding to the selected execution step.

Guide for Configuration

To adjust the analysis algorithm employed by Tsmart in a fine-grained manner, the underlying configuration files should be tweaked. Overall, Tsmart employs a 3-level configuration system.

  1. Top Configuration specifies the referred phase configuration file, the scheduling type of phases, the type of error report, the function adapter class, library function aliases and stop functions.
  2. Phase Configuration specifies (1) which phases are to be used, (2) referred algorithm configuration file for each phase, (3) the dependencies over the employed phases.
  3. Algorithm Configuration specifies options for concrete static analysis algorithm.

Top Configuration

The following snippet shows an example of top configuration file:

phase.manager.config = top.range.config
phase.manager.executionType = SEQUENTIAL

statistics.kind = HIERARCHICAL
error.export.mode = XML

function.adapters = RangeFunctionAdapter
function.RangeFunctionAdapter = map/alias.map
function.stopFunctions = exit, abort, error, error_at_line

phase.manager.config specifies the referred phase configuration file. phase.manager.executionType specifies the scheduling type for phases, and SEQUENTIAL means that phases are executed sequentially. In general, statistics.kind is set as HIERARCHICAL in order to support multi-entry analysis. error.export.mode should be set as XML in order to output .xml files correctly for displaying analysis results in bug visualizer. function.adapter specifies the library function support module while multiple modules are split by commas. function.RangeFunctionAdapter specifies the function name aliases (e.g. assume that library functions f1 and f2 have the identical operational semantics, then we only need to set f2 as an alias of f1 if the semantics of f1 is implemented in an employed library function support module) for the library function support module named RangeFunctionAdapter. function.stopFunctions specifies a list of stop functions in use. A stop function refers to a function that never returns.

Phase Configuration

The following snippet shows an example of a phase configuration file:

.SET
DIR = org.sosy_lab.cpachecker.core.phase;
.TES

.DEF cfa
.TYPE = $(DIR).CFACreatePhase
parser.dialect = GNUC
analysis.functionPointerCalls = false
analysis.aggressivelyRemoveGlobals = true
analysis.summaryEdges = true
analysis.machineModel = LINUX64
.FED

.DEF summary
.TYPE = $(DIR).SummaryComputationPhase
base = config/product/
summaries = access.properties, range.properties
summary.usedExternalSummary = ACCESS_SUMMARY, RANGE_SUMMARY
summary.usedInternalSummary = ACCESS_LOOP_INTERNAL, RANGE_LOOP_INTERNAL
.FED

.DEF basic
.TYPE = $(DIR).BasicAnalysisPhase
phase.analysis.type = MULTI_ENTRY
phase.me.analysis = rangeAnalysis.properties
phase.me.stopAfterError = false
checker.weaknessForCheck = INTEGER_OVERFLOW, INTEGER_CONVERSION, DIVIDED_BY_ZERO
.FED

.RUN
summary : cfa;
basic : summary;

A phase configuration has three main sections. .SET section specifies aliases in use. Users can create more aliases analogously. The syntax for the usage of aliases is similar with that in Makefile. Each .DEF section corresponds to an analysis phase. .TYPE field specifies a name for the corresponding analysis phase and is required. The configurations inside a .DEF section are valid only for the enclosing phase. .RUN section describes the dependencies over the employed analysis phases. For example, summary: cfa denotes that the phase named summary depends on the results by the phase named cfa. The example phase configuration contains three phases: cfa (control-flow construction), summary (summary computation) and basic (concrete static analysis).

The control-flow construction phase parses the input program. parser.dialect specifies the supported C extension. analysis.functionPointerCalls specifies whether function pointers are parsed and modelled in the control-flow graph at compile-time. analysis.aggressivelyRemoveGlobals specifies whether function definitions and global declarations unreachable from the current analysis entries are pruned in constructing control-flow graph. analysis.machineModel specifies the employed data model. LINUX64 corresponds to LP64 where the byte lengths of int, long and pointer are 4, 8 and 8, respectively.

In the summary computation phase, base is the root directory of configuration files for summary algorithms. summaries specifies the relative paths of summary configurations to be used. summary.usedExternalSummary and summary.usedInternalSummary specify the name of summaries to be used.

In the concrete static analysis phase, phase.analysis.type specifies the type of analysis. In the most cases, the type of analysis is MULTI_ENTRY which denotes the analysis algorithm that starts from specified entry function(s) rather than the execution entry main(). phase.me.analysis specifies the configuration for the underlying concrete analysis algorithms. phase.me.stopAfterError specifies whether the analysis terminates once a potential defect is found. It is usually set to true for program verification tasks. checker.weaknessForCheck specifies the target defects (in their internal names) against which the input source code is checked. The supported defects can be referred to in The Capability of Tsmart. It is noteworthy that whether Tsmart checks the input code against the given target defect depends on the employed concrete analysis algorithm. For example, if rangeAnalysis.properties does not contain an analysis algorithm relevant to memory safety, Tsmart cannot find memory leak defects even if MEMORY_LEAK is specified in checker.weaknessForCheck.

Analysis Configuration

The following snippet shows the contents of rangeAnalysis.properties:

analysis.algorithm.bounded = true
analysis.reachedSet.hierarchical = true
analysis.useConfidence = true

analysis.me.static.strategy = CHEAP_COVER

cpa = cpa.arg.ARGCPA
ARGCPA.cpa = cpa.composite.CompositeCPA
CompositeCPA.cpas = cpa.location.LocationCPA, cpa.boundary.BoundaryCPA, cpa.range.RangeCPA, cpa.pointer2.PointerCPA

cpa.boundary.callDepth = 2
cpa.boundary.loopIteration = 1
cpa.boundary.loopDepth = 3

analysis.traversal.controlled.maxWaitingSize = 16
analysis.traversal.useDominationOrder = true
trace.upperBound = 1000

analysis.traversal.order = BFS

cfa.useMultiEdges = false

solver.solver = Z3

The first two lines is necessary for multi-entry analysis. analysis.useConfidence is an experimental feature that prioritizes the detected bugs by their confidence scores. analysis.me.static.strategy specifies how the analysis entries are chosen, where CHEAP_COVER denotes a heuristic approach that derives analysis entries based on the structure of the input source code. Other available strategies for choosing analysis entries include MAIN (use main() as the only analysis entry), FUNCTION_HEAD (function-wise analysis). The following 3 lines specify the employed concrete analysis algorithms. RangeCPA is an analysis based on range abstract domain. BoundaryCPA is an analysis that captures information of call stack and loop stack in program execution. PointerCPA derives points-to information. The next 3 lines are the configurations relevant to boundary analysis where callDepth, loopIteration and loopDepth denote the maximum tracked depth of call stack, maximum tracked iterations of a loop and maximum tracked nesting loops. When the threshold is reached, the summary of relevant function or loop is applied. The remaining options serve as the fine-grained tweaks on precision-efficiency trade-off. For example, trace.upperBound specifies the upper bound of the branches to be tracked under a branching point in control-flow graph. At the last line, solver.solver specifies the underlying constraint solver.

Details on more options used in configuration files can be found at doc/ConfigurationOptions.txt under the root directory of TsmartBD.

FAQ

  1. Why some errors are encountered in viewing the analysis results, even when the analysis runs normally.
  1. I cannot find analysis results.
  1. TsmartAnalyze.jar prints the error message Unrecognized CWE Number