Tsmart is a static analyzer for C source code. Given a C program, Tsmart outputs a report which contains a list of potential defects and how they could be triggered in program execution. Currently the latest version of Tsmart is 2.0 released on Dec 16, 2017.
Static analysis offers static compile-time techniques for predicting safe and computable approximations to the set of values or behaviors arising dynamically at runtime when executing a program. The main applications of static analysis include allowing compilers to generate more optimized code and reducing the likelihood of malicious or unintended behavior of programs.
It is almost impossible to write a bug-free program. Although developers employ tests to validate the program functionality, there could still be a substantial number of potential defects which could only be manifested by corner inputs. Tsmart can be utilized to improve the quality of source code in the terms of reliability and security by finding bugs that reside deeply inside programs
The overall schematic overview of Tsmart is shown as follows.
To use Tsmart for static analysis, users should specify the program files to be analyzed, the types of defects to be checked, and the output location of analysis results. Tsmart analyzes the input source files fully automatically and then generates a report on potential bugs with the information including bug locations and execution paths for detected bugs.
Tsmart has 3 main working modes:
|1||Simple Check||Tsmart's working mode on which a collection of source files is checked against the specified types of defects.|
|2||Build-capture||Build-capture replaces compilation commands in
|3||Analysis Task||An analysis task is a module that consists of one or more preprocessed source files and corresponds to a target in
|4||Compound Check||Tsmart's working mode on which a C project is checked against the specified types of defects.|
|5||Phase||A phase is an indivisible step in static analysis. Common phases include program parsing, semantics analysis and results processing.|
|6||CWE||The Common Weakness Enumeration (CWE) is a category system for software weaknesses and vulnerabilities. It is sustained by a community project with the goals of understanding flaws in software and creating automated tools that can be used to identify, fix, and prevent those flaws. The project is sponsored by the National Cybersecurity FFRDC, which is owned by The MITRE Corporation, with support from US-CERT and the National Cyber Security Division of the U.S. Department of Homeland Security. The official homepage of CWE is https://cwe.mitre.org|
|7||Data-flow Analysis||Data-flow analysis is a technique for gathering information about the possible set of values calculated at various points in a computer program.|
|8||Abstract Interpretation||Abstract interpretation is a theory of sound approximation of the semantics of computer programs, based on monotonic functions over ordered sets, especially lattices. It can be viewed as a partial execution of a computer program which gains information about its semantics.|
|9||Counterexample||In term of program analysis, a counterexample is a concrete execution that triggers a certain defect.|
|#||Internal Name||CWE||CWE Name||Description|
||190||Integer Overflow or Wraparound||The software performs a calculation that can produce an integer overflow or wraparound, when the logic assumes that the resulting value will always be larger than the original value. This can introduce other weaknesses when the calculation is used for resource management or execution control.|
||191||Integer Underflow (Wrap or Wraparound)||The product subtracts one value from another, such that the result is less than the minimum allowable integer value, which produces a value that is not equal to the correct result.|
||194||Unexpected Sign Extension||The software performs an operation on a number that causes it to be sign extended when it is transformed into a larger data type. When the original number is negative, this can produce unexpected values that lead to resultant weaknesses.|
||195||Signed to Unsigned Conversion Error||The software uses a signed primitive and performs a cast to an unsigned primitive, which can produce an unexpected value if the value of the signed primitive can not be represented using an unsigned primitive.|
||196||Unsigned to Signed Conversion Error||The software uses an unsigned primitive and performs a cast to a signed primitive, which can produce an unexpected value if the value of the unsigned primitive can not be represented using a signed primitive.|
||197||Numeric Truncation Error||Truncation errors occur when a primitive is cast to a primitive of a smaller size and data is lost in the conversion.|
||369||Divide By Zero||The product divides a value by zero.|
||561||Dead Code||The software contains dead code, which can never be executed.|
||571||Expression is Always True||The software contains an expression that will always evaluate to true.|
||570||Expression is Always False||The software contains an expression that will always evaluate to false.|
||563||Assignment to Variable without Use||The variable's value is assigned but never used, making it a dead store.|
||590||Free of Memory not on the Heap||The application calls
|13||415||Double Free||The product calls
|14||761||Free of Pointer not at Start of Buffer||The application calls
|15||690||Unchecked Return Value to NULL Pointer Dereference||The product does not check for an error after calling a function that can return with a NULL pointer if the function fails, which leads to a resultant NULL pointer dereference.|
|16||404||Improper Resource Shutdown or Release||The program does not release or incorrectly releases a resource before it is made available for re-use.|
||476||NULL Pointer Dereference||A NULL pointer dereference occurs when the application dereferences a pointer that it expects to be valid, but is NULL, typically causing a crash or exit.|
|18||126||Buffer Over-read||The software reads from a buffer using buffer access mechanisms such as indexes or pointers that reference memory locations after the targeted buffer.|
|19||127||Buffer Under-read||The software reads from a buffer using buffer access mechanisms such as indexes or pointers that reference memory locations prior to the targeted buffer.|
|20||416||Use After Free||Referencing memory after it has been freed can cause a program to crash, use unexpected values, or execute code.|
||121||Stack-based Buffer Overflow||A stack-based buffer overflow condition is a condition where the buffer being overwritten is allocated on the stack (i.e., is a local variable or, rarely, a parameter to a function).|
|22||122||Heap-based Buffer Overflow||A heap overflow condition is a buffer overflow, where the buffer that can be overwritten is allocated in the heap portion of memory, generally meaning that the buffer was allocated using a routine such as
|23||124||Buffer Underwrite ('Buffer Underflow')||The software writes to a buffer using an index or pointer that references a memory location prior to the beginning of the buffer.|
||401||Improper Release of Memory Before Removing Last Reference ('Memory Leak')||The software does not sufficiently track and release allocated memory after it has been used, which slowly consumes remaining memory.|
|25||775||Missing Release of File Descriptor or Handle after Effective Lifetime||The software does not release a file descriptor or handle after its effective lifetime has ended, i.e., after the file descriptor/handle is no longer needed.|
||562||Return of Stack Variable Address||A function returns the address of a stack variable, which will cause unintended program behavior, typically in the form of a crash.|
Tsmart is CWE-compatible, and the details are demonstrated in the following table.
|CWE Searchable||users may search security elements using CWE identifiers||Bug report can be filtered by CWE identifiers (ref. Review of Analysis Results)|
|CWE Output||security elements presented to users includes, or allows users to obtain, associated CWE identifiers||Each defect entry in bug report has a
|Mapping Accuracy||security elements accurately link to the appropriate CWE identifiers||The Capability of Tsmart gives the relation from defect types in Tsmart to CWE identifiers|
|CWE Documentation||capability's documentation describes CWE, CWE compatibility, and how CWE-related functionality in the capability is used||Terms introduces the concept of CWE. This section introduces CWE-compatibility and how CWE-related functionality is used in Tsmart.|
|CWE Coverage||for CWE-Compatibility and CWE-Effectiveness, the capability's documentation explicitly lists the CWE-IDs that the capability claims coverage and effectiveness against locating in software||The Capability of Tsmart gives the supported CWE identifiers by Tsmart.|
For more information on CWE-compatibility, please visit https://cwe.mitre.org/compatible/index.html
A major release of Tsmart is scheduled in December every year. The improvements in a major release mainly include: (1) resolving bugs in the previous version; (2) adding supports for some defect types uncovered in the previous version; (3) updating CWE capability mapping in terms of tool and documentation according to the update of CWE.
Users can mail to email@example.com for an evaluation copy or a technical support.
|Operating System||32-bit/64-bit Linux||64-bit Ubuntu 16.04LTS or newer|
|CPU||3GHz Intel (R) Core2 Duo||Intel Core i5(2.66GHz) or better|
|Memory||2GB for 32-bit system, 4GB for 64-bit system||more than 8GB|
|Disk Space||more than 2GB||more than 2GB|
|Java||8 or higher||http://openjdk.java.net/install/|
|Python 2||2.7 or higher||https://www.python.org/downloads/|
|Requests||2.8.14 or higher||install via pip|
|Python 3||3.5 or higher||https://www.python.org/downloads/|
|Apache Ant||1.9 or higher||https://ant.apache.org/bindownload.cgi|
|Z3||4.5.0 or higher||https://github.com/Z3Prover/z3|
Please mail to firstname.lastname@example.org for a demo version of Tsmart.
The components of Tsmart are listed as follows.
|Component||Directory||File or Folder||Functionality|
||Dependencies for Tsmart|
||Preset configurations for Tsmart|
||The main entry of Tsmart|
||The server for management of analysis results|
||Upload the analysis results and relevant source code to the server|
||The history of analysis results|
||Webpage template for BugVisualizer|
Extract the compression archive of Tsmart to somewhere. The target path should only contain A-Z, a-z, 0-9 or _ to avoid unpredictable issues.
The main entry of Tsmart is
TsmartAnalyze.jar. Users can specify the working mode, the input source code and the output location of analysis results via command line arguments. The supported arguments are listed as follows.
||Perform build-capture for the project under
||Specify the build-captured project under
||Specify a collection of source files (
||Specify the target defects via CWE identifiers
||Specify the tasks to be analyzed by task numbers in
||Specify the tasks to be analyzed by task names in
||Enumerate available analysis tasks for a compound check|
||View the version of Tsmart|
Perform static analysis on a collection of source files. The input source files could contain header files which are processed by preprocessor in Tsmart. The command for simple check is as follows.
> java -jar TsmartAnalyze.jar -manual [SRC_PATH] -cwe=[CWE_LIST] (-output=[OUTPUT_PATH])
-manual is required while
-output is optional. If
-output is not specified, the results are output to
output\ under the root location of TsmartBD module. Furthermore, users can specify multiple CWE identifiers split by commas in the
-cwe argument. For example,
-cwe=190,369 makes Tsmart to check the defects of CWE 190 (integer overflow) and CWE 369 (division-by-zero). In fact,
-cwe argument specifies preset configurations associated with CWE identifiers. If it is necessary to tweak analysis algorithm in a more fine-grained manner, users can specify a configuration file via
Perform static analysis on a C project based on
Makefile compilation system. In some cases
Makefile needs to be generated first, typically by invoking the following command in the root directory of the project.
If the input C project has not been pre-processed, build-capture is necessary in prior to analysis. The output of build-capture is stored under
.process_makefile/ in the root directory of the input project. The command for compound check with build-capture is as follows.
> java -jar TsmartAnalyze.jar -build=[BUILD_PATH] make -cwe=[CWE_LIST] -task=[TASK_NO_LIST] (-output=[OUTPUT_PATH])
The required argument
-build=[BUILD_PATH] make denotes that using
make to handle the
Makefile in the project under
-task argument is to specify the analysis task(s) to be analyzed and is optional. If the
-task argument is absent, all the analysis tasks in the build-captured project are to be analyzed. Furthermore, the
-task argument can be replaced with the
-taskName argument which specifies the names of analysis tasks to be analyzed.
Note that, performing build-capture on a build-captured project would arise runtime errors. Let
path/to/project be the path of the input project, then the following commands restore a build-captured project to its original status.
> cd path/to/project > make clean > rm -rf .process_makefile/
It is possible to build-capture a project without performing static analysis by invoking
path/to/project be the path of the input project, build-capture can be performed by invoking the following commands in order:
> cd path/to/project > ./configure > java -jar TsmartBuild.jar -shell=/bin/bash make
Next, invoke the following command to perform static analysis on a build-captured project:
> java -jar TsmartAnalyze.jar -captured=[CAPT_PATH] -cwe=[CWE_LIST] -task=[TASK_NO_LIST] (-output=[OUTPUT_PATH])
-captured is the path of the build-captured project and should be in the form of
To review analysis results, the server for management of analysis results should be started first. This is because development members can examine the potential bugs individually by sharing analysis results. On the server side, enter the
bug_visualizer directory and execute the following command:
python server.py ([port_number])
port_number denotes the port number. If
port_number is absent, port
8080 is employed by default.
[server_ip] be the IP of the server side. Then execute the following command on the machine on which the static analysis is performed to upload analysis results along with the source files to the server.
python3 upload_sh.py --name=[task_name] --build=[build_path] --captured=[capture_path] --manual=[src_list] --xml=[xml_file] --server=[server_ip] --port=[port_number]
The requirement argument
--name specifies a name for current analysis. Only one (and at least one) argument in
--manual is used and should be consistent with the corresponding argument in invoking
TsmartAnalyze.jar. For example, if we use
--captured [CAPT_PATH] in invoking
--captured [CAPT_PATH] should be used in invoking
upload_sh.py here. The argument
--xml specifies the location of analysis result
.xml file. If
-output is unspecified in invoking
TsmartAnalyze.jar, the location of analysis results is at
result.xml can be retrieved under
--port specify the IP of server side and its port, respectively.
If upload succeeds,
To view the analysis results, please visit
http://[server_ip]:[port_number]/visualizer.html on a web browser. Note that the webpage of bug visualizer is tested on the latest version of Firefox and Chrome. Compatibility on other web browsers is not guaranteed. The webpage of bug visualizer is as follows.
To adjust the analysis algorithm employed by Tsmart in a fine-grained manner, the underlying configuration files should be tweaked. Overall, Tsmart employs a 3-level configuration system.
The following snippet shows an example of top configuration file:
phase.manager.config = top.range.config phase.manager.executionType = SEQUENTIAL statistics.kind = HIERARCHICAL error.export.mode = XML function.adapters = RangeFunctionAdapter function.RangeFunctionAdapter = map/alias.map function.stopFunctions = exit, abort, error, error_at_line
phase.manager.config specifies the referred phase configuration file.
phase.manager.executionType specifies the scheduling type for phases, and
SEQUENTIAL means that phases are executed sequentially. In general,
statistics.kind is set as
HIERARCHICAL in order to support multi-entry analysis.
error.export.mode should be set as
XML in order to output
.xml files correctly for displaying analysis results in bug visualizer.
function.adapter specifies the library function support module while multiple modules are split by commas.
function.RangeFunctionAdapter specifies the function name aliases (e.g. assume that library functions
f2 have the identical operational semantics, then we only need to set
f2 as an alias of
f1 if the semantics of
f1 is implemented in an employed library function support module) for the library function support module named
function.stopFunctions specifies a list of stop functions in use. A stop function refers to a function that never returns.
The following snippet shows an example of a phase configuration file:
.SET DIR = org.sosy_lab.cpachecker.core.phase; .TES .DEF cfa .TYPE = $(DIR).CFACreatePhase parser.dialect = GNUC analysis.functionPointerCalls = false analysis.aggressivelyRemoveGlobals = true analysis.summaryEdges = true analysis.machineModel = LINUX64 .FED .DEF summary .TYPE = $(DIR).SummaryComputationPhase base = config/product/ summaries = access.properties, range.properties summary.usedExternalSummary = ACCESS_SUMMARY, RANGE_SUMMARY summary.usedInternalSummary = ACCESS_LOOP_INTERNAL, RANGE_LOOP_INTERNAL .FED .DEF basic .TYPE = $(DIR).BasicAnalysisPhase phase.analysis.type = MULTI_ENTRY phase.me.analysis = rangeAnalysis.properties phase.me.stopAfterError = false checker.weaknessForCheck = INTEGER_OVERFLOW, INTEGER_CONVERSION, DIVIDED_BY_ZERO .FED .RUN summary : cfa; basic : summary;
A phase configuration has three main sections.
.SET section specifies aliases in use. Users can create more aliases analogously. The syntax for the usage of aliases is similar with that in
.DEF section corresponds to an analysis phase.
.TYPE field specifies a name for the corresponding analysis phase and is required. The configurations inside a
.DEF section are valid only for the enclosing phase.
.RUN section describes the dependencies over the employed analysis phases. For example,
summary: cfa denotes that the phase named
summary depends on the results by the phase named
cfa. The example phase configuration contains three phases:
cfa (control-flow construction),
summary (summary computation) and
basic (concrete static analysis).
The control-flow construction phase parses the input program.
parser.dialect specifies the supported C extension.
analysis.functionPointerCalls specifies whether function pointers are parsed and modelled in the control-flow graph at compile-time.
analysis.aggressivelyRemoveGlobals specifies whether function definitions and global declarations unreachable from the current analysis entries are pruned in constructing control-flow graph.
analysis.machineModel specifies the employed data model.
LINUX64 corresponds to LP64 where the byte lengths of
long and pointer are 4, 8 and 8, respectively.
In the summary computation phase,
base is the root directory of configuration files for summary algorithms.
summaries specifies the relative paths of summary configurations to be used.
summary.usedInternalSummary specify the name of summaries to be used.
In the concrete static analysis phase,
phase.analysis.type specifies the type of analysis. In the most cases, the type of analysis is
MULTI_ENTRY which denotes the analysis algorithm that starts from specified entry function(s) rather than the execution entry
phase.me.analysis specifies the configuration for the underlying concrete analysis algorithms.
phase.me.stopAfterError specifies whether the analysis terminates once a potential defect is found. It is usually set to
true for program verification tasks.
checker.weaknessForCheck specifies the target defects (in their internal names) against which the input source code is checked. The supported defects can be referred to in The Capability of Tsmart. It is noteworthy that whether Tsmart checks the input code against the given target defect depends on the employed concrete analysis algorithm. For example, if
rangeAnalysis.properties does not contain an analysis algorithm relevant to memory safety, Tsmart cannot find memory leak defects even if
MEMORY_LEAK is specified in
The following snippet shows the contents of
analysis.algorithm.bounded = true analysis.reachedSet.hierarchical = true analysis.useConfidence = true analysis.me.static.strategy = CHEAP_COVER cpa = cpa.arg.ARGCPA ARGCPA.cpa = cpa.composite.CompositeCPA CompositeCPA.cpas = cpa.location.LocationCPA, cpa.boundary.BoundaryCPA, cpa.range.RangeCPA, cpa.pointer2.PointerCPA cpa.boundary.callDepth = 2 cpa.boundary.loopIteration = 1 cpa.boundary.loopDepth = 3 analysis.traversal.controlled.maxWaitingSize = 16 analysis.traversal.useDominationOrder = true trace.upperBound = 1000 analysis.traversal.order = BFS cfa.useMultiEdges = false solver.solver = Z3
The first two lines is necessary for multi-entry analysis.
analysis.useConfidence is an experimental feature that prioritizes the detected bugs by their confidence scores.
analysis.me.static.strategy specifies how the analysis entries are chosen, where
CHEAP_COVER denotes a heuristic approach that derives analysis entries based on the structure of the input source code. Other available strategies for choosing analysis entries include
main() as the only analysis entry),
FUNCTION_HEAD (function-wise analysis). The following 3 lines specify the employed concrete analysis algorithms.
RangeCPA is an analysis based on range abstract domain.
BoundaryCPA is an analysis that captures information of call stack and loop stack in program execution.
PointerCPA derives points-to information. The next 3 lines are the configurations relevant to boundary analysis where
loopDepth denote the maximum tracked depth of call stack, maximum tracked iterations of a loop and maximum tracked nesting loops. When the threshold is reached, the summary of relevant function or loop is applied. The remaining options serve as the fine-grained tweaks on precision-efficiency trade-off. For example,
trace.upperBound specifies the upper bound of the branches to be tracked under a branching point in control-flow graph. At the last line,
solver.solver specifies the underlying constraint solver.
Details on more options used in configuration files can be found at
doc/ConfigurationOptions.txt under the root directory of TsmartBD.
[OUTPUT_PATH]specified in invoking
-outputargument is absent, the default location of analysis results is at
output/under the root directory of TsmartBD.
TsmartAnalyze.jarprints the error message
Unrecognized CWE Number