C Checker Reference Manual

The TenDRA Documentation Team

$TenDRA: book.xml 2447 2006-03-23 21:15:51Z verm $

Extensions to this document from the original TenDRA-4.1.2-doc.tar.gz source distribution are covered by the BSDL, while all prior modifications remain under the Crown Copyright.

Berkeley Systems Design License

Redistribution and use in source (SGML DocBook) and 'compiled' forms (SGML, HTML, PDF, PostScript, RTF and so forth) with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code (SGML DocBook) must retain the above copyright notice, this list of conditions and the following disclaimer as the first lines of this file unmodified.

  2. Redistributions in compiled form (transformed to other DTDs, converted to PDF, PostScript, RTF and other formats) must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

Important

THIS DOCUMENTATION IS PROVIDED BY THE TENDRA DOCUMENTATION TEAM "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE TENDRA DOCUMENTATION TEAM BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS DOCUMENTATION, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Crown Copyright (c) 1997, 1998

This TenDRA(r) Computer Program is subject to Copyright owned by the United Kingdom Secretary of State for Defence acting through the Defence Evaluation and Research Agency (DERA). It is made available to Recipients with a royalty-free licence for its use, reproduction, transfer to other parties and amendment for any purpose not excluding product development provided that any such use et cetera shall be deemed to be acceptance of the following conditions:

  1. Its Recipients shall ensure that this Notice is reproduced upon any copies or amended versions of it;

  2. Any amended version of it shall be clearly marked to show both the nature of and the organisation responsible for the relevant amendment or amendments;

  3. Its onward transfer from a recipient to another party shall be deemed to be that party's acceptance of these conditions;

  4. DERA gives no warranty or assurance as to its quality or suitability for any purpose and DERA accepts no liability whatsoever in relation to any use to which it may be put.

This document was generated on 2006-09-07 15:15:40

Abstract

Please email us at if you see any errors


Table of Contents

Bibliography
Introduction
1. Configuring the Checker
1.1. Built-in checking profiles
1.2. Minimum integer ranges
1.3. API selection
1.4. Individual command line checking options
1.5. Construct a customised checking environment
1.6. Scoping checking profiles
2. Type checking
2.1. Type conversions
2.1.1. Integer to integer conversions
2.1.2. Pointer to integer and integer to pointer conversions
2.1.3. Pointer to pointer conversions
2.1.4. 64-bit portability issues
2.2. Function type checking
2.2.1. Type checking non-prototyped functions
2.2.2. Checking printf strings
2.2.3. Function return checking
2.3. Overriding type checking
2.3.1. Implicit Function Declarations
2.3.2. Function Parameters
2.3.3. Incompatible promoted function arguments
2.3.4. Incompatible type qualifiers
3. Integral Types
3.1. Integer promotion rules
3.2. Arithmetic operations on integer types
3.3. Interaction with the integer conversion checks
3.4. Target dependent integral types
3.4.1. Integer literals
3.4.2. Abstract API types
3.5. Integer overflow checks
3.6. Integer operator checks
3.7. Support for 64 bit integer types (long long)
4. Data Flow and Variable Analysis
4.1. Unreachable code analysis
4.2. Case fall through
4.3. Unusual flow in conditional statements
4.3.1. Empty if statements
4.3.2. Use of assignments as control expressions
4.3.3. Constant control expressions
4.4. Operator precedence
4.5. Variable analysis
4.5.1. Order of evaluation
4.5.2. Modification between sequence points
4.5.3. Operand of sizeof operator
4.5.4. Unused variables
4.5.5. Values set and not used
4.5.6. Variable which has not been set is used
4.6. Overriding the variable analysis
4.6.1. Discarding variables
4.6.2. Setting variables
4.6.3. Exhaustive switch statements
4.6.4. Non-returning functions
4.7. Discard Analysis
4.7.1. Discarded function returns
4.7.2. Discarded computed values
4.7.3. Unused static variables and procedures
4.8. Overriding the discard analysis
4.8.1. Discarding function returns and computed values
4.8.2. Preserving unused statics
5. Preprocessing checks
5.1. Preprocessor directives
5.2. Indented Preprocessing Directives
5.3. Multiple macro definitions
5.4. Macro arguments
5.5. Unmatched quotes
5.6. Include depth
5.7. Text after #endif
5.8. Text after #
5.9. New line at end of file
6. Dialect Features
6.1. Resolving linkage problems
6.2. Identifier linkage
6.3. Implicit integer types
6.4. Bitfield types
6.5. Extra type definitions
6.6. Static block level functions
6.7. Incomplete array element types
6.8. Forward enumeration declarations
6.9. Untagged compound types
6.10. External volatility
6.11. Identifier name length
6.12. Ellipsis in function calls
6.13. Conditional lvalues
6.14. Unifying the tag name space
6.15. Initialisation of compound types
6.16. Variable initialisation
6.17. Escape sequences
6.18. $ in identifier names
6.19. Writeable string literals
6.20. Concatenation of character string literals and wide character string literals
6.21. Nested comments
6.22. Empty source files
6.23. Extra commas
6.24. Extra semicolons
6.25. Compatibility with C++ to TDF producer
7. Common Errors
7.1. Enumerations controlling switch statements
7.2. Incomplete structures and unions
7.3. Variable shadowing
7.4. Floating point equality
8. Symbol Table Dump
8.1. Unused headers
8.2. Error processing
8.3. API usage analysis
8.4. Intermodular checks
9. Conditional Compilation
1. Compilation Modes
1.1. Base modes
1.2. nepc and not_ansi modes
2. Command Line Options for Portability Checking
3. Integral Type Specification
3.1. Specifying integer literal types
3.2. The Portability Table
4. Summary of the pragma statements
5. Dump format specification
5.1. Basics
5.2. Dump commands
5.3. API information
5.4. Base definitions
5.5. Error commands
5.6. File commands
5.7. Identifier commands
5.8. Scope commands
5.9. String command
5.10. Templates
5.11. Token sort information
5.12. Type information
6. The Token Syntax
6.1. Introduction
6.2. Program construction using TDF
6.3. The token syntax
6.4. Token identification
6.5. Expression tokens
6.6. Statement tokens
6.7. Type tokens
6.7.1. General type tokens
6.7.2. Integral type tokens
6.7.3. Arithmetic type tokens
6.7.4. Compound type tokens
6.7.5. Type token compatibility, definitions etc.
6.8. Selector tokens
6.9. Procedure tokens
6.9.1. General procedure tokens
6.9.2. Simple procedure tokens
6.9.3. Function procedure tokens
6.9.4. Defining procedure tokens
6.10. Tokens and APIs
7. API checking
7.1. Introduction
7.2. Specifying APIs to tcc
7.3. API Checking Examples
7.4. Redeclaring Objects in APIs
7.5. Defining Objects in APIs
7.6. Stepping Outside an API
7.7. Using the System Headers
7.8. Abstract API headers and API usage analysis
8. Specifying conversions using the token syntax
8.1. Introduction
8.2. User-defined conversions
8.3. Specifying integer promotions
8.3.1. Literal promotions
8.3.2. Computed promotions
10. Revision History

List of Tables

4.1. ISO C Rules for Operator Precedence
1.1. Base Modes
2.1. Base Modes
3.1. ISO/ANSI Minimum Requirements Portability Table (default)
3.2. 32 bit Portability table (specified by -Y32bit option)
4.1. pragma_syntax
4.2. tendra_pragma
4.3. analysis_spec
4.4. conversion_spec
4.5. discard_spec
4.6. function_pars
4.7. keyword_spec
4.8. type_spec
4.9. check_pragma
4.10. variable_pragma
4.11. separator
4.12. identifier_list
4.13. dialect_pragma
4.14. set_sign
4.15. pp_directive
4.16. pp_spec
4.17. linkage_spec
4.18. state
4.19. permit
4.20. dallow
4.21. token_pragma
4.22. token_operation
4.23. integer_pragma
4.24. lit_class_type_list
4.25. int_type_spec

List of Examples

2.1. 64-bit portability issues
2.2. An obscure type mismatch
2.3. Weak prototype checks in defined programs

Bibliography

Tdfc: The C to TDF Producer Issue 1.0 (DRA/CIS3/OSSG/TR/95/102/1.0).

The C to TDF Producer Issue 2.1.0. Copyright © June 1993.

TCheck, The TenDRA Program Checker (DRA/CIS/CSE2/TR/94/44/1.2). Copyright © November 1994.

Tcc Users Guide (DRA/CIS/CSE2/TR/94/48/1.2). Copyright © June 1994.

Implementation of ISO/IEC 9899. Copyright © 1990.

tspec - An API Specification Tool DRA/CIS/CSE2/94/48/2.1. Copyright © September 1994.

Introduction

Background

The C program static checker was originally developed as a programming tool to aid the construction of portable programs using the Application Programming Interface (API) model of software portability; the principle underlying this approach being:

If a program is written to conform to an abstract API specification, then that program will be portable to any machine which implements the API specification correctly.

The tool was designed to address the problem of the lack of separation between an API specification and an API implementation and as such was considered as a compiler for an abstract machine.

This approach gave the tool an unusually powerful basis for static checking of C programs and a large amount of development work has resulted in the production of the TenDRA C static checker (tchk). The terms, TenDRA C checker and tchk are used interchangably in this document.

The C static checker

The C static checker is a powerful and flexible tool which can perform a number of static checks on C programs, including:

  • strict interface checking. In particular, the checker can analyse programs against abstract APIs to check their conformance to the specification. Abstract versions of most standard APIs are provided with the tool; alternatively users can define their own abstract APIs using the syntax described in Annex G

  • checking of integer sizes, overflows and implicit integer conversions including potential 64-bit problems, against a 16 bit or 32 bit architecture profile

  • strict ISO C standard checking, plus configurable support for many non-ISO dialect features

  • extensive type checking, including prototype-style checking for traditionally defined functions, conversion checking, type checking on printf and scanf style argument strings and type checking between translation units

  • variable analysis, including detection of unused variables, use of uninitialised variables, dependencies on order of evaluation in expressions and detection of unused function returns, computed values and static variables

  • detection of unused header files

  • configurable tests for detecting many other common programming errors

  • complete standard API usage analysis

  • several built-in checking environments plus support for user-defined checking profiles.

About this document

This document is designed as a reference manual detailing the features of the C static checker. It contains eleven chapters (including this introduction) and nine annexes.

  • Chapter 2: Configuring the Checker describes the built-in checking modes and the design of customised environments

  • Chapters 3-8: Type Checking, Integral Types, Data Flow and Variable Analysis , Preprocessing Checks, ISO C and Other Dialects and Common Errors respectively

  • Chapter 9: The Symbol Table Dump deals with the detection of unused header files, type checking across translation units and complete standard API usage analysis

  • Chapter 10: Conditional Compilation describes the checker's approach to conditional compilation

  • Chapter 11: References lists the references used in the production of this document

  • Annex A: Checking Modes gives a description of the built-in environments

  • Annex B: Command Line Options lists the command line checking options

  • Annex C: Specifying Integral Types describes the built-in integer modes and the methods for customising them

  • Annex D: Pragma Syntax Specification

  • Annex E: Symbol Table Dump Specification

  • Annex F: Token Syntax describes the methods and syntax used to produce abstract APIs

  • Annex G: Abstract API Manipulation gives details of the ways in which TenDRA abstract APIs may be extended, combined or overriden by local declarations

  • Annex H: Specifying Conversions with Tokens

Chapter 1. Configuring the Checker

There are several methods available for configuring the checker most of which are selected by using the relevant command line option. More detailed customisation may require special #pragma statements to be incorporated into the source code to be analysed (this commonly takes the form of a startup file). The configuration options generally act independently of one another and unless explicitly forbidden in the descriptions below, they may be combined in any way.

1.1. Built-in checking profiles

Six standard checking profiles are provided with the tool and are held as a set of startup files which are automatically included in each C source file. A brief description of each profile is given below, for a complete descriptions see Annex A.

  • Xs ( strict checks ) denotes strict ISO standard C with most extra checks enabled as warnings

  • Xp ( partial checks ) denotes strict ISO standard C with some extra checks enabled

  • Xc ( conformance ) denotes strict ISO standard C with no extra checks enabled

  • Xw ( warning mode ) represents a `warning' oriented compilation mode. Many non-ISO standard C features are permitted with a warning. Extra checks are performed to produce warnings

  • Xa ( `standard-ish' C ) denotes ISO standard C with syntatic relaxation and no extra checks

  • Xt ( traditional C ) denotes traditional ( Kernighan and Ritchie ) C with no extra checks

Note

The modes Xc, Xa, and Xt are meant to roughly correspond to the modes found on some System V compilers.

The default checking environment is Xc, other environments are specified by passing the name of the environment to the checker as a command line flag, e.g. the -Xs flag specifies that the Xs environment is to be used. These environments are the base checking modes and may not be combined: if more than one base mode is specified, only the final base mode is actually used - the earlier ones are ignored.

There are also two "add-on" checking profiles, called nepc (no extra portability checks) and not_ansi, which may be used to complement any base mode. The "add-on" modes may alter the status of checks set in the base mode. The nepc mode switches off many of the checks relating to portability issues and may be specified by passing the -nepc command line option to tchk. The not_ansi mode supports a raft of non-ISO features and is specified using the -not_ansi command line flag.

1.2. Minimum integer ranges

By default the checker assumes that all integer ranges conform to the minimum ranges prescribed by the ISO C standard, i.e. char contain at least 8 bits, short and int contain at least 16 bits and long contains at least 32 bits. If the -Y32bit flag is passed to the checker it assumes that integers conform to the minimum ranges commonly found on most 32 bit machines, i.e. int contains at least 32 bits and int is strictly larger than short so that the integral promotion of unsigned short is int under the ISO C standard integer promotion rules.

1.3. API selection

By default, programs are checked against the standard ISO C API as specified in the ISO C standard Chapter 7. Other APIs are specified by passing the -Yapi-name flag to the tchk, where api-name is one of the API designators listed below. APIs fall into two categories: base APIs and extension APIs. If more than one base API is specified to tchk, only the last one is used for checking; the others are ignored. Additional extension APIs, however, may be used in addition to any suitable base API.

The base APIs available are:

  • ansi ANSI X3.159;

  • iso ISO MSE 9899:1990(Amendment 1:1993(E));

  • posix POSIX 1003.1;

  • posix2 POSIX 1003.2;

  • xpg3 X/Open Portability Guide 3;

  • xpg4 X/Open Portability Guide 4;

  • cose COSE 1170;

  • svid3 System V Interface Definition 3rd Edition;

  • aes AES Revision A;

  • system System headers as main API.

and the extension APIs are:

  • bsd_extn BSD-like extension for use with POSIX etc.;

  • x5_lib X11 (Release 5) Library;

  • x5_t X11 (Release 5) Intrinsics Toolkit;

  • x5_mu X11 (Release 5) Miscellaneous Utilities;

  • x5_aw X11 (Release 5) Athena Widgets;

  • x5_mit X11 (Release 5) MIT Implementation;

  • x5_proto X11 (Release 5) Protocol Extension;

  • x5_ext X11 (Release 5) Extensions;

  • x5_private X11 (Release 5) private headers (otherwise protected );

  • motif Motif 1.1;

system+ System headers as last resort API. Search the system headers only for those objects for which no declaration or definition can be found within the base API.

1.4. Individual command line checking options

Some of the checks available can be controlled using a command line option of the form -Xopt,opt,..., where the various opt options give a comma-separated list of commands. These commands have the form test=status, where test is the name of the check, and status is either check (apply check and give an error if it fails), warn (apply check and give a warning if it fails) or dont (do not apply check). The names of checks can be found with their descriptions in Chapter 3, Integral Types to Chapter 8, Symbol Table Dump ; for example the check for implicit function declarations described in Section 2.3.1, “Implicit Function Declarations” may be switched on using -X:implicit_func=check.

1.5. Construct a customised checking environment

The individual checks performed by the C static checker are generally controlled by #pragma directives. The reason for this is that the ISO standard places no restrictions on the syntax following a #pragma preprocessing directive, and most compilers/checkers can be configured to ignore any unknown #pragma directives they encounter.

Most of these directives begin:

   #pragma TenDRA ...

and are always checked for syntactical correctness. The individual directives, together with the checks they control are described in Chapters 3 - 8. Section 2.2 describes the method of constructing a new checking profile from these individual checks.

1.6. Scoping checking profiles

Almost all the available checks are scoped (exceptions will be mentioned in the description of the check). A new checking scope may be started by inserting the pragma:

   #pragma TenDRA begin

at the outermost level. The scope runs until the matching:

   #pragma TenDRA end

directive, or to the end of the translation unit (the ISO C standard definition of a translation unit as being a source file, together with any headers or source files included using the #include preprocessing directive, less any source lines skipped by any of the conditional inclusion preprocessing directives, is used throughout this document).

Checking scopes may be nested in the obvious way.

Each new checking scope inherits its initial set of checks from the checking scope which immediately contains it (this includes the implicit main checking scope consisting of the entire source file). Any checks switched on or off within the scope apply only to that scope and any scope it contains. The set of checks applied reverts to its previous state at the end of a scope. Thus, for example:

   #pragma TenDRA variable analysis on
/* Variable analysis is on here */
#pragma TenDRA begin
#pragma TenDRA variable analysis off
/* Variable analysis is off here */
#pragma TenDRA end
/* Variable analysis is on again here */

Once a check has been set any attempt to change its status within the same scope is flagged as an error. If checks need to be switched on and off in the same source file, they must be properly scoped. The built-in compilation modes have the entire source file as their scope.

The method of applying different checking profiles to different parts of a program clearly needs to take into account those properties of C which can circumvent such scoping. Consider for example:

   #pragma TenDRA begin
#pragma TenDRA unknown escape allow
#define STRING "hello\!"
#pragma TenDRA end
char * f () {
    return ( STRING ) ;
}

The macro STRING is defined in an area where unknown escape sequences, such as \!, are allowed, but it is expanded in an area where they are not allowed (this is the default setting). The conventional approach to macro expansion would lead to the unknown escape sequence being flagged as an error, even though the user probably intended to avoid this. The checker therefore expands all macros using the checking profile in which they were defined, rather than the current checking scope.

The directives describing the user's desired checking profile could be included directly in the program itself, ideally in some configuration file which is #include'd in all source files. It is however perhaps more appropriate to store the directives as a startup file, file say, which is passed to the checker using the -ffilecommand line option. It should be noted that user-defined compilation modes are defined on top of a built-in mode base (normally Xc, the default mode). It is therefore important to scope the new checking profile as described above.

Names may be associated with checking scopes by using an alternative form of the begin directive:

   #pragma TenDRA begin name environment identifier

where identifier is any valid C identifier. Thereafter a statement of the form:

   #pragma TenDRA use environment identifier

changes the current checking environment to the environment associated with identifier.

Sometimes it may be desirable to use different checking profiles for different parts of a translation unit, e.g. applying less strict checks to any system headers which may be included. The checker can be configured to apply a named checking scope, env_name, to any files included from a directory which has been named dir_name, using:

   #pragma TenDRA directory dir_name use environment env_name

The directory name must be passed to the checker using the -Ndir_name :dircommand line option. This is equivalent to the usual -Idir option for specifying include paths, except that it also attaches the name dir_name to the directory.

Chapter 2. Type checking

Type checking is relevant to two main areas of C. It ensures that all declarations referring to the same object are consistent (clearly a pre-requisite for a well-defined program). It is also the key to determining when an undefined or unexpected value has been produced due to the type conversions which arise from certain operations in C. Conversions may be explicit (conversion is specified by a cast) or implicit. Generally explicit conversions may be regarded more leniently since the programmer was obviously aware of the conversion, whereas the implications of an implicit conversion may not have been considered.

2.1. Type conversions

The only types which may be interconverted legally are integral types, floating point types and pointer types. Even if these rules are observed, the results of some conversions can be surprising and may vary on different machines. The checker can detect three categories of conversion: integer to integer conversions, pointer to integer and integer to pointer conversions, and pointer to pointer conversions.

In the default mode, the checker allows all integer to integer conversions, explicit integer to pointer and pointer to integer conversions and the explicit pointer to pointer conversions defined by the ISO C standard (all conversions between pointers to function types and other pointers are undefined according to the ISO C standard).

Checks to detect these conversions are controlled by the pragma:

#pragma TenDRA conversion analysis status

Unless explicitly stated to the contrary, throughout the rest of the document where status appears in a pragma statement it represents one of on (enable the check and produce errors), warning (enable the check but produce only warnings), or off (disable the check). Here status may be on to give an error if a conversion is detected, warning to produce a warning if a conversion is detected, or off to switch the checks off. The checks may also be controlled using the command line option-X:test=state where test is one of convert_all, convert_int, convert_int_explicit, convert_int_implicit, convert_int_ptr and convert_ptr and state is check,warn or dont.

Due to the serious nature of implicit pointer to integer, implicit pointer to pointer conversions and undefined explicit pointer to pointer conversions, such conversions are flagged as errors by default. These conversion checks are not controlled by the global conversion analysis pragma above, but must be controlled by the relevant individual pragmas given in sections Section 2.1.2, “Pointer to integer and integer to pointer conversions” and Section 2.1.3, “Pointer to pointer conversions”

2.1.1. Integer to integer conversions

All integer to integer conversions are allowed in C, however some can result in a loss of accuracy and so may be usefully detected. For example, conversions from int to long never result in a loss of accuracy, but conversions from long to int may. The detection of these shortening conversions is controlled by:

#pragma TenDRA conversion analysis ( int-int ) status

Checks on explicit conversions and implicit conversions may be controlled independently using:

#pragma TenDRA conversion analysis ( int-int explicit ) status

and

#pragma TenDRA conversion analysis ( int-int implicit ) status

Objects of enumerated type are specified by the ISO C standard to be compatible with an implementation-defined integer type. However assigning a value of a different integral type other then an appropriate enumeration constant to an object of enumeration type is not really in keeping with the spirit of enumerations. The check to detect the implicit integer to enum type conversions which arise from such assignments is controlled using:

#pragma TenDRA conversion analysis ( int-enum implicit ) status

Note that only implicit conversions are flagged; if the conversion is made explicit, by using a cast, no errors are raised.

As usual status must be replaced by on, warning or off in all the pragmas listed above.

The interaction of the integer conversion checks with the integer promotion and arithmetic rules is an extremely complex issue which is further discussed in Chapter 4.

2.1.2. Pointer to integer and integer to pointer conversions

Integer to pointer and pointer to integer conversions are generally unportable and should always be specified by means of an explicit cast. The exception is that the integer zero and null pointers are deemed to be inter-convertible. As in the integer to integer conversion case, explicit and implicit pointer to integer and integer to pointer conversions may be controlled separately using:

#pragma TenDRA conversion analysis ( int-pointer explicit ) status

and

#pragma TenDRA conversion analysis ( int-pointer implicit ) status

or both checks may be controlled together by:

#pragma TenDRA conversion analysis ( int-pointer ) status

where status may be on, warning or off as before and pointer-int may be substituted for int-pointer.

2.1.3. Pointer to pointer conversions

A ccording to the ISO C standard, section 6.3.4, the only legal pointer to pointer conversions are explicit conversions between:

a pointer to an object or incomplete type and a pointer to a different object or incomplete type. The resulting pointer may not be valid if it is improperly aligned for the type pointed to;

a pointer to a function of one type and a pointer to a function of another type. If a converted pointer, used to call a function, has a type that is incompatible with the type of the called function, the behaviour is undefined.

Except for conversions to and from the generic pointer which are discussed below, all other conversions, including implicit pointer to pointer conversions, are extremely unportable.

All pointer to pointer conversion may be flagged as errors using:

#pragma TenDRA conversion analysis ( pointer-pointer ) status

Explicit and implicit pointer to pointer conversions may be controlled separately using:

#pragma TenDRA conversion analysis ( pointer-pointer explicit ) status

and

#pragma TenDRA conversion analysis ( pointer-pointer implicit ) status

where, as before, status may be on, warning or off.

Conversion between a pointer to a function type and a pointer to a non-function type is undefined by the ISO C standard and should generally be avoided. The checker can however be configured to treat function pointers as object pointers for conversion using:

#pragma TenDRA function pointer as pointer permit

Unless explicitly stated to the contrary, throughout the rest of the document where permit appears in a pragma statement it represents one of allow (allow the construct and do not produce errors), warning (allow the construct but produce warnings when it is detected), or disallow (produce errors if the construct is detected) Here there are three options for permit: allow (do not produce errors or warnings for function pointer <-> pointer conversions); warning (produce a warning when function pointer <-> pointer conversions are detected); or disallow (produce an error for function pointer <-> pointer conversions).

The generic pointer, void *, is a special case. All conversions of pointers to object or incomplete types to or from a generic pointer are allowed. Some older dialects of C used char * as a generic pointer. This dialect feature may be allowed, allowed with a warning, or disallowed using the pragma:

#pragma TenDRA compatible type : char * == void * permit

where permit is allow, warning or disallow as before.

2.1.4. 64-bit portability issues

Example 2.1. 64-bit portability issues

64-bit machines form the "next frontier" of program portability. Most of the problems involved in 64-bit portability are type conversion problems. The assumptions that were safe on a 32-bit machine are not necessarily true on a 64-bit machine - int may not be the same size as long, pointers may not be the same size as int, and so on. This example illustrates the way in which the checker's conversion analysis tests can detect potential 64-bit portability problems.

Consider the following code:

#include <stdio.h>
void print ( string, offset, scale )
char *string;
unsigned int offset;
int scale;
{
  string += ( scale * offset );
  ( void ) puts ( string );
  return;
}

int main ()
{
  char *s = "hello there";
  print ( s + 4, 2U, -2 );
  return ( 0 );
}

This appears to be fairly simple - the offset of 2U scaled by -2 cancels out the offset in s + 4, so the program just prints "hello there". Indeed, this is what happens on most machines. When ported to a particular 64-bit machine, however, it core dumps. The fairly subtle reason is that the composite offset, scale * offset, is actually calculated as an unsigned int by the ISO C arithmetic conversion rules. So the answer is not -4. Strictly speaking it is undefined, but on virtually all machines it will be UINT_MAX - 3. The fact that adding this offset to string is equivalent to adding -4 is only true on machines on which pointers have the same size as unsigned int. If a pointer contains 64 bits and an unsigned int contains 32 bits, the result is 232 bytes out.

So the error occurs because of the failure to spot that the offset being added to string is unsigned. All mixed integer type arithmetic involves some argument conversion. In the case above, scale is converted to an unsigned int and that is multiplied by offset to give an unsigned int result. If the implicit int->int conversion checks (Section 2.1.1, “Integer to integer conversions” ) are enabled, this conversion is detected and the problem may be avoided.


2.2. Function type checking

The importance of function type checking in C lies in the conversions which can result from type mismatches between the arguments in a function call and the parameter types assumed by its definition or between the specified type of the function return and the values returned within the function definition. Until the introduction of function prototypes into ISO standard C, there was little scope for detecting the correct typing of functions. Traditional C allows for absolutely no type checking of function arguments, so that totally bizarre functions, such as:

int f ( n ) int n ; {
  return ( f ( "hello", "there" ) ) ;
}

are allowed, although their effect is undefined. However, the move to fully prototyped programs has been relatively slow. This is partially due to an understandable reluctance to change existing, working programs, but the desire to maintain compatibility with existing C compilers, some of which still do not support prototypes, is also a powerful factor. Prototypes are allowed in the checker's default mode but tchk can be configured to allow, allow with a warning or disallow prototypes, using:

#pragma TenDRA prototype permit

where permit is allow, disallow or warning.

Even if prototypes are not supported the checker has a facility, described below, for detecting incorrectly typed functions.

2.2.1. Type checking non-prototyped functions

The checker offers a method for applying prototype-like checks to traditionally defined functions, by introducing the concept of " weak" prototypes. A weak prototype contains function parameter type information, but has none of the automatic argument conversions associated with a normal prototype. Instead weak prototypes imply the usual argument promotion passing rules for non-prototyped functions. The type information required for a weak prototype can be obtained in three ways:

  1. A weak prototype may be declared using the syntax:

    int f WEAK ( char, char * ) ;
    

    where WEAK represents any keyword which has been introduced using:

    #pragma TenDRA keyword WEAK for weak
    

    An alternative definition of the keyword must be provided for other compilers. For example, the following definition would make system compilers interpret weak prototypes as normal (strong) prototypes:

    #ifdef __TenDRA__
    #pragma TenDRA keyword WEAK for weak
    #else
    #define WEAK
    #endif
    

    The difference between conventional prototypes and weak prototypes can be illustrated by considering the normal prototype for f:

    int f (char,char *);
    

    When the prototype is present, the first argument to f would be passed as a char. Using the weak prototype, however, results in the first argument being passed as the integral promotion of char, that is to say, as an int.

    There is one limitation on the declaration of weak prototypes - declarations of the form:

    int f WEAK() ;
    

    are not allowed. If a function has no arguments, this should be stated explicitly as:

    int f WEAK( void ) ;
    

    whereas if the argument list is not specified, weak prototypes should be avoided and a traditional declaration used instead:

    extern int f ();
    

    The checker may be configured to allow, allow with a warning or disallow weak prototype declarations using:

    #pragma TenDRA prototype ( weak ) permit
    

    where permit is replaced by allow, warning or disallow as appropriate. Weak prototypes are not permitted in the default mode.

  2. Information can be deduced from a function definition. For example, the function definition:

    int f(c,s) char c; char *s;{...}
    

    is said to have weak prototype:

    int f WEAK (char,char *);
    

    The checker automatically constructs a weak prototype for each traditional function definition it encounters and if the weak prototype analysis mode is enabled (see below) all subsequent calls of the function are checked against this weak prototype.

    For example, in the bizarre function in Section 2.2, “Function type checking” , the weak prototype:

    int f WEAK ( int );
    

    is constructed for f. The subsequent call to f:

    f ( "hello", "there" );
    

    is then rejected by comparison with this weak prototype - not only is f called with the wrong number of arguments, but the first argument has a type incompatible with (the integral promotion of) int.

  3. Information may be deduced from the calls of a function. For example, in:

    extern void f ();
    void g ()
    {
      f ( 3 );
      f ( "hello" );
    }
    

    we can infer from the first call of f that f takes one integral argument. We cannot deduce the type of this argument, only that it is an integral type whose promotion is int (since this is how the argument is passed). We can therefore infer a partial weak prototype for f:

    void f WEAK ( t );
    

    for some integral type t which promotes to int. Similarly, from the second call of f we can infer the weak prototype:

    void f WEAK ( char * );
    

    (the argument passing rules are much simpler in this case). Clearly the two inferred prototypes are incompatible, so an error is raised. Note that prototype inferred from function calls alone cannot ensure that the uses of the function within a source file are correct, merely that they are consistent. The presence of an explicit function declaration or definition is required for a definitive "right" prototype.

    Null pointers cause particular problems with weak prototypes inferred from function calls. For example, in:

    #include <stdio.h>
    extern void f ();
    void g () {
      f ( "hello" );
      f( NULL );
    }
    

    the argument in the first call of f is char* whereas in the second it is int (because NULL is defined to be 0). Whereas NULL can be converted to char*, it is not necessarily passed to procedures in the same way (for example, it may be that pointers have 64 bits and ints have 32 bits). It is almost always necessary to cast NULL to the appropriate pointer type in weak procedure calls.

Functions for which explicitly declared weak prototypes are provided are always type-checked by the checker. Weak prototypes deduced from function declarations or calls are used for type checking if the weak prototype analysis mode is enabled using:

#pragma TenDRA weak prototype analysis status

where status is one of on, warning and off as usual. Weak prototype analysis is not performed in the default mode.

There is also an equivalent command line option of the form -X:weak_proto= state, where state can be check, warn or dont.

This section ends with two examples which demonstrate some of the less obvious consequences of weak prototype analysis.

Example 2.2. An obscure type mismatch

As stated above, the promotion and conversion rules for weak prototypes are precisely those for traditionally declared and defined functions. Consider the program:

void f ( n )long n;{
  printf ( "%ld\n", n );
}
void g (){
  f ( 3 );
}

The literal constant 3 is an int and hence is passed as such to f. f is however expecting a long, which can lead to problems on some machines. Introducing a strong prototype declaration of f for those compilers which understand them:

#ifdef __STDC__
  void f ( long );
#endif

will produce correct code - the arguments to a function declared with a prototype are converted to the appropriate types, so that the literal is actually passed as 3L. This solves the problem for compilers which understand prototypes, but does not actually detect the underlying error. Weak prototypes, because they use the traditional argument passing rules, do detect the error. The constructed weak prototype:

void f WEAK ( long );

conveys the type information that f is expecting a long, but accepts the function arguments as passed rather than converting them. Hence, the error of passing an int argument to a function expecting a long is detected.

Many programs, seeking to have prototype checks while preserving compilability with non-prototype compilers, adopt a compromise approach of traditional definitions plus prototype declarations for those compilers which understand them, as in the example above. While this ensures correct argument passing in the prototype case, as the example shows it may obscure errors in the non-prototype case.


Example 2.3. Weak prototype checks in defined programs

In most cases a program which fails to compile with the weak prototype analysis enabled is undefined. ISO standard C does however contain an anomalous rule on equivalence of representation. For example, in:

extern void f ();
void g () {
  f ( 3 );
  f ( 4U );
}

the TenDRA checker detects an error - in one instance f is being passed an int, whereas in the other it is being passed an unsigned int. However, the ISO C standard states that, for values which fit into both types, the representation of a number as an int is equal to that as an unsigned int, and that values with the same representation are interchangeable in procedure arguments. Thus the program is defined. The justification for raising an error or warning for this program is that the prototype analysis is based on types, not some weaker notion of "equivalence of representation". The program may be defined, but it is not type correct.

Another case in which a program is defined, but not correct, is where an unnecessary extra argument is passed to a function. For example, in:

void f ( a ) int a; {
  printf ( "%d\n", a );
}
void g () {
  f ( 3, 4 );
}

the call of f is defined, but is almost certainly a mistake.


2.2.2. Checking printf strings

Normally functions which take a variable number of arguments offer only limited scope for type checking. For example, given the prototype:

int execl ( const char *, const char *, ... );

the first two arguments may be checked, but we have no hold on any subsequent arguments (in fact in this example they should all be const char *, but C does not allow this information to be expressed). Two classes of functions of this form, namely the printf and scanf families, are so common that they warrant special treatment. If one of these functions is called with a constant format string, then it is possible to use this string to deduce the types of the extra arguments that it is expect ing. For example, in:

printf ( "%ld", 4 );

the format string indicates that printf is expecting a single additional argument of type long. We can therefore deduce a quasi-prototype which this particular call to printf should conform to, namely:

int printf ( const char *,long );

In fact this is a mixture of a strong prototype and a weak prototype. The first argument comes from the actual prototype of printf, and hence is strong. All subsequent arguments correspond to the ellipsis part of the printf prototype, and are passed by the normal promotion rules. Hence the long component of the inferred prototype is weak (see 3.3.1). This means that the error in the call to printf - the integer literal is passed as an int when a long is expected - is detected.

In order for this check to take place, the function declaration needs to tell the checker that the function is like printf. This is done by introducing a special type, PSTRING say, to stand for a printf string, using:

#pragma TenDRA type PSTRING for ... printf

For most purposes this is equivalent to:

typedef const char *PSTRING;

except that when a function declaration:

int f ( PSTRING, ... );

is encountered the checker knows to deduce the types of the arguments corresponding to the ... from the PSTRING argument (the precise rules it applies are those set out in the XPG4 definition of fprintf). If this mechanism is used to apply printf style checks to user defined functions, an alternative definition of PSTRING for conventional compilers must be provided. For example:

#ifdef __TenDRA__
#pragma TenDRA type PSTRING for ... printf
#else
typedef const char *PSTRING;
#endif

There are similar rules with scanf in place of printf.

The TenDRA descriptions of the standard APIs use this mechanism to describe those functions, namely printf, fprintf and sprintf, and scanf, fscanf and sscanf which are of these forms. This means that the checks are switched on for these functions by default. However, these descriptions are under the control of a macro, __NO_PRINTF_CHECKS, which, if defined before stdio.h is included, effectively switches the checks off. This macro is defined in the start-up files for certain checking modes, so that the checks are disabled in these modes (see chapter 2). The checks can be enabled in these cases by #undef'ing the macro before including stdio.h. There are equivalent command-line options to tchk of the form -X:printf=state, where state can be check or dont, which respectively undefine and define this macro.

2.2.3. Function return checking

Function returns normally present no difficulties. The return value is converted, as if by assignment, to the function return type, so that the problem is essentially one of type conversion (see 3.2). There is however one anomalous case. A plain return statement, without a return value, is allowed in functions returning a non-void type, the value returned being undefined. For example, in:

int f ( int c )
{
  if ( c ) return ( 1 );
  return;
}

the value returned when c is zero is undefined. The test for detecting such void returns is controlled by:

#pragma TenDRA incompatible void return permit

where permit may be allow, warning or disallow as usual.

There are also equivalent command line options to tchk of the form -X:void_ret=state, where state can be check, warn or dont. Incompatible void returns are allowed in the default mode and of course, plain return statements in functions returning void are always legal.

This check also detects functions which do not contain a return statement, but fall out of the bottom of the function as in:

int f ( int c )
{
  if ( c ) return ( 1 );
}

Occasionally it may be the case that such a function is legal, because the end of the function is not reached. Unreachable code is discussed in section Section 4.1, “Unreachable code analysis” .

2.3. Overriding type checking

There are several commonly used features of C, some of which are even allowed by the ISO C standard, which can circumvent or hinder the type-checking of a program. The checker may be configured either to enforce the absence of these features or to support them with or without a warning, as described below.

2.3.1. Implicit Function Declarations

The ISO C standard states that any undeclared function is implicitly assumed to return int. For example, in ISO C:

int f ( int c ) {
  return ( g( c )+1 );
}

the undeclared function g is inferred to have a declaration:

extern int g ();

This can potentially lead to program errors. The definition of f would be valid if g actually returned double, but incorrect code would be produced. Again, an explicit declaration might give us more information about the function argument types, allowing more checks to be applied.

Therefore the best chance of detecting bugs in a program and ensuring its portability comes from having each function declared before it is used. This means detecting implicit declarations and replacing them by explicit declarations. By default implicit function declarations are allowed, however the pragma:

#pragma TenDRA implicit function declaration status

may be used to determine how tchk handles implicit function declarations. Status is replaced by on to allow implicit declarations, warning to allow implicit declarations but to produce a warning when they occur, or off to prevent implicit declarations and raise an error where they would normally be used.

(There are also equivalent command-line options to tcc of the form -X:implicit_func=state, where state can be check, warn or dont.)

This test assumes an added significance in API checking. If a programmer wishes to check that a certain program uses nothing outside the POSIX API, then implicitly declared functions are a potential danger area. A function from outside POSIX could be used without being detected because it has been implicitly declared. Therefore, the detection of implicitly declared functions is vital to rigorous API checking.

2.3.2. Function Parameters

Many systems pass function arguments of differing types in the same way and programs are sometimes written to take advantage of this feature. The checker has a number of options to resolve type mismatches which may arise in this way and would otherwise be flagged as errors:

  1. Type-type compatibility.  When comparing function prototypes for compatibility, the function parameter types must be compared. If the parameter types would otherwise be incompatible, they are treated as compatible if they have previously been introduced with a type-type param ter compatibility pragma i.e.

    #pragma TenDRA argument type-name as type-name
    

    where type-name is the name of any type. This pragma is transitive and the second type in the pragma is taken to be the final type of the parameter.

  2. Type-ellipsis compatibility.  Two function prototypes with different numbers of arguments are compatible if:

    • both prototypes have an ellipsis

    • each parameter type common to both prototypes is compatible

    • each extra parameter type in the prototype with more parameters, is either specified in a type-ellipsis compatibility pragma or is type-type compatible (see above) to a type that is specified in a type-ellipsis compatibility.

    Type-ellipsis compatibility is introduced using the pragma:

    #pragma TenDRA argument type-name as ...
    

    where again type-name is the name of any type.

  3. Ellipsis compatibility.  If, when comparing two function prototypes for compatibility, one has an ellipsis and the other does not, but otherwise the two types would be compatible, then if an `extra' ellipsis is allowed, the types are treated as compatible. The pragma controlling ellipsis compatibility is:

    #pragma TenDRA extra ... permit
    

    where permit may be allow, disallow or warning as usual.

2.3.3. Incompatible promoted function arguments

Mixing the use of prototypes with old-fashioned function definitions can result in incorrect code. For example, in the program below the function argument promotion rules are applied to the definition of f, making it incompatible with the earlier prototype (a is converted to the integer promotion of char, i.e. int).

int f(char);
int f(a)char a;{
...
}

An incompatible type error is raised in the default checking mode. The check for incompatible types which arise from mixtures of prototyped and non-prototyped function declarations and definitions is controlled using:

#pragma TenDRA incompatible promoted function argument permit

Permit may be replaced by allow, warning or disallow as normal. The parameter type in the resulting function type is the promoted parameter type.

2.3.4. Incompatible type qualifiers

The declarations

const int a;
int a;

are not compatible according to the ISO C standard because the qualifier, const, is present in one declaration but not in the other. Similar rules hold for volatile qualified types. By default, tchk produces an error when declarations of the same object contain different type qualifiers. The check is controlled using:

#pragma TenDRA incompatible type qualifier permit

where the options for permit are allow, disallow or warning.

Chapter 3. Integral Types

The checks described in the previous chapter involved the detection of conversions which could result in undefined values. Certain conversions involving integral types, however, are defined in the ISO C standard and so might be considered safe and unlikely to cause problems. This unfortunately is not the case: some of these conversions may still result in a change in value; the actual size of each integral type is implementation-dependent; and the "old-style" integer conversion rules which predate the ISO standard are still in common use. The checker provides support for both ISO and traditional integer promotion rules. The set of rules used may be specified independently of the two integral range scenarios, 16 bit(default) and 32 bit, described in section Section 1.2, “Minimum integer ranges”

The means of specifying and alternative sets of promotion rules, their interaction with the conversion checks described in Section 2.1, “Type conversions” and the additional checks which may be performed on integers and integer operations are described in the remainder of this chapter.

3.1. Integer promotion rules

The ISO C standard rules may be summarised as follows: long integral types promote to themselves; other integral types promote to whichever of int or unsigned int they fit into. In full the promotions are:

  • char -> int

  • signed char -> int

  • unsigned char -> int

  • short -> int

  • unsigned short -> int or unsigned int

  • int -> int

  • unsigned int -> unsigned int

  • long -> long

  • unsigned long -> unsigned long

Note

Even with these simple built-in types, there is a degree of uncertainty, namely concerning the promotion of unsigned short. On most machines, int is strictly larger than short so the promotion of unsigned short is int. However, it is possible for short and int to have the same size, in which case the promotion is unsigned int. When using the ISO C promotion rules, the checker usually avoids making assumptions about the implementation by treating the promotion of unsigned short as an abstract integral type. If, however, the -Y32bit option is specified, int is assumed to be strictly larger than short, and unsigned short promotes to int.

The traditional C integer promotion rules are often referred to as the signed promotion rules. Under these rules, long integral types promote to themselves, as in ISO C, but the other integral types promote to unsigned int if they are qualified by unsigned, and int otherwise. Thus the signed promotion rules may be represented as follows:

  • char -> int

  • signed char -> int

  • unsigned char -> unsigned int

  • short -> int

  • unsigned short -> unsigned int

  • int -> int

  • unsigned int -> unsigned int

  • long -> long

  • unsigned long -> unsigned long

The traditional promotion rules are applied in the Xt built-in environment only. All of the other built-in environments specify the ISO C promotion rules. Users may also specify their own rules for integer promotions and minimum integer ranges; the methods for doing this are described in Annex H.

3.2. Arithmetic operations on integer types

The ISO C standard rules for calculating the type of an arithmetic operation involving two integer types is as follows - work out the integer promotions of the types of the two operands, then:

  • If either promoted type is unsigned long, the result type is unsigned long;

  • Otherwise, if one promoted type is long and the other is unsigned int, then if a long int can represent all values of an unsigned int, the result type is long; otherwise the result type is unsigned long;

  • Otherwise, if either promoted type is long, the result type is long;

  • Otherwise, if either promoted type is unsigned int, the result type is unsigned int;

  • Otherwise the result type is int.

Both promoted values are converted to the result type, and the operation is then applied.

3.3. Interaction with the integer conversion checks

A simple-minded implementation of the integer conversion checks described in 3.2 would interact badly with these rules. Consider, for example, adding two values of type char:

char f ( char a, char b )
{
  char c = a + b ;
  return ( c ) ;
}

The various stages in the calculation of c are as follows - a and b are converted to their promotion type, int, added together to give an int result, which is converted to a char and assigned to c. The conversions of a and b from char to int are always safe, and so present no difficulties to the integer conversion checks. The conversion of the result from int to char, however, is precisely the type of value destroying conversion which these checks are designed to detect.

Obviously, an integer conversion check which flagged all char arithmetic would never be used, thereby losing the potential to detect many subtle portability errors. For this reason, the integer conversion checks are more sophisticated. In all typed languages, the type is used for two purposes - for static type checking and for expressing information about the actual representation of data on the target machine. Essentially it is a confusion between these two roles which leads to the problems above. The C promotion and arithmetic rules are concerned with how data is represented and manipulated, rather than the underlying abstract types of this data. When a and b are promoted to int prior to being added together, this is only a change in representation; at the conceptual level they are still char's. Again, when they are added, the result may be represented as an int, but conceptually it is a char. Thus the assignment to c, an actual char, is just a change in representation, not a change in conceptual type.

So each expression may be regarded as having two types - a conceptual type which stands for what the expression means, and a representational type which stands for how the expression is to represented as data on the target machine. In the vast majority of expressions, these types coincide, however the integral promotion and arithmetic conversions are changes of representational, not conceptual, types. The integer conversion checks are concerned with detecting changes of conceptual type, since it is these which are most likely to be due to actual programming errors.

It is possible to define integral types within the TenDRA extensions to C in which the split between concept and representation is made explicit. The pragma:

#pragma TenDRA keyword TYPE for type representation

may be used to introduce a keyword TYPE for this purpose (as with all such pragmas, the precise keyword to be used is left to the user). Once this has been done, TYPE ( r, t ) may be used to represent a type which is conceptually of type t but is represented as data like type r. Both t and r must be integral types. For example:

TYPE ( int, char ) a ;

declares a variable a which is represented as an int, but is conceptually a char.

In order to maintain compatibility with other compilers, it is necessary to give TYPE a sensible alternative definition. For all but conversion checking purposes, TYPE ( r, t ) is identical to r, so a suitable definition is:

#ifdef __TenDRA__
#pragma TenDRA keyword TYPE for type representation
#else
#define TYPE( r, t ) r
#endif

3.4. Target dependent integral types

Since the checker uses only information about the minimum guaranteed ranges of integral types, integer values for which the actual type of the values is unknown may arise. Integer values of undetermined type generally arise in one of two ways: through the use of integer literals and from API types which are not completely specified.

3.4.1. Integer literals

The ISO C rules on the type of integer literals are set out as follows. For each class of integer literals a list of types is given. The type of an integer literal is then the first type in the appropriate list which is large enough to contain the value of the integer literal. The class of the integer literal depends on whether it is decimal, hexadecimal or octal, and whether it is qualified by U (or u) or L (or l) or both. The rules may be summarised as follows:

  • decimal -> int or long or unsigned long

  • hex or octal -> int or unsigned int or long or unsigned long

  • any + U -> unsigned int or unsigned long

  • any + L -> long or unsigned long

  • any + UL -> unsigned long

These rules are applied in all the built-in checking modes except Xt. Traditional C does not have the U and L qualifiers, so if the Xt mode is used, these qualifiers are ignored and all integer literals are treated as int, long or unsigned long, depending on the size of the number.

If a number fits into the minimal range for the first type of the appropriate list, then it is of that type; otherwise its type is undetermined and is said to be target dependent. The checker treats target dependent types as abstract integral types which may lead to integer conversion problems. For example, in:

int f ( int n ) {
  return ( n & 0xff00 ) ;
}

the type of 0xff00 is target dependent, since it does not fit into the minimal range for int specified by the ISO C standard (this is detected by the integer overflow analysis described in section 4.6). The arithmetic conversions resulting from the & operation is detected by the checker's conversion analysis. Note that if the -Y32bit option is specified to tchk, an int is assumed to contain at least 32 bits. In this case, 0xff00 fits into the type int, and so this is the type of the integer literal. No invalid integer conversions is then detected.

3.4.2. Abstract API types

Target dependent integral types also occur in API specifications and may be encountered when checking against one of the implementation-independent APIs provided with the checker. The commonest example of this is size_t, which is stated by the ISO C standard to be a target dependent unsigned integral type, and which arises naturally within the language as the type of a sizeof expression.

The checker has its own internal version of size_t, wchar_t and ptrdiff_t for evaluating static compile-time expressions. These internal types are compatible with the ISO C specification of size_t, wchar_t and ptrdiff_t, and thus are compatible with any conforming definitions of these types found in included files. However, when checking the following program against the system headers, a warning is produced on some machines concerning the implicit conversion of an unsigned int to type size_t:

#include <stdlib.h>
int main() {
  size_t size;
  size = sizeof(int);
}

The system header on the machine in question actually defines size_t to be a signed int (this of course contravenes the ISO C standard) but the compile time function sizeof returns the checker's internal version of size_t which is an abstract unsigned integral type. By using the pragma:

#pragma TenDRA set size_t:signed int

the checker can be instructed to use a different internal definition of size_t when evaluating the sizeof function and the error does not arise. Equivalent options are also available for the ptrdiff_t and wchar_t types.

3.5. Integer overflow checks

Given the complexity of the rules governing the types of integers and results of integer operations, as well as the variation of integral ranges with machine architecture, it is hardly surprising that unexpected results of integer operations are at the root of many programming problems. These problems can often be hard to track down and may suddenly appear in an application which was previously considered "safe", when it is moved to a new system. Since the checker supports the concept of a guaranteed minimum size of an integer it is able to detect many potential problems involving integer constants. The pragma:

#pragma TenDRA integer overflow analysis status

where status is on, warning or off, controls a set of checks on arithmetic expressions involving integer constants. These checks cover overflow, use of constants exceeding the minimum guaranteed size for their type and division by zero. They are not enabled in the default mode.

There are two special cases of integer overflow for which checking is controlled separately:

Bitfield sizes. Obviously, the size of a bitfield must be smaller than or equal to the minimum size of its integral type. A bitfield which is too large is flagged as an error in the default mode. The check on bitfield sizes is controlled by:

#pragma TenDRA bitfield overflow permit

where permit is one of allow, disallow or warning.

Octal and hexadecimal escape sequences. According to the ISO C standard, the value of an octal or hexadecimal escape sequence shall be in the range of representable values for the type unsigned char for an integer character constant, or the unsigned type corresponding to wchar_t for a wide character constant. The check on escape sequence sizes is controlled by:

#pragma TenDRA character escape overflow permit

where the options for permit are allow, warning and disallow. The check is switched on by default.

3.6. Integer operator checks

The results of some integer operations are undefined by the ISO C standard for certain argument types. Others are implementation-defined or simply likely to produce unexpected results.In the default mode such operations are processed silently, however a set of checks on operations involving integer constants may be controlled using:

#pragma TenDRA integer operator analysis status

where status is replaced by on, warning or off. This pragma enables checks on:

  • shift operations where an expression is shifted by a negative number or by an amount greater than or equal to the width in bits of the expression being shifted;

  • right shift operation with a negative value of signed integral type as the first argument;

  • division operation with a negative operand;

  • test for an unsigned value strictly greater than or less than 0 (these are always true or false respectively);

  • conversion of a negative constant value to an unsigned type;

  • application of unary - operator to an unsigned value.

3.7. Support for 64 bit integer types (long long)

Although the use of long long to specify a 64 bit integer type is not supported by the ISO C standard it is becoming increasingly popular as in programming use. By default, tchk does not support the use of long long but the checker can be configured to support the long long type to different degrees using the following pragmas:

#pragma TenDRA longlong type permit

where permit is one of allow (long long type accepted), disallow (errors produced when long long types are detected) or warning (long long type are accepted but a warning is raised).

#pragma TenDRA set longlong type : type_name

where type_name is long or long long.

The first pragma determines the behaviour of the checker if the type long long is encountered as a type specifier. In the disallow case, an error is raised and the type specifier mapped to long, otherwise the type is stored as long long although a message alerting the user to the use of long long is raised in the warning mode. The second pragma determines the semantics of long long. If the type specified is long long, then long long is treated as a separate integer type and if code generation is enabled, long long types appears in the output. Otherwise the type is mapped to long and all objects declared long long are output as if they had been declared long (a warning is produced when this occurs). In either case, long long is treated as a distinct integer type for the purpose of integer conversion checking.

Extensions to the integer promotion and arithmetic conversion rules are required for the long long type. These have been implemented as follows:

  • the types of integer arithmetic operations where neither argument has long long type are unaffected;

  • long long and unsigned long long both promote to themselves;

  • the result type of arithmetic operations with one or more arguments of type unsigned long long is unsigned long long;

  • otherwise if either argument has type signed long long the overall type is long long if both arguments can be represented in this form, otherwise the type is unsigned long long.

There are now three cases where the type of an integer arithmetic operation is not completely determined from the type of its arguments, i.e.

  1. signed long long + unsigned long = signed long long or unsigned long long;

  2. signed long long + unsigned int = signed long long or unsigned long long;

  3. signed int + unsigned short = signed int or unsigned int ( as before ).

In these cases, the type of the operation is represented using an abstract integral type as described in section Section 3.1, “Integer promotion rules”

Chapter 4. Data Flow and Variable Analysis

The checker has a number of features which can be used to help track down potential programming errors relating to the use of variables within a source file and the flow of control through the program. Examples of this are detecting sections of unused code, flagging expressions that depend upon the order of evaluation where the order is not defined, checking for unused static variables, etc.

4.1. Unreachable code analysis

Consider the following function definition:

int f ( int n )
{
  if ( n ) {
  return ( 1 );
} else {
  return ( 0 );
}
  return ( 2 );
}

The final return statement is redundant since it can never be reached. The test for unreachable code is controlled by:

#pragma TenDRA unreachable code permit

where permit is replaced by disallow to give an error if unreached code is detected, warning to give a warning, or allow to disable the test (this is the default).

There are also equivalent command-line options to tchk of the form -X:unreached=state, where state can be check, warn or dont.

Annotations to the code in the form of user-defined keywords may be used to indicate that a certain statement is genuinely reached or unreached. These keywords are introduced using:

#pragma TenDRA keyword REACHED for set reachable
#pragma TenDRA keyword UNREACHED for set unreachable

The statement REACHED then indicates that this portion of the program is actually reachable, whereas UNREACHED indicates that it is unreachable. For example, one way of fixing the program above might be to say that the final return is reachable (this is a blatant lie, but never mind). This would be done as follows:

int f ( int n ) {
  if ( n ) {
return ( 1 );
  } else {
return ( 0 )
  }
  REACHED
  return ( 2 );
}

An example of the use of UNREACHED might be in the function below which falls out of the bottom without a return statement. We might know that, because it is never called with c equal to zero, the end of the function is never reached. This could be indicated as follows:

int f ( int c ){
  if ( c ) return ( 1 );
  UNREACHED
}

As always, if new keywords are introduced into a program then definitions need to be provided for conventional compilers. In this case, this can be done as follows:

#ifdef __TenDRA__
#pragma TenDRA keyword REACHED for set reachable
#pragma TenDRA keyword UNREACHED for set unreachable
#else
#define REACHED
#define UNREACHED
#endif

4.2. Case fall through

Another flow analysis check concerns fall through in case statements. For example, in:

void f ( int n )
{
  switch ( n ) {
case 1 : puts ( "one" );
   case 2 : puts ( "two" );
  }
}

the control falls through from the first case to the second. This may be due to an error in the program (a missing break statement), or be deliberate. Even in the latter case, the code is not particularly maintainable as it stands - there is always the risk when adding a new case that it will interrupt this carefully contrived flow. Thus it is customary to comment all case fall throughs to serve as a warning.

In the default mode, the TenDRA C checker ignores all such fall throughs. A check to detect fall through in case statements is controlled by:

#pragma TenDRA fall into case permit

where permit is allow (no errors), warning (warn about case fall through) or disallow (raise errors for case fall through).

There are also equivalent command-line options to tcc of the form -X:fall_thru=state, where state can be check, warn or dont.

Deliberate case fall throughs can be indicated by means of a keyword, which has been introduced using:

#pragma TenDRA keyword FALL_THROUGH for fall into case

Then, if the example above were deliberate, this could be indicated by:

void f ( int n ){
  switch ( n ) {
case 1 : puts ( "one" );
FALL_THROUGH
case 2 : puts ( "two" );
  }
}

Note

FALL_THROUGH is inserted between the two cases, rather than at the end of the list of statements following the first case.

If a keyword is introduced in this way, then an alternative definition needs to be introduced for conventional compilers. This might be done as follows:

#ifdef __TenDRA__
#pragma TenDRA keyword FALL_THROUGH for fall into case
#else
#define FALL_THROUGH
#endif

4.3. Unusual flow in conditional statements

The following three checks are designed to detect possible errors in conditional statements.

4.3.1. Empty if statements

Consider the following C statements:

if( var1 == 1 ) ;
var2 = 0 ;

The conditional statement serves no purpose here and the second statement will always be executed regardless of the value of var1. This is almost certainly not what the programmer intended to write. A test for if statements with no body is controlled by:

#pragma TenDRA extra ; after conditional permit

with the usual allow (this is the default setting), warning and disallow options for permit.

4.3.2. Use of assignments as control expressions

Using the C assignment operator, `=', when the equality operator `==' was intended is an extremely common problem. The pragma:

#pragma TenDRA assignment as bool permit

is used to control the treatment of assignments used as the controlling expression of a conditional statement or a loop, e.g.

if( var = 1 ) { ...

The options for permit are allow, warning and disallow. The default setting allows assignments to be used as control statements without raising an error.

4.3.3. Constant control expressions

Statements with constant control expressions are not really conditional at all since the value of the control statement can be evaluated statically. Although this feature is sometimes used in loops, relying on a break, goto or return statement to end the loop, it may be useful to detect all constant control expressions to check that they are deliberate. The check for statically constant control expressions is controlled using:

#pragma TenDRA const conditional permit

where permit may be replaced by disallow to give an error when constant control expressions are encountered, warning to replace the error by a warning, or the check may be switched off using the allow (this is the default).

4.4. Operator precedence

The ISO C standard section 6.3, provides a set of rules governing the order in which operators within expressions should be applied. These rules are said to specify the operator precedence and are summarised in the table over the page. Operators on the same line have the same precedence and the rows are in order of decreasing precedence. Note that the unary +, -, * and & operators have higher precedence than the binary forms and thus appear higher in the table.

The precedence of operators is not always intuitive and often leads to unexpected results when expressions are evaluated. A particularly common example is to write:

if ( var & TEST == 1) { ...
}
else { ...

assuming that the control expression will be evaluated as:

( ( var & TEST ) == 1 )

However, the == operator has a higher precedence than the bitwise & operator and the control expression is evaluated as:

( var & ( TEST == 1 ) )

which in general will give a different result

Table 4.1. ISO C Rules for Operator Precedence

operators Precedence
function call() [] -> . ++(postfix) --(postfix) highest
[ - ++ -- + - * & (type) sizeof  
* / %  
+(binary) -(binary)  
<< >>  
<<= >>=  
== [=  
&  
^  
|  
&&  
||  
?;  
=+= -= *= /= %= &= ^= |= <<= >>=  
. lowest

The TenDRA C checker can be configured to flag expressions containing certain operators whose precedence is commonly confused, namely:

  • && versus ||

  • << and >> versus + and -

  • & versus == != < > <= >= + and -

  • ^ versus & == |= < > <= >= + and -

  • | versus ^ & == |= < > <= >= + and -

The check is switched off by default and is controlled using:

#pragma TenDRA operator precedence status

where status is on, warning or off.

4.5. Variable analysis

The variable analysis checks are controlled by:

#pragma TenDRA variable analysis status

where status is on, warning or off as usual. The checks are switched off in the default mode.

There are also equivalent command line options to tchk of the form -X:variable=state, where state can be check, warn or dont.

The variable analysis is concerned with the evaluation of expressions and the use of local variables, including function arguments. Occasionally it may not be possible to statically perform a full analysis on an expression or variable and in these cases the messages produced indicate that there may be a problem. If a full analysis is possible a definite error or warning is produced. The individual checks are listed in sections 5.6.1 to 5.6.6 and section 5.7 describes the source annotations which can be used to fine-tune the variable analysis.

4.5.1. Order of evaluation

The ISO C standard specifies certain points in the expression syntax at which all prior expressions encountered are guaranteed to have been evaluated. These positions are called sequence points and occur:

  • after the arguments and function expression of a function call have been evaluated but before the call itself;

  • after the first operand of a logical &&, or || operator;

  • after the first operand of the conditional operator, ?:;

  • after the first operand of the comma operator;

  • at the end of any full expression (a full expression may take one of the following forms: an initialiser; the expression in an expression statement; the controlling expression in an if, while, do or switch statement; each of the three optional expressions of a for statement; or the optional expression of a return statement).

Between two sequence points however, the order in which the operands of an operator are evaluated, and the order in which side effects take place is unspecified - any order which conforms to the operator precedence rules above is permitted. For example:

var = i + arr[ i++ ] ;

may evaluate to different values on different machines, depending on which argument of the + operator is evaluated first. The checker can detect expressions which depend on the order of evaluation of sub-expressions between sequence points and these are flagged as errors or warnings when the variable analysis is enabled.

4.5.2. Modification between sequence points

The ISO C standard states that if an object is modified more than once, or is modified and accessed other than to determine the new value, between two sequence points, then the behaviour is undefined. Thus the result of:

var = arr[i++] + i++ ;

is undefined, since the value of i is being incremented twice between sequence point