calculus User's Guide

The TenDRA Documentation Team

$TenDRA: book.xml 2447 2006-03-23 21:15:51Z verm $

Extensions to this document from the original TenDRA-4.1.2-doc.tar.gz source distribution are covered by the BSDL, while all prior modifications remain under the Crown Copyright.

Berkeley Systems Design License

Redistribution and use in source (SGML DocBook) and 'compiled' forms (SGML, HTML, PDF, PostScript, RTF and so forth) with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code (SGML DocBook) must retain the above copyright notice, this list of conditions and the following disclaimer as the first lines of this file unmodified.

  2. Redistributions in compiled form (transformed to other DTDs, converted to PDF, PostScript, RTF and other formats) must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

Important

THIS DOCUMENTATION IS PROVIDED BY THE TENDRA DOCUMENTATION TEAM "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE TENDRA DOCUMENTATION TEAM BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS DOCUMENTATION, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Crown Copyright (c) 1997, 1998

This TenDRA(r) Computer Program is subject to Copyright owned by the United Kingdom Secretary of State for Defence acting through the Defence Evaluation and Research Agency (DERA). It is made available to Recipients with a royalty-free licence for its use, reproduction, transfer to other parties and amendment for any purpose not excluding product development provided that any such use et cetera shall be deemed to be acceptance of the following conditions:

  1. Its Recipients shall ensure that this Notice is reproduced upon any copies or amended versions of it;

  2. Any amended version of it shall be clearly marked to show both the nature of and the organisation responsible for the relevant amendment or amendments;

  3. Its onward transfer from a recipient to another party shall be deemed to be that party's acceptance of these conditions;

  4. DERA gives no warranty or assurance as to its quality or suitability for any purpose and DERA accepts no liability whatsoever in relation to any use to which it may be put.

This document was generated on 2006-09-07 02:34:20

Abstract

Please email us at if you see any errors


Table of Contents

Introduction
1. Input syntax
1.1. Primitives
1.2. Identities
1.3. Enumerations
1.4. Structures
1.5. Unions
1.6. Type constructors
1.7. Relations between algebras
2. Output type system specification
2.1. Version information
2.2. Basic types
2.3. Operations on sizes
2.4. Operations on pointers
2.4.1. Simple pointer constructs
2.4.2. Unique pointers
2.4.3. Pointer construction and destruction
2.4.4. Assignment and dereference constructs
2.5. Operations on lists
2.5.1. Simple list constructs
2.5.2. Unique lists
2.5.3. List construction and destruction
2.6. Operations on stacks
2.7. Operations on vectors
2.8. Operations on vector pointers
2.9. Operations on primitives
2.10. Operations on identities
2.11. Operations on enumerations
2.12. Operations on structures
2.13. Operations on unions
2.13.1. Union construction operations
2.13.2. Union field sets
2.13.3. Inheritance in unions
2.13.4. Union maps
3. Implementation details
3.1. Implementation of types
3.2. Support routines
3.3. Run-time checking
4. Disk reading and writing
4.1. Disk writing routines
4.2. Disk reading routines
4.3. Object printing routines
4.4. Aliasing
4.5. Application to calculus
5. Template files
6. Revision History

List of Figures

3.1. Packing of Types

Introduction

This document describes a tool, calculus, which allows complex type systems to be described in a simple algebraic format, and transforms this into a system of C types which implements this algebra.

calculus was initially written as a design tool for use with the TenDRA C and C++ producers. The producers' internal datatypes reflect the highly complex interdependencies between the C and C++ language concepts (types, expressions and so on) which they were designed to represent. A tool for managing this complexity and allowing changes to the internal datatypes to be made with confidence was therefore seen to be an absolute necessity. The tool can also be applied in similar situations where a complex type system needs to be described.

The tool also provides for a separation between the specification of the type system and its implementation in terms of actual C types. This separation has a number of advantages. The advantages of maintaining a level of abstraction in the specification are obvious. It is possible to apply extremely rigorous type checking to any program written using the tool, thereby allowing many potential errors to be detected at compile time. It is also possible to restrict access to the types to certain well-specified constructs, thereby increasing the maintainability of code written using the tool.

This separation also has beneficial effects on the implementation of the type system. By leaving all the type checking aspects at the specification level, it is possible to impose an extremely homogeneous underlying implementation. This means, for example, that a single memory management system can be applied to all the types within the system. It also opens the possibility of writing operations which apply to all types within the system in a straightforward manner. The Chapter 4, Disk reading and writing disk reading and writing routines described below are an example of this.

Using calculus

The general form for invoking calculus is as

calculus [ options ] input .... [ output ]
     

where input is a file containing the input type system description and output is the directory where the files implementing this system are to be output. This directory must already exist. If no output argument is given then the current working directory is assumed. Note that several input files can be given. Unless Section 1.7, “Relations between algebras” otherwise specified it is the last file which is used to generate the output.

The form in which the Chapter 1, Input syntax input type systems are expressed is described below. The form of the output depends on options. By default, the C implementation of the type system is output. If the -t option is passed to calculus then l#pragma token statements describing the type system specification are output. This Chapter 2, Output type system specification specification is given below, with some of the more important Chapter 3, Implementation details implementation details being described in the following section.

Note that it is necessary to check any program written using calculus against the #pragma token version of the specification in order to get the full benefits of the rigorous type checking provided by the tool. Some sections of the program (if only the Section 3.2, “Support routines” basic functions) may be dependent upon the implementation, and so not be suitable for this form of checking.

Example program

The program calculus itself was written using a type system specified using the calculus tool. It is designed to provide an example of its own application, with some features not strictly necessary for the functionality of the program being added for illustrative purposes.

Chapter 1. Input syntax

algebra :
  ALGEBRA identifier versionopt : item-listopt

version :
  ( integer . integer )

item-list :
  item
  item-list item

item :
  primitive
  identity
  enumeration
  structure
  union
   

The initial identifier gives the overall name of the algebra. A version number may also be associated with the algebra (if this is omitted the version is assumed to be 1.0). The main body of the algebra definition consists of a list of items describing the primitives, the identities, the enumerations, the structures and the unions comprising the algebra.

Here identifier has the same meaning as in C. The only other significant lexical units are integer, which consists of a sequence of decimal digits, and string, which consists of any number of characters enclosed in double quotes. There are no escape sequences in strings. C style comments may be used anywhere in the input. White space is not significant.

1.1. Primitives

Primitives form the basic components from which the other types in

primitive :
  object-identifier = quoted-type ;
     

where the primitive identifier is given by:

object-identifier :
  #opt :opt identifier
  #opt :opt identifier ( identifier )
     

and the primitive definition is a string which gives the C type

quoted-type :
string
     

Note

Each primitive (and also each identity, each enumeration, each structure and each union) has two names associated with it. The second name is optional; if it is not given then it is assumed to be the same as the first name. The first name is that which will be given to the corresponding type in the output file. The second is a short form of this name which will be used in forming constructor names etc. in the output.

The optional hash and colon which may be used to qualify an object identifier are provided for backwards compatibility only and are not used in the output routines.

1.2. Identities

Identities are used to associate a name with a particular type in the algebra. In this they correspond to typedefs in C. They are described as follows:

identity :
  object-identifier = type ;
     

where the definition type, type, is as

1.3. Enumerations

Enumerations are used to define types which can only take values

enumeration :
  enum !opt object-identifier = { enumerator-list } ;
  enum !opt object-identifier = base-enumeration + {
enumerator-list } ;

     

where:

base-enumeration :
  identifier
     

is the name of a previously defined enumeration type. The latter form is used to express extension enumeration types. An enumeration type may be qualified by an exclamation mark to indicate that no lists of this type will be constructed.

The enumeration constants themselves are defined as follows:

enumerator :
  identifier
  identifier = enumerator-value

enumerator-list :
  enumerator
  enumerator-list , enumerator
     

Each enumerator is assigned a value in an ascending sequence, starting at zero. The next value to be assigned can be set using an enumerator-value. This is an expression formed from integers, identifiers representing previous enumerators from the same enumeration, and the question mark character which stands for the previous enumeration value. The normal C arithmetic operations can be applied to build up more complex enumerator-values. All enumerator evaluation is done in the unsigned long type of the host machine. Values containing more than 32 bits are not portable.

Enumerations thus correspond to enumeration types in C, except that

1.4. Structures

Structures are used to build up composite types from other types in the algebra. They correspond to structures in C. They are described as follows:

structure :
  struct object-identifier = component-group ;
  struct object-identifier = base-structure + component-group ;
     

where:

base-structure :
identifier
     

is the name of a previously defined structure type. The latter form is used to express (single) inheritance of structures. All components of the base structure also become components of the derived structure.

The structure components themselves are defined as follows:

component-group :
  { component-listopt }

component-list :
  component-declarations ;
  component-list component-declarations ;

component-declarations :
  type component-declarators

component-declarators :
  component-declarator
  component-declarators , component-declarator

component-declarator :
  identifier component-initialiseropt

component-initialiser :
  = string
     

The optional Section 2.12, “Operations on structures” component initialiser strings are

Structures are the only algebra construct which prevent the input from being a general graph. Unions may be defined in terms of themselves, but (as in C) pointers must be used to define structures in terms of themselves.

1.5. Unions

Unions are used to build up types which can hold a variety of information. They differ from C unions in that they are discriminated. They are described as follows:

union :
  union object-identifier = component-group + field-group map-group opt ;
  union object-identifier = base-union + field-group map-group opt ;
     

where:

base-union :
identifier
     

is the name of a previously defined union type. The latter form is used to express (single) inheritance of unions. All components, fields and maps of the base union also become components of the derived union. Note that only new fields and maps can be added in the derived union.

The component-group gives a set of components which are common to all the different union cases. The cases themselves are described as follows:

field-group :
{ field-list }

field :
  #opt #opt field-identifier-list -> component-group
  #opt #opt field-identifier-list -> base-field + component-group

base-field :
  identifier

field-list :
  field
  field-list , field

field-identifier :
  identifier

field-identifier-list :
  field-identifier
  field-identifier-list , field-identifier
     

The optional one or two hashes which may be used to qualify a list of field identifiers are used to indicate Section 4.4, “Aliasing” aliasing in the disk reading and writing routines. The base-field case is a notational convenience which allows one field in a union to inherit all the components of another field.

Note

A number of field identifiers may be associated with the same set of field components. Any such list containing more than one identifier forms a field identifier set, named after the first field identifier.

In addition a number of maps may be associated with a union. These maps correspond to functions which take the union, plus a number of other map parameter types, and return the map return type. They are described as follows:

map-group :
  : [ map-listopt ]

map :
  extended-type #opt identifier ( parameter-listopt )

map-list :
  map
  map-list map
     

where:

parameter-list :
  parameter-declarations
  parameter-list ; parameter-declarations

parameter-declarations :
  extended-type parameter-declarators

parameter-declarators :
  identifier
  parameter-declarators , identifier
     

Note that the map parameter and return types are given by:

extended-type :
  type
  quoted-type
     

In addition to the Section 1.6, “Type constructors” types derived from the algebra

A map may be qualified by means of a hash. This means that the associated function also takes a Section 2.13.4, “Union maps” destructor function as a parameter.

1.6. Type constructors

The types derived from the algebra may be described as follows:

type :
  identifier
  PTR type
  LIST type
  STACK type
  VEC type
  VEC_PTR type
     

The simple types correspond to primitive, identity, enumeration, structure or union names. It is possible for a type to be used before it is defined, but it must be defined at some point.

The derived type constructors correspond to pointers, lists, stacks, vectors and pointers into vectors. They may be used to build up further types from the basic algebra types.

1.7. Relations between algebras

As the section called “Using calculus” mentioned above, more than one input algebra may be specified to calculus. Each is processed separately, and output is generated for only one. By default this is the last algebra processed, however a specific algebra can be specified using the command-line option -Aname, where name is the name of the algebra to be used for output.

Types may be imported from one algebra to another by means of

import :
  IMPORT identifier ;
  IMPORT identifier :: identifier ;
     

which fit into the main syntax as an item. The first form imports all the types from the algebra given by identifier into the current algebra. The second imports a single type, given by the second identifier from the algebra given by the first identifier.

Note that importing a type in this way also imports all the types used in its construction. This includes such things as structure components and union fields and maps. Thus an algebra consisting just of import commands can be used to express subalgebras in a simple fashion.

Chapter 2. Output type system specification

In this section we document the basic output of calculus. Two forms of the output can be generated - a description of the output specification in terms of the TenDRA #pragma token constructs, and the actual C code which implements these tokens.

In this section the description given will be at the level of the output specification. The more important details of the Chapter 3, Implementation details C implementation are given in the following section.

The output is split among several header files in the specified output directory. The main output is printed into name.h, where name is the overall algebra name. Unless otherwise stated, all the objects specified below are to be found in name.h. However for each union, union, in the algebra certain information associated with the union is printed into union_ops.h. If the union has any maps associated with it then further output is printed to union_map.h and union_hdr.h.

2.1. Version information

Certain basic information about the input algebra is included in name.h. name_NAME is a string literal giving the overall algebra name. name_VERSION is a string literal giving the algebra version number. name_SPECIFICATION and name_IMPLEMENTATION are flags which take the values 0 or 1 depending on whether the specification of the type system in terms of #pragma token statements or the C implementation is included.

2.2. Basic types

Six abstract type operators, each taking a type as argument and

TYPE PTR ( TYPE t );
TYPE LIST ( TYPE t ) ;
TYPE STACK ( TYPE t ) ;
TYPE VEC ( TYPE t ) ;
TYPE VEC_PTR ( TYPE t ) ;
TYPE SIZE ( TYPE t ) ;
     

These represent a pointer to an object of type t, a list of objects of type t, a stack of objects of type t, a vector of objects of type t, a pointer into a vector of objects of type t, and an allocation block size for type t respectively. (It is possible to suppress all constructs involving VEC or VEC_PTR by passing the -x command-line option to calculus. Similarly STACK constructs may be suppressed using -z.)

These constructors can be applied to any type, although in practice they are only applied to the types specified in the algebra and those derived from them. For example, we may form the type:

LIST ( PTR ( int ) )
     

representing a list of pointers to int.

INT_TYPE name_dim ;
     

is specified, where name is the overall

A function pointer type:

typedef void ( *DESTROYER ) () ;
     

is specified, which represents a destructor function. Two

void destroy_name () ;
void dummy_destroy_name () ;
     

where name is as above. destroy_name is the default destructor, whereas dummy_destroy_name is a dummy destructor which has no effect. The details of the arguments passed to the destructors and so on are implementation dependent.

2.3. Operations on sizes

The SIZE type constructor is used to represent a multiple of the size of a type. It is used, for example, in the Section 2.4, “Operations on pointers” pointer stepping construct STEP_ptr to specify the number of units the pointer is to be increased by. Having a separate abstract type for the size of each type greatly increases the scope for type checking of memory allocation, and other, functions.

For each basic type in the algebra (a primitive, a structure or a

SIZE ( t ) SIZE_type ;
     

where t denotes the type itself, and

For the five other type constructors Section 2.2, “Basic types” described

SIZE ( PTR ( t ) ) SIZE_ptr ( TYPE t ) ;
SIZE ( LIST ( t ) ) SIZE_list ( TYPE t ) ;
SIZE ( STACK ( t ) ) SIZE_stack ( TYPE t ) ;
SIZE ( VEC ( t ) ) SIZE_vec ( TYPE t ) ;
SIZE ( VEC_PTR ( t ) ) SIZE_vec_ptr ( TYPE t ) ;
     

for any type t.

These constructs allow the size of any type derived from the algebra to be built up. There is also a construct which corresponds to multiplying the size of a type by a constant. This takes the form:

SIZE ( t ) SCALE ( SIZE ( t ), INT_TYPE itype ) ;
     

for any type t and any integral type

2.4. Operations on pointers

The PTR type constructor is used to represent a pointer to an object of type t. It should be emphasised that this construct is not in general implemented by means of the C pointer construct.

2.4.1. Simple pointer constructs

There are several simple operations on pointers, given as

PTR ( t ) NULL_ptr ( TYPE t ) ;
int IS_NULL_ptr ( PTR ( t ) ) ;
int EQ_ptr ( PTR ( t ), PTR ( t ) ) ;
PTR ( t ) STEP_ptr ( PTR ( t ), SIZE ( t ) ) ;
       

The construct NULL_ptr is used to form the null pointer to t for a type t. This is a constant expression. IS_NULL_ptr can be used to test whether a given pointer expression is a null pointer. Similarly EQ_ptr checks whether two pointers are equal (note that we can only compare pointers to the same type). Finally STEP_ptr can be used to add a scalar value to a pointer.

2.4.2. Unique pointers

There are also constructs for generating and destroying unique

PTR ( t ) UNIQ_ptr ( TYPE t ) ;
void DESTROY_UNIQ_ptr ( PTR ( t ) ) ;
       

A unique pointer is guaranteed to be different from all other undestroyed pointers. Dereferencing a unique pointer is undefined.

2.4.3. Pointer construction and destruction

The constructs:

PTR ( t ) MAKE_ptr ( SIZE ( t ) ) ;
void DESTROY_ptr ( PTR ( t ), SIZE ( t ) ) ;
       

are used to respectively create and destroy a pointer to a given

2.4.4. Assignment and dereference constructs

The constructs for assigning and dereferencing pointers have one of two forms depending on the complexity of the type pointed to. For simple types, including primitive, enumeration and union types, they have the form:

void COPY_type ( PTR ( t ), t ) ;
t DEREF_type ( PTR ( t ) ) ;
       

where t is the type involved and

For more complex types, including structures, the assignment or dereference cannot be done in a single expression, so the constructs are specified to be statements as follows:

STATEMENT COPY_type ( PTR ( t ), t ) ;
STATEMENT DEREF_type ( PTR ( t ), lvalue t ) ;
       

Here the lvalue type qualifier is used to indicate that the statement argument is an assignable lvalue. In this case it should give the location of an object of type t into which the pointer will be dereferenced.

The appropriate assignment and dereference constructs are generated for each of the basic algebra types (primitives, enumerations, structures and unions). In addition there are generic assignment and dereference constructs for pointer types, list types, stack types, vector types and vector pointer types. The first three are of the first form above, whereas the second two have the second form, as follows:

void COPY_ptr ( PTR ( PTR ( t ) ), PTR ( t ) ) ;
PTR ( t ) DEREF_ptr ( PTR ( PTR ( t ) ) ) ;
void COPY_list ( PTR ( LIST ( t ) ), LIST ( t ) ) ;
LIST ( t ) DEREF_list ( PTR ( LIST ( t ) ) ) ;
void COPY_stack ( PTR ( STACK ( t ) ), STACK ( t ) ) ;
STACK ( t ) DEREF_stack ( PTR ( STACK ( t ) ) ) ;
STATEMENT COPY_vec ( PTR ( VEC ( t ) ), VEC ( t ) ) ;
STATEMENT DEREF_vec ( PTR ( VEC ( t ) ), lvalue VEC ( t ) ) ;
STATEMENT COPY_vec_ptr ( PTR ( VEC_PTR ( t ) ), VEC_PTR ( t ) ) ;
STATEMENT DEREF_vec_ptr ( PTR ( VEC_PTR ( t ) ), lvalue VEC_PTR ( t ) ) ;
       

2.5. Operations on lists

The LIST type constructor is used to represent a

2.5.1. Simple list constructs

There are several simple list constructs:

LIST ( t ) NULL_list ( TYPE t ) ;
int IS_NULL_list ( LIST ( t ) ) ;
int EQ_list ( LIST ( t ), LIST ( t ) ) ;
unsigned LENGTH_list ( LIST ( t ) ) ;
PTR ( t ) HEAD_list ( LIST ( t ) ) ;
LIST ( t ) TAIL_list ( LIST ( t ) ) ;
PTR ( LIST ( t ) ) PTR_TAIL_list ( LIST ( t ) ) ;
LIST ( t ) END_list ( LIST ( t ) ) ;
LIST ( t ) REVERSE_list ( LIST ( t ) ) ;
LIST ( t ) APPEND_list ( LIST ( t ), LIST ( t ) ) ;
       

Empty lists may be constructed or tested for. NULL_list is a constant expression. Two lists may be checked for equality (note that this is equality of lists, rather than element-wise equality). The number of elements in a list can be found. The head or tail (or car and cdr) of a list may be formed. The end element of a list may be selected. The order of the elements in a list can be reversed. One list can be appended to another.

2.5.2. Unique lists

There are also constructs for generating and destroying unique

LIST ( t ) UNIQ_list ( TYPE t ) ;
void DESTROY_UNIQ_list ( LIST ( t ) ) ;
       

A unique lists is guaranteed to be different from all other undestroyed lists. Taking the head or tail of a unique list is undefined.

2.5.3. List construction and destruction

For each type t there are operations for constructing and deconstructing lists. For the basic types comprising the algebra these take the form:

STATEMENT CONS_type ( t, LIST ( t ), lvalue LIST ( t ) ) ;
STATEMENT UN_CONS_type ( lvalue t, lvalue LIST ( t ), LIST ( t ) ) ;
STATEMENT DESTROY_CONS_type ( DESTROYER, lvalue t, lvalue LIST ( t ), LIST ( t ) ) ;
       

where type is the short type name.

The operation CONS_type is used to build a list whose head is the first argument and whose tail is the second argument. This is assigned to the third argument. UN_CONS_type reverses this process, decomposing a list into its head and its tail. DESTROY_CONS_type is similar, but it also applies the given destructor function to the decomposed list component.

There are also generic list construction and deconstruction operations which apply to lists of pointers (with a ptr suffix), lists of lists (with a list suffix), lists of stacks (with a stack suffix), lists of vectors (with a vec suffix) and lists of pointers to vectors (with a vec_ptr suffix). For example, for lists of pointers these have the form:

STATEMENT CONS_ptr ( PTR ( t ), LIST ( PTR ( t ) ), lvalue LIST ( PTR ( t ) ) ) ;
STATEMENT UN_CONS_ptr ( lvalue PTR ( t ), lvalue LIST ( PTR ( t ) ), LIST ( PTR ( t ) ) ) ;
STATEMENT DESTROY_CONS_ptr ( DESTROYER, lvalue PTR ( t ), lvalue LIST ( PTR ( t ) ), LIST ( PTR ( t ) ) ) ;
       

There is also a generic list destruction construct:

STATEMENT DESTROY_list ( LIST ( t ), SIZE ( t ) ) ;
       

which may be used to destroy all the elements in a list.

2.6. Operations on stacks

The STACK type constructor is used to represent stacks of objects of type t. Empty stacks can be created and tested for:

STACK ( t ) NULL_stack ( TYPE t ) ;
int IS_NULL_stack ( STACK ( t ) ) ;
     

For each type t there are operations for pushing objects onto a stack and for popping elements off. For the basic types comprising the algebra these take the form:

STATEMENT PUSH_type ( t, lvalue STACK ( t ) ) ;
STATEMENT POP_type ( lvalue t, lvalue STACK ( t ) ) ;
     

where type is the short type name. There are also generic constructs, such as PUSH_ptr for pushing any pointer type, similar to the Section 2.5, “Operations on lists” generic list constructors above.

Stacks are in fact just modified lists with pushing corresponding adding an element to the start of a list, and popping to removing the first element. There are constructs:

LIST ( t ) LIST_stack ( STACK ( t ) ) ;
STACK ( t ) STACK_list ( LIST ( t ) ) ;
     

for explicitly converting between lists and stacks.

2.7. Operations on vectors

The VEC type constructor is used to represent vectors of objects of type t. There are a number of simple operations on vectors:

VEC ( t ) NULL_vec ( TYPE t ) ;
name_dim DIM_vec ( VEC ( t ) ) ;
name_dim DIM_ptr_vec ( PTR ( VEC ( t ) ) ) ;
PTR ( t ) PTR_ptr_vec ( PTR ( VEC ( t ) ) ) ;
     

An empty vector can be constructed (note that, unlike null pointers and null lists, this is not a constant expression). The number of elements in a vector (or in a vector given by a pointer) can be determined (note how the type name_dim is used to represent vector sizes). A pointer to a vector can be transformed into a pointer to the elements of the vector.

In general a vector may be created or destroyed using the

STATEMENT MAKE_vec ( SIZE ( t ), name_dim, lvalue VEC ( t ) ) ;
STATEMENT DESTROY_vec ( VEC ( t ), SIZE ( t ) ) ;
     

Finally a vector can be trimmed using:

STATEMENT TRIM_vec ( VEC ( t ), SIZE ( t ), int, int, lvalue VEC (
t ) ) ;
     

the two integral arguments giving the lower and upper bounds for

2.8. Operations on vector pointers

The VEC_PTR type constructor is used to represent a pointer to an element of a vector of objects of type t. Apart from the basic constructors already mentioned, there are only two operations on vector pointers:

VEC_PTR ( t ) VEC_PTR_vec ( VEC ( t ) ) ;
PTR ( t ) PTR_vec_ptr ( VEC_PTR ( t ) ) ;
     

The first transforms a vector into a vector pointer (pointing to its first argument). The second transforms a vector pointer into a normal pointer.

2.9. Operations on primitives

Each primitive specified within the algebra maps onto a typedef defining the type in terms of its given definition. The only operations on primitives are those listed above - the size constructs, the pointer assignment and dereference operations, the list construction and deconstruction operations and the stack push and pop operations.

2.10. Operations on identities

Each identity specified within the algebra maps onto a typedef in the output file. There are no operations on identities.

2.11. Operations on enumerations

Each enumeration specified within the algebra maps onto an unsigned integral type in the output file. The Section 2.9, “Operations on primitives” basic operations listed above are always generated unless it has been indicated that Section 1.3, “Enumerations” no lists of this type will be formed, when CONS_type and the other list and stack operators are suppressed. In addition each enumerator which is a member of this enumeration maps onto a constant expression:

t enum_member ;
     

where t is the enumeration type, enum is the short type name, and member is the enumerator name. It is guaranteed that the first enumerator will have value 0, the second 1, and so on, unless the value of the enumerator is explicitly given (as in C). There is also a constant expression:

unsigned long ORDER_enum ;
     

giving one more than the maximum enumerator in the enumeration.

2.12. Operations on structures

Each structure specified within the algebra maps onto a typedef defining the type to be the C structure with the given components. In addition to the Section 2.9, “Operations on primitives” basic operations listed above there are also field selectors defined.

Suppose that t is a structure type with short name struct, and that comp is a component of t of type ctype. Then there is an operation:

PTR ( ctype ) struct_comp ( PTR ( t ) ) ;
     

which transforms a pointer to the structure into a pointer to the

STATEMENT MAKE_struct ( ctype, ...., PTR ( t ) ) ;
     

where ctype ranges over all the component types which do not have a component initialiser string in the structure definition. This is used to assign values to all the components of a structure. If a component has an initialiser string then this is used as an expression giving the initial value, otherwise the given operation argument is used. The initialiser strings are evaluated in the context of the MAKE_struct statement. The parameters to the operation may be referred to by the corresponding component name followed by _. The partially initialised object may be referred to by the special character sequence %0. Any remainder operations should be written as %% rather than %.

Inheritance in structures is handled as follows: if t is a derived structure with base structure btype then there is an operation:

PTR ( btype ) CONVERT_struct_base ( PTR ( t ) ) ;
     

where struct and base are the short names of t and btype respectively.

2.13. Operations on unions

Each union specified within the algebra maps onto an opaque type in the output file. In addition to the Section 2.9, “Operations on primitives” basic operations listed above there are a number of other constructs output into name.h, namely:

unsigned int ORDER_union ;
t NULL_union ;
int IS_NULL_union ( t ) ;
int EQ_union ( t, t ) ;
     

where t denotes the union type, and union is the short type name. ORDER_union is a constant value giving the number of union fields. NULL_union is a distinguished constant value of type t. Values of type t may be compared against this distinguished value, or against each other.

2.13.1. Union construction operations

Most of the output for the union type t is output into the file union_ops.h. This contains a construct:

unsigned int TAG_union ( t ) ;
       

for extracting the discriminant tag from a union.

For each shared component, comp, of t of type ctype, say, there is an operator:

PTR ( ctype ) union_comp ( t ) ;
       

which extracts this component from the union.

For each field, field, of the union there are constructs:

unsigned int union_field_tag ;
int IS_union_field ( t ) ;
       

giving the (constant) discriminant tag associated with this field, and a test for whether an object of type t has this discriminant.

In addition, for each unshared component, comp, of field of type ctype, say, there is an operator:

PTR ( ctype ) union_field_comp ( t ) ;
       

which extracts this component from the union.

There are also operations for constructing and deconstructing objects of type t for field field given as follows:

STATEMENT MAKE_union_field ( ctype, ...., lvalue t ) ;
STATEMENT DECONS_union_field ( lvalue ctype, ...., t ) ;
STATEMENT DESTROY_union_field ( DESTROYER, lvalue ctype, ...., t ) ;
       

The operation MAKE_union_field constructs a value of type t from the various components and assigns it into the last argument. The ctype arguments range over all the components of field, both the shared components and the unshared components, which do not have a component initialiser string in the definition. The union component are initialised either using the Section 2.12, “Operations on structures” initialiser string or the given operation argument.

DECONS_union_field performs the opposite operation, deconstructing an object of type t into its components for this field. DESTROY_union_field is similar, except that it also applies the given destructor function to the original value. For these two operations ctype ranges over all the components, including those with initialiser strings.

2.13.2. Union field sets

Recall that Section 1.5, “Unions” a number of union field identifiers may be associated with the same set of components. In this case these fields form a union field set named after the first element of the set. There are operations:

int IS_union_field_etc ( t ) ;
PTR ( ctype ) union_field_etc_comp ( t ) ;
STATEMENT MAKE_union_field_etc ( unsigned int, ctype, ...., lvalue
t ) ;
STATEMENT MODIFY_union_field_etc ( unsigned int, t ) ;
STATEMENT DECONS_union_field_etc ( lvalue ctype, ...., t ) ;
STATEMENT DESTROY_union_field_etc ( DESTROYER, lvalue ctype, ....,
t ) ;
       

defined on these sets. These are exactly analogous to the single field operations except that they work for any field in the set. Note that MAKE_union_field _etc takes an extra argument, giving the tag associated with the particular element of the set being constructed. Also the construct MODIFY_union_field _etc is introduced to allow the tag of an existing object to be modified to another value within the same set.

The valid range of union tags for this set can be given by the

unsigned int union_field_tag ;
unsigned int union_field_etc_tag ;
       

A union is a member of this set if its tag is greater than or equal to union_field_tag and strictly less than union_field _etc_tag.

2.13.3. Inheritance in unions

Inheritance in unions is handled as follows: if t is a derived union with base union btype then there is an operation:

btype CONVERT_union_base ( t ) ;
       

where union and base are the short names of t and btype respectively.

2.13.4. Union maps

For each map, map, on the union t, we have declared in union_ops.h an operator:

map_ret map_union ( t, map_par, .... ) ;
       

where map_ret is the map return type and map_par ranges over the map parameter types. This is except for maps with destructors (i.e. those qualified by a Section 1.5, “Unions” hash symbol) which have the form:

map_ret map_union ( t, DESTROYER, map_par, .... ) ;
       

These maps are implemented by having one function per field for each map. The output file union _map.h contains tables of these functions. These have the form:

map_ret ( *map_union_table [ ORDER_union ] ) ( t,
map_par, .... ) = {
....
map_union_field,
....
} ;
       

where there is one entry per union field.

In order to aid the construction of these functions a set of function headers is provided in union _hdr.h. These take the form:

#define HDR_map_union_field\
  map_ret map_union_field ( name_union, destroyer, par, .... )\
  t name_union ;\
  DESTROYER destroyer ; /* if required */\
  map_par par ;\
  ....\
  {\
ctype comp ;\
....\
DECONS_union_field ( comp, ...., name_union ) ;
       

There is also an alternative function header, HDR_map_d_ union_field, which calls DESTROY_union_field rather than DECONS_union_field. The destructor function used is destroyer, if this parameter is present, or the default destructor, destroy_name, otherwise.

Chapter 3. Implementation details

3.1. Implementation of types

The C implementation of the type system Chapter 2, Output type system specification specified above is based on a single type, name, with the same name as the input algebra. This is defined as follows:

typedef union name_tag {
  unsigned int ag_tag ;
  union name_tag *ag_ptr ;
  unsigned ag_enum ;
  unsigned long ag_long_enum ;
  name_dim ag_dim ;
  t ag_prim_type ;
  ....
} name ;
     

where t runs over all the primitive types. All of the types in the algebra can be packed into a block of cells of type name. The following paragraphs describe the implementation of each type, together with how they are packed as blocks of cells. This is illustrated by the following diagram:

Figure 3.1. Packing of Types

Packing of Types

Primitive types are implemented by a typedef defining the type in terms of its Section 2.9, “Operations on primitives” given definition. A primitive type can be packed into a single cell using the appropriate ag_prim_type field of the union.

Identity types are implemented by a typedef defining the type as being equal to another Section 2.10, “Operations on identities” type from the algebra.

Enumeration types are all implemented as unsigned int if all its values fit into 16 bits, or unsigned long otherwise. An enumeration type can be packed into a single cell using the ag_enum or ag_long_enum field of the union.

Structure types are implemented by a typedef defining the type to be the C structure with the Section 2.12, “Operations on structures” given components. A structure type may be packed into a block of cells by packing each of the components in turn.

Union types are all implemented by a pointer to name. This pointer references a block of cells. The null pointer represents NULL_union, otherwise the first cell contains the union discriminant tag (in the ag_tag field), with the subsequent cells containing the packed field components (shared components first, then unshared components). If the union has only one field then the discriminant can be omitted. The union itself can be packed into a single cell using the ag_ptr field of the union.

Pointer types are all implemented by a pointer to name. Null pointers represent NULL_ptr, otherwise the pointer references a single cell. This cell contains a pointer to the packed version of the object being pointed to in its ag_ptr field. A pointer type itself can be packed into a single cell using the ag_ptr field of the union.

List types are all implemented by a pointer to name. Null pointers represent NULL_list, otherwise the pointer references a block of two cells. The first cell gives the tail of the list in its ag_ptr field; the second cell contains a pointer to the packed version of the head of the list in its ag_ptr field. A list type itself can be packed into a single cell using the ag_ptr field of the union.

Stack types are identical to list types, with the head of the list

Vector pointer and vector types are all implemented by structures

typedef unsigned int name_dim ;

typedef struct {
  name *vec ;
  name *ptr ;
} name_VEC_PTR ;

typedef struct {
  name_dim dim ;
  name_VEC_PTR elems ;
} name_VEC ;
     

The vec field of a vector pointer contains a pointer to a block of cells containing the packed versions of the elements of the vector. The ptr field is a pointer to the current element of this block. A vector type also has a field giving the vector size. A vector pointer type can be packed into a block of two cells, using the ag_ptr field of each to hold its two components. A vector type can similarly be packed into a block of three cells, the first holding the vector size in its ag_dim field.

All size types are implemented as int.

3.2. Support routines

The implementation requires the user to provide certain support functions. Most fundamental are the routines for allocating and deallocating the blocks of cells which are used to store the types. These are specified as follows:

name *gen_name ( unsigned int ) ;
void destroy_name ( name *, unsigned int ) ;
void dummy_destroy_name ( name *, unsigned int ) ;
     

where gen_name allocates a block of cells of the given size, destroy_name deallocates the given block, and dummy_destroy_name has Section 2.12, “Operations on structures” no effect. The precise details of how the memory management is to be handled are left to the user.

Certain generic list functions must also be provided, namely:

void destroy_name_list ( name *, unsigned int ) ;
name *reverse_name_list ( name * ) ;
name *append_name_list ( name *, name * ) ;
name *end_name_list ( name * ) ;
     

which implement the constructs DESTROY_list, REVERSE_list, APPEND_list and END_list respectively.

Finally a dummy empty vector:

name_VEC empty_name_vec ;
     

needs to be defined.

Examples of these support routines can be found in the the section called “Example program” calculus program itself.

3.3. Run-time checking

The type checking facilities supported by calculus allow for compile-time detection of many potential programming errors, however there are many problems, such as dereferencing null pointers, deconstructing empty lists and union tag errors which can only be detected at run-time. For this reason calculus can be made to add extra assertions to check for such errors into its output. This is done by specifying the -a command-line option.

These assertions are implemented by means of macros. If the macro ASSERTS is defined then these macros expand to give the run-time checks. If not the output is exactly equivalent to that which would have been output if the -a option had not been given.

The assertions require certain support functions which are output into a separate file, assert_def.h. These functions need to be compiled into the program when ASSERTS is defined. Note that the functions are implementation specific.

Chapter 4. Disk reading and writing

One of the facilities which the Section 3.1, “Implementation of types” homogeneous implementation of the type system described above allows for is the addition of persistence. Persistence in this context means allowing objects to be written to, and read from, disk. Also discussed in this section is the related topic of the object printing routines, which allow a human readable representation of objects of the type system to be output for debugging or other purposes.

The disk reading and writing routines are output into the files read_def.h and write_def.h respectively if the -d command-line option is passed to calculus. The object printing routines are output if the -p option is given, with additional code designed for use with run-time debuggers being added if the -a option is also given.

All of these routines use extra constructs output in the main output files (name.h and union_ops.h), but not normally accessible. The macro name_IO_ROUTINES should be defined in order to make these available for the disk reading and writing routines to use.

4.1. Disk writing routines

The disk writing routines output in write_def.h

static void WRITE_type ( t ) ;
     

for each type t mentioned within the algebra

Note that such routines are output not only for the basic types, such as unions and structures, but also for any composite types, such as pointers and lists, which are used in their definition. The type component of the name WRITE_type is derived from basic types t by using the short type name. For composite types it is defined recursively as follows:

LIST ( t ) -> list_type
PTR ( t ) -> ptr_type
STACK ( t ) -> stack_type
VEC ( t ) -> vec_type
VEC_PTR ( t ) -> vptr_type
     

Such functions are defined for identity types and composite types involving identity types by means of macros which define them in terms of the identity definitions. WRITE_type functions for the primitive types should be provided by the user to form a foundation on which all the other functions may be built.

The user may wish to generate WRITE_type (or other disk reading and writing) functions for types other than those mentioned in the algebra definition. This can be done by means of a command-line option of the form -Einput where input is a file containing a list of the extra types required. In the notation Chapter 1, Input syntax used above the syntax for input is given by:

extra :
  type-listopt

type-list :
  type ;
  type-list type ;
     

The WRITE_type functions are defined recursively in an obvious fashion. The user needs to provide the writing routines for the primitives already mentioned, plus support routines (or macros):

void WRITE_BITS ( int, unsigned int ) ;
void WRITE_DIM ( name_dim ) ;
void WRITE_ALIAS ( unsigned int ) ;
     

for writing a number of bits to disk, writing a vector dimension

Any of the WRITE_type functions may be overridden by the user by defining a macro WRITE_type with the desired effect. Note that the WRITE_type function for an identity can be overridden independently of the function for the identity definition. This provides a method for introducing types which are representationally the same, but which are treated differently by the disk reading and writing routines.

4.2. Disk reading routines

The disk reading routines output in read_def.h are exactly analogous to the disk writing routines. For each type t (except primitives) there is a function or macro:

static t READ_type ( void ) ;
     

which reads an object of that type from disk. The user must provide the READ_type functions for the primitive types, plus support routines:

unsigned int READ_BITS ( int ) ;
name_dim READ_DIM ( void ) ;
unsigned int READ_ALIAS ( void ) ;
     

for reading a number of bits from disk, reading a vector dimension and reading an Section 4.4, “Aliasing” object alias. The READ_type functions may be overridden by means of macros as before.

4.3. Object printing routines

The object printing routines output in

static void PRINT_type ( FILE *, t, char *, int ) ;
     

for each type t, which prints an object of type t to the given file, using the given object name and indentation value. The user needs to provide basic output routines:

void OUTPUT_type ( FILE *, t ) ;
     

for each primitive type. The PRINT_type functions may be overridden by means of macros as before.

The printing routines are under the control of three variables

static int print_indent_step = 4 ;
static int print_ptr_depth = 1 ;
static int print_list_expand = 0 ;
     

These determine the indentation to be used in the output, to what depth pointers are to be dereferenced when printing, and whether lists and stacks are to be fully expanded into a sequence of elements or just split into a head and a tail.

One application of these object printing routines is to aid debugging programs written using the calculus tool. The form of the type system implementation means that it is not easy to extract information using run-time debuggers without a detailed knowledge of the structure of this implementation. As a more convenient alternative, if both the -p and -a command-line options are given then calculus will generate functions:

void DEBUG_type ( t ) ;
     

defined in terms of PRINT_type, for printing an object of the given type to the standard output. Many debuggers have problems passing structure arguments, so for structure, vector and vector pointer types DEBUG_type takes the form:

void DEBUG_type ( t * ) ;
     

These debugging routines are only defined conditionally, if the

4.4. Aliasing

An important feature of the disk reading and writing routines, namely aliasing, has been mentioned but not yet described. The problem to be faced is that many of the objects built up using type systems defined using calculus will be cyclic - they will include references to themselves in their own definitions. Aliasing is a mechanism for breaking such cycles by ensuring that only one copy of an object is ever written to disk, or that only one copy is created when reading from disk. This is done by associating a unique number as an alias for the object.

For example, when writing to disk, the first time the object is written the alias definition is set up. Consequently the alias number is written instead of the object it represents. Similarly when reading from disk, an alias may be associated with an object when it is read. When this alias is encountered subsequently it will always refer to this same object.

The objects on which aliasing can be specified are the Section 1.5, “Unions” union fields. A union field may be qualified by one or two hash symbols to signify that objects of that type should be aliased.

The two hash case is used to indicate that the user wishes to gain access to the objects during the aliasing mechanism. In the disk writing case, the object to be written, x say, is split into its components using the appropriate DECONS_union_field construct. Then the user-defined routine, or macro:

ALIAS_union_field ( comp, ...., x ) ;
     

(where comp ranges over all the union components) is called prior to writing the object components to disk.

Similarly in the disk reading case, the object being read, x, is initialised by calling the user-defined routine:

UNALIAS_union_field ( x ) ;
     

prior to reading the object components from disk. Each object component is then read into a local variable, comp. Finally the user-defined routine:

UNIFY_union_field ( comp, ...., x ) ;
     

(where comp ranges over all the union components) is called to assign these values to x before returning.

In the single hash case the object is not processed in this way. It is just written straight to disk, or has its components immediately assigned to it when reading from disk.

Note that aliasing is used, not just in the disk reading and writing routines, but also in the object printing functions. After calling any such function the user should call the routine:

void clear_name_alias ( void ) ;
     

to clear all aliases.

Aliases are implemented by adding an extra field to the objects to be aliased, which contains the alias number, if this has been assigned, or zero, otherwise. A list of all these extra fields is maintained. In addition to the routine clear_name_alias mentioned above, the user should provide support functions and variables:

unsigned int crt_name_alias ;
void set_name_alias ( name *, unsigned int ) ;
name *find_name_alias ( unsigned int ) ;
     

giving the next alias number to be assigned, and routines for adding an alias to the list of all aliases, and looking up an alias in this list. Example implementations of these routines are given in the the section called “Example program” calculus program itself.

4.5. Application to calculus

As mentioned above, the calculus program itself is an example of its own application. It therefore contains routines for reading and writing a representation of an algebra to and from disk, and for pretty-printing the contents of an algebra. These may be accessed using the the section called “Using calculus” command-line options mentioned above.

If the -w command-line option is specified to calculus then it reads its input file, input, as normal, but writes a disk representation of the input algebra to output, which in this instance is an output file, rather than an output directory. An output file produced in this way can then be specified as an input file to calculus if the -r option is given. Finally the input algebra may be pretty-printed to an output file (or the standard output if no output argument is given) by specifying the -o option.

Chapter 5. Template files

It is possible to use calculus to generate an output file from a template input file, template, using the syntax:

calculus [ options ] input .... -Ttemplate output
   

The template file consists of a list of either template directives or source lines containing escape sequences which are expanded by calculus. Template directive lines are distinguished by having @ as their first character. Escape sequences consist of % following by one or more characters.

There are two template directives; loops take the form:

@loop control
....
@end
   

and conditionals take the form:

@if condition
....
@else
....
@endif
   

or:

@if condition
....
@endif
   

where .... stands for any sequence of template

The control statements in a loop can be primitive, identity, enum, struct or union to loop over all the primitive, identity, enumeration structure or union types within the input algebra. Within an enum loop it is possible to use enum.const to loop over all the enumeration constants of the current enumeration type. Within a struct loop it is possible to use struct.comp to loop over all the components of the current structure. Within a union loop it is possible to use union.comp to loop over all the shared components of the current union, union.field to loop over all the fields of the current union, and union.map to loop over all the maps of the current union. Within a union.field loop it is possible to use union.field.comp to loop over all the components of the current union field. Within a union.map loop it is possible to use union.map.arg to loop over all the arguments of the current union map.

The valid condition statements in a conditional are true and false, plus comp.complex, which is true if the current structure or union field component has a complex type (i.e. those for which COPY_type and DEREF_type require two arguments), and comp.default, which is true if the current structure or union field component has a default initialiser value.

A number of escape sequences can be used anywhere. %ZX and %ZV give the name and version number of the version of calculus used. %Z and %V give the name and version number of the input algebra. %% and %@ give % and @ respectively, and % as the last character in a line suppresses the following newline character.

Within a primitive loop, %PN gives the primitive name, %PM gives the short primitive name and %PD gives the primitive definition.

Within an identity loop, %IN gives the identity name, %IM gives the short identity name and %IT gives the identity definition.

Within an enum loop, %EN gives the enumeration name, %EM gives the short enumeration name and %EO gives the enumeration order, ORDER_enum. Within an enum.const loop, %ES gives the enumeration constant name and %EV gives its value.

Within a struct loop, %SN gives the structure name and %SM gives the short structure name.

Within a union loop, %UN gives the union name, %UM gives the short union name and %UO gives the union order, ORDER_union. Within a union.field loop, %FN gives the field name. Within a struct.comp, union.comp or union.field.comp loop, %CN gives the component name, %CT gives the component type, %CU gives the short form of the component type and %CV gives the default component initialiser value (if comp.default is true). Within a union.map loop, %MN gives the map name and %MR gives the map return type. Within a union.map.arg loop, %AN gives the argument name and %AT gives the argument type.

As an example, the following template file gives a simple algebra

ALGEBRA %X (%V):

/* PRIMITIVE TYPES */
@loop primitive
%PN (%PM) = "%PD" ;
@end

/* IDENTITY TYPES */
@loop identity
%IN (%IM) = %IT ;
@end

/* ENUMERATION TYPES */
@loop enum

enum %EN (%EM) = {
@loop enum.const
  %ES = %EV,
@end
} ;
@end

/* STRUCTURE TYPES */
@loop struct

struct %SN (%SM) = {
@loop struct.comp
  %CT %CN ;
@end
} ;
@end

/* UNION TYPES */
@loop union

union %UN (%UM) = {
@loop union.comp
%CT %CN ;
@end
} + {
@loop union.field
%FN -> {
@loop union.field.comp
  %CT %CN ;
@end
} ;
@end
  } ;
@end
   

Chapter 6. Revision History

This chapter describes revisions to this document.

Only major changes are listed in the revision history. Please see http://www.ten15.org/log/trunk/doc/en/calculus/book.xml?action=follow_copy&rev=&stop_rev=1&mode=follow_copy&verbose=on for a complete list of changes.

Note

CVS revision numbers are located behind the date in the format rXX

Revision History
Revision 1.1 2004/03/08 r1618 verm
Cross references were fixed.
Revision 1.0 2002/09/23 r35 verm
Converted to SGML from the TenDRA 4.1.2 Documentation.