TDF Guide

The TenDRA Documentation Team

$TenDRA: book.xml 2452 2006-03-24 04:04:50Z verm $

Extensions to this document from the original TenDRA-4.1.2-doc.tar.gz source distribution are covered by the BSDL, while all prior modifications remain under the Crown Copyright.

Berkeley Systems Design License

Redistribution and use in source (SGML DocBook) and 'compiled' forms (SGML, HTML, PDF, PostScript, RTF and so forth) with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code (SGML DocBook) must retain the above copyright notice, this list of conditions and the following disclaimer as the first lines of this file unmodified.

  2. Redistributions in compiled form (transformed to other DTDs, converted to PDF, PostScript, RTF and other formats) must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

Important

THIS DOCUMENTATION IS PROVIDED BY THE TENDRA DOCUMENTATION TEAM "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE TENDRA DOCUMENTATION TEAM BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS DOCUMENTATION, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Crown Copyright (c) 1997, 1998

This TenDRA(r) Computer Program is subject to Copyright owned by the United Kingdom Secretary of State for Defence acting through the Defence Evaluation and Research Agency (DERA). It is made available to Recipients with a royalty-free licence for its use, reproduction, transfer to other parties and amendment for any purpose not excluding product development provided that any such use et cetera shall be deemed to be acceptance of the following conditions:

  1. Its Recipients shall ensure that this Notice is reproduced upon any copies or amended versions of it;

  2. Any amended version of it shall be clearly marked to show both the nature of and the organisation responsible for the relevant amendment or amendments;

  3. Its onward transfer from a recipient to another party shall be deemed to be that party's acceptance of these conditions;

  4. DERA gives no warranty or assurance as to its quality or suitability for any purpose and DERA accepts no liability whatsoever in relation to any use to which it may be put.

MIPS and R4000 are registered trademarks of MIPS Technologies, Inc. in the United States and other countries. http://www.mips.com/

This document was generated on 2006-09-07 16:04:10

Abstract

Please email us at if you see any errors or omissions.


Table of Contents

Introduction
1. SORTs and TOKENs
1.1. Token applications and first-class SORTs
1.2. Token definitions and declarations
1.3. A simple use of a TOKEN
1.4. Second class SORTs
2. CAPSULEs and UNITs
2.1. make_capsule and name-spaces
2.1.1. External linkages
2.1.2. UNITs
2.1.3. make_unit
2.1.4. LINK
2.2. Definitions and declarations
2.2.1. Scopes and linking
2.2.2. Declaration and definition signatures
2.2.3. STRING
3. SHAPEs, ALIGNMENTs and OFFSETs
3.1. Shapes
3.1.1. TOP, BOTTOM, LUB
3.1.2. INTEGER
3.1.3. FLOATING and complex
3.1.4. BITFIELD
3.1.5. PROC
3.1.6. Non-primitive SHAPEs
3.2. Alignments
3.2.1. ALIGNMENT constructors
3.2.2. Special alignments
3.2.3. AL_TAG, make_al_tagdef
3.3. Pointer and offset SHAPEs
3.3.1. OFFSET
3.4. Compound SHAPEs
3.4.1. Offset arithmetic with compound shapes
3.4.2. offset_mult
3.4.3. OFFSET ordering and representation
3.5. BITFIELD alignments
4. Procedures and Locals
4.1. make_proc and apply_proc
4.1.1. vartag, varparam
4.2. make_general_proc and apply_general_proc
4.2.1. tail_call
4.2.2. PROCPROPS
4.3. Defining and using locals
4.3.1. identify, variable
4.3.2. ACCESS
4.3.3. current_env, env_offset
4.3.4. local_alloc, local_free_all, last_local
4.4. Heap storage
5. Control Flow within procedures
5.1. Unconditional flow
5.1.1. sequence
5.2. Conditional flow
5.2.1. labelled, make_label
5.2.2. goto, make_local_lv, goto_local_lv, long_jump, return_to_label
5.2.3. integer_test, NTEST
5.2.4. case
5.2.5. conditional, repeat
5.3. Exceptional flow
6. Values, variables and assignments.
6.1. contents
6.2. assign
6.3. TRANSFER_MODE operations
6.4. Assigning and extracting bitfields
7. Operations
7.1. VARIETY and overflow
7.1.1. ERROR_TREATMENT
7.2. Division and remainder
7.3. change_variety
7.4. and, or, not, xor
7.5. Floating-point operations, ROUNDING_MODE
7.6. change_bitfield_to_int, change_int_to_bitfield
7.7. make_compound, make_nof, n_copies
8. Constants
8.1. _cond constructors
8.2. Primitive constant constructors
9. Tokens and APIs
9.1. Application programming interfaces
9.2. Linking to APIs
9.2.1. Target independent headers, unique_extern
9.3. Language programming interfaces
10. TDF transformations
10.1. Transformations as definitions
10.1.1. Examples of transformations
10.1.2. Programs with undefined values
11. TDF expansions of offsets
11.1. Bitfield offsets
12. Models of the TDF algebra
12.1. Model for a 32-bit standard architecture
12.1.1. Alignment model
12.1.2. Offset and pointer model
12.1.3. Size model
12.2. Model for machines like the iAPX-432
12.2.1. Alignment model
12.2.2. Offset and pointer model
12.2.3. Size model
12.2.4. Offset arithmetic
13. Revision History

List of Figures

2.1. make_capsule Signature
2.2. make_quit Diagram
9.1. Compile Phase

List of Tables

1.1. First class SORTs
1.2. Sorts without SORTNAMEs

List of Examples

10.1. Examples of transformations

Introduction

This memo is intended to be a fairly detailed commentary on the specification of TDF, a kind of Talmud to the Torah. If it conflicts with the specification document, it is wrong. The aim is elucidate the various constructions of TDF, giving examples of usages both from the point of view of a producer of TDF and how it is used to construct programs on particular platforms using various installers or translators. In addition, some attempt is made to give the reasons why the particular constructions have been chosen. Most of the commentary is a distillation of questions and answers raised by people trying to learn TDF from the specification document.

Throughout this document, references like (S5.1) are headings in the TDF Specification. I use the term "compiling" or "producing" to mean the production of TDF from some source language and "translating" to mean making a program for some specific platform from TDF.

I use the first person where I am expressing my own opinions or preferences; these should not be taken as official opinions of DRA or the TenDRA team.

Chapter 1. SORTs and TOKENs

In the syntax of language like C or Pascal, we find various syntactic units like <Expression>, <Identifier> etc. A SORT bears the same relation to TDF as these syntactic units bear to the language; roughly speaking, the syntactic unit <Expression> corresponds to the SORT EXP and <Identifier> to TAG . However, instead of using BNF to compose syntactic units from others, TDF uses explicit constructors to compose its SORTs; each constructor uses other pieces of TDF of specified SORTs to make a piece of its result SORT. For example, the constructor plus uses an ERROR_TREATMENT and two EXPs to make another EXP.

At the moment, there are 58 different SORTS, from ACCESS to VARIETY given in Table 1.1, “First class SORTs” and Table 1.2, “Sorts without SORTNAMEs”. Some of these have familiar analogues in standard language construction as with EXP and TAG above. Others will be less familiar since TDF must concern itself with issues not normally addressed in language definitions. For example, the process of linking together TDF programs is at the root of the architecture neutrality of TDF and so must form an integral part of its definition. On the other hand, TDF is not meant to be a language readily accessible to the human reader or writer; computers handle it much more easily. Thus a great many choices have been made in the definition which would be intolerable in a standard language definition for the human programmer but which, paradoxically enough, make it much simpler for a computer to produce and analyse TDF.

The SORTs and constructors in effect form a multi-sorted algebra. There were two principal reasons for choosing this algebraic form of definition. First, it is easy to extend - a new operation on existing constructs simply requires a new constructor. Secondly, the algebraic form is highly amenable to the automatic construction of programs. Large parts of both TDF producers and TDF translators have been created by automatic transformation of the text of the specification document itself, by extracting the algebraic signature and constructing C program which can read or produce TDF. To this extent, one can regard the specification document as a formal description of the free algebra of TDF SORTs and constructors. Of course, most of the interesting parts of the definition of TDF lies in the equivalences of parts of TDF, so this formality only covers the easy bit.

Another distinction between the TDF definition and language syntactic description is that TDF is to some extent conscious of its own SORTs so that it can specify a new construction of a given SORT. The analogy in normal languages would be that one could define a new construction with new syntax and say this is an example of an <Expression>, for example; I don't know of any standard language which permits this, although those of you with a historical bent might remember Algol-N which made a valiant attempt at it. Of course, the algebraic method of description makes it much easier to specify, rather than having to give syntax to provide the syntax for the new construction in a language.

1.1. Token applications and first-class SORTs

A new construction is introduced by the SORT TOKEN; the constructors involving TOKENs allow one to give an expansion for the TOKEN in terms of other pieces of TDF, possibly including parameters. We can encapsulate a (possibly parameterised) fragment of TDF of a suitable SORT by giving it a TOKEN as identification. Not all of the SORTs are available for this kind of encapsulation - only those which have a SORTNAME constructor (from access to variety). These are the "first-class" SORTs given in Table 1.1, “First class SORTs”. Each of these have an appropriate _apply_token constructor (e.g. exp_apply_token) give the expansion. Most of these also have _cond constructors (e.g.see exp_cond in Section 8.1, “_cond constructors”) which allows translate time conditional expansion of the SORT.

Table 1.1. First class SORTs

SORT USAGE
ACCESS Properties of TAGs
AL_TAG Name for alignment
ALIGNMENT Abstraction of data alignment
BITFIELD_VARIETY Give no of bits in bit-field with sign
BOOL true or false
ERROR_TREATMENT How to handle errors in operations
EXP Piece of TDF program manipulating values
FLOATING_VARIETY Kind of floating point number
LABEL Mark on EXP to jump to
NAT Non-negative static number of unbounded size
NTEST Test in comparisons
PROCPROPS Porperties of calls and definitions of procedures
ROUNDING_MODE How to round floating point operations
SHAPE Abstraction of size and representation of values
SIGNED_NAT Static number of unbounded size
STRING Static string of n-bit integers
TAG Name for value in run-time program
TRANSFERMODE Controls special contents and assignments operations
TOKEN Install-time function
VARIETY Kind of integer used in run-time program

Every TOKEN has a result SORT, i.e. the SORT of its resulting expansion and before it can be expanded, one must have its parameter SORTs. Thus, you can regard a TOKEN as having a type defined by its result and parameter SORTs and the _apply_token as the operator which expands the encapsulation and substitutes the parameters.

However, if we look at the signature of exp_apply_token:

token_value: TOKEN
token_args:  BITSTREAM param_sorts(token_value)
             → EXP x

we are confronted by the mysterious BITSTREAM where one might expect to find the actual parameters of the TOKEN.

To explain BITSTREAMs requires a diversion into the bit-encoding of TDF. Constructors for a particular SORT are represented in a number of bits depending on the number of constructors for that SORT; the context will determine the SORT required, so no more bits are required. Thus since there is only one constructor for UNITs, no bits are required to represent make_unit; there are about 120 different constructors for EXPs so 7 bits are required to cover all the EXPs. The parameters of each constructor have known SORTs and so their representations are just concatenated after the representation of the constructor. [1] While this is a very compact representation, it suffers from the defect that one must decode it even just to skip over it. This is very irksome is some applications, notably the TDF linker which is not interested detailed expansions. Similarly, in translators there are places where one wishes to skip over a token application without knowledge of the SORTs of its parameters. Thus a BITSTREAM is just an encoding of some TDF, preceded by the number of bits it occupies. Applications can then skip over BITSTREAMs trivially. Similar considerations apply to BYTESTREAMs used elsewhere; here the encoding is preceded by the number of bytes in the encoding and is aligned to a byte boundary to allow fast copying.

1.2. Token definitions and declarations

Thus the token_args parameter of exp_apply_token is just the BITSTREAM formed from the actual parameters in the sequence described by the definition of the token_value parameter. This will be given in a TOKEN_DEFN somewhere with constructor token_definition:

result_sort: SORTNAME
tok_params:  LIST(TOKFORMALS)
body:        result_sort
             → TOKEN_DEFN

The result_sort is the SORT of the construction of body; e.g. if result_sort is formed from exp then body would be constructed using the EXP constructors and one would use exp_apply_token to give the expansion. The list tok_params gives the formal parameters of the definition in terms of TOKFORMALS constructed using make_tok_formals:

sn: SORTNAME
tk: TDFINT
    → TOKFORMALS

The TDFINT tk will be the integer representation of the formal parameter expressed as a TOKEN whose result sort is sn (see more about name representation in Section 2.1, “make_capsule and name-spaces” .). To use the parameter in the body of the TOKEN_DEFN, one simply uses the _apply_token appropriate to sn.Note that sn may be a TOKEN but the result_sort may not.

Hence the BITSTREAM param_sorts(token_value) in the actual parameter of exp_apply_token above is simply formed by the catenation of constructions of the SORTs given by the SORTNAMEs in the tok_params of the TOKEN being expanded.

Usually one gives a name to a TOKEN_DEFN using to form a TOKDEF using make_tokdef :

tok:       TDFINT
signature: OPTION(STRING)
def:       BITSTREAM TOKEN_DEFN
           → TOKDEF

Here, tok gives the name that will be used to identify the TOKEN whose expansion is given by def. Any use of this TOKEN (e.g. in exp_apply_token) will be given by make_token(tok) . Once again, a BITSTREAM is used to encapsulate the TOKEN_DEFN.

The significance of the signature parameter is discussed in Section 2.2.2, “Declaration and definition signatures .

Often, one wishes a token without giving its definition - the definition could, for example, be platform-dependent. A TOKDEC introduces such a token using make_tokdec:

tok:       TDFINT
signature: OPTION(STRING)
s:         SORTNAME
           → TOKDEC

Here the SORTNAME, s, is given by token:

result: SORTNAME
params: LIST(SORTNAME)
        → SORTNAME

which gives the result and parameter SORTs of tok.

One can also use a TOKEN_DEFN in an anonymous fashion by giving it as an actual parameter of a TOKEN which itself demands a TOKEN parameter. To do this one simply uses use_tokdef :

tdef: BITSTREAM TOKEN_DEFN
      → TOKEN

1.3. A simple use of a TOKEN

The crucial use of TOKENs in TDF is to provide abstractions of APIs (see Chapter 9, Tokens and APIs) but they are also used as shorthand for commonly occurring constructions. For example, given the TDF constructor plus, mentioned above, we could define a plus with only two EXP parameters more suitable to C by using the wrap constructor as the ERROR_TREATMENT:

make_tokdef(C_plus, empty,
            token_definition(exp(), (make_tokformals(exp(), l), make_tokformals(exp(), r)),
                             plus(wrap(), exp_apply_token(l, ()), exp_apply_token(r,()))))

1.4. Second class SORTs

Second class SORTs given in Table 1.2, “Sorts without SORTNAMEs” cannot be TOKENised. These are the "syntactic units" of TDF which the user cannot extend; he can only produce them using the constructors defined in core-TDF.

Some of these constructors are implicit. For example, there are no explicit constructors for LIST or SLIST which are both used to form lists of SORTs; their construction is simply part of the encoding of TDF. However, it is forseen that LIST constructors would be highly desireable and there will probably extensions to TDF to promote LIST from a second-class SORT to a first-class one. This will not apply to SLIST or to the other SORTs which have implicit constructions. These include BITSTREAM, BYTESTREAM, TDFINT, TDFIDENT and TDFSTRING.

Table 1.2. Sorts without SORTNAMEs

SORT USAGE
AL_TAGDEF Alignment name definition
AL_TAGDEF_PROPS Body of UNIT containing AL_TAGDEFs
BITSTREAM Encapsulation of a bit encoding
BYTE STREAM Encapsulation of a byte encoding
CALLEES Actual callee parameters
CAPSULE Independent piece of TDF program
CAPSULE_LINK No and kind of linkable entities in CAPSULE
CASELIM Bounds in case constructor
ERROR_CODE Encoding for exceptions
EXTERNAL External name used to connect CAPSULE name
EXTERN_LINK List of LINKEXTERNs in CAPSULE
GROUP List of UNITs with same identification
LINK Connects names in CAPSULE
LINK EXTERN Used to connect CAPSULE names to outside world
LINKS List of LINKs
LIST (AUX) List of AUX SORTs; may have SORTNAME later
OTAGEXP Describes a formal parameter
PROPS Program info in a UNIT
SLIST (AUX) List of AUX SORTs; will not have SORTNAME later
SORTNAME SORT which can be parameter of TOKEN
TAGACC Used in constructing proc formals
TAGDEC Declaration of TAG at UNIT level
TAGDEC_PROPS Body of UNIT containing TAGDECs
TAGDEF Definition of TAG at UNIT level
TAGDEF_PROPS Body of UNIT containing TAGDEFs
TAGSHACC A formal parameter
TDFBOOL TDF encoding for a boolean
TDFIDENT TDF encoding of a byte string
TDFINT TDF encoding of an integer
TDFSTRING TDF encoding of n-bit byte string
TOKDEC Declaration of a TOKEN
TOKDEC_PROPS Body of UNIT containing TOKDECs
TOKDEF Definition of a TOKEN
TOKDEF_PROPS Body of UNIT containing TOKDEFs
TOKEN_DEFN Defines TOKEN expansion
TOKFORMALS Sort and name for parameters in TOKEN_DEFN
UNIQUE World-wide name
UNIT Component of CAPSULE with LINKs to other UNITs
VERSION Version no of TDF
VERSION_PROPS Body of UNIT containing version information



[1] There are facilities to allow extensions to the number of constructors, so it is not quite as simple as this

Chapter 2. CAPSULEs and UNITs

A CAPSULE is typically the result of a single compilation - one could regard it as being the TDF analogue of a Unix .o file. Just as with .o files, a set of CAPSULEs can be linked together to form another. Similarly, a CAPSULE may be translated to make program for some platform, provided certain conditions are met. One of these conditions is obviously that a translator exists for the platform, but there are others. They basically state that any names that are undefined in the CAPSULE can be supplied by the system in which it is to be run. For example, the translator could produce assembly code with external identifiers which will be supplied by some system library.

2.1. make_capsule and name-spaces

The only constructor for a CAPSULE is make_capsule. Its basic function is to compose together UNITs which contain the declarations and definitions of the program. The signature of make_capsule looks rather daunting and is probable best represented graphically.

Figure 2.1. make_capsule Signature

make_capsule Signature

The diagram gives an example of a CAPSULE using the same components as in the following text.

Each CAPSULE has its own name-space, distinct from all other CAPSULEs' name-spaces and also from the name-spaces of its component UNITs (see Section 2.1.2, “UNITs”). There are several different kinds of names in TDF and each name-space is further subdivided into one for each kind of name. The number of different kinds of names is potentially unlimited but only three are used in core-TDF, namely "tag", "token" and "al_tag". Those names in a "tag" name-space generally correspond to identifiers in normal programs and I shall use these as the paradigm for the properties of them all.

The actual representations of a "tag" name in a given name-space is an integer, described as SORT TDFINT. These integers are drawn from a contiguous set starting from 0 up to some limit given by the constructor which introduces the name-space. For CAPSULE name-spaces, this is given by the capsule_linking parameter of make_capsule:

capsule_linking: SLIST(CAPSULE_LINK)

In the most general case in core-TDF, there would be three entries in the list introducing limits using make_capsule_link for each of the "tag", "token" and "al_tag" name-spaces for the CAPSULE. Thus if:

capsule_linking = (make_capsule_link("tag", 5),
                   make_capsule_link("token", 6),
                   make_capsule_link("al_tag", 7))

there are 5 CAPSULE "tag" names used within the CAPSULE, namely 0, 1, 2, 3 and 4; similarly there are 6 "token" names and 7 "al_tag" names.

2.1.1. External linkages

The context of usage will always determine when and how an integer is to be interpreted as a name in a particular name-space. For example, a TAG in a UNIT is constructed by make_tag applied to a TDFINT which will be interpreted as a name from that UNIT's "tag" name-space. An integer representing a name in the CAPSULE name-space would be found in a LINKEXTERN of the external_linkage parameter of make_capsule.

external_linkage: SLIST(EXTERN_LINK)

Each EXTERN_LINK is itself formed from an SLIST of LINKEXTERNs given by make_extern_link . The order of the EXTERN_LINKs determines which name-space one is dealing with; they are in the same order as given by the capsule_linkage parameter. Thus, with the capsule_linkage given above, the first EXTERN_LINK would deal with the "tag" name-space; Each of its component LINKEXTERNs constructed by make_linkextern would be identifying a tag number with some name external to the CAPSULE; for example one might be:

make_linkextern (4, string_extern("printf"))

This would mean: identify the CAPSULE's "tag" 4 with an name called "printf", external to the module. The name "printf" would be used to linkage external to the CAPSULE; any name required outside the CAPSULE would have to be linked like this.

2.1.2. UNITs

This name "printf", of course, does not necessarily mean the C procedure in the system library. This depends both on the system context in which the CAPSULE is translated and also the meaning of the CAPSULE "tag" name 4 given by the component UNITs of the CAPSULE in the groups parameter of make_capsule:

groups: SLIST(GROUP)

Each GROUP in the groups SLIST will be formed by sets of UNITs of the same kind. Once again, there are a potentially unlimited number of kinds of UNITs but core-TDF only uses those named "tld","al_tagdefs", "tagdecs", "tagdefs", "tokdecs" and "tokdefs" [2] These names will appear (in the same order as in groups) in the prop_names parameter of make_capsule, one for each kind of UNIT appearing in the CAPSULE:

prop_names: SLIST(TDFIDENT)

Thus if:

prop_names = ("tagdecs", "tagdefs")

then, the first element of groups would contain only "tagdecs" UNITs and and the second would contain only "tagdefs" UNITs. A "tagdecs" UNIT contains things rather like a set of global identifier declarations in C, while a "tagdefs" UNIT is like a set of global definitions of identifiers.

2.1.3. make_unit

Now we come to the construction of UNITs using make_unit, as in the diagram below

Figure 2.2. make_quit Diagram

make_quit Diagram

First we give the limits of the various name-spaces local to the UNIT in the local_vars parameter:

local_vars: SLIST(TDFINT)

Just in the same way as with external_linkage, the numbers in local_vars correspond (in the same order) to the spaces indicated in capsule_linking in Section 2.1, “make_capsule and name-spaces” . With our example,the first element of local_vars gives the number of "tag" names local to the UNIT, the second gives the number of "token" names local to the UNIT etc. These will include all the names used in the body of the UNIT. Each declaration of a TAG, for example, will use a new number from the "tag" name-space; there is no hiding or reuse of names within a UNIT.

2.1.4. LINK

Connections between the CAPSULE name-spaces and the UNIT name-spaces are made by LINKs in the lks parameter of make_unit:

lks: SLIST(LINKS)

Once again, lks is effectively indexed by the kind of name-space a. Each LINKS is an SLIST of LINKs each of which which establish an identity between names in the CAPSULE name-space and names in the UNIT name-space. Thus if the first element of lks contains:

make_link(42, 4)

then, the UNIT "tag" 42 is identical to the CAPSULE "tag" 4.

Note that names from the CAPSULE name-space only arise in two places, LINKs and LINK_EXTERNs. Every other use of names are derived from some UNIT name-space.

2.2. Definitions and declarations

The encoding in the properties:BYTSTREAM parameter of a UNIT is a PROPS , for which there are five constructors corresponding to the kinds of UNITs in core-TDF, make_al_tagdefs, make_tagdecs, make_tagdefs, make_tokdefs and make_tokdecs. Each of these will declare or define names in the appropriate UNIT name-space which can be used by make_link in the UNIT's lks parameter as well as elsewhere in the properties parameter. The distinction between "declarations" and "definitions" is rather similar to C usage; a declaration provides the "type" of a name, while a definition gives its meaning. For tags, the "type" is the SORT SHAPE (see below). For tokens, the "type" is a SORTNAME constructed from the SORTNAMEs of the parameters and result of the TOKEN using token:

params: LIST(SORTNAME)
result: SORTNAME
        → SORTNAME

Taking make_tagdefs as a paradigm for PROPS, we have:

no_labels: TDFINT
tds:       SLIST(TAGDEF)
           → TAGDEF_PROPS

The no_labels parameter introduces the size of yet another name-space local to the PROPS, this time for the LABELs used in the TAGDEFs. Each TAGDEF in tds will define a "tag" name in the UNIT's name-space. The order of these TAGDEFs is immaterial since the initialisations of the tags are values which can be solved at translate time, load time or as unordered dynamic initialisations.

There are three constructors for TAGDEFs, each with slightly different properties. The simplest is make_id_tagdef:

t:         TDFINT
signature: OPTION(STRING)
e:         EXP x
           → TAGDEF

Here, t is the tag name and the evaluation of e will be the value of SHAPE x of an obtain_tag(t) in an EXP. Note that t is not a variable; the value of obtain_tag(t) will be invariant. The signature parameter gives a STRING (see Section 2.2.3, “STRING”) which may be used as an name for the tag, external to TDF and also as a check introduced by the producer that a tagdef and its corresponding tagdec have the same notion of the language-specific type of the tag.

The two other constructors for TAGDEF, make_var_tagdef and common_tagdef both define variable tags and have the same signature:

t:          TDFINT
opt_access: OPTION(ACCESS)
signature:  OPTION(STRING)
e:          EXP x
            → TAGDEF

Once again t is tag name but now e is initialisation of the variable t. A use of obtain_tag(t) will give a pointer to the variable (of SHAPE POINTER x), rather than its contents. [3] There can only be one make_var_tagdef of a given tag in a program, but there may be more than one common_tagdef, possibly with different initialisations; however these initialisations must overlap consistently just as in common blocks in FORTRAN.

The ACCESS parameter gives various properties required for the tag being defined and is discussed in Section 4.3.2, “ACCESS” .

The initialisation EXPs of TAGDEFs will be evaluated before the "main" program is started. An initialiation EXP must either be a constant (in the sense of Chapter 8, Constants) or reduce to (either directly or by token or _cond expansions) to an initial_value:

init: EXP s
      → EXP s

The translator will arrange that init will be evaluated once only before any procedure application, other than those themselves involved in initial_values, but after any constant initialisations. The order of evaluation of different initial_values is arbitrary.

2.2.1. Scopes and linking

Only names introduced by AL_TAGDEFS, TAGDEFS, TAGDECs, TOKDECs and TOKDEFs can be used in other UNITs (and then, only via the lks parameters of the UNITs involved). You can regard them as being similar to C global declarations. Token definitions include their declarations implicitly; however this is not true of tags. This means that any CAPSULE which uses or defines a tag across UNITs must include a TAGDEC for that tag in its "tagdecs" UNITs. A TAGDEC is constructed using either make_id_tagdec , make_var_tagdec or common_tagdec, all with the same form:

t_intro:   TDFINT
acc:       OPTION(ACCESS)
signature: OPTION(STRING)
x:         SHAPE
           → TAGDEC

Here the tagname is given by t_intro; the SHAPE x will defined the space and alignment required for the tag (this is analogous to the type in a C declaration). The acc field will define certain properties of the tag not implicit in its SHAPE; I shall return to the kinds of properties envisaged in discussing local declarations in Section 4.3, “Defining and using locals” .

Most program will appear in the "tagdefs" UNITs - they will include the definitions of the procedures of the program which in turn will include local definitions of tags for the locals of the procedures.

The standard TDF linker allows one to link CAPSULEs together using the name identifications given in the LINKEXTERNs, perhaps hiding some of them in the final CAPSULE. It does this just by generating a new CAPSULE name-space, grouping together component UNITs of the same kind and replacing their lks parameters with values derived from the new CAPSULE name-space without changing the UNITs' name-spaces or their props parameters. The operation of grouping together UNITs is effectively assumed to be associative, commutative and idempotent e.g. if the same tag is declared in two capsules it is assumed to be the same thing . It also means that there is no implied order of evaluation of UNITs or of their component TAGDEFs

Different languages have different conventions for deciding how programs are actually run. For example, C requires the presence of a suitably defined "main" procedure; this is usually enforced by requiring the system ld utility to bind the name "main" along with the definitions of any library values required. Otherwise, the C conventions are met by standard TDF linking. Other languages have more stringent requirements. For example, C++ requires dynamic initialisation of globals, using initial_value. As the only runnable code in TDF is in procedures, C++ would probably require an additional linking phase to construct a "main" procedure which calls the initialisation procedures of each CAPSULE involved if the system linker did not provide suitable C++ linking.

2.2.2. Declaration and definition signatures

The signature arguments of TAGDEFs and TAGDECs are designed to allow a measure of cross-UNIT checking when linking independently compiled CAPSULEs. Suppose that we have a tag, t, used in one CAPSULE and defined in another; the first CAPSULE would have to have a TAGDEC for t whose TAGDEF is in the second. The signature STRING of both could be arranged to represent the language-specific type of t as understood at compilation-time. Clearly, when the CAPSULEs are linked the types must be identical and hence their STRING representation must be the same - a translator will reject any attempt to link definitions and declarations of the same object with different signatures.

Similar considerations apply to TOKDEFs and TOKDECs; the "type" of a TOKEN may not have any familiar analogue in most HLLs, but the principle remains the same.

2.2.3. STRING

The SORT STRING is used in various constructs other than declarations and definitions. It is a first-class SORT with string_apply_token and string_cond. A primitive STRING is constructed from a TDFSTRING(k,n) which is an encoding of n integers,each of k bits, using make_string:

arg: TDFSTRING(k, n)
     → STRING(k, n)

STRINGs may be concatenated using concat_string:

arg1: STRING(k, n)
arg2: STRING(k,m)
      → STRING(k, n+m)

Being able to compose strings, including token applications etc, means that late-binding is possible in signature checking in definitions and declarations. This late-binding means that the representation of platform-dependent HLL types need only be fully expanded at install-time and hence the types could be expressed in their representational form on the specific platform.



[2] The "tld" UNITs gives usage information for namess to aid the linker, tld, to discover which namess have definitions and some usage information. The C producer also optionally constructs "diagnostics" UNITs (to give run-time diagnostic information).

[3] There is a similar distinction between tags introduced to be locals of a procedure using identify and variable (see Section 4.3.1, “identify, variable”)

Chapter 3. SHAPEs, ALIGNMENTs and OFFSETs

In most languages there is some notion of the type of a value. This is often an uncomfortable mix of a definition of a representation for the value and a means of choosing which operators are applicable to the value. The TDF analogue of the type of value is its SHAPE (S3.20). A SHAPE is only concerned with the representation of a value, being an abstraction of its size and alignment properties. Clearly an architecture-independent representation of a program cannot say, for example, that a pointer is 32 bits long; the size of pointers has to be abstracted so that translations to particular architectures can choose the size that is apposite for the platform.

3.1. Shapes

There are ten different basic constructors for the SORT SHAPE from bitfield to top as shown in table 3 (FIXME: where is table 3?). SHAPEs arising from those constructors are used as qualifiers (just using an upper case version of the constructor name) to various SORTs in the definition; for example, EXP TOP is an expression with top SHAPE. This is just used for definitional purposes only; there is no SORT SHAPENAME as one has SORTNAME.

In the TDF specification of EXPs, you will observe that all EXPs in constructor signatures are all qualified by the SHAPE name; for example, a parameter might be EXP INTEGER(v). This merely means that for the construct to be meaningful the parameter must be derived from a constructor defined to be an EXP INTEGER(v). You might be forgiven for assuming that TDF is hence strongly-typed by its SHAPEs. This is not true; the producer must get it right. There are some checks in translators, but these are not exhaustive and are more for the benefit of translator writers than for the user. A tool for testing the SHAPE correctness of a TDF program would be useful but has yet to be written.

3.1.1. TOP, BOTTOM, LUB

Two of the SHAPE constructions are rather specialised; these are TOP and BOTTOM. The result of any expression with a TOP shape will always be discarded; examples are those produced by assign and integer_test . A BOTTOM SHAPE is produced by an expression which will leave the current flow of control e.g. goto . The significance of these SHAPEs only really impinges on the computation of the shapes of constructs which have alternative expressions as results. For example, the result of conditional is the result of one of its component expressions. In this case, the SHAPE of the result is described as the LUB of the SHAPEs of the components. This simply means that if one of the component SHAPEs is TOP then the resulting SHAPE is TOP; if one is BOTTOM then the resulting SHAPE is the SHAPE of the other; otherwise both component SHAPEs must be equal and is the resulting SHAPE. Since this operation is associative, commutative and idempotent, we can speak quite unambiguously of the LUB of several SHAPEs.

3.1.2. INTEGER

Integer values in TDF have shape INTEGER(v) where v is of SORT VARIETY. The constructor for this SHAPE is integer with a VARIETY parameter. The basic constructor for VARIETY is var_limits which has a pair of signed natural numbers as parameters giving the limits of possible values that the integer can attain. The SHAPE required for a 32 bit signed integer would be:

integer(var_limits(-231, 231-1))

while an unsigned char is:

integer(var_limits(0, 255))

A translator should represent each integer variety by an object big enough (or bigger) to contain all the possible values with limits of the VARIETY. That being said, I must confess that most current translators do not handle integers of more than the maximum given naturally by the target architecture, but this will be rectified in due course.

The other way of constructing a VARIETY is to specify the number of bits required for its 2s-complemennt representation using var_width:

signed_width: BOOL
width:        NAT
              → VARIETY

3.1.3. FLOATING and complex

Similarly, floating point and complex numbers have shape FLOATING qualified by a FLOATING_VARIETY.

A FLOATING_VARIETY for a real number is constructed using fvar_parms:

base:             NAT
mantissa_digits:  NAT
minimum_exponent: NAT
maximum_exponent: NAT
                  → FLOATING_VARIETY

A FLOATING_VARIETY specifies the base, number of mantissa digits, and maximum and minimum exponent. Once again, it is intended that the translator will choose a representation which will contain all possible values, but in practice only those which are included in IEEE float, double and extended are actually implemented.

Complex numbers have a floating variety constructed by complex_parms which has the the same signature as fvar_parms. The representation of these numbers is likely to be a pair of real numbers each defined as if by fvar_parms with the same arguments. The real and imaginary parts of of a complex number can be extracted using real_part and imaginary_part; these could have been injected ito the complex number using make_complex or any of the complex operations. Many translators will simply transform complex numbers into COMPOUNDs consisting of two floating point numbers, transforming the complex operations into floating point operations on the fields.

3.1.4. BITFIELD

A number of contiguous bits have shape BITFIELD, qualified by a BITFIELD_VARIETY (S3.4) which gives the number of bits involved and whether these bits are to be treated as signed or unsigned integers. Current translators put a maximum of 32 or 64 on the number of bits.

3.1.5. PROC

The representational SHAPEs of procedure values is given by PROC with constructor proc . I shall return to this in the description of the operations which use it.

3.1.6. Non-primitive SHAPEs

The construction of the other four SHAPEs involves either existing SHAPEs or the alignments of existing SHAPEs. These are constructed by compound, nof , offset and pointer. Before describing these, we require a digression into what is meant by alignments and offsets.

3.2. Alignments

In most processor architectures there are limitations on how one can address particular kinds of objects in convenient ways. These limitations are usually defined as part of the ABI for the processor. For example, in the MIPS® processor the fastest way to access a 32-bit integer is to ensure that the address of the integer is aligned on a 4-byte boundary in the address space; obviously one can extract a mis-aligned integer but not in one machine instruction. Similarly, 16-bit integers should be aligned on a 2-byte boundary. In principle, each primitive object could have similar restrictions for efficient access and these restrictions could vary from platform to platform. Hence, the notion of alignment has to be abstracted to form part of the architecture independent TDF - we cannot assume that any particular alignment regime will hold universally.

The abstraction of alignments clearly has to cover compound objects as well as primitive ones like integers. For example, if a field of structure in C is to be accessed efficiently, then the alignment of the field will influence the alignment of the structure as whole; the structure itself could be a component of a larger object whose alignment must then depend on the alignment of the structure and so on. In general, we find that a compound alignment is given by the maximum alignment of its components, regardless of the form of the compound object e.g. whether it is a structure, union, array or whatever.

This gives an immediate handle on the abstraction of the alignment of a compound object - it is just the set of abstractions of the alignments of its components. Since "maximum" is associative, commutative and idempotent, the component sets can be combined using normal set-union rules. In other words, a compound alignment is abstracted as the set of alignments of the primitive objects which make up the compound object. Thus the alignment abstraction of a C structure with only float fields is the singleton set containing the alignment of a float while that of a C union of an int and this structure is a pair of the alignments of an int and a float.

3.2.1. ALIGNMENT constructors

The TDF abstraction of an alignment has SORT ALIGNMENT. The constructor, unite_alignments, gives the set-union of its ALIGNMENT parameters; this would correspond to taking a maximum of two real alignments in the translator.

The constructor , alignment, gives the ALIGNMENT of a given SHAPE according to the rules given in the definition. These rules effectively define the primitive ALIGNMENTs as in the ALIGNMENT column of table 3. Those for PROC, all OFFSETs and all POINTERs are constants regardless of any SHAPE qualifiers. Each of the INTERGER VARIETYs, each of the FLOATING VARIETYs and each of the BITFIELD VARIETYs have their own ALIGNMENTs. These ALIGNMENTs will be bound to values apposite to the particular platform at translate-time. The ALIGNMENT of TOP is conventionally taken to be the empty set of ALIGNMENTs (corresponding to the minimum alignment on the platform).

The alignment of a procedure parameter clearly has to include the alignment of its SHAPE; however, most ABIs will mandate a greater alignment for some SHAPEs e.g. the alignment of a byte parameter is usually defined to be on a 32-bit rather than an 8-bit boundary. The constructor, parameter_alignment, gives the ALIGNMENT of a parameter of given SHAPE.

3.2.2. Special alignments

There are several other special ALIGNMENTs.

The alignment of a code address is {code} given by code_alignment; this will be the alignment of a pointer given by make_local_lv giving the value of a label.

The other special ALIGNMENTs are considered to include all of the others, but remain distinct. They are all concerned with offsets and pointers relevant to procedure frames, procedure parameters and local allocations and are collectively known as frame alignments. These frame alignments differ from the normal alignments in that their mapping to a given architecture is rather more than just saying that it describes some n-bit boundary. For example, alloca_alignment describes the alignment of dynamic space produced by local_alloc (roughly the C alloca). Now, an ABI could specify that the alloca space is a stack disjoint from the normal procedure stack; thus manipulations of space at alloca_alignment may involve different code to space generated in other ways.

Similar considerations apply to the other special alignments, callees_alignment(b), callers_alignment(b) and locals_alignment. The first two give the alignments of the bases of the two different parameter spaces in procedures (q.v.) and locals_alignment gives the alignment of the base of locally declared tags within a procedure. The exact interpretation of these depends on how the frame stack is defined in the target ABI, e.g. does the stack grow downwards or upwards?

The final special alignment is var_param_alignment. This describes the alignment of a special kind of parameter to a procedure which can be of arbitrary length (see Section 4.1.1, “vartag, varparam .

3.2.3. AL_TAG, make_al_tagdef

Alignments can also be named as AL_TAGs using make_al_tagdef. There is no corresponding make_al_tagdec since AL_TAGs are implicitly declared by their constructor, make_al_tag. The main reason for having names for alignments is to allow one to resolve the ALIGNMENTs of recursive data structures. If, for example, we have mutually recursive structures, their ALIGNMENTs are best named and given as a set of equations formed by AL_TAGDEFs. A translator can then solve these equations trivially by substitution; this is easy because the only significant operation is set-union.

3.3. Pointer and offset SHAPEs

A pointer value must have a form which reflects the alignment of the object that it points to; for example, in the MIPS® processor, the bottom two bits of a pointer to an integer must be zero. The TDF SHAPE for a pointer is POINTER qualified by the ALIGNMENT of the object pointed to. The constructor pointer uses this alignment to make a POINTER SHAPE.

3.3.1. OFFSET

Expressions which give sizes or offsets in TDF have an OFFSET SHAPE. These are always described as the difference between two pointers. Since the alignments of the objects pointed to could be different, an OFFSET is qualified by these two ALIGNMENTs. Thus an EXP OFFSET(X,Y) is the difference between an EXP POINTER(X) and an EXP POINTER(Y). In order for the alignment rules to apply, the set X of alignments must include Y. The constructor offset uses two such alignments to make an OFFSET SHAPE. However, many instances of offsets will be produced implicitly by the offset arithmetic, e.g., offset_pad:

a:    ALIGNMENT
arg1: EXP OFFSET(z, t)
      → EXP OFFSET(za, a)

This gives the next OFFSET greater or equal to arg1 at which an object of ALIGNMENT a can be placed. It should be noted that the calculation of shapes and alignments are all translate-time activities; only EXPs should produce runnable code. This code, of course, may depend on the shapes and alignments involved; for example, offset_pad might round up arg1 to be a multiple of four bytes if a was an integer ALIGNMENT and z was a character ALIGNMENT. Translators also do extensive constant analysis, so if arg1 was a constant offset, then the round-off would be done at translate-time to produce another constant.

3.4. Compound SHAPEs

The alignments of compound SHAPEs (i.e. those arising from the constructors compound and nof) are derived from the constructions which produced the SHAPE. To take the easy one first, the constructor nof has signature:

n: NAT
s: SHAPE
   → SHAPE

This SHAPE describes an array of n values all of SHAPE s; note that n is a natural number and hence is a constant known to the producer. Throughout the definition this is referred to as the SHAPE NOF(n, s). The ALIGNMENT of such a value is alignment(s); i.e. the alignment of an array is just the alignment of its elements.

The other compound SHAPEs are produced using compound:

sz: EXP OFFSET(x, y)
    → SHAPE

The sz parameter gives the minimum size which can accommodate the SHAPE.

3.4.1. Offset arithmetic with compound shapes

The constructors offset_add , offset_zero and shape_offset are used together with offset_pad to implement (inter alia) selection from structures represented by COMPOUND SHAPEs. Starting from the zero OFFSET given by offset_zero, one can construct an EXP which is the offset of a field by padding and adding offsets until the required field is reached. The value of the field required could then be extracted using component or add_to_ptr. Most producers would define a TOKEN for the EXP OFFSET of each field of a structure or union used in the program simply to reduce the size of the TDF

The SHAPE of a C structure consisting of an char followed by an int would require x to be the set consisting of two INTEGER VARIETYs, one for int and one for char, and sz would probably have been constructed like:

sz = offset_add(offset_pad(int_al, shape_offset(char)), shape_offset(int))

The various rules for the ALIGNMENT qualifiers of the OFFSETs give the required SHAPE; these rules also ensure that offset arithmetic can be implemented simply using integer arithmetic for standard architectures (see Section 12.1, “Model for a 32-bit standard architecture”). Note that the OFFSET computed here is the minimum size for the SHAPE. This would not in general be the same as the difference between successive elements of an array of these structures which would have SHAPE OFFSET(x, x) as produced by offset_pad(x, sz). For examples of the use of OFFSETs to access and create structures, see Chapter 11, TDF expansions of offsets .

3.4.2. offset_mult

In C, all structures have size known at translate-time. This means that OFFSETs for all field selections of structures and unions are translate-time constants; there is never any need to produce code to compute these sizes and offsets. Other languages (notably Ada) do have variable size structures and so sizes and offsets within these structures may have to be computed dynamically. Indexing in C will require the computation of dynamic OFFSETs; this would usually be done by using offset_mult to multiply an offset expression representing the stride by an integer expression giving the index:

arg1: EXP OFFSET(x, x)
arg2: EXP INTEGER(v)
      → EXP OFFSET(x, x)

and using add_to_ptr with a pointer expression giving the base of the array with the resulting OFFSET.

3.4.3. OFFSET ordering and representation

There is an ordering defined on OFFSETs with the same alignment qualifiers, as given by offset_test and offset_max having properties like:

shape_offset(S) ≥ offset_zero(alignment(S))
A ≥ B    iff offset_max(A,B) = A
offset_add(A, B) ≥ A     where B ≥ offset_zero(some compatible alignment)

In most machines, OFFSETs would be represented as single integer values with the OFFSET ordering corresponding to simple integer ordering. The offset_add constructor just translates to simple addition with offset_zero as 0 with similar correspondences for the other offset constructors. You might well ask why TDF does not simply use integers for offsets, instead of introducing the rather complex OFFSET SHAPE. The reasons are two-fold. First, following the OFFSET arithmetic rules concerned with the ALIGNMENT qualifiers will ensure that one never extracts a value from a pointer with the wrong alignment by, for example, applying contents to an add_to_pointer. This frees TDF from having to define the effect of strange operations like forming a float by taking the contents of a pointer to a character which may be mis-aligned with respect to floats - a heavy operation on most processors. The second reason is quite simple; there are machines which cannot represent OFFSETs by a single integer value.

The iAPX-432 is a fairly extreme example of such a machine; it is a "capability" machine which must segregate pointer values and non-pointer values into different spaces. On this machine a value of SHAPE POINTER({pointer, int}) (e.g. a pointer to a structure containing both integers and pointers) could have two components; one referring to the pointers and another to the integers. In general, offsets from this pointer would also have two components, one to pick out any pointer values and the other the integer values. This would obviously be the case if the original POINTER referred to an array of structures containing both pointers and integers; an offset to an element of the array would have SHAPE OFFSET({pointer, int},{pointer , int}); both elements of the offset would have to be used as displacements to the corresponding elements of the pointer to extract the structure element. The OFFSET ordering is now given by the comparison of both displacements. Using this method, one finds that pointers in store to non-pointer alignments are two words in different blocks and pointers to pointer-alignments are four words, two in one block and two in another. This sounds a very unwieldy machine compared to normal machines with linear addressing. However, who knows what similar strange machines will appear in future; the basic conflicts between security, integrity and flexibility that the iAPX-432 sought to resolve are still with us. For more on the modelling of pointers and offsets see Chapter 12, Models of the TDF algebra .

3.5. BITFIELD alignments

Even in standard machines, one finds that the size of a pointer may depend on the alignment of the data pointed at. Most machines do not allow one to construct pointers to bits with the same facility as other alignments. This usually means that pointers in memory to BITFIELD VARIETYs must be implemented as two words with an address and bit displacement. One might imagine that a translator could implement BITFIELD alignments so that they are the same as the smallest natural alignment of the machine and avoid the bit displacement, but this is not the intention of the definition. On any machine for which it is meaningful, the alignment of a BITFIELD must be one bit; in other words successive BITFIELDs are butted together with no padding bits. [4] Within the limits of what one can extract from BITFIELDs, namely INTEGER VARIETYs, this is how one should implement non-standard alignments, perhaps in constructing data, such as protocols, for exchange between machines. One could implement some Ada representational statements in this way; certainly the most commonly used ones.

TDF Version 3.0 does not allow one to construct a pointer of SHAPE POINTER(b) where b consists entirely of bitfield alignments; this relieves the translators of the burden of doing general bit-addressing. Of course, this simply shifts the burden to the producer. If the high level language requires to construct a pointer to an arbitrary bit position, then the producer is required to represent such a pointer as a pair consisting of pointer to some alignment including the required bitfield and an offset from this alignment to the bitfield. For example, Ada may require the construction of a pointer to a boolean for use as the parameter to a procedure; the SHAPE of the rep resentation of this Ada pointer could be a COMPOUND formed from a POINTER({x,b}) and an OFFSET({x, b}, b) where b is the alignment given by a 1 bit alignment. To access the boolean, the producer would use the elements of this pair as arguements to bitfield_assign and bitfield_contents (Section 6.4, “Assigning and extracting bitfields”).



[4] Note that is not generally true for C bitfields; most C ABIs have (different) rules for putting in padding bits depending on the size of the bitfield and its relation with the natural alignments. This is a fruitful source of errors in data exchange between different C ABIs For more on similar limitations of bitfields in TDF (see Section 6.4, “Assigning and extracting bitfields”).

Chapter 4. Procedures and Locals

All procedures in TDF are essentially global; the only values which are accessible from the body of a procedure are those which are derived from global TAGs (introduced by TAGDEFs or TAGDECs), local TAGs defined within the procedure and parameter TAGs of the procedure.

All executable code in TDF will arise from an EXP PROC made by either make_proc or make_general_proc. They differ in their treatment of how space for the actual parameters of a call is managed; in particular, is it the caller or the callee which deallocates the parameter space?

With make_proc, this management is conceptually done by the caller at an apply_proc; i.e. the normal C situation. This suffers from the limitation that tail-calls of procedures are then only possible in restricted circumstances (e.g. the space for the parameters of the tail-call must be capable of being included in caller's parameters) and could only be implemented as an optimisation within a translator. A producer could not predict these circumstances in a machine independent manner, whether or not it knew that a tail-call was valid.

An alternative would be to make the management of parameter space the responsibility of the called procedure. Rather than do this, make_general_proc (and apply_general_proc) splits the parameters into two sets, one whose allocation is the responsibility of the caller and the other whose allocation is dealt with by the callee. This allows an explicit tail_call to be made to a procedure with new callee parameters; the caller parameters for the tail_call will be the same as (or some initial subset of) the caller parameters of the procedure containing the tail_call .

A further refinement of make_general_proc is to allow access to the caller parameter space in a postlude at the call of the procedure using an apply_general_proc. This allows simple implementations of Ada out_parameters, or more generally, multiple results of procedures.

4.1. make_proc and apply_proc

The make_proc constructor has signature:

result_shape: SHAPE
params_intro: LIST(TAGSHACC)
var_intro:    OPTION(TAGACC)
body:         EXP BOTTOM
              → EXP PROC

The params_intro and var_intro parameters introduce the formal parameters of the procedure which may be used in body. The procedure result will have SHAPE result_shape and will be usually given by some return construction within body. The basic model is that space will be provided to copy actual parameters (into space supplied by some apply_proc) by value into these formals and the body will treat this space effectively as local variables.

Each straightforward formal parameter is introduced by an auxiliary SORT TAGSHACC using make_tagshacc:

sha:        SHAPE
opt_access: OPTION(LIST(ACCESS))
tg_intro:   TAG POINTER(alignment(sha))
            → TAGSHACC

Within body, the formal will be accessed using tg_intro; it is always considered to be a pointer to the space of SHAPE sha allocated by apply_proc, hence the pointer SHAPE.

For example, if we had a simple procedure with one integer parameter, var_intro would be empty and params_intro might be:

params_intro = make_tagshacc(integer(v), empty, make_tag(13))

Then, TAG 13 from the enclosing UNIT's name-space is identified with the formal parameter with SHAPE POINTER(INTEGER(v)). Any use of obtain_tag(make_tag(13)) in body will deliver a pointer to the integer parameter. I shall return to the meaning of opt_access and the ramifications of the scope and extent of TAGs involved in conjunction with local declarations in Section 4.3.1, “identify, variable” .

Procedures, whether defined by make_proc or make_general_proc, will usually terminate and deliver its result with a return:

arg1: EXP x
      → EXP BOTTOM

Here x must be identical to the result_shape of the call of the procedure There may be several returns in body; and the SHAPE x in each will be the same. Some languages allow different types to be returned depending on the particular call. The producer must resolve this issue. For example, C allows one to deliver void if the resulting value is not used. In TDF a dummy value must be provided at the return; for example make_value(result_shape)

Note that the body has SHAPE bottom since all possible terminations to a procedure have SHAPE BOTTOM..

Procedures defined by make_proc are called using apply_proc:

result_shape: SHAPE
arg1:         EXP PROC
arg2:         LIST(EXP)
varparam:     OPTION(EXP)
              → EXP result_shape

Here arg1 is the procedure to be called and arg2 gives the actual parameters. There must be at least as many actual parameters as given (with the same SHAPE) in the params_intro of the corresponding make_proc for arg1. [5] The values of arg2 will be copied into space managed by caller.

The SHAPE of the result of the call is given by result_shape which must be identical to the result_shape of the make_proc.

4.1.1. vartag, varparam

U se of the var_intro OPTION in make_proc and the corresponding varparam in apply_proc allows one to have a parameter of any SHAPE, possibly differing from call to call where the actual SHAPE can be deduced in some way by the body of the make_proc . One supplies an extra actual parameter, varparam, which usually would be a structure grouping some set of values. The body of the procedure can then access these values using the pointer given by the TAG var_intro, using add_to_ptr with some computed offsets to pick out the individual fields.

This is a slightly different method of giving a variable number of parameters to a procedure, rather than simply giving more actuals than formals. The principle difference is in the alignment of the components of varparam; these will be laid out according to the default padding defined by the component shapes. In most ABIs, this padding is usually different to the way parameters are laid out; for example, character parameters are generally padded out to a full word. Thus a sequence of parameters of given shape has a different layout in store to the same sequence of shapes in a structure. If one wished to pass an arbitrary structure to a procedure, one would use the varparam option rather passing the fields individually as extra actual parameters.

4.2. make_general_proc and apply_general_proc

A make_general_proc has signature:

result_shape: SHAPE
prcprops:     OPTION(PROCPROPS)
caller_intro: LIST(TAGSHACC)
callee_intro: LIST(TAGSHACC)
body:         EXP BOTTOM
              → EXP PROC

Here the formal parameters are split into two sets, caller_intro and callee_intro, each given by a list of TAGSHACCs just as in make_proc. The distinction between the two sets is that the make_general_proc is responsible for de_allocating any space required for the callee parameter set; this really only becomes obvious at uses of tail_call within body.

The result_shape and body have the same general properties as in make_proc. In addition prcprops gives other information both about body and the way that that the procedure is called. PROCPROPS are a set drawn from check_stack, inline, no_long_jump_dest, untidy, var_callees and var_callers. The set is composed using add_procprops. The PROCPROPS no_long_jump_dest is a property of body only; it indicates that none of the labels within body will be the target of a long_jump construct. The other properties should also be given consistently at all calls of the procedure; theu are discussed in Section 4.2.2, “PROCPROPS” .

A procedure, p, constructed by make_general_proc is called using apply_general_proc:

result_shape:  SHAPE
prcprops:      OPTION(PROCPROPS)
p:             EXP PROC
caller_params: LIST(OTAGEXP)
callee_params: CALLEES
postlude:      EXP TOP
               → EXP result_shape

The actual caller parameters are given by caller_params as a list of OTAGEXPs constructed using make_otagexp:

tgopt: OPTION(TAG x)
e:     EXP x
       → OTAGEXP

Here, e is the value of the parameter and tgopt, if present, is a TAG which will bound to the final value of the parameter (after body is evaluated) in the postlude expression of the apply_general_proc. [6] Clearly, this allows one to use a caller parameter as an extra result of the procedure; for example, as in Ada out-parameters.

The actual callee_params may be constructed in three different ways. The usual method is to use make_callee_list, giving a list of actual EXP parameters, corresponding to the caller_intro list in the obvious way.The constructor, same_callees allows one to use the callees of the current procedure as the callees of the call; this, of course, assumes that the formals of the current procedure are compatible with the formals required for the call The final method allows one to construct a dynamically sized set of CALLEES; make_dynamic_callees takes a pointer and a size (expressed as an OFFSET) to make the CALLEES; this will be used in conjunction with a var_callees PROCPROPS (see Section 4.2.2, “PROCPROPS”).

Some procedures can be expressed using either make_proc or make_general_proc. For example:

make_proc(S, L, empty, B) = make_general_proc(S, var_callers, L, empty, B)

4.2.1. tail_call

Often the result of a procedure, f, is simply given by the call of another (or the same) procedure, g. In appropriate circumstances, the same stack space can be used for the call of g as the call of f. This can be particularly important where heavily recursive routines are involved; some languages even use tail recursion as the preferred method of looping.

One condition for such a tail call to be applicable is knowing that g does not require any pointers to locals of f; this is often implicit in the language involved. Equally important is that the action on the return from f is indistiguishable from the return from g. For example, if it were the callers responsibility to pop the the space for the parameters on return from a call, then the tail call of g would only work if g had the same parameter space as f.

This is the justification for splitting the parameter set of a general proc; it is (at least conceptually) the caller's responsibility for popping the caller-parameters only - the callee-parameters are dealt with by the procedure itself. Hence we can define tail_call which uses the same caller-parameters, but a different set of callee-parameters:

prcprops:      OPTION(PROCPROPS)
p:             EXP PROC
callee_params: CALLEES
               → EXP BOTTOM

The procedure p will be called with the same caller parameters as the current procedure and the new callee_params and return to the call site of the current procedure. Semantically, if S is the return SHAPE of the current procedure, and L is its caller-parameters:

tail_call(P, p, C) = return(apply_general_proc(S, P, p, L, C, make_top()))

However an implementation is expected to conserve stack by using the same space for the call of p as the current procedure.

4.2.2. PROCPROPS

The presence of var_callees (or var_callers) means that the procedure can be called with more actual callee (or caller) parameters than are indicated in callee_intro (or caller_intro). These extra parameters would be accessed within body using offset calculations with respect to the named parameters. The offsets should be calculated using parameter_alignment to give the packing of the parameter packs.

The presence of untidy means that body may be terminated by an untidy_return. This returns the result of the procedure as in return, but the lifetime of the local space of the procedure is extended (in practice this is performed by not returning the stack to its original value at the call). A procedure containing an untidy_return is a generalisation of a local_alloc(see Section 4.3.4, “local_alloc, local_free_all, last_local”). For example the procedure could do some complicated local allocation (a triangular array, say) and untidily return a pointer to it so that the space is still valid in the calling procedure. The space will remain valid for the lifetime of the calling procedure unless some local_free is called within it, just as if the space had been generated by a local_alloc in the calling procedure.

The presence of inline is just a hint to the translator that the procedure body is a good candidate for inlining at the call.

The presence of check_stack means that the static stack requirements of the procedure will be checked on entry to see that they do not exceed the limits imposed by set_stack_limit; if they are exceeded a TDF exception with ERROR_CODE stack_overflow (see Section 5.3, “Exceptional flow”) will be raised.

4.3. Defining and using locals

4.3.1. identify, variable

Local definitions within the body of a procedure are given by two EXP constructors which permit one to give names to values over a scope given by the definition. Note that this is somewhat different to declarations in standard languages where the declaration is usually embedded in a larger construct which defines the scope of the name; here the scope is explicit in the definition. The reason for this will become more obvious in the discussion of TDF transformations. The simpler constructor is identify:

opt_access: OPTION(ACCESS)
name_intro: TAG x
definition: EXP x
body:       EXP y
            → EXP y

The definition is evaluated and its result is identified with the TAG given by name_intro within its scope body. Hence the use of any obtain_tag(name_intro) within body is equivalent to using this result. Anywhere else, obtain_tag(name_intro) is meaningless, including in other procedures.

The other kind of local definition is variable:

opt_access: OPTION(ACCESS)
name_intro: TAG x
init:       EXP x
body:       EXP y
            → EXP y

Here the init EXP is evaluated and its result serves as an initialisation of space of SHAPE x local to the procedure. The TAG name_intro is then identified with a pointer to that SPACE within body. A use of obtain_tag(name_intro) within body is equivalent to using this pointer and is meaningless outside body or in other procedures. Many variable declarations in programs are uninitialised; in this case, the init argument could be provided by make_value which will produce some value with SHAPE given by its parameter.

4.3.2. ACCESS

The ACCESS SORT given in tag declarations is a way of describing a list of properties to be associated with the tag. They are basically divided into two classes, one which describes global properties of the tag with respect to the model for locals and the other which gives "hints" on how the value will be used. Any of these can be combined using add_access.

4.3.2.1. Locals model

At the moment there are just three possibilities in the first class of ACCESS constructors. They are standard_access (the default) , visible, out_par and long_jump_access.

The basic model used for the locals and parameters of a procedure is a frame within a stack of nested procedure calls. One could implement a procedure by allocating space according to SHAPEs of all of the parameter and local TAGs so that the corresponding values are at fixed offsets either from the start of the frame or some pointer within it.

Indeed, if the ACCESS opt_access parameter in a TAG definition is produced by visible, then a translator is almost bound to do just that for that TAG. This is because it allows for the possibility of the value to be accessed in some way other than by using obtain_tag which is the standard way of recovering the value bound to the TAG. The principal way that this could happen within TDF is by the combined use of env_offset to give the offset and current_env to give a pointer to the current frame (see Section 4.3.3, “current_env, env_offset”).

The out_par ACCESS is only applicable to caller parameters of procedures; it indicates that the value of the TAG concerned will accessed by the postlude part of an apply_general_proc. Hence, the value of the parameter must be accessible after the call; usually this will be on the stack in the callers frame.

The long_jump_access flag is used to indicate that the tag must be available after a long_jump. In practice, if either visible or long_jump_access is set, most translators would allocate the space for the declaration on the main-store stack rather than in an available register. If it is not set, then a translator is free to use its own criteria for whether space which can fit into a register is allocated on the stack or in a register, provided there is no observable difference (other than time or program size) between the two possibilities.

Some of these criteria are rather obvious; for example, if a pointer to local variable is passed outside the procedure in an opaque manner, then it is highly unlikely that one can allocate the variable in a register. Some might be less obvious. If the only uses of a TAG t was in obtain_tag(t)s which are operands of contents or the left-hand operands of assigns , most ABIs would allow the tag to be placed in a register. We do not necessarily have to generate a pointer value if it can be subsumed by the operations available.

4.3.2.2. Access "hints"

A variable tag with ACCESS constant is a write-once value; once it is initialised the variable will always contain the initialisation. In other words the tag is a pointer to a constant value; translators can use this information to apply various optimisations.

A POINTER tag with ACCESS no_other_read or no_other_write is asserting that there are no "aliassed" accesses to the contents of the pointer. For example, when applied to a parameter of a procedure, it is saying that the original pointer of the tag is distinct from any other tags used (reading/writing) in the lifetime of the tag. These other tags could either be further parameters of the procedure or globals. Clearly, this is useful for describing the limitations imposed by Fortran parameters, for example.

4.3.3. current_env, env_offset

The constructor current_env gives a pointer to the current procedure frame of SHAPE POINTER(fa) where fa is depends on how the procedure was defined and will be some set of the special frame ALIGNMENTs. This set will always include locals_alignment - the alignment of any locals defined within the procedure. If the procedure has any caller- parameters, the set will also include callers_alignment(b) where b indicates whether there can be a variable number of them; similarly for callee-parameters.

Offsets from the current_env of a procedure to a tag declared in the procedure are constructed by env_offset:

fa: ALIGNMENT
y:  ALIGNMENT
t:  TAG x
    → EXP OFFSET(fa,y)

The frame ALIGNMENT fa will be the appropriate one for the TAG t; i.e. if t is a local then the fa will be locals_alignment; if t is a caller parameter, fa will be callers_alignment(b); if t is a callee_parameter, fa will be callees_alignment(b). The alignment y will be the alignment of the initialisation of t.

The offset arithmetic operations allow one to access the values of tags non-locally using values derived from current_env and env_offset. They are effectively defined by the following identities:

If TAG t is derived from a variable definition:

add_to_ptr(current_env(), env_offset(locals_alignment, A, t)) = obtain_tag(t)

if TAG t is derived from an identify definition:

contents(S, add_to_ptr(current_env(), env_offset(locals_alignment, A, t))) = obtain_tag(t)

if TAG t is derived from a caller parameter:

add_to_ptr(current_env(), env_offset(callers_alignment(b), A, t)) = obtain_tag(t)

if TAG t is derived from a callee parameter:

add_to_ptr(current_env(), env_offset(callees_alignment(b), A, t)) = obtain_tag(t)

These identities are valid throughout the extent of t, including in inner procedure calls. In other words, one can dynamically create a pointer to the value by composing current_env and env_offset.

The importance of this is that env_offset(t) is a constant OFFSET and can be used anywhere within the enclosing UNIT, in other procedures or as part of constant TAGDEF; remember that the TDFINT underlying t is unique within the UNIT. The result of a current_env could be passed to another procedure (as a parameter, say) and this new procedure could then access a local of the original by using its env_offset. This would be the method one would use to access non-local, non-global identifiers in a language which allowed one to define procedures within procedures such as Pascal or Algol. Of course, given the stack-based model, the value given by current_env becomes meaningless once the procedure in which it is invoked is exited.

4.3.4. local_alloc, local_free_all, last_local

The size of stack frame produced by variable and identify definitions is a translate-time constant since the frame is composed of values whose SHAPEs are known. TDF also allows one to produce dynamically sized local objects which are conceptually part of the frame. These are produced by local_alloc:

arg1: EXP OFFSET(x, y)
      → EXP POINTER(alloca_alignment)

The operand arg1 gives the size of the new object required and the result is a pointer to the space for this object "on top of the stack" as part of the frame. The quotation marks indicate that a translator writer might prefer to maintain a dynamic stack as well as static one. There are some disadvantages in putting everything into one stack which may well out-weigh the trouble of maintaining another stack which is relatively infrequently used. If a frame has a known size, then all addressing of locals can be done using a stack-front register; if it is dynamically sized, then another frame-pointer register must be used - some ABIs make this easy but not all. The majority of procedures contain no local_allocs, so their addressing of locals can always be done relative to a stack-front; only the others have to use another register for a frame pointer.

The alignment of pointer result is alloca_alignment which must include all SHAPE alignments.

There are two constructors for releasing space generated by local_alloc. To release all such space generated in the current procedure one does local_free_all(); this reduces the size of the current frame to its static size.

The other constructor is local_free whch is effectively a "pop" to local_alloc's "push":

a: EXP OFFSET(x, y)
p: EXP POINTER(alloca_alignment)
   → EXP TOP

Here p must evaluate to a pointer generated either by local_alloc or last_local . The effect is to free all of the space locally allocated after p. The usual implementation (with a downward growing stack) of this is that p becomes the "top of stack" pointer

The use of a procedure with an untidy_return is just a generalisation of the idea of local_alloc and the space made available by its use can be freed in the same way as normal local allocations. Of course, given that it could be the result of the procedure it can be structured in an arbitrarily complicated way.

4.4. Heap storage

At the moment, there are no explicit constructors of creating dynamic off-stack storage in TDF. Any off-stack storage requirements must be met by the API in which the system is embedded, using the standard procedural interface. For example, the ANSI C API allows the creation of heap space using standard library procedures like malloc.



[5] The vararg construction in C are implemented by giving more actuals than formals; the extra parameters are accessed by offset arithmetic with a pointer to a formal, using parameter_alignment to pad the offsets.

[6] If a formal parameter is to be used in this way, it should be marked as having out_par ACCESS in its corresponding TAGSHACC in callers_intro.

Chapter 5. Control Flow within procedures

5.1. Unconditional flow

5.1.1. sequence

To perform a sequential set of operations in TDF, one uses the constructor sequence:

statements: LIST(EXP)
result:     EXP x
            → EXP x

Each of the statements are evaluated in order, throwing away their results. Then, result is evaluated and its result is the result of the sequence.

A translator is free to rearrange the order of evaluation if there is no observable difference other than in time or space. This applies anywhere I say "something is evaluated and then ...". We find this kind of statement in definitions of local variables in Section 4.3, “Defining and using locals” , and in the controlling parts of the conditional constructions below.

For a more precise discussion of allowable reorderings see (S7.14) .

5.2. Conditional flow

5.2.1. labelled, make_label

All simple changes of flow of control within a TDF procedure are done by jumps or branches to LABELs, mirroring what actually happens in most computers. There are three constructors which introduce LABELs; the most general is labelled which allows arbitrary jumping between its component EXPs:

placelabs_intro: LIST(LABEL)
starter:         EXP x
places:          LIST(EXP)
                 → EXP w

Each of the EXPs in places is labelled by the corresponding LABEL in placelabs_intro; these LABELs are constructed by make_label applied to a TDFINT uniquely drawn from the LABEL name-space introduced by the enclosing PROPS. The evaluation starts by evaluating starter; if this runs to completion the result of the labelled is the result of starter. If there is some jump to a LABEL in placelabs_intro then control passes to the corresponding EXP in places and so on. If any of these EXPS runs to completion then its result is the result of the labelled; hence the SHAPE of the result, w, is the LUB of the SHAPEs of the component EXPs.