CLOC (V00309)




                                   PREFACE
                                   ~~~~~~~

     This guide describes a computer program for analysing natural language
text.  The development work has been done by Birmingham University Computer
Centre in collaboration with the Department of English Language and
Literature.  The program is named CLOC and takes the form of a package of
facilities designed for ease of use by people with little or no computer
experience.  The package will be extended as new natural language techniques
are evolved.  It currently includes the production of sorted vocabulary lists,
word indexes, concordances, and the automatic discovery of collocations.  The
documentation in the following sections refers to mark 2M of the CLOC package
released during July  1986.  The name CLOC is an acronymn taken from the term
"ColLOCation".

                                                                        Page 2


                               ACKNOWLEDGEMENTS
                               ~~~~~~~~~~~~~~~~

     The package and its documentation were written at Birmingham by Mr.  A.
Reed.  The author would like to offer his thanks to friends and colleagues
who, by criticism and advice, have aided the writing of this guide and the
production of the package. To Professor J. McH. Sinclair of the Department
of English for requesting the program and suggesting several of the features;
to Dr.  J.  L.  Schonfelder for his enthusiasm and advice; and to Professors
Greaves,  Benson  (York,Toronto) and Brainerd (Toronto) whose interest and
support was most welcome.

                                                                        Page 3


                    BIRMINGHAM UNIVERSITY COMPUTER CENTRE
                               CLOC USERS GUIDE

                                   CONTENTS
                                   ~~~~~~~~
 1.   INTRODUCTION
 2.   PREPARATION OF TEXT
      2.1  The Character Set
      2.2  Capital and Small Letters
      2.3  Diacritical Marks
      2.4  Foreign Languages
      2.5  Text References
         2.5.1  Reading
         2.5.2  Printing
         2.5.3  Example
 3.   STRUCTURE OF A JOB
 4.   USING THE CLOC PACKAGE
      4.1  The command language
         4.1.1  The Control Statement Conventions
         4.1.2  Rules for Control Statements
         4.1.3  The -INSERT feature
         4.1.4  The -SEND feature
         4.1.5  The -NOSEND feature
      4.2  INPUT DETAILS
      4.3  Word definition commands
           ITEMIZE USING
           *LETTERS
           *PADDING
           *DEFERRED
           *SEPARATORS
           *READ AS SPACE
           *IGNORE
      4.4  Saving Text Files
           SAVE TEXT
           GET TEXT
      4.5  OUTPUT DETAILS

                                                                        Page 4


      4.6  Word selection commands
           EVERY WORD
           SELECT WORDS
           EXCLUDING
           INCLUDING
           *LIST OF WORDS
           *FREQUENCY
           *PATTERN
      4.7  Task selection commands
           WORDLIST
           INDEX
           CONCORDANCE
           CO-OCCURRENCE
           *PHRASE
           *SERIES
           *PATTERN
           COLLOCATIONS
           *SPAN
           *FREQUENCY
           EVERY COLLOCATE
           SELECTCOLLOCATE
           REJECTING
           ACCEPTING
           NOTE
           WRITETEXT
           NEWLINE
           NEWPAGE
           MESSAGE
           FINISH
 5.   EXAMPLES
 APPENDIX I    Messages Produced by the CLOC package
 APPENDIX II   References
 APPENDIX III  Glossary
 APPENDIX IV   CLOC Global Syntax Rules

                                                                        Page 5


                               CLOC USER GUIDE
                               ~~~~ ~~~~ ~~~~~
 1. INTRODUCTION
     CLOC is a package which will enable a novice computer user to analyse
Natural language text by computer.  This guide explains what the package can
do, and shows the reader how he can instruct it to carry out various tasks.
     The package can examine the vocabulary used by an author and be told to
print it in several ways.  The vocabulary, or a selected portion of it, can be
printed in order, as in a dictionary, or according to how frequently a word is
used.
     The primary purpose of the CLOC package is to produce collocations of
selected words.  These are frequently occurring patterns of words which appear
regularly within a text.  The package is capable of discovering these patterns
and will print the context of each occurrence in a style chosen by the user.
     CLOC will also produce a concordance of words selected from the text, to
show how an author uses the selected words.  The amount of context that is
printed can be freely chosen by the user, and the style in which the
concordance is printed can be selected in several simple and convenient ways.
     The user can guide and control the actions of the package by employing a
few simple commands, which are supplied to the package by way of statements in
a command language.  The following sections explain how this language can be
used to carry out the required tasks.
 2.   PREPARATION OF TEXT

 2.1  THE CHARACTER SET
     The text to be analysed must be converted from the printed page or spoken
word into a form that a computer can read.  A printed page could contain many
differing alphabets, use several type styles, and allow a large number of
special symbols.  As CLOC can only deal with (say)95 different characters, a
consistent scheme must be devised to convert every letter, punctuation mark,
diacritic and significant change of type style into one or more characters in
this small and restricted alphabet.
     This can be achieved by dividing the set of possible characters into
several mutually exclusive categories.  Use one category of characters to
compose words, use another to separate one word from the next, and so on.  For
English, the first category could include the alphabet A to Z, while the
second could include ?  , ;  "full stop", and "space".  For example, if you
are unlucky enough to prepare your text on punched cards you could represent
the phrase "Once upon a time" as follows:-

                  =ONCE UPON A TIME

     Notice that before every capital letter you should place some symbol (say
= ) to indicate that a capital letter comes next.  This can also be used when
a capital letter occurs inside a word as in "MacDonald".  This you could
represent as :

                  =MAC=DONALD
The CLOC package could be told that = is a special kind of letter so it could
read "=MAC=DONALD" as one word rather than the two words "MAC" and "DONALD".

In this document it will be assumed that text is prepared using the 95
printable characters of the ASCII alphabet.  This includes the upper and lower
case letters together with a large variety of other symbols.  All the CLOC
examples will use upper case letters for clarity only, in practice, CLOC

                                                                        Page 6


commands can be given in a mixture of upper and lower case.  The CLOC package
allows you to choose which characters are letters and which are not.  Normally
you would choose  abcdefghijklmnopqrstuvwxyz  as your alphabet.  Additional
special letters called "padding" and "deferred" can be defined to cope with
accents, apostophies, breathing marks, etc.

 2.2  CAPITAL AND SMALL LETTERS
     The CLOC package is designed to work with text containing a mixture of
upper and lower case letters.  Thus if you ask for a concordance of (say)
"the" you will get all the places where "the" occurs even when "The" is given
in the text.  Usually the case of the letters is not relevant, but if required
you can tell CLOC that the case of letters is significant, in which case "the"
and "The" will be counted as different words.  (see the ITEMISE USING command
for details).

 2.3  DIACRITICAL MARKS
     Marks placed above or below letters to indicate stress can be represented
by two characters, one for the letter and one for the particular mark.  For
example one could represent the following:

       cliche    as    clich1e    and on punched cards as    CLICH1E

where the symbol 1 is used to represent the acute accent.  This can be
combined with capitalisation as follows:

       Ecosse    as   1Ecosse     and on punched cards as    1=ECOSSE

Note that each diacritical mark must be considered as either a padding or a
deferred special "letter".


 2.4  FOREIGN LANGUAGES
     When the language of the text is not in the Roman alphabet, the letters
in the language must be converted to characters in the computer's alphabet.
Using Greek as an example and a systematic conversion changing   to A,   to B
and   to G etc.  we could write:-

                             as  POLEMIKOS

where each Greek letter is replaced by a Roman letter.  When the Natural
language being used contains words from another language they should be
carefully disinguished.  This can easily be achieved by prefixing each rarely
occurring foreign word with a special character.  Thus when using English
containing French words they could be distinguished by a  $  symbol.  For
example:-

     "In French, lard means bacon" could be coded as:

       In French, $lard means bacon

   or on punched cards as
       =IN =FRENCH, $LARD MEANS BACON

2.5  TEXT REFERENCES

                                                                        Page 7


    2.5.1  Reading
     This feature allows you to include in your text data the name of the
author, the section, chapter, and line number, etc.  in a manner similar to
the well known COCOA package.  The text references feature is only invoked
when you supply an INPUT DETAILS command with the REFS option in the
specification field.  Suppose you had supplied REFS<> this would tell the
package that your text references were enclosed by the the two characters <
and >.  Your text data could now include say <A DICKENS><P 1><L 1> which to
you would signify author Dickens, Page 1, Line 1.  Subsequent lines would
contain the text data for this section.  For the next page it is sufficient
for you to punch <P 2><L 1> as the author has not changed.  As the package
reads each line of text the line number is increased by 1, so the line number
should be reset to 1 when you start a new page.  Note that <L 1> refers to the
next line, hence no text data should occur after it on the same line.  Apart
~~~~ ~~~~
from this restriction, text references may be placed anywhere in your text
data.  It is, however, recommended that text references be placed on lines
separate from the text data itself.  Blank lines and lines containing text
references only are ignored.


The general form of a text reference is:

                aletter gap referenceb
                 ~~~~~~ ~~~ ~~~~~~~~~

           where letter is A or B or C ...... Z
                 ~~~~~~

               reference is any sequence of characters
               ~~~~~~~~~

               gap denotes one or more spaces
               ~~~

           and  ab  are the two characters specified on the REFSab part of the
                INPUT DETAILS command.  (See section 4.2.)


    Note
    ~~~~
          a)   When L is used as a letter, the reference must be a number
                                   ~~~~~~
               which will normally be 1.

          b)   An incorrect text reference will be ignored - you can therefore
               include a title of the form: a titleb which the package will
                                              ~~~~~
               ignore.

2.5.2 Printing
     Each line of a printed concordence can be prefixed with a detailed text
reference.  The CONCORDANCE ,COLLOCATIONS, CO-OCCURRENCE, INDEX and WRITETEXT
commands can contain the keyword REFS followed by letter number pairs, e.g.
                                                  ~~~~~~ ~~~~~~

                                                                        Page 8


REFS A4P2L3.  This example will cause the first 4 characters of the current
"A" reference, the first 2 of the "P" reference, and a 3 figure line number to
    ~~~~~~~~~                         ~~~~~~~~~
be placed before every printed citation.  Each entry will be separated from
the next by one space.  The general rule is that for every letter number pair
                                                           ~~~~~~ ~~~~~~
occuring after the REFS keyword the first number characters of the current
                                          ~~~~~~
value of the letter references are printed.
             ~~~~~~

     A minor variation occurs when a line number of say L2 is asked for, and
the given citation occurs at line 100 or greater.  On the first occasion that
this happens the package increases the request by 1, and subsequent references
are now considered to have been requested by L3.
     Note that in the first instance all reference letters are considered to
contain "spaces":  thus a request of say X4 will cause 4 spaces (+1 space) to
be printed.

 2.5.3     Example

           <   The  Small  Celandine  >
           <A WORDSWORTH><V 1><L 1>
           There is a Flower, the Lesser Celandine
           That Shrinks, like many more, from cold and rain;
           And, the first moment that the sun may shine,
           Bright as the sun itself, 'tis out again!
           <V 2><L 1>
           When hailstones have been falling, swarm on swarm,
           Or blasts the green field and the trees distress'd,
           Oft have I seen it muffled up from harm,
           In close self-shelter, like a Thing at rest.

3.   STRUCTURE OF A JOB
     The user must supply instructions to the package to tell it how to read
and analyse the text.  These instructions are prepared according to rules
described in Section 4, and must be supplied to the package in the following
order:

     (Optional) INPUT DETAILS command
     1.   Word definition commands

     2.   Word selection commands

     3.   Task selection commands, e.g. word listing, index, concordance,
          co-occurrence, collocation and writetext commands

     FINISH command

     The precise order in which the individual commands should be presented is
described in the Appendix.  The layout and effect of each command will be
described later.

                                                                        Page 9


     The input details command can be used to inform the package of any
special characteristics in the way the text has been prepared.

     Word definition commands inform the package how to interpret the text
data which is about to be read.  They define the constitution of a word in
                                                                   ~~~~
terms of an alphabet of characters.  The CLOC package assumes that a file of
                        ~~~~~~~~~~
text is composed of a series of distinguishable words, clearly separated from
each other by spaces, full stops, etc.

     Words selection commands are used to instruct the package to choose a
suitable collection of words, termed "nodes", which will be used by the task
commands during the analysis of the text.

     The word listing command causes the package to list the selected words
sorted into a chosen order.

     The index command will produce for each word a list of references to it's
position in the original text.

     The concordance command is used to command the package to print a
concordance of the selected words.  The style chosen for printing the
concordance can be chosen by the user.

     The co-occurrence command will produce a concordance of given phrases or
series of words.

     The collocations command is used to command the package to search the
context of the selected words, and to print the context of those which possess
frequently occurring neighbours.

     The writetext command will cause the original text to be printed out, but
each line will start with a text reference.

     The following example program illustrates how CLOC commands could appear
in a typical task.
                              ITEMIZE USING       CLOC
 word definition              *LETTERS            abcdefghijklmnopqrstuvwxyz
                              SELECT WORDS
 word selection               *FREQUENCY          (100 TO 500)
                              EXCLUDING
                              *LIST OF WORDS      i the but
 Concordance command          CONCORDANCE         KWIC, CITE 6 BY 6
 FINISH command               FINISH

     This example causes CLOC to do the following things:

(a) Read text composed of a series of words in the English alphabet

(b) Select a collection of words each of which has a frequency of occurrence
lying in the range 100 to 500 inclusive, with the exception of the words 'i'
'the' and 'but'

(c) Produce a Key Word In Context concordance of the chosen words, where each

                                                                       Page 10


word is surrounded by 12 words of context.

4. USING THE CLOC PACKAGE
     The package needs to be told what actions to perform and in which order
they should be done.  It is the function of the 'command language' to supply
this information in a clear and unambiguous way.  Each of the following
sections contains detailed information on how to instruct the package to carry
out certain actions.  The sections are described in the order in which they
should be presented to the package, when they are phrased in terms of the
'command language'.

 4.1 THE COMMAND LANGUAGE
     The package is controlled by a series of 'control statements' each of
which contains a 'command'.  These commands are obeyed by CLOC in the order in
which they are given.  Certain commands are optional and can be included when
the user requires more facilities than those provided by default.

    4.1.1   The control statement conventions
     CLOC commands must be prepared in a standard format.  A control statement
is notionally divided into two distinct 'fields', each of which occupies a
certain number of columns on a line.  The fields for CLOC control statements
are as follows:-

     (a) The control field occupying columns 1 to 15 inclusive.  Its function
             ~~~~~~~ ~~~~~
is to inform the package of an action to be performed, or to present the
package with further information.

     (b) The specification field occupying columns 16 to 80 inclusive.  This
             ~~~~~~~~~~~~~ ~~~~~
portion of the card supplies extra information to the instruction in the
control field.

 An example of a command is

           WORDLIST                   ALPHA
           control field              specification field

     This command instructs the package to sort the words into alphabetic
order, and to print them.

     The command WORDLIST must be placed in columns 1 to 15 of the line.  The
specification field, columns 16 to 80, of this command contains the sorting
criterion ALPHA, meaning 'alphabetic order'.

 4.1.2  Rules for Control Statements

     (a) Each control field must be written within columns 1 to 15.

     (b) The commands must be spelt correctly.  (Thus CONCORDENCE is not a
valid command!)

     (c) When columns 1 to 15 contain spaces only, the specification field of
the line is treated as a continuation of the specification field of the
previous command.

                                                                       Page 11


     (d) Keywords in the specification field must not contain spaces,nor be
split when continuing them onto the next line.

     (e) Some commands contain a star symbol (*) in column 1.  This indicates
that the command is subsidiary to the last unstarred command.  The starred
commands supply extra information to that supplied by the unstarred command
upon which they are dependent.  The star symbol is there for your convenience
to remaind you of the function of these commands, the symbol itself is
optional.

     (f) The right parenthesis ) can be used to terminate a control field.
The specification field is then deemed to start immediately following it.
Hence, one can write WORDLIST)ALPHA.  This feature is intended for use when
CLOC control statements are typed at a terminal rather than punched on cards.
Note that you can use a ")" in column 1 to stand for the 15 space continuation
field described above in part (c).

     (g) All CLOC control statements and keywords can be written in upper or
lower case (or a mixture of the two).  The examples in this guide are written
in upper case for clarity only.

     (h) The pseudo CLOC commands  -INSERT  -SEND  -NOSEND  do not take any
continuation lines.  They can therefore be placed anywhere in a sequence of
CLOC commands.

 4.1.3 The -INSERT feature.
     This allows you to include a file of prewritten CLOC commands.  You
could, for example, have a set of files each containing a different sequence
of word definition commands.  Alternatively you could have files containing
specific lists of words which you could -INSERT when required.

Example of use
~~~~~~~ ~~ ~~~
     a)   -INSERT          WORDDEF1
          SELECT WORDS

     b)   EXCLUDING
          -INSERT          VERBS

  General form
  ~~~~~~~ ~~~~
                -INSERT            filename
                                   ~~~~~~~~

     The contents of filename are used as if they appeared in
                     ~~~~~~~~
  place of the -INSERT command.

 Points to note
 ~~~~~~ ~~ ~~~~
     1. There are no continuation lines for this (pseudo) CLOC command.
        This allows filename to contain lists of words on continuation
                    ~~~~~~~~
        lines only. For example, if a file  A  contains:
            _______________is are was were be

                                                                       Page 12


            _______________up down in out

        we could write
            *LIST OF WORDS
            -INSERT        A
            _______________EXTRA-WORDS-HERE

        which would be interpreted as if you had written
            *LIST OF WORDS
            _______________is are was were be
            _______________up down in out
            _______________EXTRA-WORDS-HERE

       2. The syntax of filename depends on the computer that CLOC
                        ~~~~~~~~
          is implemented on.

       3. Lines taken from an -INSERT file will be copied onto the
          CLOC information and diagnostic file. An indication will be
          given as to where the lines came from.

       4. The contents of an -INSERT file may include other -INSERT commands.
          The depth of nesting is implementation dependent.

 4.1.4 The -SEND feature
     This (pseudo) CLOC command will cause subsequent CLOC commands to be sent
to the CLOC diagnostic and information file.

Example and General form
~~~~~~~ ~~~ ~~~~~~~ ~~~~
        -SEND

 4.1.5  The -NOSEND feature
      This (pseudo) CLOC command will stop the normal process of sending
 CLOC commands to the CLOC diagnostic and information file.

 Example and General form
 ~~~~~~~ ~~~ ~~~~~~~ ~~~~
        -NOSEND

 4.2  The INPUT DETAILS command (optional)
     This command allows you to specify the maximum width of lines of text
data;  to indicate the presence of text references;  to determine which parts
to skip;  to define an explicit newline symbol;  to include ignorable
comments;  and to specify rules about line continuation.
When this command is absent INPUT DETAILS  WIDTH80 is assumed.
 Examples:
 a)   INPUT DETAILS       WIDTH72
 b)   INPUT DETAILS       NEWLINE/,REFS<>
 c)   INPUT DETAILS       WIDTH128,NEWLINE/,CONTINUE+
 d)   INPUT DETAILS       COMMENT(),NEWLINE/,CONTINUE+,REFS<>
 General Form:
      INPUT DETAILS       WIDTHnumber,SKIPab,COMMENTab,NEWLINEa,
                          CONTINUEa,RUNOVER,REFSab
 Default value:

                                                                       Page 13


      INPUT DETAILS       WIDTH80
Parameters
~~~~~~~~~~
WIDTHnumber         default value WIDTH80
    At most number characters will be read for each line of text data.
            ~~~~~~
Trailing spaces will be removed by the package.  All characters which occur
after column number will be ignored.
             ~~~~~~

SKIPab
    When present this instruction causes the package to ignore all characters
between  a  and  b  inclusive.  This option withdraws characters  a  and  b 
from the available character set.

COMMENTab
     Words which occur between the pair of characters  aa  and the pair of
                                   ~~~~ ~~                         ~~~~ ~~
characters  bb  will not appear in the word count tables, but they will appear
in the context when a citation is printed.  This option withdraws the
characters  a  and  b  from the available character set.

NEWLINEa
     When present the character  a  represents a logical newline.  This option
allows more than one "line of text" to be placed on the same line.  Note that
 a  will also be inserted automatically at the end of each line.  This option
withdraws character  a  from the available character set.

CONTINUEa
     When CONTINUEa is present and  a  is found in the text, all characters
remaining on the line are ignored.  The next line is considered to replace the
ignored part, and to be on the same line.

RUNOVER
     When RUNOVER is present, and the text reading position is at the
end-of-line (i.e.  at the WIDTH number position), the end-of-line will not
                                                                       ~~~
terminate a word.  Hence the full width of line can be used to store text, and
~~~~~~~~~
words can run over onto the next line.

REFSab
     When present the package extracts text references of the form aletter
                                                                    ~~~~~~
referenceb from the text file.  This option withdraws characters  a  and  b 
~~~~~~~~~
from the available character set.

    4.3 WORD DEFINITION COMMANDS
     The CLOC package has been designed to read text punched according to many
differing conventions.  No matter how a text has been coded, the package
interprets it as an arbitrary series of words.  The composition of words is
                                        ~~~~~
left up to you, but CLOC needs to know what rules are used for constructing
words.  These rules embody a strategy for extracting words from the characters

                                                                       Page 14


in the text data.  The process of combining characters in this way is called
itemization and one must first select which itemizing strategy the package is
~~~~~~~~~~~
to use;  the rules of the strategy are supplied by subsidiary (starred)
commands.

ITEMIZE USING.  (The -ISE ending can be used if desired.)
     This command is used to select a strategy for itemizing the text.

Example
~~~~~~~
                 ITEMIZE USING       CLOC

 General form
 ~~~~~~~ ~~~~
                 ITEMIZE USING       strategy name
                                     ~~~~~~~~ ~~~~

 Two possibilities are available at present. They are
     a)  CLOC
     b)  CLOC UNCHANGED

 Strategy a) ensures that words which differ only by the case of their
 letters, and/or contain *PADDING letters (q.v.) are counted as the same
 word. When strategy b) is chosen, words will always be distinguished by
 the case of their letters and the presence of *PADDING letters.
 For example, consider the sentence:-

     The MacDonald Hotel is different from the Mac'donald Motel.

 Assuming that the apostrophy ' has been designated a padding letter, then
 when the CLOC itemising strategy is in use the word "the" is deemed
 to occur twice, as does the word "macdonald". When a CONCORDANCE
 or COLLOCATIONS task (etc) is run, they too will treat the various forms as
 if they were the same word. The citations will of course look like the
 original text. The effect of the CLOC itemising strategy is that :-

     The             is mapped to        "the"
     the             is mapped to        "the"
     MacDonald       is mapped to        "macdonald"
     Mac'donald      is mapped to        "macdonald"
     Hotel           is mapped to        "hotel"
     Motel           is mapped to        "motel"
     is              is mapped to        "is"
     different       is mapped to        "different"
     from            is mapped to        "from"

 When CLOC UNCHANGED is used all the above words are considered distinct.

 Other itemization strategies may be introduced in future versions of
 the package.

 Default
 ~~~~~~~
 When strategy name is absent CLOC is assumed.
      ~~~~~~~~ ~~~~

                                                                       Page 15


     The ITEMIZE USING  CLOC command has a number of subsidiary commands.
These commands tell the package how to interpret the characters it finds on
the lines containing the text data.

*LETTERS
     This command is mandatory and must be the first command which follows the
ITEMIZE USING  CLOC command.  This informs the package of the alphabet of
characters out of which words are composed.  A word is defined to be one or
                                               ~~~~
more consecutive letters.  Every character which could form part of a word
must be specified here.  This includes characters used for accents,
apostrophes, hyphenation, changes of type style etc.

Example
~~~~~~~
                *LETTERS               abcdefghijklmnopqrstuvwxzy
General form
~~~~~~~ ~~~~
                *LETTERS               letter characters
                                       ~~~~~~ ~~~~~~~~~~

     The order in which the letter characters appear in the command is
         ~~~~~              ~~~~~~ ~~~~~~~~~~
significant.  This order determines the way in which words will be
alphabetically sorted.  In the above example, those words beginning with 'a'
will preceed those starting with 'b', and so on.  Thus 'alan' will sort before
'fred' which itself precedes 'freda'.  Note that this command automatically
caters for upper and lower case letters.

*PADDING
     This command is optional and when present informs the package of those
letter characters which are to be ignored when words are placed in the
~~~~~~ ~~~~~~~~~~
vocabulary table.  Usually this command will contain those letter characters
                                                           ~~~~~~ ~~~~~~~~~~
used as apostrophes or hyphenation, but any characters specified on the above
*LETTERS command could also be used.
When the CLOC itemising strategy is chosen the *PADDING letters cannot appear
in the vocabulary print-outs, because they are absent from the vocabulary
table.
Note that if you were to choose an itemising strategy (e.g.  CLOC UNCHANGED)
which allowed padding letters to appear in the vocabulary table, they would be
ignored when sorting took place.

Example
~~~~~~~
       (a)          *PADDING             ' -
       Words containing the apostrophe and/or the hyphen will have them
removed
       before the word is stored in the vocabulary table.

General form
~~~~~~~ ~~~~
             *PADDING             letter characters
                                  ~~~~~~ ~~~~~~~~~~

                                                                       Page 16


     where every character declared must be a letter character declared on the
                                              ~~~~~~ ~~~~~~~~~
*LETTERS command.  This is a deliberate design decision to emphasise that CLOC
defines a word to be a sequence of letters.
                                   ~~~~~~

*DEFERRED
     This command is optional and when present informs the package of those
letter characters which are to be ignored when words are sorted
~~~~~~ ~~~~~~~~~~
alphabetically.  Usually this command will contain those letter characters
                                                         ~~~~~~ ~~~~~~~~~~
used for accents and changes of type style, but any characters specified on
the above *LETTERS command could also be used.  Words which contain *DEFERRED
letters will be counted separately.

Examples
~~~~~~~~
        (a)          *DEFERRED            -
        Hyphenated words will be separately indexed.

        (b)          *DEFERRED            aeiou
        Words will be sorted alphabetically ignoring vowels.

General form
~~~~~~~ ~~~~
             *DEFERRED            letter characters
                                  ~~~~~~ ~~~~~~~~~~
     where every character declared must be a letter character declared on the
                                              ~~~~~~ ~~~~~~~~~
*LETTERS command.  This is a deliberate design decision to emphasise that CLOC
defines a word to be a sequence of letters, and that the deferred feature only
                                   ~~~~~~
affects the sorting order.

     This command ensures that words which differ only in (say) diacritical
marks are adjacent in an alphabetically ordered dictionary.  Words will always
be distinguished by their *DEFERRED letters, each will have a separate entry
in the vocabulary table.  Note that whenever two words differ only in deferred
                                                              ~~~~
letters, their sorting order is determined by the order of the deferred
letters on the *LETTERS command.

*SEPARATORS
     This command is optional and when present informs the package of those
characters which separate one word from the next.  When this command is absent
every character that is not declared by the *LETTERS command is automatically
assumed to be a separator.  The symbols one would use to separate one word
from the next might be the fullstop, comma, semicolon, etc.  The CLOC package
always takes a 'space' to be a separator.

Example
~~~~~~~
                 *SEPARATORS             ?  !  ;  .
General form
~~~~~~~ ~~~~

                                                                       Page 17


                 *SEPARATORS             separator characters
                                         ~~~~~~~~~ ~~~~~~~~~~

     The order in which separator characters appear on this command is of no
                        ~~~~~~~~~ ~~~~~~~~~~
significance.  Note that a character must (and cannot) be declared both as a
letter character and as a separator character at one and the same time.  Those
~~~~~~ ~~~~~~~~~          ~~~~~~~~~ ~~~~~~~~~
characters which are neither letter characters nor separator characters will
                             ~~~~~~ ~~~~~~~~~~     ~~~~~~~~~ ~~~~~~~~~~
be assumed to signify 'spaces' and will be interpreted as if they were
declared on the following control statement.

*READ AS SPACE
     This command is optional and when present informs the package of those
characters which signify a space.  These characters although present in the
text data will be assumed to stand for the space character and will be printed
as such when concordances and collocations are produced.

Example
~~~~~~~
                   *READ AS SPACE               %
General form
~~~~~~~ ~~~~
                   *READ AS SPACE               space characters
                                                ~~~~~ ~~~~~~~~~~

     The order in which the space characters appear on this command is of no
                            ~~~~~ ~~~~~~~~~~
significance.  This command can be used to remove punctuation marks from a
text or to cause one word to be read as several.  For example, if the text
contained N'EST%PAS, the package would read it as two words N'EST and PAS, and
would print it as N'EST PAS.  If the % sign were declared as a (padding)
letter character instead of on the *READ AS SPACE command, N'EST%PAS would be
~~~~~~ ~~~~~~~~~
read as single word, and printed as N'EST%PAS.
               ~~~~

*IGNORE
     This command is optional and when present informs the package of those
characters which are to be totally ignored when the text is read.
                           ~~~~~~~ ~~~~~~~

Example
~~~~~~~
          *IGNORE           @ /
General form
~~~~~~~ ~~~~
          *IGNORE           ignore characters
                            ~~~~~~ ~~~~~~~~~~

     The order in which the ignore characters appear on this command is of no
                            ~~~~~~ ~~~~~~~~~~
significance.  This command can be used to ignore characters which were placed
in the text for special purposes.  As an example one could cause 'house-wife'

                                                                       Page 18


to be read as if it were 'housewife' by declaring "-" as an ignore character.
                                                            ~~~~~~ ~~~~~~~~~

 4.4 SAVING TEXT FILES
 THE ITEMIZATION PROCESS
     The package treats the text as a series of words separated from each
                                                ~~~~~
other by separators.  Thus:
         ~~~~~~~~~~

          text:  separator word separator ...  word separator
                 ~~~~~~~~~ ~~~~ ~~~~~~~~~ ~~~  ~~~~ ~~~~~~~~~

     The composition of words and separators, and the method of extracting
                        ~~~~~     ~~~~~~~~~~
them from a text, are chosen by the ITEMIZE USING command and its subsidiary
commands.  Every time a CLOC job is run the text will be read word by word,
                                                              ~~~~    ~~~~
and carefully saved in a special form which permits rapid production of
concordances and collocations.  Whenever the same text is to be examined
several times it is clearly desirable to use this special form and save
computer time by not reading the same text over and over again.  The following
commands achieve this aim.

The SAVE TEXT and GET TEXT commands
     These commands cause text to be stored in, and returned from, the
computer's filing system.  Their function is to bypass the text reading stage
and so allow computer time to be saved when many analyses are performed on one
file of text.  Further information on the filing system can be obtained from
your local computer centre.

     The SAVE TEXT command causes itemised text to be placed in a permanent
file, named filename.
            ~~~~~~~~

     The GET TEXT command causes itemized text to be retrieved from a
permanent file, named filename, previously created by the SAVE TEXT command.
                      ~~~~~~~~
General form
~~~~~~~ ~~~~
        SAVE TEXT             filename
                              ~~~~~~~~
        GET TEXT              filename
                              ~~~~~~~~

Examples
~~~~~~~~

     On one run of the package the following commands are sufficient to read
the text and store it in the special form.
     ITEMIZE USING              CLOC
     *LETTERS                   abcdefghijklmnopqrstuvwxyz
     SAVE TEXT                  MYFILE
     FINISH

                                                                       Page 19


     Once the file of text has been saved, the following jobs could be run in
which the GET TEXT command replaces the itemizing instructions in the previous
example.
     GET TEXT                    MYFILE
     EVERY WORD
     WORDLIST                    ALPHA
     FINISH

 and on some other run one could write:

     GET TEXT                    MYFILE
     SELECT WORDS
     *PATTERN                    *.ing
     CONCORDANCE                 KWIC, CITE 4 BY 4
     FINISH
 4.5 THE OUTPUT DETAILS COMMAND (optional)
     This command allows you to choose the maximum line width for wordlists
and citations to suit your particular lineprinter or terminal device.
When this command is absent OUTPUT DETAILS  WIDTH 120 is assumed.

Example
~~~~~~~
            OUTPUT DETAILS          WIDTH80
General form
~~~~~~~ ~~~~
            OUTPUT DETAILS          WIDTHnumber
                                         ~~~~~~
Parameters
~~~~~~~~~~
WIDTHnumber
     ~~~~~~
No more than number character positions will be reserved on the output device.
             ~~~~~~
All word lists will be packed into a line of this width.

 4.6  WORD SELECTION COMMANDS
     This section describes how one can select, from the vocabulary of the
text, a collection of words for analysis.  This collection is used by
subsequent commands when performing sorting, and producing concordances and
collocations.  One can choose the entire vocabulary or select a portion of it.
An exclusion facility is provided which operates on the complete vocabulary or
on the portion selected.

     The command EVERY WORD specifies the entire vocabulary of the text.  The
SELECT WORDS command is used, in conjunction with several subsidiary commands,
to define a given portion of the vocabulary.  The EXCLUDING command can be
used to remove unwanted words and reduce the size of the above collection.  (A
further command INCLUDING is provided in case your exclusion commands remove
too many words.)

     The set of words defined using the SELECT WORDS, EXCLUDING, or INCLUDING
commands is specified by way of several subsidiary commands.  These are termed
set description commands and are described later.
~~~ ~~~~~~~~~~~

                                                                       Page 20


     To choose a collection of words one can use either of the following two
constructions, without supplying an exclusion list.
 (a)      EVERY WORD      The entire vocabulary is selected

 (b)      SELECT WORDS    The following set description
                                        ~~~ ~~~~~~~~~~~
          set description        describes the words to be used.
          ~~~ ~~~~~~~~~~~

     The exclusion list can be placed after either of the above constructions
to give the following alternatives.

 (c)      EVERY WORD              The entire vocabulary
          EXCLUDING               excluding
          set description         this set description is selected.
          ~~~ ~~~~~~~~~~~              ~~~ ~~~~~~~~~~~

 (d)      SELECT WORDS            The words specified in
          set description1        set description1 are used,
          ~~~ ~~~~~~~~~~~~        ~~~ ~~~~~~~~~~~~
          EXCLUDING               excluding those words in
          set description2        set description2.
          ~~~ ~~~~~~~~~~~~        ~~~ ~~~~~~~~~~~~

 (e)      EVERY WORD              The entire vocabulary
          EXCLUDING               excluding
          set description1        this set description1,
          ~~~ ~~~~~~~~~~~~             ~~~ ~~~~~~~~~~~~
          INCLUDING               but including
          set description2        set description2
          ~~~ ~~~~~~~~~~~~        ~~~ ~~~~~~~~~~~~

 (f)      SELECT WORDS            The words specified in
          set description1        this set description1
          ~~~ ~~~~~~~~~~~~             ~~~ ~~~~~~~~~~~~
          EXCLUDING               excluding
          set description2        this set description2
          ~~~ ~~~~~~~~~~~~             ~~~ ~~~~~~~~~~~~
          INCLUDING               but including
          set description3        the set description3
          ~~~ ~~~~~~~~~~~~            ~~~ ~~~~~~~~~~~~


 The commands EVERYWORD and SELECTWORDS set description define
                                        ~~~ ~~~~~~~~~~~
 a working set of words. You can then use the EXCLUDING commands to
 remove words from the working set, and the INCLUDING commands to
 add words to the working set. You can repeat the EXCLUDING and
 INCLUDING commands as often as you need to get precisely the
 collection of words that you are interested in.

 Here are a few examples
            1.     SELECT WORDS
                   *PATTERN           *ing

                                                                       Page 21


            2.     SELECT WORDS
                   *PATTERN           *ing
                   EXCLUDING
                   *LIST OF WORDS     running jumping

            3.     EVERY WORD
                   EXCLUDING
                   *PATTERN           *ing
                   INCLUDING
                   *LIST OF WORDS     running jumping

 Example 1 selects all words that end with 'ing'.
 Example 2 selects all ING words apart from 'running' and 'jumping',
 Example 3 chooses the whole vocabulary less the 'ing' words,
 but with 'running' and 'jumping' included.


 Set description commands
 ~~~ ~~~~~~~~~~~
     These commands all have a star symbol(*) in column 1, showing they are
subsidiary to the previous unstarred command.  Three ways of choosing words
are provided.  They are, by frequency of occurrence, by an explicit list, or
by a pattern.  Each of the following commands may be repeated as often as
required.

frequency of occurrence - The *FREQUENCY command.
~~~~~~~~~ ~~ ~~~~~~~~~~
     This command is used to choose a set of words each member of which has a
particular frequency of occurrence or lies in a given frequency range.

Examples
~~~~~~~~
 (a)     *FREQUENCY      (100 TO 500)
     This command will select only those words which occur between 100 and 500
times inclusive.

 (b)     *FREQUENCY      1 OR 4 OR >50
     This command will select words which occur exactly once, exactly four
times, or more than 50 times.

General form
~~~~~~~ ~~~~
      *FREQUENCY         expression
                         ~~~~~~~~~~
where expression is one or more terms connected by OR symbols.  And a term is
      ~~~~~~~~~~                ~~~~                                  ~~~~
one of the following:
 (a)  integer                 for example 10
      ~~~~~~~
      only words occurring exactly integer times will be selected.
                                   ~~~~~~~
 (b) >integer                 for example >10
      ~~~~~~~
      only words occurring more than integer times will be selected.
                                     ~~~~~~~

                                                                       Page 22


 (c) <integer                 for example <10
      ~~~~~~~
      only words occurring less than integer times will be selected.
                                     ~~~~~~~
 (d) (integer1 TO integer2)  for example (100 TO 500)
      ~~~~~~~~    ~~~~~~~~
      only words lying in the range integer1 to integer2 inclusive
                                    ~~~~~~~~    ~~~~~~~~
      will be selected. Note that integer1 must be smaller than
                                  ~~~~~~~~
      integer2.
      ~~~~~~~~

An explicit list - The *LIST OF WORDS command.
~~ ~~~~~~~~ ~~~~
     This command allows one to specify a set of words of interest by
supplying them explicitly.
Note that when the CLOC itemising strategy is in use, each item in the
explicit list will be mapped to a "word".  Thus you do not need to supply the
exact case of the letters nor include padding letters.
 Example
 ~~~~~~~

       *LIST OF WORDS      this that me you

 General form
 ~~~~~~~ ~~~~
       *LIST OF WORDS      list
                           ~~~~

where list is one or more words separated from each other by one or more
      ~~~~
spaces.

A Pattern - The *PATTERN command
~ ~~~~~~~
     This command specifies a skeletal form of a word, and causes the package
to select only those words which match the specified pattern.  Two reserved
characters are used within a pattern;
 (a) a dummy-symbol      which is .
 (b) a variable-symbol   which is *

     The dummy-symbol stands for any letter.
                                 ~~~ ~~~~~~

     The variable-symbol stands for "any sequence of letters, including none
at all".

     These reserved characters can be used in combination with the letter
                                                                   ~~~~~~
characters defined by the word definition commands, to construct a pattern.
~~~~~~~~~~
 (a)      *PATTERN          run*
 (b)      *PATTERN          *ing
 (c)      *PATTERN          pre*ed

                                                                       Page 23


 In (a) all words which start with 'run' are selected.
 In (b) all words which end with 'ing' are selected.
 In (c) all words which start with 'pre' and end in 'ed' are selected.
 (d)     *PATTERN       *ing   *ed
 (e)     *PATTERN       a*  b*  c*

     These examples show how more than one pattern can be included on the same
*PATTERN command line.  Each is separated from the next by at least one space.

     In (d) all words which end in 'ing' or end in 'ed' are selected.  This is
                                         ~~
equivalent to having a *PATTERN line for '*ing' and another one for '*ed'.

     In (e) this selects all words which start with 'a' or 'b' or 'c'.  One
                                                        ~~     ~~
can use this feature to produce a full concordance in sections;  first the
'a', 'b', and 'c's then 'd', 'e', 'f's etc.
 (f)     *PATTERN       ....
 (g)     *PATTERN       .h.a...
 (h)     *PATTERN       *...ing
 In (f) all four letter words will be chosen.
 In (g) all six letter words with 2nd letter 'h' and 4th letter 'a' will
 be selected.
 In (h) all words of at least six letters which end in 'ing' will be
                     ~~ ~~~~~
 picked out.

     NOTE If, within a pattern, "*" and/or "." are being used as letters, the
     ~~~~
following option can be used to define your own variable and dummy symbols.

The DUMMYaVARIABLEb option
     This option must be used whenever a given pattern is to contain "*" or
"." as letters.  The revised symbols apply for the current *PATTERN command
only.
 Examples:
 (a)   *PATTERN      DUMMY?VARIABLE-    *run-
 (b)   *PATTERN      DUMMY.VARIABLE?    ?...ing*
In (a) the "?" temporarily replaces "." as the dummy-symbol;  the "-"
temporarily replaces "*" as the variable-symbol.  All words which start with
'*run' are selected.
In (b) all words at least seven letters long and ending with 'ing*' are
                 ~~ ~~~~~
selected.

General Form
~~~~~~~ ~~~~
 The a)    *PATTERN     pattern1 pattern2 pattern3 etc.
 or  b)    *PATTERN     DUMMYaVARIABLEb pattern1 pattern2 pattern3 etc.
 Notes:
 1.   In a) DUMMY.VARIABLE* is implied before the first pattern.
 2.   At least one pattern must appear on the command.
 3.   A pattern consists of letters,the dummy-symbol, and the variable
                            ~~~~~~~
      symbol in any combination.

                                                                       Page 24


 4.   In (b) the character "a" becomes the new dummy-symbol, overriding ".",
      the character "b" becomes the new variable-symbol, overriding "*".
 5.   When the CLOC itemising strategy is in use each explicit
      pattern will be carefully mapped to one which will match
      the various "words" in the vocabulary. Thus the pattern 'run*'
      will match "run" "running" "Run" "RUNNING" etc. Padding letters will
      be ignored since the vocabulary words do not contain them. For example,
      a) if ' was a padding letter then .... would match "don't"
      b) if ' was a deferred letter then .... would not match "don't"
        This is because in case a) padding letters are removed before the
      word is stored in the vocabulary table, so it looks like "dont".
        In practice it is sufficient for you to examine the vocabulary table
      printed using the WORDLIST command (q.v.) to find out what "words"
      are in the vocabulary.

     The above set description commands can follow each other.  When they do
so, the set of words chosen will be the sum of the words described by each
control statement.  Words defined on two or more command lines will be counted
once only.
Example
~~~~~~~
       *FREQUENCY          (100 TO 500)
       *LIST OF WORDS      me you we they
       *PATTERN            ...ing

     The above sequence of commands defines a set of words which contain:- all
words of frequency 100 to 500 inclusive, the words 'me' 'you' 'we' 'they', and
all six letter words which end in 'ing'.

 4.7 TASK SELECTION COMMANDS
     The following commands operate on the previously selected collection of
words.  Each command specifies an action to be performed.  Eleven tasks can be
selected:
 (A)   sorting the chosen words into alphabetic and/or frequency
       order

 (B)   printing a word-index

 (C)   producing a concordance of the chosen words

 (D)+  finding co-occurrences of words

 (E)   discovering the collocations within the context of the
       chosen words.

 (F)+  write out the itemised text

 (G)+  output a newline

 (H)+  output a newpage

 (I)+  output a message

 (J)+  include a comment

                                                                       Page 25


 (K)+  FINISH the run of the package

 Each command can be included as often as required. Those marked + above
 do not need to be preceded by any word selection statements.

 (A)  WORDLIST
     This command causes the package to sort the previously chosen collection
of words into order, and print them.  The type of sorted list that is produced
is determined by the keyword in the specification field.  This allows one to
produce word counts in alphabetic order, reverse alphabetic order, etc.  These
are printed across the page, the number per line is determined by the maximum
word length and output line width.  In all word lists each word is preceded by
its frequency of occurrence.

Examples:
~~~~~~~~
    a)   WORDLIST      ALPHA
    b)   WORDLIST      REVALPHA
    c)   WORDLIST      AFREQ
 In a) an alphabetic wordlist, is produced.
 In b) a reverse alphabetic wordlist, i.e. one in rhyming order, is
       produced.
 In c) a wordlist in ascending frequency order is printed.

 General form
 ~~~~~~~ ~~~~
        WORDLIST       sorting criterion
                       ~~~~~~~ ~~~~~~~~~
 where sorting criterion can be one of the following:
       ~~~~~~~ ~~~~~~~~~
ALPHA
     This causes the package to sort the words into ascending alphabetic
order.  The collating order for letters is taken from the word definition
commands.

DALPHA
     This causes the package to sort the words into descending alphabetic
order.  The collating order for letters is taken from the word definition
commands.

REVALPHA
     This causes the package to sort the selected words into reverse
alphabetic order, in which words with similar endings sort together.  The
collating order for letters is taken from the word definition commands.

AFREQ
     This causes the package to sort the words into ascending frequency order.
Words having the same frequency of occurrence will be sorted in ascending
alphabetic order.

DFREQ
     This causes the package to sort the selected words into descending
frequency order.  Words having the same frequency of occurrence will be sorted
in ascending alphabetic order.

                                                                       Page 26


FIRST
     This causes the package to sort the selected words in the order in which
they first occur in the text.  Note that words are printed across the page.
     ~~~~~ ~~~~~

LAST
     This causes the package to sort the selected words in the order in which
they last occur in the text.  Note that words are printed across the page.
     ~~~~ ~~~~~

ALENGTH
     This causes the package to sort the selected words in ascending length
order, which is in order of their length in characters ignoring any deferred
(or padding) letters.  Words of equal length will be sorted in ascending
alphabetic order.

DLENGTH
     This causes the package to sort the selected words in descending length
order, which is the descending order of their length in characters ignoring
any deferred (or padding) letters.  Words of equal length will be sorted in
ascending alphabetic order.

AXLENGTH
     This causes the package to sort the selected words in ascending extended
length order, which is the order of their length in characters including any
deferred (or padding) letters.  Words of equal length will be sorted in
ascending alphabetic order.

DXLENGTH
     This causes the package to sort the selected words in descending extended
length order, which is the descending order of their length in characters
including any deferred (or padding) letters.  Words of equal length will be
sorted in ascending alphabetic order.

     In each of the above cases, each word printed is preceded by its
frequency of occurrence in the text.  By default, the words and their
frequencies are printed across the page, rather than in columns.  This is done
so that you can easily write a program (say in SNOBOL or BASIC) to reformat
the output from the CLOC package as suitable data for another package, say for
statistical analysis or graph plotting.

(B) INDEX
     This command instructs the package to print a word index of the selected
words.  The parameters in the specification field allow you to presort the
keywords, and optionally supply the form of text reference that will be used.
 Examples:
 ~~~~~~~~
 a)   INDEX
 b)   INDEX            ALPHA
 c)   INDEX            REVALPHA,REFS P2 L2
 d)   INDEX            NOREFS

     Examples a) and b) produce the same word index.  The keywords are
alphabetically sorted.  Each reference to a line is specified by a simple line
number.

                                                                       Page 27


     In c) a reverse alphabetic word index is produced.  Each reference gives
information about "Page number" and "Line within page" assuming that P and L
references are included in the text for each page.

     In d) an alphabetical word index is produced.  No text references of any
kind are printed.

      General Form
      ~~~~~~~ ~~~~
      INDEX           sorting criterion , reference
                      ~~~~~~~ ~~~~~~~~~ ~ ~~~~~~~~~

     where sorting criterion is one of ALPHA, DALPHA, REVALPHA, AFREQ, DFREQ,
           ~~~~~~~ ~~~~~~~~~
FIRST, LAST, ALENGTH, DLENGTH, AXLENGTH, DXLENGTH.  The chosen keyword set is
sorted into this order before the word index is produced.

reference allows each word to be referenced with portions of references which
~~~~~~~~~
are embedded in the text data.  The instruction takes the form:
 Example
 ~~~~~~~
                  REFS A4P2L6
 General Form
 ~~~~~~~ ~~~~
                  REFS letter number letter number ... letter number 
                       ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~ ~~~~~~ ~~~~~~

     The letter must be from A to Z, and identifies an embedded text
         ~~~~~~
reference.  The number of characters printed for the reference is given by
number.  When printed, each reference will be separated from the next by one
~~~~~~
space.
                  NOREFS
 When this keyword is present no text references of any kind will be printed.

 Defaults
 ~~~~~~~~

 1.   When sorting criterion is absent, ALPHA is assumed.
           ~~~~~~~ ~~~~~~~~~
 2.   When reference is absent, an absolute record number is used.
           ~~~~~~~~~


(C) CONCORDANCE
     This command instructs the package to print a concordance of the selected
words.  The parameters in the specification field allow the user to presort
the keywords;  to choose the citation style;  to select a citation width;  and
optionally supply a text reference.
 Examples:
 ~~~~~~~~
 a)   CONCORDANCE
 b)   CONCORDANCE      ALPHA,KWIC,CITE 4 BY 4

                                                                       Page 28


 c)   CONCORDANCE      REVALPHA,CITE 6 BY 6,REFS P2 L2
 d)   CONCORDANCE      CITE FROM.TO.INCLUSIVE
 e)   CONCORDANCE      REVALPHA,LEFT,CITE FROM/TO/EXCLUSIVE,REFS S2 P3 L2
 f)   CONCORDANCE      REVALPHA,CITE 6 BY 6,NOREFS
 g)   CONCORDANCE      CITE 4 BY 4, ABOUT NODE-1

     Examples a) and b) produce the same concordance.  The keywords are
alphabetically sorted.  Keyword in context citations are printed which have
four words on either side of the word of interest.  Each line is identified by
a simple record number.

     In c) a reverse alphabetic KWIC concordance is produced with six words of
context on either side of the keyword.  Each line printed is prefixed by text
reference information giving "Page number" and "Line within page" assuming
that P and L references are included in the text for each page.

     In d) an alphabetical KWIC concordance is printed.  The keyword is
surrounded by as many words as possible up to and including a "." character.
By this means a sentence of context is printed.

     In e) assuming that "/" has been declared as a "newline" character (see
INPUT DETAILS command), this example will print a reverse alphabetic
concordance.  Each citation will consist of a full line of context, left
justified and prefixed with "section", "page" and "line number" information,
assuming that S, P and L references have been included in the text.

     In f) the same concordance as in c) will be produced but no text
references will precede the citations.

     In g) the same concordance as in b) will be selected but the citations
will be printed with the word before the keyword centralised on the line.
 General Form
 ~~~~~~~ ~~~~
   CONCORDANCE   sorting criterion, style, citationwidth, offset, reference
                 ~~~~~~~ ~~~~~~~~~~ ~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~~ ~~~~~~~~~

     where sorting criterion is one of ALPHA, DALPHA, REVALPHA, AFREQ, DFREQ,
           ~~~~~~~ ~~~~~~~~~
FIRST, LAST, ALENGTH, DLENGTH, AXLENGTH, DXLENGTH.  The chosen keyword set is
sorted into this order before the concordance is produced.

style is the type of concordance required.  This can be one of two kinds,
~~~~~
namely:

     1.   KWIC - key word on context, in which the word of interest is
         centralised on the print line.  CENT can be used as a synonym for
         KWIC.

     2.   LEFT - in which the line of context is printed as far to the left as
         possible.

citation width indicates the amount of context to be printed.  This takes the
~~~~~~~~ ~~~~~
form:

                                                                       Page 29


 either:  CITE integer1 BY integer2
               ~~~~~~~~    ~~~~~~~~
          in which integer1 words are printed before the keyword, and
                   ~~~~~~~~
          integer2 words are printed after the keyword.
          ~~~~~~~~

 or:      CITE FROMchar1TOchar2INCLUSIVE
                   ~~~~~  ~~~~~
 or:      CITE FROMchar1TOchar2EXCLUSIVE
                   ~~~~~  ~~~~~

     This option causes the package to print the citations between two given
characters char1 and char2.  The left context begins with character char1, and
           ~~~~~     ~~~~~                                          ~~~~~
the right context ends with char2.  When INCLUSIVE is present the characters
                            ~~~~~
char1 and char2 are removed from the printed line.
~~~~~     ~~~~~

offset is optional and when present takes the form
~~~~~~
     either:      ABOUT NODE+integer
                             ~~~~~~~
         or:      ABOUT NODE-integer
                             ~~~~~~~

     This option allows citations to be printed about words near to the
keyword.  For example, ABOUT NODE+1 causes all citations to be printed as if
the word to the right of the keyword was being used for citations.  Similarly,
ABOUT NODE-1 chooses the word to the left of the keyword when citations are
printed.  The default value of offset is ABOUT NODE+0 .
                               ~~~~~~

reference allows each citation to be prefixed with portions of references
~~~~~~~~~
which are embedded in the text data.  The instruction takes the form:
 Example
 ~~~~~~~
                  REFS A4P2L6
 General Form
 ~~~~~~~ ~~~~
                  REFS letter number letter number ... letter number 
                       ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~ ~~~~~~ ~~~~~~

     The letter must be from A to Z, and identifies an embedded text
         ~~~~~~
reference.  The number of characters printed for the reference is given by
number.  When printed, each reference will be separated from the next by one
~~~~~~
space.
                  NOREFS
 When this keyword is present no text references of any kind will be printed.

 Defaults
 ~~~~~~~~

                                                                       Page 30


 1.   When sorting criterion is absent, ALPHA is assumed.
           ~~~~~~~ ~~~~~~~~~
 2.   When style is absent, KWIC is assumed.
           ~~~~~
 3.   When citation width is absent, CITE 4 BY 4 is assumed.
           ~~~~~~~~ ~~~~~
 4.   When offset is absent, ABOUT NODE+0 is assumed.
           ~~~~~~
 5.   When reference is absent, an absolute record number is used.
           ~~~~~~~~~

(D)+ CO-OCCURRENCE
     This command is used when you know two or more words and need to study
how they occur in a text.  The parameters in the specification field allow you
to choose a style for presenting the results;  to select a citation width and
to optionally supply a text reference.  Subsidiary commands enable you to
choose word pairs or phrases of interest and also to choose a series of words
separated by an arbitrary word distance.  Note:  The CO-OCCURRENCE command
                                          ~~~~
need not be preceded by any word selection commands.
 Examples:
 a)  CO-OCCURRENCE
     *PHRASE              you are
     *PHRASE              now is the winter
 b)  CO-OCCURRENCE        CITE 8 BY 8, REFS P2 L2
     *PHRASE              just man
     *SERIES              a UPTO6 goat
     *SERIES              how GAP2 see
     *SERIES              he UPTO3 miner GAP1 and
 c) CO-OCCURRENCE
     *PATTERN             *ing *ly

     Example a) produces 4 BY 4 KWIC citations of positions in a text in which
the phrase "you are" occurs.  After these have been found, all occurrences of
the phrase "now is the winter" are found.  In both cases punctuation is
ignored, hence all examples are discovered.

     Example b) 8 BY 8 citations are printed centralised on the page, each
prefixed with a 2 figure page number and 2 figure line number.  After printing
all occurrences of the phrase "just man", the three *SERIES commands are
obeyed in order.  The first command finds all occurrences of "a ...  goat"
where "a" and "goat" are separated from each other by 0, 1, 2, 3, 4, 5, or 6
words of context.  The second command finds all occurrences of "how" and "see"
separated from each other by precisely two arbitrary words.  The third *SERIES
                             ~~~~~~~~~
command shows how you can mix the UPTOnumber and GAPnumber options within one
                                      ~~~~~~        ~~~~~~
specification.  This example will find all occurrences of "he" "miner" "and"
where "he" and "miner" are separated by 0, 1, 2 or 3 arbitrary words and with
"miner" and "and" separated by exactly 1 arbitrary word.  Example c) shows how
you can also put a list of patterns which will be searched in the order they
are given.  This represents a skeletal form of a phrase.  When citations are
printed the node for offset purposes is the first word of a *PHRASE or a
*SERIES or a *PATTERN.  The offset option allows you to shift the citation
left or right as required.

                                                                       Page 31


 General Form
 ~~~~~~~ ~~~~
      CO-OCCURRENCE   style ,  citation width , offset, reference
                      ~~~~~ ~  ~~~~~~~~ ~~~~~ ~ ~~~~~~~ ~~~~~~~~~
      *PHRASE         list
                      ~~~~
      *SERIES         word type number word ... type number word
                      ~~~~ ~~~~ ~~~~~~ ~~~~ ~~~ ~~~~ ~~~~~~ ~~~~
      *PATTERN     pattern1 pattern2 pattern3 etc.
 (or) *PATTERN     DUMMYaVARIABLEb pattern1 pattern2 pattern3 etc.
style is the type of citation required.  This can be one of two kinds, namely:
~~~~~

     1.   KWIC - key word on context, in which the word of interest is
         centralised on the print line.  CENT can be used as a synonym for
         KWIC.

     2.   LEFT - in which the line of context is printed as far to the left as
         possible.


citation width indicates the amount of context to be printed.  This takes the
~~~~~~~~ ~~~~~
form:
 either:  CITE integer1 BY integer2
               ~~~~~~~~    ~~~~~~~~
          in which integer1 words are printed before the keyword, and
                   ~~~~~~~~
          integer2 words are printed after the keyword.
          ~~~~~~~~

 or:      CITE FROMchar1TOchar2INCLUSIVE
                   ~~~~~  ~~~~~
 or:      CITE FROMchar1TOchar2EXCLUSIVE
                   ~~~~~  ~~~~~

     This option causes the package to print the citations between two given
characters char1 and char2.  The left context begins with character char1, and
           ~~~~~     ~~~~~                                          ~~~~~
the right context ends with char2.  When INCLUSIVE is present the characters
                            ~~~~~
char1 and char2 are removed from the printed line.
~~~~~     ~~~~~
offset is optional and when present takes the form
~~~~~~
     either:      ABOUT NODE+integer
                             ~~~~~~~
         or:      ABOUT NODE-integer
                             ~~~~~~~

     This option allows citations to be printed about words near to the
keyword.  For example, ABOUT NODE+1 causes all citations to be printed as if
the word to the right of the keyword was being used for citations.  Similarly,
ABOUT NODE-1 chooses the word to the left of the keyword when citations are
printed.  The default value of offset is ABOUT NODE+0 .  The node for offset
                               ~~~~~~

                                                                       Page 32


purposes is the first word of a *PHRASE or a *SERIES.

reference allows each citation to be prefixed with portions of references
~~~~~~~~~
which are embedded in the text data.  The instruction takes the form:
 Example
 ~~~~~~~
                  REFS A4P2L6
 General Form
 ~~~~~~~ ~~~~
                  REFS letter number letter number ... letter number 
                       ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~ ~~~~~~ ~~~~~~

     The letter must be from A to Z, and identifies an embedded text
         ~~~~~~
reference.  The number of characters printed for the reference is given by
number.  When printed, each reference will be separated from the next by one
~~~~~~
space.
                  NOREFS
 When this option is present no text references of any kind will be printed.

 Defaults
 ~~~~~~~~

 1.   When style is absent, KWIC is assumed.
           ~~~~~
 2.   When citation width is absent, CITE 4 BY 4 is assumed.
           ~~~~~~~~ ~~~~~
 3.   When offset is absent, ABOUT NODE+0 is assumed.
           ~~~~~~
 4.   When reference is absent, an absolute record number is used.
           ~~~~~~~~~

 type number is either UPTOnumber or GAPnumber, and where
 ~~~~ ~~~~~~               ~~~~~~       ~~~~~~
       UPTOnumber means 0, 1, 2, 3.. number of arbitrary words may
           ~~~~~~                    ~~~~~~
            occur at this position.

       GAPnumber means exactly number arbitrary words occur at
          ~~~~~~               ~~~~~~
                this position.


 list is one or more words separated from each other by spaces.
 ~~~~


Note
~~~~

     1.   Each word must be separated from the type and number by at least one
                                               ~~~~     ~~~~~~
         space.

                                                                       Page 33


     2.   The number can be 0, in which case UPTO0 and GAP0 are equivalent and
              ~~~~~~
         indicate that the two words are to be adjacent.  *PHRASE is the
         degenerative case of a *SERIES in which all numbers are zero.
                                                     ~~~~~~~

     3.   *PHRASE and *SERIES and *PATTERN commands can be repeated as often
         as required.


(E) COLLOCATIONS
     This command instructs the package to examine the collocates in the
context surrounding each selected word.  Those collocates which are found to
have a significant affinity to the selected word will have their context
printed.  This option allows closely associated pairs of words to have their
context printed.  The occurrence of a collocate will be counted whenever it
occurs in a range several words to the left or to the right of the selected
word.  This region is termed a span, the size of which can be chosen using a
                               ~~~~
subsidiary command.
 Examples
 ~~~~~~~~
     COLLOCATIONS     ALPHA,KWIC,CITE 4 BY 4
     COLLOCATIONS     REVALPHA,CITE 6 BY 6,REFS P2 L2
     COLLOCATIONS     CITE 5 BY 5, ABOUT NODE+1
     COLLOCATIONS     CITE 6 BY 6, ABOUT COLLOCATE
     COLLOCATIONS     CITE FROM.TO.EXCLUSIVE, ABOUT COLLOCATE-1
     COLLOCATIONS     FIRST,CONDENSED
 General Form
 ~~~~~~~ ~~~~
 a) COLLOCATIONS sorting criterion, style, citation width, offset, reference
                 ~~~~~~~ ~~~~~~~~~~ ~~~~~~ ~~~~~~~~ ~~~~~~ ~~~~~~~ ~~~~~~~~~
 b) COLLOCATIONS sorting criterion, CONDENSED
                 ~~~~~~~ ~~~~~~~~~
where sorting criterion is one of ALPHA, DALPHA, REVALPHA, AFREQ, DFREQ,
      ~~~~~~~ ~~~~~~~~~
FIRST, LAST, ALENGTH, DLENGTH, AXLENGTH, DXLENGTH.  The chosen keyword set is
sorted into this order before the collocations are produced.
style is the type of citation required.  This can be one of two kinds, namely:
~~~~~

     1.   KWIC - key word on context, in which the word of interest is
         centralised on the print line.  CENT can be used as a synonym for
         KWIC.

     2.   LEFT - in which the line of context is printed as far to the left as
         possible.

citation width indicates the amount of context to be printed.  This takes the
~~~~~~~~ ~~~~~
form:
 either:  CITE integer1 BY integer2
               ~~~~~~~~    ~~~~~~~~
          in which integer1 words are printed before the keyword, and
                   ~~~~~~~~

                                                                       Page 34


          integer2 words are printed after the keyword.
          ~~~~~~~~

 or:      CITE FROMchar1TOchar2INCLUSIVE
                   ~~~~~  ~~~~~
 or:      CITE FROMchar1TOchar2EXCLUSIVE
                   ~~~~~  ~~~~~

     This option causes the package to print the citations between two given
characters char1 and char2.  The left context begins with character char1, and
           ~~~~~     ~~~~~                                          ~~~~~
the right context ends with char2.  When INCLUSIVE is present the characters
                            ~~~~~
char1 and char2 are removed from the printed line.
~~~~~     ~~~~~

offset is optional and when present takes the form
~~~~~~
     either:      ABOUT NODE+integer
                             ~~~~~~~
         or:      ABOUT NODE-integer
                             ~~~~~~~
         or:      ABOUT COLLOCATE+integer
                                  ~~~~~~~
         or:      ABOUT COLLOCATE-integer
                                  ~~~~~~~

     This option allows citations to be printed about words near to the node
or collocate.  For example, ABOUT NODE+1 causes all citations to be printed as
if the word to the right of the node was being used for citations.  Similarly,
ABOUT NODE-1 chooses the word to the left of the node when citations are
printed.  The commands ABOUT COLLOCATE+1 ,or ABOUT COLLOCATE-1 do the same
except that centralisation is done using the collocate.  When +integer is
                                                               ~~~~~~~
absent +0 is assumed, (similarly for -integer).
                                      ~~~~~~~

reference allows each citation to be prefixed with portions of references
~~~~~~~~~
which are embedded in the text data.  The instruction takes the form:
 Example
 ~~~~~~~
                  REFS A4P2L6
 General Form
 ~~~~~~~ ~~~~
                  REFS letter number letter number ... letter number 
                       ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~ ~~~~~~ ~~~~~~

     The letter must be from A to Z, and identifies an embedded text
         ~~~~~~
reference.  The number of characters printed for the reference is given by
number.  When printed, each reference will be separated from the next by one
~~~~~~
space.
                  NOREFS

                                                                       Page 35


 When this keyword is present no text references of any kind will be printed.
 Defaults
 ~~~~~~~~

 1.   When sorting criterion is absent, ALPHA is assumed.
           ~~~~~~~ ~~~~~~~~~
 2.   When style is absent, KWIC is assumed.
           ~~~~~
 3.   When citation width is absent, CITE 4 BY 4 is assumed.
           ~~~~~~~~ ~~~~~
 4.   When offset is absent, ABOUT NODE+0 is assumed.
           ~~~~~~
 5.   When reference is absent, an absolute record number is used.
           ~~~~~~~~~


For b):

     The CONDENSED option causes the discovered collocates to be listed in a
simple tabular form.  This gives on quick look at an author's word
associations, allowing one to choose accurately which nodes to select and
which collocates to reject.  The following few lines illustrate the format of
the table produced by this command.

          NODE               COLLOCATE            PAIR
          ~~~~               ~~~~~~~~~            ~~~~
          25 ship               99 the               3
          16 target              5 house             2
           8 weston              8 master            8

Subsidiary commands to the COLLOCATIONS command
~~~~~~~~~~ ~~~~~~~~ ~~ ~~~ ~~~~~~~~~~~~ ~~~~~~~
The *SPAN command
     This command specifies the range of searching that is done when the
package performs a collocation analysis.  The specification field of this
command defines the range which will be searched in terms of the number of
words to the left and right of the word of interest.
 Example 1
 ~~~~~~~ ~
            *SPAN              4 BY 4
 Example 2
 ~~~~~~~ ~
            *SPAN              4 BY 4 RESTRICTED
General form
~~~~~~~ ~~~~
            *SPAN              integer1 BY integer2 qualifier
                               ~~~~~~~~    ~~~~~~~~ ~~~~~~~~~
where integer1 indicates the number of words to be searched before the word of
      ~~~~~~~~
interest and integer2 indicates the number of words after the word of
             ~~~~~~~~
interest.  qualifier is either UNRESTRICTED or RESTRICTED.  UNRESTRICTED means
           ~~~~~~~~~
that all words in the left and right span will be counted as collocates.
RESTRICTED means that when a pair of nodes are closer that leftspan+rightspan,

                                                                       Page 36


overlapping collocates will be counted once only, and the node will not be
counted as a collocate.

     Note that the citation width could be narrower than the span.  This will
                   ~~~~~~~~ ~~~~~                            ~~~~
cause some collocates to appear to be absent from the context, they will
however be found in the text.  Normally the citation width should be greater
                                            ~~~~~~~~ ~~~~~
than the span.
         ~~~~
Default
~~~~~~~
When the *SPAN command is absent the value taken for it will be;
                    *SPAN        4 BY 4
 When qualifier is absent UNRESTRICTED is assumed.
      ~~~~~~~~~

The *FREQUENCY command
     This command is used to select significant collocates according to their
frequency of occurrence.  Only those collocates which occur within the
specified frequency limits will have their citations printed.  Note that the
frequency of occurrence of a collocate is different from the frequency of
occurrence of the same object treated as a word.
                                           ~~~~
 Example
 ~~~~~~~
                    *FREQUENCY       (100 TO 500)
 General Form
 ~~~~~~~ ~~~~
                    *FREQUENCY       expression
                                     ~~~~~~~~~~
 where expression is one or more terms connected by OR symbols.  A
       ~~~~~~~~~~                ~~~~
 term is one of the following:
 ~~~~

 (a) integer                      for example 10
     ~~~~~~~
       Only collocates occurring exactly integer times will be selected.
                                         ~~~~~~~

 (b) >integer                     for example >10
      ~~~~~~~
       Only collocates occurring more than integer times will be selected.
                                           ~~~~~~~

 (c) <integer                     for example <10
      ~~~~~~~
       Only collocates occurring less than integer times will be selected.
                                           ~~~~~~~

 (d) (integer1 TO integer2)      for example (100 TO 500)
      ~~~~~~~     ~~~~~~~
       Only collocates lying in the range integer1 to integer2 inclusive
                                          ~~~~~~~     ~~~~~~~

                                                                       Page 37


       will be selected.  Note that integer1 must be smaller than
                                    ~~~~~~~
 integer2.
 ~~~~~~~

Default
~~~~~~~
When the *FREQUENCY command is absent the package assumes the following value
for it.
                   *FREQUENCY           >1

EVERY COLLOCATE
     This command ensures that every collocate chosen by the SPAN and
FREQUENCY commands will be considered for selection.

Example and general form
~~~~~~~ ~~~ ~~~~~~~ ~~~~

                  EVERY COLLOCATE

 SELECTCOLLOCATE
 set description
 ~~~ ~~~~~~~~~~~

 This command ensures that only those words in the set description
                                                   ~~~ ~~~~~~~~~~~
 will be considered as collocates.

Example
~~~~~~~

                  SELECTCOLLOCATE
                  *LIST OF WORDS   father mother

 REJECTING
 set description
 ~~~ ~~~~~~~~~~~
      This command allows one to specify an exclusion list for
 collocates.  Its function is to remove insignificant collocates
 e.g.  'the' 'but' 'and' from those to be printed, thereby producing as
 results only the interesting collocates.

 Example
 ~~~~~~~

                  REJECTING
                  *PATTERN         run*

 ACCEPTING
 set description
 ~~~ ~~~~~~~~~~~
      This command allows you to add to the collocation of possible collocates
 those in the set description. It is most often used to supply words
              ~~~ ~~~~~~~~~~~
 that were excluded because the REJECTING command was too restrictive.

                                                                       Page 38


 The above commands can only be supplyed in a fixed order. The order is
 similar to that for word selection described earlier but this time we
 use the collocate selection commands. The commands EVERY COLLOCATE or
 SELECTCOLLOCATE are alternatives, only one can be chosen, but if both
 are absent the command EVERY COLLOCATE is assumed. Thus we can say:-

 either:    EVERY COLLOCATE
     or:    SELECTCOLLOCATE
            set description
            ~~~ ~~~~~~~~~~~

 The commands REJECTING or ACCEPTING can be chosen as often as neccessary
 so that you can get just those collocates that you want.

 Examples
 ~~~~~~~~
 a)   REJECTING
      *LIST OF WORDS         the but and

 b)   REJECTING
      *PATTERN               *ing *ed

 c)   REJECTING
      *FREQUENCY             (100 TO 700)

 d)   SELECTCOLLOCATE
      *PATTERN               run*

 e)   SELECTCOLLOCATE
      *PATTERN               run*
      REJECTING
      *LIST OF WORDS         running
 In a) The specific collocates 'the' 'but' 'and' will not be printed.
 In b) All collocates which end in 'ing' or 'ed' will not be printed.
 In c) All collocates whose vocabulary frequency of occurrence is
       between 100 and 700 inclusive will not be printed.
 In d) Only those collocates which start with 'run' will be chosen
 In e) All collocates which start with 'run' but excluding the word 'running'
       will be chosen.


 (F)+   WRITETEXT
     This command will cause CLOC to print out the itemised text.  Each line
can be prefixed with a text reference to the first word in each line.
 Examples
 ~~~~~~~~
         a)      WRITETEXT
         b)      WRITETEXT         REFS P3 L4
         c)      WRITETEXT         NOREFS

 in a) the text reference will be a simple record number.
 in b) a 3 character page number P and a 4 figure line number L will
       be used as the reference.
 in c) no references of any kind will be printed.

                                                                       Page 39


 General form
 ~~~~~~~ ~~~~
     WRITETEXT      reference
                    ~~~~~~~~~

 where reference is optional, and when present takes the form:-
       ~~~~~~~~~

 Example
 ~~~~~~~
                  REFS A4P2L6
 General Form
 ~~~~~~~ ~~~~
                  REFS letter number letter number ... letter number 
                       ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~ ~~~~~~ ~~~~~~

     The letter must be from A to Z, and identifies an embedded text
         ~~~~~~
reference.  The number of characters printed for the reference is given by
number.  When printed, each reference will be separated from the next by one
~~~~~~
space.
                  NOREFS
 When this keyword is present no text references of any kind will be printed.
 Default
 ~~~~~~~
 1.   When reference is absent, an absolute record number is used.
           ~~~~~~~~~

 (G)+   NEWLINE
      This command will insert one or more newlines in the CLOC results file.
 You can use this feature to widen the gap between the results from
 successive tasks.

 Examples
 ~~~~~~~~
  a) NEWLINE
  b) NEWLINE        1
  c) NEWLINE        5

 General form
 ~~~~~~~ ~~~~
     NEWLINE        integer
                    ~~~~~~~

 The command will cause integer newlines to be sent to the CLOC
                        ~~~~~~~
 results file.
 Default
 ~~~~~~~
   When integer is absent, a value of 1 is assumed.
        ~~~~~~~

 (H)+   NEWPAGE
      This command will cause a newpage to be thrown on the CLOC results file.

                                                                       Page 40


 Example and general form.
 ~~~~~~~ ~~~ ~~~~~~~ ~~~~
        NEWPAGE

 (I)+   MESSAGE
      This command will send the contents of the specification field, and any
 continuation, to the CLOC results file.
 Example
 ~~~~~~~
     MESSAGE        Henry the Fifth (part 1)

 General form
 ~~~~~~~ ~~~~
     MESSAGE        character sequence
                    ~~~~~~~~~ ~~~~~~~~

      The character sequence will be sent to the CLOC results file.
          ~~~~~~~~~ ~~~~~~~~

 (J)+   NOTE
      This command can be used at your convenience to insert a
 commentary about the following or preceding task.  All characters
 in the specification field of this command are ignored.
 Example
 ~~~~~~~
      NOTE            THIS TEXT IS TAKEN FROM WORDSWORTH
 General Form
 ~~~~~~~ ~~~~
      NOTE            character sequence
                      ~~~~~~~~~ ~~~~~~~~

     Where character sequence is totally ignored.  This information will be
           ~~~~~~~~~ ~~~~~~~~
printed on the CLOC diagnostic and information channel along with the other
control statements.

(K)+ FINISH
     This command must be the final one in the sequence.  It informs the
package that no further commands are to follow.
 Example and general form
 ~~~~~~~ ~~~ ~~~~~~~ ~~~~
        FINISH


5 EXAMPLES
     Before preparing a large volume of text, or before trying out the CLOC
package on some prepared text, you should run the example programs given
below.  To do this you will need to read the documentation on using the
package on your local computer.  This will provide the basic information you
need to know to run any CLOC job.  You should compare the results produced by
the computer with those given in the example programs to check your
understanding of the command language.  You are also recommended to vary the
given commands in order to gain some feel for their effect.  Often the
examples will contain all the commands you need to solve your given problem,
in which case all you need do is supply your own text.  All the examples in

                                                                       Page 41


this section use the following extract from "CAUTION:  LOW FLYING DUCKS" by
the author.

          The University ; "A society of individuals living and working
     together for the advancement of learning and the dissemination of
     knowledge".  (University of York Development Plan).
          In 1617 James I received a petition requesting a University
     for York.  This was followed by a petition to Parliament in 1652,
     and a deputation to the University Grants Committee in 1947.  The
     University officially opened in 1963 with a student population
     comprising 216 undergraduates and 12 postgraduates.
          The site consisted of 190 acres of marshy land and a large
     decrepit Elizabethan mansion, Heslington Hall, destined to become
     the administration building.  Draining the saturated ground was
     accomplished by widening a natural stream and creating a fourteen
     acre artificial lake around which the University was constructed.

     If the above were coded on punched cards using the recommendations in
section 2 of this guide, it would look like this, the following:-
      =THE =UNIVERSITY : "=A SOCIETY OF INDIVIDUALS LIVING AND WORKING
 TOGETHER FOR THE ADVANCEMENT OF LEARNING AND THE DISSEMINATION OF
 =KNOWLEDGE".   (=UNIVERSITY OF =YORK =DEVELOPMENT =PLAN)  .
      =IN 1617 =JAMES =I RECIEVED A PETITION REQUESTING A =UNIVERSITY
 FOR =YORK,  =THIS WAS FOLLOWED BY A PETITION TO =PARLIAMENT IN 1652,
 AND A DEPUTATION TO THE =UNIVERSITY =GRANTS =COMMITTEE IN 1947.  =THE
 =UNIVERSITY OFFICIALLY OPENDED IN 1963 WITH A STUDENT POPULATION
 COMPRISING 216 UNDERGRADUATES AND 12 POSTGRADUATES.
      =THE SITE CONSISTED OF 190 ACRES OF MARSHY LAND AND A LARGE
 DECREPIT =ELIZABETHAN MANSION, =HESLINGTON =HALL, DESTINED TO BECOME
 THE ADMINISTRATION BUILDING.  =DRAINING THE SATURATED GROUND WAS
 ACCOMPLISHED BY WIDENING A NATURAL STREAM AND CREATING A FOURTEEN
 ACRE ARTIFICIAL LAKE AROUND WHICH THE =UNIVERSITY WAS CONSTRUCTED.

     We will assume that you are using a computer that has both upper and
lower case and that you stored the text in the form that it was first written.
The coming examples show how CLOC commands are put together to perform the
following tasks:-
      1. Alphabetic sorting
      2. The pattern feature
      3. The exclusion list
      4. Producing a concordance
      5. Finding collocations

                                                                       Page 42


           Example number 1             Alphabetic Sorting
           ~~~~~~~ ~~~~~~               ~~~~~~~~~~ ~~~~~~~
           The following commands cause CLOC to read the above text and
           sort the vocabulary into ascending alphabetic order.
                 ITEMISEUSING)CLOC
                 *LETTERS)abcdefghijklmnopqrstuvwxyz
                 EVERYWORD
                 WORDLIST)ALPHA
                 FINISH
           The output produced by the computer is a listing of the commands,
           including the defaults and comments, and the results of the
           sorting process.
                    a)  Control statement listing

default                input details  width80
                       ITEMISEUSING)CLOC
                       *LETTERS)abcdefghijklmnopqrstuvwxyz
default                *separators    !"#$%&'()*+,-./0123456789:;<=>?[\]^_`{|}~
the text contains :
     113  running words
      71  distinct words
and the maximum word length is 14 characters

default                output details width120
                       EVERYWORD
                       WORDLIST)ALPHA
                       FINISH

                     b)   The Results.
table of 71 words in ascending alphabetic order
===============================================
    9 a                 1 accomplished      1 acre              1 acres             1 administration    1 advancement
    6 and               1 around            1 artificial        1 become            1 building          2 by
    1 committee         1 comprising        1 consisted         1 constructed       1 creating          1 decrepit
    1 deputation        1 destined          1 development       1 dissemination     1 draining          1 elizabethan
    1 followed          2 for               1 fourteen          1 grants            1 ground            1 hall
    1 heslington        1 i                 4 in                1 individuals       1 james             1 knowledge
    1 lake              1 land              1 large             1 learning          1 living            1 mansion
    1 marshy            1 natural           6 of                1 officially        1 opened            1 parliament
    2 petition          1 plan              1 population        1 postgraduates     1 received          1 requesting
    1 saturated         1 site              1 society           1 stream            1 student           9 the
    1 this              3 to                1 together          1 undergraduates    6 university        3 was
    1 which             1 widening          1 with              1 working           2 york



                                                                       Page 43


 Example number 2                 The PATTERN Feature
 ~~~~~~~ ~~~~~~                   ~~~ ~~~~~~~ ~~~~~~~
 The example illustrates how the pattern feature can be used to select, from
 the above text, words which end in a standard way. The selected words are
 then listed in alphabetic order.
                    ITEMISEUSING)CLOC
                    *LETTERS)abcdefghijklmnopqrstuvwxyz
                    SELECTWORDS
                    *PATTERN) *ing
                    WORDLIST)ALPHA
                    FINISH
         The output produced is a listing of the commands and the results.
                     a)  Control statement listing

default                input details  width80
                       ITEMISEUSING)CLOC
                       *LETTERS)abcdefghijklmnopqrstuvwxyz
default                *separators    !"#$%&'()*+,-./0123456789:;<=>?[\]^_`{|}~
the text contains :
     113  running words
      71  distinct words
and the maximum word length is 14 characters

default                output details width120
                       SELECTWORDS
                       *PATTERN) *ing
                       WORDLIST)ALPHA
                       FINISH
                      b)   The Results.

table of 9 words in ascending alphabetic order
==============================================
    1 building          1 comprising        1 creating          1 draining          1 learning          1 living
    1 requesting        1 widening          1 working


                                                                       Page 44


 Example Number 3                      The Exclusion List
 ~~~~~~~ ~~~~~~                        ~~~ ~~~~~~~~~ ~~~~
 This example shows how one can exclude a set of words from a previously
 selected set.  The resultant collection is listed in ascending alphabetic
 order.
                    ITEMISEUSING)CLOC
                    *LETTERS)abcdefghijklmnopqrstuvwxyz
                    SELECTWORDS
                    *FREQUENCY) >1
                    EXCLUDING
                    *LISTOFWORDS) a the of and
                    WORDLIST)ALPHA
                    FINISH
    The output produced is a listing of the commands and the alphabetically
    ordered list.
                     a)   Control statement Listing

default                input details  width80
                       ITEMISEUSING)CLOC
                       *LETTERS)abcdefghijklmnopqrstuvwxyz
default                *separators    !"#$%&'()*+,-./0123456789:;<=>?[\]^_`{|}~
the text contains :
     113  running words
      71  distinct words
and the maximum word length is 14 characters

default                output details width120
                       SELECTWORDS
                       *FREQUENCY) >1
                       EXCLUDING
                       *LISTOFWORDS) a the of and
                       WORDLIST)ALPHA
                       FINISH
                    b)  The Results
table of 8 words in ascending alphabetic order
==============================================
    2 by                2 for               4 in                2 petition          3 to                6 university
    3 was               2 york


                                                                       Page 45


 Example number 4                    Producing a concordance
 ~~~~~~~ ~~~~~~                      ~~~~~~~~~ ~ ~~~~~~~~~~~
 The following commands will produce a concordance of the words selected.
 The output is centralized on the page and sorted in ascending alphabetic
order.
                   ITEMISEUSING)CLOC
                   *LETTERS)abcdefghijklmnopqrstuvwxyz
                   SELECTWORDS
                   *PATTERN) *ed
                   CONCORDANCE) KWIC,CITE 5 BY 5
                   FINISH
                      a)   Control statement Listing
default                input details  width80
                       ITEMISEUSING)CLOC
                       *LETTERS)abcdefghijklmnopqrstuvwxyz
default                *separators    !"#$%&'()*+,-./0123456789:;<=>?[\]^_`{|}~
the text contains :
     113  running words
      71  distinct words
and the maximum word length is 14 characters

default                output details width120
                       SELECTWORDS
                       *PATTERN) *ed
                       CONCORDANCE) KWIC,CITE 5 BY 5
                       FINISH
                     b)   The results
concordance of 8 nodes
======================
node  accomplished  occurs 1 times
11                            Draining the saturated ground was accomplished by widening a natural stream

node  consisted  occurs 1 times
8            undergraduates and 12 postgraduates.      The site consisted of 190 acres of marshy land

node  constructed  occurs 1 times
12                              around which the University was constructed.

node  destined  occurs 1 times
9                decrepit Elizabethan mansion, Heslington Hall, destined to become the administration building

node  followed  occurs 1 times
4                                University for York.  This was followed by a petition to Parliament

node  opened  occurs 1 times
6                 Committee in 1947.  The University officially opened in 1963 with a student population

node  received  occurs 1 times
3                       Development Plan).      In 1617 James I received a petition requesting a University

node  saturated  occurs 1 times
10                   the administration building.  Draining the saturated ground was accomplished by widening


                                                                       Page 46


 Example number 5               Finding COLLOCATIONS
 ~~~~~~~ ~~~~~~                 ~~~~~~~ ~~~~~~~~~~~~
 The following commands cause the package to scan the context of
 the selected words, and to print the examples of their collocations.
 The output is centralised on the page and sorted in ascending alphabetic
 order.
                     ITEMISEUSING)CLOC
                     *LETTERS)abcdefghijklmnopqrstuvwxyz
                     SELECTWORDS
                     *LISTOFWORDS) university
                     COLLOCATIONS) KWIC,CITE 5 BY 5
                     *FREQUENCY) >2
                     FINISH
                     a)   Control statement Listing
default                input details  width80
                       ITEMISEUSING)CLOC
                       *LETTERS)abcdefghijklmnopqrstuvwxyz
default                *separators    !"#$%&'()*+,-./0123456789:;<=>?[\]^_`{|}~
the text contains :
     113  running words
      71  distinct words
and the maximum word length is 14 characters

default                output details width120
                       SELECTWORDS
                       *LISTOFWORDS) university
                       COLLOCATIONS) KWIC,CITE 5 BY 5
default                *span          4 by 4
                       *FREQUENCY) >2
                       FINISH

                     b) The Results
collocation analysis of 1 nodes (cited about node)
==================================================
node  university  occurs 6 times
collocate  the  occurs 9 times
node-collocate pair occurs 6 times
0                                                           The University ; "A society of individuals living
2                        and the dissemination of knowledge".  (University of York Development Plan).      In
5                                       and a deputation to the University Grants Committee in 1947.  The University
5                                       and a deputation to the University Grants Committee in 1947.  The University
6                     University Grants Committee in 1947.  The University officially opened in 1963 with a
12                             artificial lake around which the University was constructed.

node  university  occurs 6 times
collocate  a  occurs 9 times
node-collocate pair occurs 4 times
0                                                           The University ; "A society of individuals living
3                              received a petition requesting a University for York.  This was followed
3                              received a petition requesting a University for York.  This was followed
5                                       and a deputation to the University Grants Committee in 1947.  The University

node  university  occurs 6 times
collocate  of  occurs 6 times
node-collocate pair occurs 3 times

                                                                       Page 47


0                                                           The University ; "A society of individuals living
2                        and the dissemination of knowledge".  (University of York Development Plan).      In
2                        and the dissemination of knowledge".  (University of York Development Plan).      In

node  university  occurs 6 times
collocate  in  occurs 4 times
node-collocate pair occurs 3 times
5                                       and a deputation to the University Grants Committee in 1947.  The University
6                     University Grants Committee in 1947.  The University officially opened in 1963 with a
6                     University Grants Committee in 1947.  The University officially opened in 1963 with a



                                                                       Page 48


Appendix I
~~~~~~~~ ~
Messages Produced by the CLOC package
~~~~~~~~ ~~~~~~~~ ~~ ~~~ ~~~~ ~~~~~~~
     Three categories of message are printed by the package, these are errors,
                                                                       ~~~~~~
warnings, and comments.
~~~~~~~~      ~~~~~~~~

Error Messages
~~~~~ ~~~~~~~~
     These cause the run of the package to be abandoned.  Where the error is
caused by a mistake in a command the symbol  1  is printed under the faulty
position on the command,  2  for the second error on the line, and so on up to
 9  .  Error messages take the form ERROR - text of message, where the text is
                                            ~~~~ ~~ ~~~~~~~
one of the following

             a.  MISSING MANDATORY STATEMENT
                 You have forgotten to include or have misspelt an
                 essential command.

             b.  CONTROL STATEMENT ENDS PREMATURELY
                 The continuation field of 15 spaces was expected but
                 not found.

             c.  INCORRECT CONTROL STATEMENT
                 A mistake has been found on the line, the symbol  1
                 points to it.

             d.  UNKNOWN SYMBOL
                 The item in the specification field has not been
                 recognised.

             e.  CHARACTER ALREADY DEFINED
                 The indicated character has occurred on this or an earlier
                 line.

             f.  NO LETTERS PROVIDED
                 The *LETTERS command exists, but no letters have been
                 put on it.

             g.  NUMBER IS TOO LARGE
                 The indicated number is too large for the CLOC
                 package to use.

             h.  UPPER VALUE DOES NOT EXCEED LOWER
                 The upper value in a frequency range is smaller than
                 the lower value.

             i.  NO WORDS FOUND
                 The combination of word selection commands has chosen
                 no words.

             j.  FILE NOT PRODUCED BY CLOC MARK mark
                                                ~~~~

                                                                       Page 49


                 The file used by the GET TEXT command was not
                 produced by an earlier run of the package.

             k.  ABOVE STATEMENT NOT EXPECTED
                 This line may be a misspelt or spurious command.

             l.  SYMBOL NOT ALLOWED IN THIS CONTEXT
                 The indicated symbol is not permitted there.

             m.  CAPACITY EXCEEDED
                 The text to be processed contains more words than
                 the package is able to handle.

             n.  NUMBER OF REFERENCES EXCEEDS number
                                              ~~~~~~
                 The text contains more text references than the
                 package can accept.

             o.  NO WORD SELECTION COMMANDS PROVIDED
                 The commands EVERY WORD or SELECT WORDS are
                 absent or misspelt.

             p.  NUMBER EXPECTED AT THIS POSITION
                 The previous CLOC keyword must be followed by a number.

             q.  A NUMBER CANNOT BE PLACED HERE
                 The previous CLOC keyword must not be followed by a number.
                                                ~~~

             r.  ZERO NUMBER NOT PERMITTED

             s.  SPACE NOT ALLOWED HERE

     The package makes several checks on its operation and in certain
instances may fail with the message SYSTEM ERROR number.  Such an occurrence
                                                 ~~~~~~
should be reported to your local advisory service.

Warning Messages
~~~~~~~ ~~~~~~~~
     These are produced when the package finds a simple mistake in a command
not important enough to cause a fatal error.  The mistake will be ignored and
the next command will be examined.  The symbol  1  is printed under the faulty
position on the line, (and so on up to  9 ).  Warning messages take the form
WARNING - text of message, where the text is one of the following.
          ~~~~ ~~ ~~~~~~~
      a.  CHARACTER ALREADY DEFINED
          The *SEPARATORS, *PADDING commands etc. contain repeated
          characters.

      b.  SPURIOUS CHARACTERS FOUND AND IGNORED
          The specification field contains characters which should
          not be present, they will be ignored.

      c.  SET DESCRIPTION SPECIFIES NO WORDS

                                                                       Page 50


          The combination of word selection commands resulted in no
          words selected.

      d.  WORD(S) NOT FOUND
          The indicated words on the *LIST OF WORDS command
          are not present in the vocabulary of the text.

      e.  ABOVE ITEM TOO LONG
          The word printed is longer than the system can cope
          with, trailing letters have been removed.

      f.  NO REFERENCE INFORMATION IN TEXT
          The citation option REFS was chosen but no text references
          were placed in the text.

      g.  NO WORDS MATCH THIS PATTERN
          The current vocabulary does not contain words of this form.

     The package makes several checks on its operation and in certain
instances may produce the message SYSTEM WARNING number.  Such an occurrence
                                                 ~~~~~~
should be reported to your local computer advisory service.

Comment messages
~~~~~~~ ~~~~~~~~
     These are produced when the system has read the text and is about to read
the task selection commands.  One or both of the following comments may be
produced.
      a.  TEXT FILE name SAVED ON date AT time
                    ~~~~          ~~~~    ~~~~
          The text has been read and stored in a permanent
          file to be later used by the GET TEXT command.

      b.  TEXT FILE name ACCESSED.  (SAVED ON date AT time)
                    ~~~~                      ~~~~    ~~~~
          The GET TEXT command has accessed this file.

      c.  The TEXT CONTAINS:
          integer1 RUNNING WORDS
          ~~~~~~~
          integer2 DISTINCT WORDS
          ~~~~~~~
          AND THE MAXIMUM WORD LENGTH IS integer3 CHARACTERS.
                                         ~~~~~~~

     The text under analysis is integer1 words in length, and the vocabulary
                                ~~~~~~~
used contains integer2 different words.  The longest word or words contain
              ~~~~~~~
integer3 characters.
~~~~~~~

                                                                       Page 51


APPENDIX II
~~~~~~~~ ~~
References
~~~~~~~~~~

     1.   "The RUNCLOC macro", Computer Centre Users Manual, University of
         Birmingham

     2.   "CLOC - An Applications Package in ALGOL 68R" Presented to
         "Applications of ALGOL 68" Conference, April 1975, University of
         Liverpool.

     3.   "English Lexical Studies", J.  McH.  Sinclair, S.  Jones and R.
         Daley.

     4.   "The COCOA Manual",D.B.  Russell,ATLAS Computer Laboratory.

     5.   "Statistical Package for the Social Sciences (SPSS), N.H.  Nie, D.H.
         Bent, and C.H.  Hull, Publ.  McGraw-Hill, New York, 1970.

     6.   "CLOC:  A Collocation Package", ALLC Bulletin, Vol.  5, No.  2,
         1977.

     7.   "RATS:  A Middle-level Text Utility System":  Smith;  Computers and
         the Humanities, Vol.  6, P.277.

     8.   "JEUDEMO:  A Text-Handling System":  Bratley, Lusigaan, and
         Ouellette, Computers in the Humanities.  Pub.  Edinburgh University
         Press.

     9.   "Computer Analysis of Natural Language":  Reed 1973, Birmingham
         University Computer Centre, internal report 1973.

    10.   "CLOC User Guide, A.Reed,1975,Computer Centre,University of
         Birmingham.

    11.   "OXEYE:  A Text Processing Package for the 1906A", L.  Burnard,
         1976, Oxford University Computing Service.

    12.   "CLOC:  A General-purpose concordance and collocations generator";
         A.  Reed, J.  L.  Schonfelder, 1979, Aston University.

    13.   "OCP:  Oxford Concordance Program", S.  Hockey and I Marriott,
         October 1980, Oxford University Computing Service.

    14. "Anatomy of a Text Analysis Package", Reed A, Computer Lang.,
        Vol. 9, No. 2, pp 89-96, 1984.

                                                                       Page 52


APPENDIX III
~~~~~~~~ ~~~
Glossary
~~~~~~~~
 LETTER - One of an arbitrary collection of graphic signs, used to construct
          words.
 SEPARATOR - i) A graphic sign which is not a LETTER
            ii) An arbitrary sequence of i) above.
 WORD      - An arbitrary sequence of LETTERs, generally contiguous,but may
             contain graphic signs which are totally ignored during reading.
 NODE - A particular word about which a concordance can be printed or a
        collocation analysis performed.
 SPAN - The context of words, surrounding a NODE, which is used during
        collocation analysis.
 COLLOCATE - one of the words of context in a SPAN.

                                                                       Page 53


APPENDIX IV
~~~~~~~~ ~~
CLOC Global Syntax Rules
~~~~ ~~~~~~ ~~~~~~ ~~~~~
     The following "railroad" diagram describes the syntax rules for CLOC
control statements.  Follow the arrows from top to bottom and you will pass
through all compulsory commands.  A diversion of route indicates optional
commands.  A choice of route indicates a choice of commands at that position.
                                                 !
                ...............     .............!..........
                !INPUT DETAILS!<----!                      !
                ~~~~~~~!~~~~~~~     !                      !
                       `----------->!                      !
                          ..........!..........       .....!....
                          !ITEMISE USING  CLOC!       !GET TEXT!
                          ~~~~~~~~~~!~~~~~~~~~~       ~~~~~!~~~~
                               .....!....                  !
                               !*LETTERS!                  !
                               ~~~~~!~~~~                  !
 .--------------------------------->!                      !
 !  ..........                      !                      !
 !--!*PADDING!<-------.             !                      !
 !  ~~~~~~~~~~        !             !                      !
 !  ...........       !             !                      !
 !--!*DEFERRED!<------!             !                      !
 !  ~~~~~~~~~~~       !             !                      !
 !  .............     !<------------!                      !
 !--!*SEPARATORS!<----!             !                      !
 !  ~~~~~~~~~~~~~     !             !                      !
 !  ................  !             !                      !
 !--!*READ AS SPACE!<-!             !                      !
 !  ~~~~~~~~~~~~~~~~  !             !                      !
 !  .........         !             !                      !
 `--!*IGNORE!<--------'             !                      !
    ~~~~~~~~~      ...........      !                      !
                   !SAVE TEXT!<-----!                      !
                   ~~~~~!~~~~~      !                      !
                        `---------->!<---------------------'
                 ................   !
                 !OUTPUT DETAILS!<--!
                 ~~~~~~~!~~~~~~~~   !
                        `---------->!
 .--------------------------------->!<--------------------------------------.-.
 !      .---------.---------.---<---+-->--.--------.-------------.          ! !
 !  ....!.... ....!.... ....!....   !  ...!.. .....!..... .......!.......   ! !
 !  !NEWLINE! !NEWPAGE! !MESSAGE!   !  !NOTE! !WRITETEXT! !CO-OCCURRENCE!   ! !
 !  ~~~~!~~~~ ~~~~!~~~~ ~~~~!~~~~   !  ~~~!~~ ~~~~~!~~~~~ ~~~~~~~!~~~~~~~   ! !
 `<-----'---------'---------'       !     !<-------'             !<---------! !
                                    !     !      .<--------.-----+--->.     ! !
                                    !     ! .....!.... ....!....  ....!.... ! !
                                    !     ! !*PATTERN! !*PHRASE!  !*SERIES! ! !
                                    !     ! ~~~~~!~~~~ ~~~~!~~~~  ~~~~!~~~~ ! !
                                    !     !      `---------`----------`---->' !
                                    !     `---------------------------------->'

                                                                       Page 54


                                    !
 .--------------------------------->!
 !        .-------------------------+----------------------.
 !  ......!.......            ......!.....             ....!...
 !  !SELECT WORDS!            !EVERY WORD!             !FINISH!
 !  ~~~~~~~!~~~~~~            ~~~~~~!~~~~~             ~~~~~~~~
 !  .......!.........               !
 !  !Set description!-------------->!
 !  ~~~~~~~~~~~~~~~~~               !
 !        .------------------------>!<-----------------------.
 !        !      ...........        !       ...........      !
 !        !      !EXCLUDING!<-------!------>!INCLUDING!      !
 !        !      ~~~~~!~~~~~        !       ~~~~~!~~~~~      !
 !        !   ........!........     !    ........!........   !
 !        `<--!Set description!     !    !Set description!-->'
 !            ~~~~~~~~~~~~~~~~~     !    ~~~~~~~~~~~~~~~~~
 !--------------------------------->!
 !      .-----------.----------.----+---------.-------------------.------------.
 !  ....!.....  ....!..  ......!......  ......!.......      ......!........    !
 !  !WORDLIST!  !INDEX!  !CONCORDANCE!  !COLLOCATIONS!      !CO-OCCURRENCE!    !
 !  ~~~~!~~~~~  ~~~~!~~  ~~~~~~!~~~~~~  ~~~~~~!~~~~~~~      ~~~~~~!~~~~~~~~    !
 !<-----'-----------'----------'   .......    !                   !<---------. !
 !                                 !*SPAN!<---!        .----.<----+--->.     ! !
 !                                 ~~~!~~~    !        ! ...!..... ....!.... ! !
 !                                    `------>!        ! !*PHRASE! !*SERIES! ! !
 !                              ............  !        ! ~~~~!~~~~ ~~~~!~~~~ ! !
 !                              !*FREQUENCY!<-!        !     `---------`---->' !
 !                              ~~~~~~!~~~~~  !        !  ..........         ! !
 !                                    `------>!        !->!*PATTERN!-------->' !
 !                      .<--------------------'        !  ~~~~~~~~~~           !
 !  .................   !   .................          !  ...........          !
 !  !EVERY COLLOCATE!<--+-->!SELECTCOLLOCATE!          !<-!WRITETEXT!<---------!
 !  ~~~~~~~!~~~~~~~~~   !   ~~~~~~~~!~~~~~~~~          !  ~~~~~~~~~~~          !
 !         !            !   ........!........          !  .........            !
 !         !            !   !Set description!          !<-!NEWLINE!<-----------!
 !         !            !   ~~~~~~~~!~~~~~~~~          !  ~~~~~~~~~            !
 !         `----------->!<----------'                  !  .........            !
 ! .------------------->!<--------------------.        !<-!NEWPAGE!<-----------!
 ! !     ...........    !    ...........      !        !  ~~~~~~~~~            !
 ! !     !REJECTING!<---!--->!ACCEPTING!      !        !  ......               !
 ! !     ~~~~~!~~~~~    !    ~~~~~!~~~~       !        !<-!NOTE!<--------------!
 ! ! .........!.......  !  .......!.........  !        !  ~~~~~~               !
 ! `-!Set description!  !  !Set description!->'        !  .........            !
 !   ~~~~~~~~~~~~~~~~~  !  ~~~~~~~~~~~~~~~~~           !<-!MESSAGE!<-----------'
 `----------------------'<-----------------------------'  ~~~~~~~~~
                                 .<----------------------.
                                 !    ............       !
                                 ! .->!*FREQUENCY!------'!
 Where                           ! !  ~~~~~~~~~~~~       !
                                 ! !  ................   !
             Set description  ---`-!->!*LIST OF WORDS!--'!
                                   !  ~~~~~~~~~~~~~~~~   !
                                   !  ..........         !
                                   `->!*PATTERN!-------->'
                                      ~~~~~~~~~~



Click on FTP to download from the FTP archives.
[FTP]