LinuxDevCenter.com: The Writer's Workbench

Published on Linux DevCenter (http://www.linuxdevcenter.com/)
http://www.linuxdevcenter.com/pub/a/linux/2000/05/05/LivingLinux.html
See this if you're having trouble printing code examples

05/05/2000

This week's column describes two venerable UNIX tools for checking your writing that have been rewritten for Linux, style and diction.

Old-timers probably remember these names -- the originals had came with AT&T UNIX as part of the much-loved ``Writer's Workbench'' (WWB) suite of tools back in the late 1970s and early 1980s. (There had also been a group who planned a ``Reader's Workbench''; we can only guess at what that might have been, but today we do have Project Gutenbook, a new etext reader.)

AT&T unbundled the Writer's Workbench from their UNIX System 7, and as the many flavors of UNIX blossomed over the years, these tools were lost by the wayside -- eventually becoming the stuff of UNIX lore.

In 1997, Michael Haardt wrote new Linux versions of these tools from scratch. They support both the English and German languages, and they're now part of the GNU Project; if you don't already have them installed on your system, you can get them from gnu.org here.

Let's take a look at some of the things that these tools can do.

Checking text for misused phrases

Use the diction tool to check for wordy, trite, clichÃ©d or misused phrases in a text. It checks for the kind of expressions William Strunk has warned us about in his Elements of Style.

According to Andrew Walker's excellent book The UNIX Environment, the diction tool that came with the old Writer's Workbench just found the phrases, and a separate command called suggest would output suggestions. In the GNU version that works for Linux, both functions have been combined in the single diction command.

In GNU diction, the words or phrases are enclosed in brackets [like this]. If diction has any suggested replacements, it gives them preceded by a right arrow, -> like this.

When checking more than just a screenful of text, you'll want to pipe the output to a tool such as less, so that you can peruse it on the screen. For example, to check a file called banquet-speech.txt for clichÃ©s or other misused phrases, you'd type:

$ diction banquet-speech.txt | less RET

You could also redirect the output to a file if you wanted to look at it later:

$ diction banquet-speech.txt > banquet-speech.diction RET

Here, the output is written to a text file called banquet-speech.diction.

Checking more than files

If you don't specify a filename, diction reads text from the standard input until you type Control-D on a line by itself -- this is especially useful for when you want to check the diction of a sentence:

$ diction RET

So finally, tonight, let us ask the question 
we wish to state. RET
(stdin):1: [So -> (do not use as intensifier)] finally, tonight, 
let us [ask the question -> ask] [we wish to state -> (cliche, avoid)].
^D
$

To check the text of a Web page, use the text-only Web browser lynx with the -dump and -nolist options to output the plain text of a given URL, and pipe it to diction. (If you expect there to be a lot of output, add another pipe at the end to the less tool so you can peruse it.)

For example, to check the text on the Web page http://example.org/page.html for wordy and misused phrases, you'd type:

$ lynx -dump -nolist http://example.org/page.html | diction | less RET

Checking text for doubled words

One of the things that diction looks for are doubled words -- words repeated twice in a row. It encloses the second member of the doubled pair in brackets followed by a right arrow and the text "Double word", like this [this -> Double word.].

If you only want to check a text file for doubled words, and not any of the other things diction checks for, use grep to find only those lines in diction's output that contains the text "Double word", if any. For example, to output all lines containing double words in the file banquet-speech.txt, you'd type:

$ diction banquet-speech.txt | grep 'Double word' RET

Checking text for readability

The style command analyzes the writing style of a given text. It performs a number of readability tests on the text and outputs their results, and it gives some statistical information about the sentences of the text.

Give as an argument the name of the text file to check. For example, to check the readability of the file banquet-speech.txt, you'd type:

$ style banquet-speech.txt RET

Like diction, style reads text from the standard input if no text is given.

The various readability formulas that style uses and outputs are as follows:

The Kincaid Formula, originally developed for Navy training manuals, a good readability for technical documentation;
the Automated Readability Index (ARI);
the Coleman-Liau Formula;
the Flesh reading easy formula, which gives an approximation of readability from 0 (difficult) to 100 (easy);
the Fog Index, which gives a school grade reading level;
the WSTF Index, a readability indicator for German document; and
the Wheeler-Smith Index, Lix formula and SMOG-Grading tests, all readability indicators which give a school grade reading level.

The sentence characteristics of the text which style outputs are as follows:

Number of characters
Number of words, their average length, and average number of syllables
Number of sentences and average length in words
Number of short and long sentences
Number of paragraphs and average length in sentences
Number of questions and imperatives

Finding difficult sentences

To output just "difficult" sentences of a text, use the -r option followed by a number; style will output only those sentences whose ARI readability index is greater than the number you give.

For example, to output all sentences in the file banquet-speech.txt whose readability is greater than a value of 20, type:

$ style -r 20 banquet-speech.txt RET

Displaying long sentences

You can use style to output sentences longer than a certain length by giving the minimum number of words as an argument to the -l option.

For example, to output all sentences longer than 14 words in the file banquet-speech.txt, type:

$ style -l 14 banquet-speech.txt RET

Spelling

Two additional commands that Walker says were part of the Writer's Workbench have long been standard on Linux: look and spell. Both tools work on the system dictionary file, /usr/dict/words. This file is nothing more than a word list (albeit a very large one), sorted in alphabetical order and containing one word per line. Words that are correct regardless of case are listed in lower-case letters, and words which rely on some form of capitalization in order to be correct (such as proper nouns) appear in that form.

The look tool outputs words in the system dictionary that begin with the text you give as an argument. It's useful for checking to see which words begin with a particular phrase or prefix.

For example, to list all the words in the dictionary that begin with the text "homew", you'd type:

$ look homew RET

This command will output words such as "homeward" and "homework."

When you're unsure whether or not a particular word is spelled correctly, use spell to find out. It reads from the standard input and outputs any words that don't appear in the system dictionary file -- so if a word is potentially misspelled, it will be echoed back on the screen after you type it.

For example, to check if the word "occurance" is spelled correctly, you'd type:

$ spell RET
occurance RET
occurance
^D
$

In this example, spell echoed the word "occurance" after it was typed, meaning that this word was not in the system dictionary and therefore was likely a misspelling. A Control-D was typed to exit spell and return to the shell prompt.

Next week: How to make and manage documents with SGML-tools.

Michael Stutz is our regular Living Linux columnist and has been exploring the question, "Can design science engineer a free society?" as a primary research interest for the past several years.