Tuesday, April 8, 2008

'What you reach for is your Roget's'

From: Thesaurus Unbound

It was precisely that scientific bent that was his book's distinction. The organization of Piozzi's and Girard's, as well the handful of others that were published before Roget's 1852 thesaurus, was scattershot by comparison. Roget's, which was remarkably successful in its author's lifetime, was a comprehensive system of synonyms and antonyms. Roget built a numbered inventory of 1,000 fundamental ideas, like "existence," which appeared with a set of related words, ens, entity, being, existence. Later, he came up with a series of six nested classes, which were inspired by the Linnaean classification of animals. Thus, Kendall writes, " 'Perfection' falls under Class V, 'Words relating to the Voluntary Powers,' Division I, 'Individual Volition,' and Section i, 'Volition in General.' " The higher the level, the more abstract the idea; the lower the level, the more specific. Roget considered his book the opposite of the dictionary: You started with the idea and then found the word. His project was so original and so immense in scope that it has taken not just time but the connectivity, the huge databases, and the broad online access of modern information architecture even to begin to outstrip it.

Exactly where print reference will be in 10 years time is still murky, but the writing is quite clearly on the wall or, if you prefer, the desktop. A recent survey to be published by the Dictionary Society of America found that while students use dictionaries as much as they ever did, the online versions have overtaken paper. Many students use Thesaurus.com and Dictionary.com (also Reference.com), which in November 2007 had 15.1 million unique visitors. Conversely, the 2008 print edition of Quid, formerly one of France's most popular encyclopedias, was canceled last month for want of sales.

Happily, if the computer processing of words is killing reference books, it's also making them better. In particular, word reference is morphing faster and smarter than any other kind of compendium out there. The innovation is not just a matter of a new medium that permits us to get online what we used to turn pages for. There has been an evolutionary leap, too: The digitization of words in time allows us to see language as it really is—not so much an abstract code as a dynamic system.

One of the most important spurs to word research is the increased use of the corpus, the term used to refer to any large body of written or spoken communications, be it a collection of medieval manuscripts or a folder of sound files. Diverse scholars of language have long amassed corpora, such as books on particular topics or writings by particular people, in order to analyze the language of the whole. Before the computer era, corpus work required painstaking, slow tabulation. With a computerized corpus, you can search and count (and run any other kind of linguistic analysis) with greater ease. Corpus linguistics means that the language of thousands of people can be mined by lexicographers, reflecting the facts of English as it is spoken or written by a population, not just English as it was spoken by Peter Mark Roget. If Roget's Thesaurus, along with Webster's and Johnson's original dictionaries, is the idiosyncratic cartography of brilliant 19th-century explorers, then this stuff is GPS.

No comments:

Post a Comment