Studer: Classification v. Categorization

Contact:cdent@burningchrome.com

Studer, P.A. (1977). Classification as a general systems construct. In
     B.M. Fry & C.A. Shepherd (Comp.) Information management in the
     1980's: Proceedings of the [40th] ASIS Annual Meeting, Chicago,
     Illinois, September 26-October 1, 1977 (pp. 67, C6-C14, A1-A9).
     White Plains, NY: Knowledge Industry for American Society for
     Information Science.

-=-=-

While reading the Studer article from session 9 it occurred to me that
there seems to be a lack of consistency in the literature between the
use of the terms classification and categorization. Studer seems to use
the terms almost interchangeably, especially when he is quoting. That is,
while he uses the term classification the quoted text uses category.

He makes it sound like the process of creating classifications is a step
following the creation or identification of categories.

This conflicts with how I've been thinking about the terms. Perhaps
somebody can confirm or reject the following views?

In my view classification is a sort of artificial process by which
we organize things for presentation or later access. It involves the
arbitrary creation of a group of classes, potentially arranged in a
hierarchy, which have explicit definitions. In other words a class is
strictly defined and once inhabited the inhabitants can be enumerated.

Categorization, on the other hand is natural process in the sense
that humans do it out of their cognitive fundament. It is, like Studer
reports, an act of simplification to make apprehension and comprehension
of the environment more efficient. Categories spring up out of necessity
and because they are designed to replace the details of definition are
themselves resistant to definition. When provided with a list of stuff we
are able to categorize the stuff, but when asked to list the full contents
of a category we can't.

So to put it more succinctly:

- a class is a defined grouping of entities in which the members
  fulfill the definition of the class and can be listed.
- a category is a cognitive label applied to a non-enumerable grouping
  of entities wherein membership is determined by typicality amongst
  the members and not some overarching definition.

This is important to me, in part, because I'm playing around with trying
to determine if computers can ever be actually intelligent or must always
fake it. I vote for the latter because computers cannot categorize.

The ability to categorize seems to be the basis for intelligence. On
the fly categorization allows us to place data in an informational
context. Once in that matrix we can do what seems to amount to an endless
recursive dialectic wherein each new synthesis becomes thesis.

Computers can presumably replicate this process but it is imitation. Their
distinctions must be made by definition, by classification, not
categorization. They can be made to appear to do categorization but the
alternate representations they provide are rules (definition) based.
Thus far the most promising research in creating seemingly intelligent
machines has used what can be called a brute force approach: supply
the computer with as much information as possible, related in as many
ways as possible. This is the method that IBM used to get Deep Blue to
become a chess champion is the key to the Semantic Web.

If we want to create truly intelligent machines a then is determining
how categorization really works. I wonder, though, why we want
intelligent machines. Don't we really just want machines that are
tools to augment our own intelligence? If that's the case, then we are
already there: we simply need to improve on what we have.


Back to the Index