Old Stuff: Classification v Categorization
June 21, 2005
When I was in school I took an Information Architecture class that required a readings journal. Some of those entries deserve revision. (PPB)
I was, at the time, especially in the difference, similarities and interrelationships between classification and categorization. What follows began life as Studer: Classification v. Categorization. The first version was written November, 2001. (PPC)
Studer, P.A. (1977). Classification as a general systems construct. In B.M. Fry & C.A. Shepherd (Comp.) Information management in the 1980's: Proceedings of the [40th] ASIS Annual Meeting, Chicago, Illinois, September 26-October 1, 1977 (pp. 67, C6-C14, A1-A9). White Plains, NY: Knowledge Industry for American Society for Information Science. (PPD)
The Studer article suggests there is a lack of consistency in the literature in the use of the terms classification and categorization. Studer uses the terms carelessly, especially when quoting: while he uses the term classification in his own text the quoted text uses category. (PPE)
Studer makes it sound like the process of creating classifications is a step following the creation or identification of categories. This conflicts with my interpretations. (PPF)
In my view classification is an artificial (synthetic, non-fundamental) process by which we organize things for presentation or later access. It involves the arbitrary creation of a group of classes which have explicit definitions and may be arranged in a hierarchy. In other words a class is strictly defined and once inhabited the inhabitants can be enumerated. (PPG)
Categorization, on the other hand is a natural process in the sense that humans do it as part of their cognitive fundament. It is, like Studer reports, an act of simplification to make apprehension and comprehension of the environment more efficient. Categories spring up out of necessity and because they are designed to replace the details of definition are themselves resistant to definition. When provided with a list of stuff we are able to categorize the stuff, but when asked to list the full contents of a category we cannot. (PPH)
So to put it more succinctly: (PPI)
- a class is a defined grouping of entities in which the members fulfill the definition of the class and can be listed. (PPJ)
- a category is a cognitive label applied to a non-enumerable grouping of entities wherein membership is determined by typicality amongst the members and not some overarching definition. (PPK)
This is important to me, in part, because I'm playing around with trying to determine if computers can ever be actually intelligent or must always fake it. I vote for the latter because computers, thus far, cannot categorize. (PPL)
The ability to categorize may be the basis for intelligence (On Intelligence, by Jeff Hawkins, presents some data to support this, as well as some assertions that may blow my "thus far" out of the water, given time). On the fly categorization allows us to place data in an informational context. Once in that matrix we can do what amounts to an endless recursive dialectic wherein each new synthesis becomes thesis. (PPM)
Computers can presumably replicate this process but if they do, it is imitation. Their distinctions must be made by definition, by classification, not categorization. They can be made to appear to do categorization but the alternate representations they provide are rules (definition) based. Until recently the most promising research in creating seemingly intelligent machines has used what can be called a brute force approach: supply the computer with as much information as possible, related in as many ways as possible. This is the method that IBM used to get Deep Blue to become a chess champion and is one of the keys to the Semantic Web. (PPN)
If we want to create truly intelligent machines we must determine how categorization works. I wonder, though, why we want intelligent machines. What do we gain from that? Don't we instead want machines that are tools to augment our own intelligence? If that's the case, then we are already have the understandings to make progress: we simply need to improve on what we have. (PPO)
Comments
I may be wrong, as I'm not up on technical uses of categorize / classify, but it sounds to me that what you really are talking about is the distinction between prototype based or bottom-up categorization vs. top-down, rules based categorization. (PPQ)
In the first, the computer gets to see a whole lot of examples of things, described in terms of a number of "features", and then works out categories according to some criteria of its own (normally some kind of measurement of the similarity / difference of each example in a high-dimensional space defined by the features.) This is what plenty of AI architectures such as neural networks, fuzzy logic systems, genetic algorithms etc. do. (PPR)
In the second case, rules are made explicitly - either by human experts before the categorization begins, or infered by the computer. (PPS)
The main difference seems to me that the first kind of system can work with rules that are hard or impossible to excplicate and represent in a way which is comprehensable to humans. The classification scheme might only be representable by a huge matrix of weights between neurons etc. (PPT)
The rules based approach is really an assumption that the categorization criteria can be made human readable. (PPU)
I've certainly put a lot of emphasis on this distinction in the past, but these days I'm not sure how much it really tells us about the actual difference between "real" intelligence and mere computer fakery. (PPV)
Certainly, humans haven't decoded the way the brain represents distinctions and put them into succinct rules. But that doesn't mean much. Even rules based systems will have to be extremely complex to do even simple stuff (look at Cyc). (PPW)
So whether there's actually any more to "real" intelligence than categorization as we already understand it, just done on a grand scale in the brain, I'm not sure. (PPX)
BTW : sorry to say, but I don't like your comments system much. It pops up something which covers the main text of your posting, and because it's not a window, I can't drag it out of the way to reread your post underneath. What's wrong with a separate window? (PPY)