Contact:cdent@burningchrome.com
Hammond, T.H. (1993). Toward a general theory of hierarchy: books, bureaucrats, basketball tournaments and the administrative structure of the nation-state. _Journal of Public Administration Research and Theory 3_(1), 120-145. Hammond describes how the hierarchical structure of institutions affects how information in the hierarchy is transformed and used. This happens because hierarchies inform how information is categorized and thus how comparisons are made. Hierarchies control how information is aggregated and transmitted, thus controlling how problems and solutions are discovered and defined. Some examples are given, including: in different library classification schema adjacency is defined differently because of different categorical relationships--meaning the results of serendipitous browsing in the shelves or catalog will be different from one scheme to another; in an intelligence organization how people filter information, determining relevancy, controls what information the final decision maker at the top of the hierarchy will see and act upon. Hammond's conclusion is that since hierarchies are present, as in any politicized institution, in the nation-state the organization of the nation-state impacts the sort of problems that can be identified, shared and worked upon by the state. Knowledge of this will help in the understanding of the behavior of nation-states. Back to the Index
Contact:cdent@burningchrome.com
The flip side to categories being resistant to definition is that they are also difficult to enumerate. If one is able to define and enumerate that's probably a classification system. Back to the Index
Contact:cdent@burningchrome.com
Studer, P.A. (1977). Classification as a general systems construct. In B.M. Fry & C.A. Shepherd (Comp.) Information management in the 1980's: Proceedings of the [40th] ASIS Annual Meeting, Chicago, Illinois, September 26-October 1, 1977 (pp. 67, C6-C14, A1-A9). White Plains, NY: Knowledge Industry for American Society for Information Science. -=-=- While reading the Studer article from session 9 it occurred to me that there seems to be a lack of consistency in the literature between the use of the terms classification and categorization. Studer seems to use the terms almost interchangeably, especially when he is quoting. That is, while he uses the term classification the quoted text uses category. He makes it sound like the process of creating classifications is a step following the creation or identification of categories. This conflicts with how I've been thinking about the terms. Perhaps somebody can confirm or reject the following views? In my view classification is a sort of artificial process by which we organize things for presentation or later access. It involves the arbitrary creation of a group of classes, potentially arranged in a hierarchy, which have explicit definitions. In other words a class is strictly defined and once inhabited the inhabitants can be enumerated. Categorization, on the other hand is natural process in the sense that humans do it out of their cognitive fundament. It is, like Studer reports, an act of simplification to make apprehension and comprehension of the environment more efficient. Categories spring up out of necessity and because they are designed to replace the details of definition are themselves resistant to definition. When provided with a list of stuff we are able to categorize the stuff, but when asked to list the full contents of a category we can't. So to put it more succinctly: - a class is a defined grouping of entities in which the members fulfill the definition of the class and can be listed. - a category is a cognitive label applied to a non-enumerable grouping of entities wherein membership is determined by typicality amongst the members and not some overarching definition. This is important to me, in part, because I'm playing around with trying to determine if computers can ever be actually intelligent or must always fake it. I vote for the latter because computers cannot categorize. The ability to categorize seems to be the basis for intelligence. On the fly categorization allows us to place data in an informational context. Once in that matrix we can do what seems to amount to an endless recursive dialectic wherein each new synthesis becomes thesis. Computers can presumably replicate this process but it is imitation. Their distinctions must be made by definition, by classification, not categorization. They can be made to appear to do categorization but the alternate representations they provide are rules (definition) based. Thus far the most promising research in creating seemingly intelligent machines has used what can be called a brute force approach: supply the computer with as much information as possible, related in as many ways as possible. This is the method that IBM used to get Deep Blue to become a chess champion is the key to the Semantic Web. If we want to create truly intelligent machines a then is determining how categorization really works. I wonder, though, why we want intelligent machines. Don't we really just want machines that are tools to augment our own intelligence? If that's the case, then we are already there: we simply need to improve on what we have. Back to the Index
Contact:cdent@burningchrome.com
This is some email between my step-father and I. Walt has a long history of thinking about databases and has done a great deal of reading and writing on strictly unique identifiers. He's the source for my own feelings about identifiers needing to be meaningless or else they are not identifiers and are thus broken. Discussion of identifiers and labels leads to some discussion of categorization. -=-=- From: "Chris Dent" <cdent@burningchrome.com> To: XXXXX Sent: Tuesday, November 06, 2001 1:06 AM Subject: Dewey decimal system how does your notion of unique, persistent, essentially meaningless identifiers interact with dewey wherein the call number is both a key to the location of the book and a mini language which describes the content of the book. For example 821 is english fiction. If you know the language you can pick up a book and see from the spine what it is potentially about. This is used as an example of how dewey is less bad than library of congress. It exercises putting knowledge into the world, with language, so you have the opportunity to process less. This is a key feature of augmentation, which I'm keen on. So is there some distinction between objects that need to have meaningful labels and those that need identifiers? I'm in class right now, writing this on my pilot so this may be a bit stilted. -=-=- Date: Wed, 31 Oct 2001 20:31:30 -0500 From: Walt Woolfolk <XXXXXX> To: Chris Dent <cdent@burningchrome.com> Subject: Re: Dewey decimal system Dewey? Egregious! Clear violation of the the doctrine of strict uniqueness (i.e., a stable identifier must be at least unique and at most unique). There are, of course, many examples of meaningful labels, but there are no justifications I know of for them. That is, a thing possesses a set of descriptive characteristics, some of which are identifying characteristics. Since it is highly inconvenient (and unstable) to attempt to always refer to a thing by its identifiying characteristics (take yourself, for example- what would it take to describe you sufficiently to identify you? and how would it likely change over time), an identifier is assigned to the thing. By making the id strictly unique it serves the purpose of picking out a single thing and it is immune to change. Any or all of the thing's characteristics remain available for descriptive purposes. So a thing has an identifier (label) which is best strictly unique plus one or more descriptive characteristics. In a library system item might have a strictly unique id (e.g., 123456789), some location scheme, and many other descriptive and varying characteristics. The id becomes the primary key to the item in the database and any characteristics or combination of characteristics are potential secondary keys (e.g., author last name). The argument most often offered for meaningful ids is the convenience of having certain information immediately available when one looks at the id, so you don't have an additional look-up step to access that information. The advantage of this is real, but completely trivial when weighed against the disadvantages. In the case of Dewey, even this minor advantage is offset by the fact that what is included in the id is itself encoded, so you have to look up the meaning of each of the codes anyway. The argument from convenience doesn't stand up. In fact there are no good reasons for meaningful ids, and I suspect the real reason for them is psychological. Finding out what that reason is would make an interesting project for some grad student with an interest in human cognition. -=-=- From cdent@burningchrome.com Tue Nov 6 01:08:10 2001 Date: Thu, 1 Nov 2001 23:59:37 -0500 (EST) From: cdent@burningchrome.com To: Walt Woolfolk <XXXXXX> Subject: Re: Dewey decimal system On Wed, 31 Oct 2001, Walt Woolfolk wrote: > Dewey? Egregious! Clear violation of the the doctrine of strict > uniqueness (i.e., a stable identifier must be at least unique and at > most unique). Yeah, that's what I thought you would say, which is why I though I would write. After thinking about it in the bathtub, though I'm still wondering if the call number is an identifier and not a label. If it is a label, the problem is not that it is meaningful, but that people think it is an identifier (instead of a label). While for you and me the label still requires a lookup for decoding, for someone who knows the language, no external lookup is required. The call number is a signifier with meaning, People like those sorts of things because they are easy (small) reference chunks to complicated (large) bits of info. This goes back to our categories conversation: people make categories so they don't have to remember all the qualities of a thing in a category, but can instead refer to it by the category label (e.g. bird). Cognitive scaffolding a prof of mine calls them. >From a database system standpoint it would be an egregious error to use the call number as the primary key to the book as, just like you say, if the interpretation (and thus call number and location) changes you're screwed, that change has to cascade around all over the place. Presumably some people know this, but when it comes time to physicaly identify the book (much different act than logically identifying it) they don't want a unique ID because you'd have to go to some sort of external (to the brain) device to find out where to put it on the shelves (either of the library or the brain). So, while you've just suggested some PhD research to find out why people want meaningful ids, I find the case already mostly closed, in that the problem is that people and computers don't think alike, and shouldn't think alike. Let the tool do it's job, it doesn't think like you and you don't want it to... Sort of like: computers are relational databases, humans are associative databases. Attempts to model people as relational databases have failed. Attempts to get computers to do associative linking have mostly fell on their face. By association I mean the ability to create undefinable categories. Computers have trouble with that whole lack of definition thing. They want rules. I might have to quote us into my readings journal for this particular class, if you don't mind? I just got back from an outdoor rock climbing trip to a nearby roadcut that's been developed into a bit of a climbing area. We got there early enough to get the rope set up before the sun went down, and then the light of the moon through the clouds led the way. It was fantastic. -- Chris Dent <cdent@burningchrome.com> http://www.burningchrome.com/~cdent/ -=-=- Date: Fri, 2 Nov 2001 10:28:09 -0500 From: XXXXXX To: cdent@burningchrome.com Subject: Dewey or not? If people want the location (encoded or not) on the book, put it on the book - no problem - just don't put it in the book's id What is your distinction between id and label? -=-=- From cdent@burningchrome.com Tue Nov 6 01:08:27 2001 Date: Fri, 2 Nov 2001 14:28:20 -0500 (EST) From: cdent@burningchrome.com To: XXXXX Subject: Re: Dewey or not? On Fri, 2 Nov 2001 XXXXX wrote: > If people want the location (encoded or not) on the book, put it on the > book - no problem - just don't put it in the book's id Right, that's basically the different between a label and id in the way I was saying it. Unfortunately people seem to want to use the label as the id. For example, although the catalogging software for the library here at IU has a title control number which is a unique ID for a resource, it's value is so completely obscured by all kinds of crufty things people try to do to get to stuff in non-referential ways. > What is your distinction between id and label? (Note I'm making this up as a I go along) Several different descriptions: database: id is primary key label is one or more concatenated descriptive fields categorization: id is a _reference_ to something which fulfills a strict definition label is a name of something which approaches some high (but undefined) level of typicality of a category (which is itself undefinable) information architecture: (LIS has this notion of a discipline called information architecture which has a whole lot to do with wayfinding, navigation, context generation, signage, etc) id is a reference to an entity (say the URL of a web document) label is a name for the entity so someone can identify it (somewhat oxymoronic...) (say the words which are the link button, indicating (or, ha, identifying) a link) More generally I'd say what I'm thinking is that an ID is a unique reference which points to something which fits into a strictly defined class of entities. In a database you only put something in the books table if it is a book or you have _declared_ it a book. When it is in there you need a handle to it, that's the ID. Labels, on the other hand, are handles to categories of one or more entities which have been associated for some reason which is beneficial into a grouping. The label indicates the group. You can lable a database table, but you can also label a bunch of stuff which sort of, but maybe not completely, fits together well for the sake of some exercise. I'm not sure, does that hang together? I'm potentially trying to shape the world to my brain and not the other way round, which could be broken. Or: I feel like I'm spewing a bunch of stuff that is potentially interesting, or comletely booboo, and I'm not sure which it is. Back to the Index
Contact:cdent@burningchrome.com
Jacob, E.K., & Albrechtsen, H. (1997). Constructing reality: the role of dialogue in the development of classificatory structures. In I.C. McIlwaine (Ed), Knowledge organization for information retrieval: Proceedings of the 6th International Study Conference on Classification Research, 14-16 June 1997, London (pp. 42-50). The Hague, Netherlands: Internation Federation of Documentation. -=-=- Dovetails nicely with the discussion of ontologies and the semantic web. Ontologies are epistemes. In the utopian view of the semantic web, machines will be able to exchange ontologies to combat heteroglot. Sounds like dialogue. Such dialogue, as stated, will need to be in unitary languages or at least a close approximation. I fear there is a danger in the proliferation of unitary languages. If a language is well defined inference is less fertile. Many a great idea has come from skimming the connotative effluvia of misunderstanding. Evolution results from mutation: from error. As an aside: this article points out some of the reasons for my resistance to professionalization: In part a profession is achieved by the establishment of a well-constructed language. Such a language can create barriers between those who are considered in the know and those who aren't. Often this is necessary for safety purposes (doctors) but in other situations the creation of a well constructed language appears to be an excuse to write more papers about the domain because you can't figure out what the domain is (information science). (I'm aware of the paradox and irony.) Back to the Index
Contact:cdent@burningchrome.com
Bowker, G.C. & Star, L.S. (1999). Chapter 3: The ICD as information infrastructure. In _Sorting things out: Classification and its consequences_ (p. 107-133). Cambridge MA: MIT Press. -=-=- A whole slew of information on how systems of classification help to create infrastructure in systems. In there, two items stood out for me: Quoting the League of Nations: Rather than omit from the beginning all which are not yet satisfactory, the authors have hoped, by including them and utilizing them for what they are worth, to create a demand for their improvement... This models a solution to a frequent stumbling block for "Information Architects" in this day and age. So often people want to come up with a structure before they really know what the resource will be used for. The search for structure becomes so intense that using the resource is delayed and delayed until its eventual value is lost. I advocate, instead, for situations where the structure is not apparent, the following process: - get the data - if it is already chunked in some fashion, give those chunks unique identifiers - build an information retrieval system that does free text indexing to allow string matching At this stage we now have a semi-useful resource where there was nothing before. Next: - as searches reveal user needs: - begin tagging resource with metadata - and/or reevaluate the chunking of the documents - use the metadata to create faceted retrieval systems As Wheatley suggested: information is a process that causes organization. The organizational structures we impose upon in information can be reveal in how we use the information. They are structures of convenience and as such we must be prepared to undertake inconvenient work to create them. There's a law of conservation of convenience in there somewhere. The second interesting point: On page 108 the sentence No knowledge system exists in a vacuum, it must be rendered compatible with other systems. has been underlined and the comment "Not so!" is nearby. I can't agree with the comment. What about the knowledge systems of the users and the organizations that use the systems and within which the system exists? The original system must be able to interoperate with those. Back to the Index
Contact:cdent@burningchrome.com
Bowker, G.C. & Star, L.S. (1999). Chapter 4: Classification, coding and coordination. In _Sorting things out: Classification and its consequences_ (p. 135-161). Cambridge MA: MIT Press. -=-=- Laborious explication of the difficulties of communication between cultures, including constructed cultures such as the ICD. Difficulties are very noticable in efforts (such as the ICD) to systematize what would be flexible systems of categorization if there were no need for the classification. Underscores the notion of the suitably restricted domain discussed by Suchman when considering the efficacy of interaction between humans and technology. Technological solutions (of which a classification system is a type) are only able to interact gracefully with a human or group of humans if the domain under consideration is suitably constrained. Constrained in this context is both bredth and depth. The ICD is certainly not very constrained. There's this ongoing discovery of a boundary between two things that can be modelled in various ways: concept | theory categorization | classification craft | science flexibility | rigidity adaptability | precision Those "two things" are both of value and must be respected in the design of any information system. Ignoring or deemphasizing either will result in a failure of the system to be completely effective. Back to the Index