Contact:cdent@burningchrome.com
This is some email between my step-father and I. Walt has a long history of thinking about databases and has done a great deal of reading and writing on strictly unique identifiers. He's the source for my own feelings about identifiers needing to be meaningless or else they are not identifiers and are thus broken. Discussion of identifiers and labels leads to some discussion of categorization. -=-=- From: "Chris Dent" <cdent@burningchrome.com> To: XXXXX Sent: Tuesday, November 06, 2001 1:06 AM Subject: Dewey decimal system how does your notion of unique, persistent, essentially meaningless identifiers interact with dewey wherein the call number is both a key to the location of the book and a mini language which describes the content of the book. For example 821 is english fiction. If you know the language you can pick up a book and see from the spine what it is potentially about. This is used as an example of how dewey is less bad than library of congress. It exercises putting knowledge into the world, with language, so you have the opportunity to process less. This is a key feature of augmentation, which I'm keen on. So is there some distinction between objects that need to have meaningful labels and those that need identifiers? I'm in class right now, writing this on my pilot so this may be a bit stilted. -=-=- Date: Wed, 31 Oct 2001 20:31:30 -0500 From: Walt Woolfolk <XXXXXX> To: Chris Dent <cdent@burningchrome.com> Subject: Re: Dewey decimal system Dewey? Egregious! Clear violation of the the doctrine of strict uniqueness (i.e., a stable identifier must be at least unique and at most unique). There are, of course, many examples of meaningful labels, but there are no justifications I know of for them. That is, a thing possesses a set of descriptive characteristics, some of which are identifying characteristics. Since it is highly inconvenient (and unstable) to attempt to always refer to a thing by its identifiying characteristics (take yourself, for example- what would it take to describe you sufficiently to identify you? and how would it likely change over time), an identifier is assigned to the thing. By making the id strictly unique it serves the purpose of picking out a single thing and it is immune to change. Any or all of the thing's characteristics remain available for descriptive purposes. So a thing has an identifier (label) which is best strictly unique plus one or more descriptive characteristics. In a library system item might have a strictly unique id (e.g., 123456789), some location scheme, and many other descriptive and varying characteristics. The id becomes the primary key to the item in the database and any characteristics or combination of characteristics are potential secondary keys (e.g., author last name). The argument most often offered for meaningful ids is the convenience of having certain information immediately available when one looks at the id, so you don't have an additional look-up step to access that information. The advantage of this is real, but completely trivial when weighed against the disadvantages. In the case of Dewey, even this minor advantage is offset by the fact that what is included in the id is itself encoded, so you have to look up the meaning of each of the codes anyway. The argument from convenience doesn't stand up. In fact there are no good reasons for meaningful ids, and I suspect the real reason for them is psychological. Finding out what that reason is would make an interesting project for some grad student with an interest in human cognition. -=-=- From cdent@burningchrome.com Tue Nov 6 01:08:10 2001 Date: Thu, 1 Nov 2001 23:59:37 -0500 (EST) From: cdent@burningchrome.com To: Walt Woolfolk <XXXXXX> Subject: Re: Dewey decimal system On Wed, 31 Oct 2001, Walt Woolfolk wrote: > Dewey? Egregious! Clear violation of the the doctrine of strict > uniqueness (i.e., a stable identifier must be at least unique and at > most unique). Yeah, that's what I thought you would say, which is why I though I would write. After thinking about it in the bathtub, though I'm still wondering if the call number is an identifier and not a label. If it is a label, the problem is not that it is meaningful, but that people think it is an identifier (instead of a label). While for you and me the label still requires a lookup for decoding, for someone who knows the language, no external lookup is required. The call number is a signifier with meaning, People like those sorts of things because they are easy (small) reference chunks to complicated (large) bits of info. This goes back to our categories conversation: people make categories so they don't have to remember all the qualities of a thing in a category, but can instead refer to it by the category label (e.g. bird). Cognitive scaffolding a prof of mine calls them. >From a database system standpoint it would be an egregious error to use the call number as the primary key to the book as, just like you say, if the interpretation (and thus call number and location) changes you're screwed, that change has to cascade around all over the place. Presumably some people know this, but when it comes time to physicaly identify the book (much different act than logically identifying it) they don't want a unique ID because you'd have to go to some sort of external (to the brain) device to find out where to put it on the shelves (either of the library or the brain). So, while you've just suggested some PhD research to find out why people want meaningful ids, I find the case already mostly closed, in that the problem is that people and computers don't think alike, and shouldn't think alike. Let the tool do it's job, it doesn't think like you and you don't want it to... Sort of like: computers are relational databases, humans are associative databases. Attempts to model people as relational databases have failed. Attempts to get computers to do associative linking have mostly fell on their face. By association I mean the ability to create undefinable categories. Computers have trouble with that whole lack of definition thing. They want rules. I might have to quote us into my readings journal for this particular class, if you don't mind? I just got back from an outdoor rock climbing trip to a nearby roadcut that's been developed into a bit of a climbing area. We got there early enough to get the rope set up before the sun went down, and then the light of the moon through the clouds led the way. It was fantastic. -- Chris Dent <cdent@burningchrome.com> http://www.burningchrome.com/~cdent/ -=-=- Date: Fri, 2 Nov 2001 10:28:09 -0500 From: XXXXXX To: cdent@burningchrome.com Subject: Dewey or not? If people want the location (encoded or not) on the book, put it on the book - no problem - just don't put it in the book's id What is your distinction between id and label? -=-=- From cdent@burningchrome.com Tue Nov 6 01:08:27 2001 Date: Fri, 2 Nov 2001 14:28:20 -0500 (EST) From: cdent@burningchrome.com To: XXXXX Subject: Re: Dewey or not? On Fri, 2 Nov 2001 XXXXX wrote: > If people want the location (encoded or not) on the book, put it on the > book - no problem - just don't put it in the book's id Right, that's basically the different between a label and id in the way I was saying it. Unfortunately people seem to want to use the label as the id. For example, although the catalogging software for the library here at IU has a title control number which is a unique ID for a resource, it's value is so completely obscured by all kinds of crufty things people try to do to get to stuff in non-referential ways. > What is your distinction between id and label? (Note I'm making this up as a I go along) Several different descriptions: database: id is primary key label is one or more concatenated descriptive fields categorization: id is a _reference_ to something which fulfills a strict definition label is a name of something which approaches some high (but undefined) level of typicality of a category (which is itself undefinable) information architecture: (LIS has this notion of a discipline called information architecture which has a whole lot to do with wayfinding, navigation, context generation, signage, etc) id is a reference to an entity (say the URL of a web document) label is a name for the entity so someone can identify it (somewhat oxymoronic...) (say the words which are the link button, indicating (or, ha, identifying) a link) More generally I'd say what I'm thinking is that an ID is a unique reference which points to something which fits into a strictly defined class of entities. In a database you only put something in the books table if it is a book or you have _declared_ it a book. When it is in there you need a handle to it, that's the ID. Labels, on the other hand, are handles to categories of one or more entities which have been associated for some reason which is beneficial into a grouping. The label indicates the group. You can lable a database table, but you can also label a bunch of stuff which sort of, but maybe not completely, fits together well for the sake of some exercise. I'm not sure, does that hang together? I'm potentially trying to shape the world to my brain and not the other way round, which could be broken. Or: I feel like I'm spewing a bunch of stuff that is potentially interesting, or comletely booboo, and I'm not sure which it is. Back to the Index