20011106: Conversation about identifiers, labels and categories

Contact:cdent@burningchrome.com

This is some email between my step-father and I. Walt has a long
history of thinking about databases and has done a great deal of
reading and writing on strictly unique identifiers. He's the source
for my own feelings about identifiers needing to be meaningless or
else they are not identifiers and are thus broken.

Discussion of identifiers and labels leads to some discussion of
categorization.


-=-=-
From: "Chris Dent" <cdent@burningchrome.com>
To: XXXXX
Sent: Tuesday, November 06, 2001 1:06 AM
Subject: Dewey decimal system



how does your notion of unique, persistent, essentially meaningless
identifiers interact with dewey wherein the call number is both a key to the
location of the book and a mini language which describes the content of the
book. For example 821 is english fiction. If you know the language you can
pick up a book and see from the spine what it is potentially about.

This is used as an example of how dewey is less bad than library of
congress. It exercises putting knowledge into the world, with language, so
you have the opportunity to process less.

This is a key feature of augmentation, which I'm keen on.

So is there some distinction between objects that need to have meaningful
labels and those that need identifiers?

I'm in class right now, writing this on my pilot so this may be  a bit
stilted.


-=-=-
Date: Wed, 31 Oct 2001 20:31:30 -0500
From: Walt Woolfolk <XXXXXX>
To: Chris Dent <cdent@burningchrome.com>
Subject: Re: Dewey decimal system

Dewey? Egregious!  Clear violation of the the doctrine of strict
uniqueness (i.e., a stable identifier must be at least unique and at
most unique).

There are, of course, many examples of meaningful labels, but there
are no justifications I know of for them. That is, a thing possesses
a set of descriptive characteristics, some of which are identifying
characteristics. Since it is highly inconvenient (and unstable) to attempt
to always refer to a thing by its identifiying characteristics (take
yourself, for example- what would it take to describe you sufficiently to
identify you? and how would it likely change over time), an identifier
is assigned to the thing. By making the id strictly unique it serves
the purpose of picking out a single thing and it is immune to change.
Any or all of the thing's characteristics remain available for descriptive
purposes. So a thing has an identifier (label) which is best strictly
unique plus one or more descriptive characteristics.

In a library system item might have a strictly unique id (e.g.,
123456789), some location scheme, and many other descriptive and varying
characteristics. The id becomes the primary key to the item in the
database and any characteristics or combination of characteristics are
potential secondary keys (e.g., author last name).

The argument most often offered for meaningful ids is the convenience
of having certain information immediately available when one looks at
the id, so you don't have an additional look-up step to access that
information. The advantage of this is real, but completely trivial
when weighed against the disadvantages. In the case of Dewey, even this
minor advantage is offset by the fact that what is included in the id
is itself encoded, so you have to look up the meaning of each of the
codes anyway. The argument from convenience doesn't stand up. In fact
there are no good reasons for meaningful ids, and I suspect the real
reason for them is psychological. Finding out what that reason is would
make an interesting project for some grad student with an interest in
human cognition.


-=-=-
From cdent@burningchrome.com Tue Nov  6 01:08:10 2001
Date: Thu, 1 Nov 2001 23:59:37 -0500 (EST)
From: cdent@burningchrome.com
To: Walt Woolfolk <XXXXXX>
Subject: Re: Dewey decimal system

On Wed, 31 Oct 2001, Walt Woolfolk wrote:

> Dewey? Egregious!  Clear violation of the the doctrine of strict
> uniqueness (i.e., a stable identifier must be at least unique and at
> most unique).

Yeah, that's what I thought you would say, which is why I though I
would write.

After thinking about it in the bathtub, though I'm still wondering if
the call number is an identifier and not a label.

If it is a label, the problem is not that it is meaningful, but that
people think it is an identifier (instead of a label).

While for you and me the label still requires a lookup for decoding,
for someone who knows the language, no external lookup is required.
The call number is a signifier with meaning, People like those sorts
of things because they are easy (small) reference chunks to
complicated (large) bits of info.

This goes back to our categories conversation: people make categories
so they don't have to remember all the qualities of a thing in a
category, but can instead refer to it by the category label (e.g.
bird).

Cognitive scaffolding a prof of mine calls them.

>From a database system standpoint it would be an egregious error to
use the call number as the primary key to the book as, just like you
say, if the interpretation (and thus call number and location) changes
you're screwed, that change has to cascade around all over the place.

Presumably some people know this, but when it comes time to physicaly
identify the book (much different act than logically identifying it)
they don't want a unique ID because you'd have to go to some sort of
external (to the brain) device to find out where to put it on the
shelves (either of the library or the brain).

So, while you've just suggested some PhD research to find out why
people want meaningful ids, I find the case already mostly closed, in
that the problem is that people and computers don't think alike, and
shouldn't think alike. Let the tool do it's job, it doesn't think like
you and you don't want it to...

Sort of like: computers are relational databases, humans are
associative databases. Attempts to model people as relational
databases have failed. Attempts to get computers to do associative
linking have mostly fell on their face. By association I mean the
ability to create undefinable categories. Computers have trouble with
that whole lack of definition thing. They want rules.

I might have to quote us into my readings journal for this particular
class, if you don't mind?

I just got back from an outdoor rock climbing trip to a nearby
roadcut that's been developed into a bit of a climbing area. We got
there early enough to get the rope set up before the sun went down,
and then the light of the moon through the clouds led the way. It was
fantastic.

-- 
Chris Dent  <cdent@burningchrome.com>  http://www.burningchrome.com/~cdent/


-=-=-
Date: Fri, 2 Nov 2001 10:28:09 -0500
From: XXXXXX
To: cdent@burningchrome.com
Subject: Dewey or not?

If people want the location (encoded or not) on the book, put it on the
book - no problem - just don't put it in the book's id

What is your distinction between id and label?

-=-=-
From cdent@burningchrome.com Tue Nov  6 01:08:27 2001
Date: Fri, 2 Nov 2001 14:28:20 -0500 (EST)
From: cdent@burningchrome.com
To: XXXXX
Subject: Re: Dewey or not?

On Fri, 2 Nov 2001 XXXXX wrote:

> If people want the location (encoded or not) on the book, put it on the
> book - no problem - just don't put it in the book's id

Right, that's basically the different between a label and id in the
way I was saying it.

Unfortunately people seem to want to use the label as the id. For
example, although the catalogging software for the library here at IU
has a title control number which is a unique ID for a resource, it's
value is so completely obscured by all kinds of crufty things people
try to do to get to stuff in non-referential ways.

> What is your distinction between id and label?

(Note I'm making this up as a I go along)

Several different descriptions:

database:
id is primary key
label is one or more concatenated descriptive fields

categorization:
id is a _reference_ to something which fulfills a strict definition
label is a name of something which approaches some high (but
   undefined) level of typicality of a category (which is itself
   undefinable)

information architecture:
(LIS has this notion of a discipline called information architecture
which has a whole lot to do with wayfinding, navigation, context
generation, signage, etc)
id is a reference to an entity (say the URL of a web document)
label is a name for the entity so someone can identify it (somewhat
   oxymoronic...) (say the words which are the link button, indicating
   (or, ha, identifying) a link)

More generally I'd say what I'm thinking is that an ID is a unique
reference which points to something which fits into a strictly defined
class of entities. In a database you only put something in the books
table if it is a book or you have _declared_ it a book. When it is in
there you need a handle to it, that's the ID.

Labels, on the other hand, are handles to categories of one or more
entities which have been associated for some reason which is
beneficial into a grouping. The label indicates the group. You can
lable a database table, but you can also label a bunch of stuff which
sort of, but maybe not completely, fits together well for the sake of
some exercise.

I'm not sure, does that hang together?

I'm potentially trying to shape the world to my brain and not the
other way round, which could be broken.  Or: I feel like I'm spewing a
bunch of stuff that is potentially interesting, or comletely booboo,
and I'm not sure which it is.


Back to the Index