J. Organizing Knowledge: Indexing, Indexing Languages and Thesauri

Sorted By Creation Time

20011028: Buckland, Vocabulary as a central concept...

Contact:cdent@burningchrome.com


Buckland, M. (1999). Vocabulary as a central concept in library and
     information science. In T. Arpanac et al. (Eds.), Digital libraries:
     interdisciplinary concepts, challenges, and opportunities. Proceedings
     of the Third International Conference on Conceptions of Library
     and Information Science [CoLIS3] 23-26 May 1999, Dubrovnik,
     Croatia, (p 3-12), Zagreb: Lokve. Available at
     http://www.sims.berkeley.edu/~buckland/colisvoc.htm

-=-=-

Yes it's preprint, but Wow, the typos! I usually expect more from
Buckland.

The content disappoints as well. There is this ongoing fetishizing of
classificatory subject heading systems. Buckland's examples, as he himself
states, are obvious and even hackneyed. They allow him to make his point
but why belabor it so?

Syndetic structures are valuable but expensive to create and subject
to the same limitations of the primary controlled vocabulary. I propose
instead systems which provide free text indexing of the primary vocabulary
with the option to include, choose and modify queries created out of a
collection of thesauri (in the Roget sense, not the older IS sense).

This will allow the searcher to take advantage of the dynamic nature of
language (about which Buckland seems to complain: he discusses the
enormous capacity of human speech to determine meaning through
interaction and then wants to minimize interaction in searching).

Beyond that, I agree--must agree--that vocabulary is central to
IS. Vocabulary is central to categorization. Categorization underlies IS.


Back to the Index

20011028: Batty, WWW -- Wealth, Weariness or Waste

Contact:cdent@burningchrome.com


Batty, D. (1998). WWW -- Wealth, Weariness or Waste: Controlled
     vocabulary and thesauri in support of online information access.
     D-Lib Magazine, November 1998. Available at:
     http://webdoc.sub.gwdg.de/edoc/aw/d-lib/dlib/november98/11batty.html

-=-=-
How about this idea:

web interface
takes terms
takes online thesaurus selection (e.g. Wordnet)
take distance value (depth of traverse in thesaurus)

generates queries to google based on the logic described in article

this is somewhat like what altavista used to do with their queries on
that that java app. 

Would this increase precision at all or just raise recall? At the
momemt recall is generally pretty high but people make short queries
because long queries sometime ruin both recall and precision. 
-=-=-

CDB Enterprises' decision to construct a dual interface to the
Washington Post articles is, to me, an excellent solution. In a
situation of that sort (article archive) if there was only one option,
I would choose whole text indexing. Best would be both whole text
indexing and a system of tagging articles with terms from a controlled
vocabulary that creates an index.

-=-=-

See also http://www.burningchrome.com/~cdent/slis/l505/papers/slisessay12.htm
for a (not fully formed) discussion of dynamic hierarchy systems. That
is, delaying the creation of hierarchy until it is needed by the user.


Back to the Index

20011030: Bush, As We May Think

Contact:cdent@burningchrome.com

Bush, V. (1996/1945). As we may think. Interactions, 3(2), 35-46.
     Originally published in _Atlantic Monthly, 176 (1), 101-108.].
     Available at:
     http://www.theatlantic.com/unbound/flashbks/computer/bushf.htm

-=-=-

See: http://www.burningchrome.com/~cdent/sliswarp/scope/index.cgi#3.3
for various comments on Bush and relating him to a larger discussion
of knoweldge development.

Also: http://www.browseup.com/, which claims to allow for the creation
of trails as Bush describes them.

It's both a blessing and a crime that Bush is considered the father of
the World Wide Web. The world wide web is a pale imitation of what
Bush imagined. In Bush's mind documents retained their independence
from the linking system within which they lived. Or more importantly,
links remained indepdendent from documents.

Many people feel we've reached the memex and thus have stopped putting
effort into fulfilling the dream. The memex has not been reached and
there is a great deal of work yet to be done. Ted Nelson has made some
progress but his personal marketing skills leave something to be
desired.


Back to the Index

20011030: Rosenfeld & Morville, Chapter 5: Labeling systems...

Contact:cdent@burningchrome.com

Rosenfeld, L. & Morville, P. (1998). Chapter 5: Labeling systems. In
    _Information architecture for the World Wide Web_ (p. 72-98).
    Beijing: O'Reilly.

Instructions to web designers on how to create labelling systems in
their web sites. Discussion of what labels are and the function they
perform. Reminders to keep labeling systems consistent (in several
dimensions) meaningful by remembering they are part of a system for
which conventions must be established. Conventions that are understood
by the audience, or quickly learned.

-=-=-

I can't decide about Rosenfeld & Morville: they are so earnest.
Underlying this chapter is a couple of librarian types giving the old
college try at convincing in a hurry non-librarian types that
controlled vocabularies have value outside the card catalog. It's a
valiant effort, but falls somehow flat. They do a valiant job of
describing categories without getting into the theory of
categorization, which would probably cause many to look askance.

The real issue, for me, is that articles like these do little to draw
the theoretical into the real world because they are so quickly dated.
Admittedly, R&M are trying to take something general and make it
specific for the domain of web architecture but what does this do for
us as students? If we need examples to make things real, use short
examples, not instructional manuals. If we are to learn, teach us the
principles. I look at web manuals and grow weary.


Back to the Index

20011030: Faceted classifications and thesuari

Contact:cdent@burningchrome.com

Faceted classifications and thesauri [Last modified: 1997]. Available
    at:
http://is.gseis.ucla.edu/impact/f95/Papers-projects/Papers/perles.html

In part a description of faceted classification and thesauri which
attempts to draw a relationship between the two and show how they can
be use post-coordinately and pre-coordinately, respectively, contrary
to their traditional roles. Thesauri can be seen as a process of facet
analysis and thesauri can be used in the process of creating
categories for facet analysis.

-=-=-

This was one of the most confusing descriptions of faceted
classification I've read in quite some time. In fact, if the labels
and proper names weren't there I don't know I would have been able to
identify it. This is a shame because faceted classification rocks (as
they say in the biz).

I can no longer read anything to do with thesauri or faceted
classification without thinking of Ted Nelson's latest brainchild:
zigzag:

    http://xanadu.com/zigzag/

In typical fashion Ted and crew have gone out of their way to obscure
the simple grace of zigzag by trying to explain what it can be used
for instead of what it is.

Zigzag is a representation system that allows for the easy creation of
multidimensional hierachies. Information objects are contained in
cells. Cells are arranged in dimensions. Any cell may be in many
dimensions. Dimensions may be traversed in a forward or reverse
direction. All the dimensions a cell is in may be traversed at any
given time. Dimensions may be cells.

So, for example, the set:

  abcdefghijklmnopqrstuvwxyz

could be 26 cells in a dimension (d1) that orders them as shown. The set

  chris

d2 is defined as d1(2,7,16,8,17). d3, d4 and d5 are similarly defined
to represent:

  a eats cow

d6 is defined as d3,d2,d4,d3,d5 to represent:

  a chris eats a cow

d7 is:

  a cow eats a chris

and is the same data arranged as:

  d3,d5,d4,d3,d2

(From a computer science standpoint, there is nothing particuarly
revoultionary about the base of Nelson's architecture: these are
simply doubly linked lists. What makes this interesting is the notion
of dimensions and the explcit inclusion of reuse and recursions. AND:
representation is completely separate from the data; everything is a
reference.)

Switching back to classification for a moment: zigzag could be a very
helpful tool for creating dynamic citation orders for faceted
classification systems. Documents are cells. Facets are dimensions.
Citations orders are dimensions of dimensions.


Back to the Index

20011030: Sanders, Introduction to data modeling concepts

Contact:cdent@burningchrome.com

Sanders, G.L. (1995). Introduction to data modeling concepts. In _Data
     modeling_ (p. 16-38). Danvers, Mass.: Boyd Frasier.

An introduction to the entity relationship (ER) method as a way of
modeling data prior to the creation of relational database tabels.
Describes the difference between and characteristics of entities and
attributes. Provides a clean explanation and demonstration of the
is-a, is-part-of, is-associated-with relationships that can exist
between entities and how these relationships to clarify structure in
data.

-=-=-

While this work is primarily designed for database administrators it
does provide a cogent introduction to the concepts used in ontologies,
class structures, and object oriented programming paradigms.

An anecdote: In version 5 of the Perl programming language object
oriented programming can be accomplished. It's not exactly elegant but
it does work. Objects are created from classes (entities). These
objects have attributes as both data and methods. Using a global
variable called @ISA it is possible to subclass perl classes to create
entities which have the attributes of the parent class plus additional
attributes. Prior to SLIS I pronounced (in my head) the global
variable as "I suh" not "is a".


Back to the Index

20011103: Dillon, Morris, User Acceptance of Information Technology

Contact:cdent@burningchrome.com


Dillon, A. & Morris, M. (1996). User acceptance of information
     technology: theories and models. _Annual Review of Information
     Science and Technology_ (p 3-32). Medford NJ: Information Today, Inc.

Overview of the primary theories involved in the acceptance of IT:
Innovation Diffusion Theory (ID), Theory of Reasoned Action,
Technology Acceptance Model, Theory of Planned Behavior, and
Socio-Technical Systems Theory.   

-=-=-

Why hasn't faceted classification (FC) caught on outside of limited domains?

ID has a possible explanation. Theory has 5 characteristics: relative
advantage, compatibility, complexity, trialability and observability.
In each of these areas FC has missed the mark.  

Relative advantage is how much we can gain from the new system
compared to the existing systems. Frequently this is very dependent on 
the other characteristics. In large domains FC hasn't caught on
because the systematic changes required to make it go and to learn it
are quite large. 

Trialability is the ability to try something before you fully commit
to it. This is difficult with faceted systems for large domains
because the value of the faceting does not shine until a significant
portion of the domain has been classified. It is true that, as Jacob
suggests, LCSH could be used to begin a classification system, but who
is going to do that work and why would they if the relative advantage
can't be proven.

Observability is the degree to which the advantage of the change can
be seen. Unforunately, with many technological or idea-based
innovations the value of the change can only be seen over the long
term. The long term cannot be viewed by most until after something is
implemented. Visionaries who can see over the longer term without
implementation have trouble convincing the entrenched.

Facet classification has the appearance, on the surface of being very
complex. It has proven not as easy to understand as hierachical or
enumerative classification. This is odd because it seems that human
thought is probably more like FC than it is like hierarchical class
systems. 

One area where FC does win is that it can be compatible with existing
systems because the citation order may be adjusted on the fly. A
representation of an FC system as some other system would be possible,
if the resources are tagged appropriately. However, again, without an
implementation to observe and try this is easy to resist.

Basically, it's a big change, a shift in paradigms, and those sorts of
things require a demonstration that causes a radical adjustment in
people's conceptual understandings. 

Back to the Index