Contact:cdent@burningchrome.com
Buckland, M. (1999). Vocabulary as a central concept in library and information science. In T. Arpanac et al. (Eds.), Digital libraries: interdisciplinary concepts, challenges, and opportunities. Proceedings of the Third International Conference on Conceptions of Library and Information Science [CoLIS3] 23-26 May 1999, Dubrovnik, Croatia, (p 3-12), Zagreb: Lokve. Available at http://www.sims.berkeley.edu/~buckland/colisvoc.htm -=-=- Yes it's preprint, but Wow, the typos! I usually expect more from Buckland. The content disappoints as well. There is this ongoing fetishizing of classificatory subject heading systems. Buckland's examples, as he himself states, are obvious and even hackneyed. They allow him to make his point but why belabor it so? Syndetic structures are valuable but expensive to create and subject to the same limitations of the primary controlled vocabulary. I propose instead systems which provide free text indexing of the primary vocabulary with the option to include, choose and modify queries created out of a collection of thesauri (in the Roget sense, not the older IS sense). This will allow the searcher to take advantage of the dynamic nature of language (about which Buckland seems to complain: he discusses the enormous capacity of human speech to determine meaning through interaction and then wants to minimize interaction in searching). Beyond that, I agree--must agree--that vocabulary is central to IS. Vocabulary is central to categorization. Categorization underlies IS. Back to the Index
Contact:cdent@burningchrome.com
Batty, D. (1998). WWW -- Wealth, Weariness or Waste: Controlled vocabulary and thesauri in support of online information access. D-Lib Magazine, November 1998. Available at: http://webdoc.sub.gwdg.de/edoc/aw/d-lib/dlib/november98/11batty.html -=-=- How about this idea: web interface takes terms takes online thesaurus selection (e.g. Wordnet) take distance value (depth of traverse in thesaurus) generates queries to google based on the logic described in article this is somewhat like what altavista used to do with their queries on that that java app. Would this increase precision at all or just raise recall? At the momemt recall is generally pretty high but people make short queries because long queries sometime ruin both recall and precision. -=-=- CDB Enterprises' decision to construct a dual interface to the Washington Post articles is, to me, an excellent solution. In a situation of that sort (article archive) if there was only one option, I would choose whole text indexing. Best would be both whole text indexing and a system of tagging articles with terms from a controlled vocabulary that creates an index. -=-=- See also http://www.burningchrome.com/~cdent/slis/l505/papers/slisessay12.htm for a (not fully formed) discussion of dynamic hierarchy systems. That is, delaying the creation of hierarchy until it is needed by the user. Back to the Index
Contact:cdent@burningchrome.com
Bush, V. (1996/1945). As we may think. Interactions, 3(2), 35-46. Originally published in _Atlantic Monthly, 176 (1), 101-108.]. Available at: http://www.theatlantic.com/unbound/flashbks/computer/bushf.htm -=-=- See: http://www.burningchrome.com/~cdent/sliswarp/scope/index.cgi#3.3 for various comments on Bush and relating him to a larger discussion of knoweldge development. Also: http://www.browseup.com/, which claims to allow for the creation of trails as Bush describes them. It's both a blessing and a crime that Bush is considered the father of the World Wide Web. The world wide web is a pale imitation of what Bush imagined. In Bush's mind documents retained their independence from the linking system within which they lived. Or more importantly, links remained indepdendent from documents. Many people feel we've reached the memex and thus have stopped putting effort into fulfilling the dream. The memex has not been reached and there is a great deal of work yet to be done. Ted Nelson has made some progress but his personal marketing skills leave something to be desired. Back to the Index
Contact:cdent@burningchrome.com
Rosenfeld, L. & Morville, P. (1998). Chapter 5: Labeling systems. In _Information architecture for the World Wide Web_ (p. 72-98). Beijing: O'Reilly. Instructions to web designers on how to create labelling systems in their web sites. Discussion of what labels are and the function they perform. Reminders to keep labeling systems consistent (in several dimensions) meaningful by remembering they are part of a system for which conventions must be established. Conventions that are understood by the audience, or quickly learned. -=-=- I can't decide about Rosenfeld & Morville: they are so earnest. Underlying this chapter is a couple of librarian types giving the old college try at convincing in a hurry non-librarian types that controlled vocabularies have value outside the card catalog. It's a valiant effort, but falls somehow flat. They do a valiant job of describing categories without getting into the theory of categorization, which would probably cause many to look askance. The real issue, for me, is that articles like these do little to draw the theoretical into the real world because they are so quickly dated. Admittedly, R&M are trying to take something general and make it specific for the domain of web architecture but what does this do for us as students? If we need examples to make things real, use short examples, not instructional manuals. If we are to learn, teach us the principles. I look at web manuals and grow weary. Back to the Index
Contact:cdent@burningchrome.com
Faceted classifications and thesauri [Last modified: 1997]. Available at: http://is.gseis.ucla.edu/impact/f95/Papers-projects/Papers/perles.html In part a description of faceted classification and thesauri which attempts to draw a relationship between the two and show how they can be use post-coordinately and pre-coordinately, respectively, contrary to their traditional roles. Thesauri can be seen as a process of facet analysis and thesauri can be used in the process of creating categories for facet analysis. -=-=- This was one of the most confusing descriptions of faceted classification I've read in quite some time. In fact, if the labels and proper names weren't there I don't know I would have been able to identify it. This is a shame because faceted classification rocks (as they say in the biz). I can no longer read anything to do with thesauri or faceted classification without thinking of Ted Nelson's latest brainchild: zigzag: http://xanadu.com/zigzag/ In typical fashion Ted and crew have gone out of their way to obscure the simple grace of zigzag by trying to explain what it can be used for instead of what it is. Zigzag is a representation system that allows for the easy creation of multidimensional hierachies. Information objects are contained in cells. Cells are arranged in dimensions. Any cell may be in many dimensions. Dimensions may be traversed in a forward or reverse direction. All the dimensions a cell is in may be traversed at any given time. Dimensions may be cells. So, for example, the set: abcdefghijklmnopqrstuvwxyz could be 26 cells in a dimension (d1) that orders them as shown. The set chris d2 is defined as d1(2,7,16,8,17). d3, d4 and d5 are similarly defined to represent: a eats cow d6 is defined as d3,d2,d4,d3,d5 to represent: a chris eats a cow d7 is: a cow eats a chris and is the same data arranged as: d3,d5,d4,d3,d2 (From a computer science standpoint, there is nothing particuarly revoultionary about the base of Nelson's architecture: these are simply doubly linked lists. What makes this interesting is the notion of dimensions and the explcit inclusion of reuse and recursions. AND: representation is completely separate from the data; everything is a reference.) Switching back to classification for a moment: zigzag could be a very helpful tool for creating dynamic citation orders for faceted classification systems. Documents are cells. Facets are dimensions. Citations orders are dimensions of dimensions. Back to the Index
Contact:cdent@burningchrome.com
Sanders, G.L. (1995). Introduction to data modeling concepts. In _Data modeling_ (p. 16-38). Danvers, Mass.: Boyd Frasier. An introduction to the entity relationship (ER) method as a way of modeling data prior to the creation of relational database tabels. Describes the difference between and characteristics of entities and attributes. Provides a clean explanation and demonstration of the is-a, is-part-of, is-associated-with relationships that can exist between entities and how these relationships to clarify structure in data. -=-=- While this work is primarily designed for database administrators it does provide a cogent introduction to the concepts used in ontologies, class structures, and object oriented programming paradigms. An anecdote: In version 5 of the Perl programming language object oriented programming can be accomplished. It's not exactly elegant but it does work. Objects are created from classes (entities). These objects have attributes as both data and methods. Using a global variable called @ISA it is possible to subclass perl classes to create entities which have the attributes of the parent class plus additional attributes. Prior to SLIS I pronounced (in my head) the global variable as "I suh" not "is a". Back to the Index
Contact:cdent@burningchrome.com
Dillon, A. & Morris, M. (1996). User acceptance of information technology: theories and models. _Annual Review of Information Science and Technology_ (p 3-32). Medford NJ: Information Today, Inc. Overview of the primary theories involved in the acceptance of IT: Innovation Diffusion Theory (ID), Theory of Reasoned Action, Technology Acceptance Model, Theory of Planned Behavior, and Socio-Technical Systems Theory. -=-=- Why hasn't faceted classification (FC) caught on outside of limited domains? ID has a possible explanation. Theory has 5 characteristics: relative advantage, compatibility, complexity, trialability and observability. In each of these areas FC has missed the mark. Relative advantage is how much we can gain from the new system compared to the existing systems. Frequently this is very dependent on the other characteristics. In large domains FC hasn't caught on because the systematic changes required to make it go and to learn it are quite large. Trialability is the ability to try something before you fully commit to it. This is difficult with faceted systems for large domains because the value of the faceting does not shine until a significant portion of the domain has been classified. It is true that, as Jacob suggests, LCSH could be used to begin a classification system, but who is going to do that work and why would they if the relative advantage can't be proven. Observability is the degree to which the advantage of the change can be seen. Unforunately, with many technological or idea-based innovations the value of the change can only be seen over the long term. The long term cannot be viewed by most until after something is implemented. Visionaries who can see over the longer term without implementation have trouble convincing the entrenched. Facet classification has the appearance, on the surface of being very complex. It has proven not as easy to understand as hierachical or enumerative classification. This is odd because it seems that human thought is probably more like FC than it is like hierarchical class systems. One area where FC does win is that it can be compatible with existing systems because the citation order may be adjusted on the fly. A representation of an FC system as some other system would be possible, if the resources are tagged appropriately. However, again, without an implementation to observe and try this is easy to resist. Basically, it's a big change, a shift in paradigms, and those sorts of things require a demonstration that causes a radical adjustment in people's conceptual understandings. Back to the Index