Ontology is Overrated follow-up

Peter has written a great critique of Ontology is Overrated, covering many of the technical issues that I didn't cover (but said I would, eventually) when I posted about it a few months ago.

So I went back to my original notes and pulled together a few of the pieces I left out. They're a bit rough, but they complement Peter's piece. (Clay's quotes here are from the Etech version of the talk.)

Even Old New York Was Once New Amsterdam

"This is a book about Dresden and it goes in the category East Germany. And Stewart this morning said countries are these kinds of uncomplicated categories. Completely wrong. East Germany turned out to be an unstable category. Cities are real... they are real physical facts. Countries are social fictions. It is easy for a country to evaporate. It is hard for a city to evaporate."

I was born in a city called Port Arthur in August 1969. Four months and half months later, it ceased to exist. It was amalgamated into a new city called Thunder Bay. And yet, there it is on my birth certificate. There are books about its history, old newspapers, and other vestiges of its existence. And the fact that it no longer exists as a city doesn't change the fact that as a category it still has value.

And so it is for East Germany--less robust that it used to be, but still a meaningful category. In fact, according to Wikipedia "To this day, there remain many differences between the former East Germany and West Germany (e.g. in lifestyle, wealth, political beliefs and other matters) and thus it is still common to speak of eastern and western Germany distinctly."

In Bowker and Star's Sorting Things Out there's a section on the International Classification of Diseases (ICD) and how it has evolved. Smallpox, since it has been nearly eradicated, is no longer an active category. AIDS has now entered the ICD. ICD has nothing to do with shelving books, but it is essential for calculating and comparing mortality and morbidity rates.

Even the ICD, a medical classificiation system which ought to be based on mostly objective criteria, is rich with historical and cultural biases. Cholera, for example, is the first entry because it was the primary concern of the people who worked on the first version of the ICD in the 1850s. The official ICD history (PDF) reveals many more.

The point is this: the world changes, and classification systems change with them. Anyone who engages in information architecture understands this. We can't predict this future, but we can make our best judgements about what the future will bring. And, not surprisingly, tagging and folksonomies are perhaps more vulnerable to these changes than other systems. Indeed, only things that are quite simple or continually tagged would be immune from "classification rot."

The LC

Clay goes on a long and, I think, spurious jag about the Library of Congress classication scheme. He sums it up with this:

"The real goal of the categorization schemes for things like libraries is to optimize the physical storage, not the intellectual aspect. It isn't ideas that have to be one place mainly. Ideas can be all over the place. It is a book that has to be one place. And so the assertion that what libraries and librarians are doing now is essentially an uncomplicated extension of this work into the electronic domain underestimates the degree to which what they've been doing previously is managing an entirely different problem which looks in first order like it's the organization of ideas but actually turns out to be the organization of the physical objects that contain the ideas. We've confused the container for the thing contained."

The best response I've read to this point (better than mine, certainly) was posted to the SIGIA mailing list by James Kalbach (emphasis mine):

Traditional libraries need a system to locate physical objects because books ARE physical objects. Unless someone can find a way to shelve a book in to two physical locations at once, that criticism is pretty much moot. What Mr Shirky is completely overlooking is the primary means that users locate these books: the surrogate record, (aka the catalogue).

In pre-computer libraries, even the card catalogue had multiple records for each physical item, in some cases a dozen or so access points to a single object. The most common access points being author, title and subject. Variant names, however, have long since been accounted for even with paper card catalogues. And so have multiple subjects. The Book of 5 Rings could very well be found under “Business” AND “War” in a traditional, offline catalogue. Libraries have decoupled books from their physical location long before Mr Shirky was born.

With online public access catalogues (OPACs), the number of access points increases. Essentially all fields of a catalogue record are searchable. Summary texts may be searchable, as well as individual songs on CD, for instance. This is in addition to a subject hierarchy.

An alternative to a subject-oriented classification scheme might be running accession numbers. With this, libraries would simply give books shelf location numbers in the order that they were catalogued. But then, when the average library goer enters the stacks to retrieve a book, there would be no apparent order. The polar bear book would be right next to The Book of 5 Rings.

LCSH and Dewey allow users to browse books on the shelf, fostering serendipitous information discovery. Who hasn’t looked to the left and right of the book you were going for? I’ve found important research for my work that way and I know others do too.

People rarely use LCSH as an initial access point.

Don’t get me wrong – there are lot’s of really bad things about the way traditional libraries are organized. But Mr Shirky has picked the wrong target, and has made false claims and comparisons. He should have been focusing on the catalogue system or digital libraries in general or OPAC interfaces. Attacking a fairly established, fairly robust system that allows for in-stack subject browsing of physical things, such as LCSH, and then comparing that the web to is really a waste of energy.

Search vs. browse

"Browse versus search represents an incredibly radical shift in your trust in link infrastructure, and in the degree of power. Browse says the people making the ontology... have the power and they get to over-ride the users needs. And if the user wants something that hasn't been categorized in the way the ontologist said it should be, the user is out of luck. The search paradigm says the reverse. It says nobody gets to tell you in advance what you need. And at the moment you are looking for it, we will do our best to service it based on this link structure."

Search assumes, of course, that you are able to describe what you're looking for, but that's not always the case. Browse gives you a set of cues about what's available, and allows you to explore. Search gives you a big empty box to fill in. Browse is learnable, while search is, at best, guessable. Browse is transparent--you can observe how the links are organized. Search is opaque; the algorithms that determine your results are corporate secrets. Oh, we know they're based on popularity--but anyone who feels good about that probably didn't go to high school.

I'm being flip, but the reality is that browse and search are almost always complementary. And classification systems provide valuable cues about the breadth and kind of information, its shape and genre, its credibility and the weltanschauung of the system's authors.

Populus invenio scatum

One of the big problems is that Clay seems to think classification is about "is-ness." This is why he use that snooty word "ontology" all the time, and holds up the periodic table as the gold standard of categorization.

But classification is really about something much simpler and more practical: people finding shit.

In that context, it's hard to understand this religious fervour around tags. The litmus test for good classification is not "does it use tags?" but "does it help people find shit?" There's much more to it than that, but that's the simplest formulation.

Trackbacks

peterme.com / Aug 9, 2005
It seems my post has prompted Gene to dig up his technically-oriented notes on the problems with Shirky's thesis. Some excellent stuff in there.... ...from More from Gene on Shirky's "Overrated" »

 

About this Page

Posted by Gene Smith on Aug 8, 2005. Before this there was Folksonomies: Year One. Next up is CanUX speaker list.

About the Author

Gene Smith is a principal with nForm, one of Canada's leading user experience consulting firms. He writes about information architecture, interaction design, community, the web and other such topics. More >

Subscribe

Get the feed Get the RSS feed (full posts, no ads)

My Book

Recent Posts

Archives

Elsewhere

You can also find me on Flickr, Upcoming, LinkedIn, Del.icio.us and Digg.

Work

nForm User Experience

Endorsements

Hosting by Dreamhost.