Ian Davis on why tagging Is expensive
I just posted a little thing called "Ian Davis on Why Tagging is Expensive" over at Tagsonomy. I've copied it here for posterity.
I sometimes feel my enthusiasm for tags flagging because I've started defending the middle ground (someone has to). Really, though, there are tons of cool things you can do with tag data--we've hardly scratched the surface.
Anyway, this is another attempt to convince people that the future of classification and retrieval is about more choices--and the interaction between those choices--not just tagging.
Last week Ian Davis wrote an interesting post on Why Tagging is Expensive:
On the surface tagging seems to offer a new paradigm of organising information, one that reduces the cost of entry and so enables a long tail of participation to emerge. I've come to realise that the cost isn't removed, instead it's displaced and possibly increased. Tagging bulldozes the cost of classification and piles it onto the price of discovery.
There's a saying I've heard once or twice (I wish I could attribute it): "The cost of metadata is in its application, but the value of metadata is in its use."
Not exactly something you'll be quoting at dinner parties, but it nicely captures the cost/benefit gaps of metadata.
The arguments against professional classification (including Clay's views on tagging) have almost always worked on the cost side of the equation. Automated indexing, search and now tagging are seen as ways drive down classification costs. But as Davis explains, classification costs are only one part of the system:
In my view the total cost of an information retrieval system is the cost of classification plus the cost of discovery. In the formal classification world you have a very small number of people incurring a high cost in order to reduce the costs incurred by a very large number of people. In contrast the tagging world has the unit costs reversed: it's cheap to classify, expensive to find. But the numbers of people involved are large in both cases so you end up with a lot of people paying a tiny cost to classify added to a lot of people paying a high price to discover. I think it's pretty likely that the total cost is going to end up much higher than in the classification scenario.
Here's an analogy. I visit a lot of thrift stores. The true cost of an item in a thrift store is a function of the time it takes me to find it, not the price (which is always cheap). A very large thrift store is more likely to have what I want, but at a greater discovery cost. Like del.icio.us, a thrift store is great for serendipitous discovery but not so good for known item retrieval. Put another way, del.icio.us wouldn't be your first choice if you needed articles on Rousseau and the French Revolution, just like the Sally Ann wouldn't be your first choice if you needed a smoking jacket, size 42T.
Where I think Davis might be wrong is suggesting that the discovery costs are shifted back to the user. In fact, the costs are shifted to search, blogs and other more efficient discovery tools. In large part this is because the domain of tagging systems has been the "big messy" web.
In that case, the "classicial" economics of information retrieval don't apply because there are often multiple ways of finding things. Or because Google can radically lower your discovery costs by selling keyword advertising to offset their infrastructure. Or because algorithms can do much of the heavy lifting. Or because users' expectations are for "just good enough" results. Or because users are not interesting in finding so much as tracking. And so on.
But I'd argue that once the domain is constrained--by subject, by context, by user population, by privacy/security, by business goals, or by those things in combination--the economic prinicples of classification and retrieval come back into play. Because other discovery tools are either not available or not optimal, poorly designed retrieval systems do shift the burden back to the user. (Karl Fast's thoughts on problems in the middle are worth a read here).
In that middle ground--and the "big messy" web contains probably millions of cases where local structure is valuable, not to mention information systems that aren't part of the "big messy" web--I think there's a large area where a mixture of emergent, algorithmic, formal and now social classification systems will make for optimal retreival.

