Tagging tags to make synonyms

I've had this post in draft for about two months, but it never seemed significant enough to finish up. And then Tagtagger came around. And then someone said Tagtagger could actually be useful. And then I thought, "oh I've done something like that."

Over the past six months we've been working on an internal tagging application for a client. One of the features of the application is a simple authority file of tags. This means that tags can be made synonymous, and that one tag is identified as the preferred term and others as alternate terms.

As you've probably guessed, we implemented this by... tagging tags.

The problem for people trying to find things using the tag set was that there were lots of minor variations in the tags. For example, here are four possible variations of Visual Basic that are syntactically different but semantically the same:

  • visual basic
  • Visual Basic
  • vb
  • visualbasic

So we recycled our tagging interface (which incorporates some ideas from Google Suggest and Flickr) to allow people to add synonymous tags. This is how it works:

Tagging interface detail

Once a connection has been made between two tags, someone who enters, say, "vb" will be assigned the preferred term "Visual Basic" instead. (Similarly, people who had entered "vb" are re-assigned "Visual Basic" after the synonymization has happened.) The alternate terms are then hidden.

There are some problems, of course, including the fact that we didn't deal with disambiguation (arguably just as important as synonyms). But there it is... tag tagging.

Comments

Fred says...

A good method, agreed. However, there's a huge assumption here - that people will tag. User-tags are extremely difficult to control because users have no proper motivation to tag any content if there is no self interest. There is self interest in publisher tags, though.

A better method (which is what I use in a current project) is to collect statistical data of tag usage and draw weighted values, mapping them to tag relationships (synonyms). With a rich enough system (I'm working with entries coming from > 10million sources), you can automatically generate synonyms and clusters. And you wont have the problem of having to count on user's good will (or proper usage of the system - like not spamming).

Posted on Nov 5, 2005
Gene says...

I would've loved to use a statistical method, but in this case both the tag set and user base was small. In fact, in this implementation the user base probably won't exceed 1,000 people.

Also, because it's an internal application, we don't have to rely entirely on self-interested tagging (i.e. there are other social forces at play to encourage people to do the right thing, mostly).

We do use a simple co-occurence algorithm to suggest recommended tags. But I agree that a statistical system would be great, and a lot more scalable than a person mapping synonymous relationships. As you probably guessed, scaling isn't a huge concern in this case. :)

Posted on Nov 6, 2005
threepointsomething says...

I guess there is one problem with this approach. Consider that a programmer wants to tag his/her links as VB to mean links related to "Visual Basic".

But for a sports person, VB could as well mean, VolleyBall.

Also the extent to which the tags may be considered synonymous is also important.

For example, a user could tag a tag called "post" with "posts", because he/she finds that the 2 might be used interchangeably. But this might not be the case with another user who might want to consider singular/plural tags separately to mean different things.

So making tags synonymous is not that simple. The context in which the tags are synonymous needs to be considered.

Pardon me if my interpretation was wrong.

Posted on Nov 7, 2005

No doubt, tagging tags is quite essential. But the form in which this will actually take place is not quite clear.

There are some problems with this approach.

Consider for example, a programmer wants to tag all "Visual Basic" links as VB. But for a sports person, VB could mean "Volley Ball".

Also, the extent to which tags can be considered as synonymous should be clarified.

A user for example, could tag a tag "post" with "posts" to make the 2 synonymous, because that is how he/she wants to interpret it. But another user might not want the same.

So making tags synonymous is not that simple. The context in which the tags are synonymous needs to be considered.

We should not only consider synonyms, but also hypernyms, hyponyms, holonym and meronyms of words.

Pardon me if my interpretation was wrong.

Posted on Nov 7, 2005
Gene says...

But for a sports person, VB could as well mean, VolleyBall.

We have two "bundles" for tags which help eliminate these kinds of context problems (I didn't really talk about these in the post but they are relevant to your example).

Also, because it's an internal application it focuses on the organization's business lines... so we're not dealing with all the potential ambiguity and confusion you find on the "big messy" web.

We should not only consider synonyms, but also hypernyms, hyponyms, holonym and meronyms of words.

Yes, of course. Synonyms are just the most obvious problem. Meronyms are interesting because they get at those part-whole relationships that tags don't handle well (and taxonomies do).

Posted on Nov 8, 2005
Babs says...

I agree that "for a sports person, VB could as well mean, VolleyBall." and might make things complicate for a large group of posters where there may be various sujects.

But if the application is used on a general one person weblog, this probably wouldn't be an issue. One of the major benefits I can see with tag tagging is that you'd stop having "duplicate" tags in a tag cloud where 'Fun' and 'fun' are two different things.

Interesting concept.

Posted on Nov 11, 2005
direwolff says...

Check out a product called Readware (http://www.readware.com/) for help in dealing w/disambiguation. They've done a wonderful job of developing a search and categorization technology that deals w/this. I could see leveraging that in addressing the disambiguation problem you cite at the end.

Posted on Nov 16, 2005
Eby says...

I'm uncertain if you've seen it but the "Tagged Object Environment" sounds very similar to this. It discusses using tags on objects and tags being objects that can be tagged. Objects can also be used as tags, etc. They have their TagCamp presentation online:

http://toetag.sourceforge.net/

Posted on Nov 27, 2005
Gene says...

Eby - I did see that presentation, but I didn't connect it to the work we were doing. In our case tags aren't objects, though... they're just tags. :)

Posted on Nov 27, 2005

Post a comment

Remember me?

Basic HTML is allowed.

 

About this Page

Posted by Gene Smith on Oct 31, 2005. Before this there was links for 2005-10-31. Next up is links for 2005-11-01.

About the Author

Gene Smith is a principal with nForm, one of Canada's leading user experience consulting firms. He writes about information architecture, interaction design, community, the web and other such topics. More >

Subscribe

Get the feed Get the RSS feed (full posts, no ads)

My Book

Recent Posts

Archives

Elsewhere

You can also find me on Flickr, Upcoming, LinkedIn, Del.icio.us and Digg.

Work

nForm User Experience

Endorsements

Hosting by Dreamhost.