After reading the chapter on taxonomy governance in Microsoft Office SharePoint Server 2007 Best Practices by Ben Curry and Bill English, I started thinking about my own personal taxonomy.
Taxonomy is a division into groups or categories. A good example is a Library’s Dewey Decimal Classification. Tagging has become a popular means of classification in blogging and photo sharing communities. But what about the documents on my computer?
Most people keep their documents in folders. They will have a folder for each client, a folder for each class at school. Some people keep a hierarchy. An English paper is under Education -> CSUSB -> 2001 -> English -> 301. This hierarchy is how I stored my files and what also got me in trouble when I realized that files belonged in multiple places because it makes sense for them to be in several areas: If my brother wrote a paper on Conditional Election, should I file it under Family -> Essays -> Jon, or Religion -> Christianity -> Essays, or Religion -> Essays -> Christianity. You can see that Essays doesn’t belong under Family or Christianity and the hierarchy (in this case) is meaningless.
Metadata filesystem tagging is supposed to solve this problem.2 Instead of placing the files in folders, the theory of metadata correctly realizes that we aren’t storing hierarchical information, but descriptive data. We’re trying to describe the contents of a file. So I could tag the paper, “Essay, Christianity, Religion, Family. This does two things: 1) It doesn’t matter how many “tags” are given to an object (or document). The object is not duplicated. And 2) categorically searching for files is easy. This is how I’ve been organizing files the last few years.
This is a great theory, but in practice, it has failed me. First, in order for filesystem metadata to work, one has to be disciplined to do it to every file. This takes time. Lots of time. Second, there are quite a few flaws in the implementation: The first I blame on Apple’s implementation because the metadata isn’t stored in the objects themselves so most backup solutions don’t back up the metadata. The second problem is tag creep. I have too many tags and I forget which ones I’ve used so I have a “money” and a “finance” tag, a “car” and an “auto” tag. If I had spent the time to develop a taxonomy, this wouldn’t be an issue, but I didn’t. The thought didn’t cross my mind. Now I have a mess of inconsistent tags, half my files aren’t tagged because I don’t have time to tag them, and I’m not motivated to do so because I know when my hard drive crashes and I have to restore from backups I’ve lost all my metadata and I would have to start over.
So what do I do now? Well, now I just keep everything in the documents folder, and store any “tags” in the filename and have a workflow that pre-pends the date to the filename. I have some high level folders that pertain more to how I got the file (or the content type) than the logical content. For example, if I scan a statement from my Schwab account it would go under the Documents -> Scan folder as “20080928 schwab statement”. I know it’s a mess but if I have to restore from backup I’ve retained the metadata in the filename.
I think tagging would be more maintainable if I developed a personal high level taxonomy (sort of a micro level taxonomy governance) instead of allowing arbitrary tag names. To see what I mean, this is how I’ve been classifying files (low level): Christianity, Automobile, Schwab, Lisp, Car, Essay, Recipe. But this is how I should classify files (high level): I would create a list of (no more than 10) high level tags like, “Religion, Transportation, Finance, Knowledge” that would have enough foresight to cover all topics and areas in the future (much like Dewey).
But who has time to do that?
I don’t. So my files are a mess, and they will stay that way. But the important thing is I know how they should be organized.