April 13, 2008

I don't have daughters ...

I don’t have daughters, but if I did, I hope I would make this speech. Here is a taste, but go read the whole thing.

“And while we’re at it, how come a girl doesn’t get to blow up the Death Star! Or send ET home? Or defeat Captain Hook! Or Destroy the Ring of Power!”

Thanks for the pointer from Meg McCarron.

Posted by Walter Underwood (wunder@wunderwood.org) at 08:27 PM | Comments (0) | TrackBack

April 11, 2008

How much does metadata cost?

It is very hard to find numbers on what it really costs for metadata, but here is one from a Netflix job posting. $6 per movie for “original, descriptive movie and TV episode synopses.”

Here are links to a Hacking Netflix blog posting (likely to remain a valid URL) and to the Netflix job posting (guaranteed to succumb to link rot as soon as the opening is filled).

The only other published numbers I’ve found are similar, $6.20 to $14.67 per jazz CD depending on the detail in 2003 at the Public Library of Cincinnati and Hamilton County. They were given a collection of 6200 jazz records and were estimating what that gift would cost them. See the article How Much Will It Cost? Making Informed Policy Choices Using Cataloging Standards.

The Netflix numbers are probably closer for an ecommerce or search application. Still, the close agreement in the numbers makes it pretty safe to say “less than $10 per document”.

Remember that the metadata must be updated when the document changes. Maybe “$10 per document per year” is a better number. HP was spending about that much to manage the HP-UX spec (man pages) about ten years ago. That covered all activities, not just metadata.

The Netflix job posting is for six openings, each with a six week duration. That sounds like a lot of work, but if I assume each writer does three synopses per hour (seems very fast for finished work), that is still only 4300 movies. Metadata is very, very expensive.

I have a couple of other stories without dollars, but still instructive.

One publishing company needed to digitize their back content and planned to start a division in the Philippines with 3000 employees to get it done. They found a different way.

I was consulting with a telecom company, and the CEO asked for metadata on every page in their intranet. They had 4M documents.

One final note, since I work for Netflix. All of the Netflix info here is derived from the job posting. No insider information was required or is included in this post.

Posted by Walter Underwood (wunder@wunderwood.org) at 09:02 PM | Comments (0) | TrackBack

April 09, 2008

Trash Talk from Historians

Remind me not to get in a position where a historian can unload on me. The History News Network at George Mason University did an unscientific poll of historians rating the Bush presidency and Bush gets the sharp end of the pen.

In a similar poll four years ago, 81% classified his presidency as a failure. Now, 98% do so, with 61% rating it the worst in history. One of those who placed him in the bottom third thought that it was too early to work out his exact placement in the bottom five alongside Buchanan, Johnson, Fillmore, and Pierce. Another felt that only Buchanan was worse.

The comments are even more damning than the raw numbers. My favorite diss is the unnamed historian who observes that George W. combines the worst characteristics of other failed presidents—“the paranoia of Nixon, the ethics of Harding and the good sense of Herbert Hoover.” Ouch.

Posted by Walter Underwood (wunder@wunderwood.org) at 04:20 PM | Comments (0) | TrackBack