Astronomy Metadata and Microformats

As I described in my last post, the efforts to tag astronomical images with useful information is well under way. However, the current plans are only to tag images. That ignores the huge amounts of astronomical content delivered asvideo, audio or prose. Suppose you wanted to find episodes of your favourite podcasts that contained a particular type of astronomicalobject, video clips of the Andromeda Galaxy, or blog posts describingobjects near to 05:34:32 +22:00:52. None of those would be tagged as they aren't static images.

What can be done to tag these other types of data? It may be possible to adapt ID3 tags in MP3s - usually used to store the artist, the track name etc - to include some of the AVM standard (PDF). Some of the image metadata is not appropriate for audio but some extra metadata (e.g. start time and stop time within the track) would also be required. It should be possible to add tags to videos in similar ways and that will also call for some video specific tagging. One issue may be the huge file sizes of audio and video. Luckily, I think most metadata tag information, in modern media files, is stored at the start of the file (it is for ID3v2 tags) so it isn't necessary to download the entire file to catalogue it. I still think that to start transferring these huge media files seems slightly inefficient and this is where I think a more lightweight approach could be useful. So, I suggest that we create an AVM microformat.

Microformats are a way to use existing code such as HTML and CSS (also known as POSH to people who like making up new acronyms) to add metadata to webpages. Microformats are great because they can be applied to existing webpages without changing how they look and feel but allow computers to extract the metadata. One example of a microformat in the wild is the vCard (or hCard). This is a microformat to describe people, companies and organisations. This is already used on sites such as Twitter to mark up the little icons of people who are followed by users.

Grabbing data from microformats is quite straightforward too. SixApart has details of an hCard Perl library for parsing hCards from webpages and this type of thing could be adapted. I have created a page with my proposal for an AVM microformat (implementing tags which were not image specific) and it would be great to have feedback. Perhaps you have suggestions for improvements, perhaps you think it is a total waste of my time and effort. Please post your responses in the comments to this post.

Posted in astro blog by Stuart on Wednesday 27th Aug 2008 (15:06 BST) | Permalink
[an error occurred while processing this directive]
[an error occurred while processing this directive]