Linking Data

This post is adapted from a post on my work site.

The world wide web has provided the ability for anyone to make a website and link it to any other. This openness, and the ground-up approach of making the sites and links, has led to the mass of interconnected web pages we see today. It is one of the strengths of the web. However, as more and more data makes its way online, computers are finding it difficult to fully understand the connections between different data sets. That is where Tim Berners-Lee's idea of Linked Data comes in. Linked Data is a way of describing and storing data on the web so that others (people or computers) can see how data on web page A is/are connected (concept-wise) to data on web page B.

I like the idea but for a while I’ve been hitting a problem. I couldn't work out how to actually go about creating Linked Data or using it. Most of the websites about Linked Data are very technical and launch into talk of specifications, schemas and ontologies at the slightest provocation. That sort of stuff scares me as it isn't usually written for people that don't already understand it and often seems to lead to an endless chain of documents to read. If, after two or three hours of reading technical documents, I still don't know how to do something basic, I tend to go find something else to do.

That is where I was with Linked Data a couple of weeks ago; nice idea but not a clue how to get started. Then, via a conversation with Doug Burke, I noticed that schools in England and Wales have been included in the UK government’s first foray into Linked Data (Scotland and Northern Ireland have separate education systems and aren’t included yet). As my new job at LCOGT has many school users in the UK (through the Faulkes Telescope Project), that seemed like a good place for me to finally get my feet wet. provides web addresses (URIs) - for each school. At a page for a specific school (e.g. Clifton High School) data about that school can be seen and, importantly, understood by special software. To get started I had to find the URI for each school that was in the LCOGT database. This involved learning some SPARQL (apparently similar to SQL or MYSQL) so that I could search their school database. It turned out that our own data quality wasn’t great with some schools being listed with slightly different names, numbers or postcodes compared to those in the government database. However, after a bit of manual effort, we got URIs for 684 schools. That meant we could start doing some interesting things.

The first thing I did was to download the longitude and latitude of every school that we had a Linked Data address for. I then gridded these and made a heat map (the redder an area, the more schools are in that bit of the country) for English and Welsh schools. The result looks fairly similar to a map showing population density so the good news is that the Faulkes Telescope Project doesn't appear to have much bias in which parts of England and Wales register.

That shows the start of what is possible once data get linked. Of course, at this point we were just consuming Linked Data and I thought I should help create some. So we added some Linked Data within the web pages for observations and users. Although not visible to a person viewing the web page it does show up in special software.

Last week I started to experimenting with sharing data properly through a special Linked Data file type known as RDF. I’m still not entirely sure of the best way to put information into RDF yet but I’m creating examples of how it might look and hoping some Linked Data experts might be able to give me some pointers (i.e. corrections rather than links to yet more documentation).

These are just the first baby steps towards making Linked Data at LCOGT. I've already started wondering what we could do if the Simbad or ADS databases provided Linked Data.

Posted in astro blog by Stuart on Monday 29th Nov 2010 (15:54 GMT) | Permalink
[an error occurred while processing this directive]
[an error occurred while processing this directive]