Posts Tagged ‘linked-data’

Linked, open, semantic?

Wednesday, November 11th, 2009

During an interesting session called the ‘Great Global Graph’ at the CETIS conference this week I formed the opinion that, in the recent rush of enthusiasm for ‘linked data’, three ‘memes’ were being conflated. These next three bullets outline my understanding of how these terms have been used in recent discussions, including the CETIS session:

  • Open data: I see this as something expressed as a philosophy or, in more concrete terms, as a policy, such as that espoused by the UK Government. There are aspects of public ownership in this, but also a philosophical approach based on ‘openness’ and a rejection of the economic idea of value in scarcity of information. I think that specific technology does not come into this really: for example one concrete realisation of this policy in the UK is the Freedom of Information Act under which it is perfectly permissible for a data owner to supply data in any reasonable format and medium. Essentially, I generally take ‘open’ to mean accessible to all, notwithstanding conditions of use.
  • Linked data: This one is trickier, as the term is used in quite a precise way by some proponents, based on the principles of linked data form the W3C. There are others who prefer a looser definition. There have been some well-reshearsed arguments about this, which generally come down to whether or not RDF is a pre-requisite of linked data. I’ve become inclined to use the term in its more precisely defined sense, in recognition of the efforts going on in this space.
  • Semantic Web: This term introduces ’semantics’ into the mix, by layering on ontologies allowing inferences to be made from the data itself.

It seems that these terms are often used together in the same discussions, and I suspect I could benefit from some separation of concerns in some of these discussions. It seems to me that the following are true:

  1. data can be open, while not being linked
  2. data can be linked, while not being open
  3. data which is both open and linked is increasingly viable
  4. the Semantic Web can only function with data which is both open and linked

Option 1 satisfies, in part at least, the drive to make available to the public data which has been paid for by the public and which might be useful to it. There are those (and I count myself among them) who generally believe that at present, for example, it would be better to quickly make the data open in some useable form than to delay this unduly while it is processed into RDF. However, there is a reasonable case to be made for not polluting information spaces with poorly prepared datasets.

Option 2 is an approach for organisations which want to take a more resource-oriented approach to managing and exploiting internal information assets. In the CETIS session an interesting idea was floated around how such an approach might go a long way to helping organisations address data-quality issues.

Option 3 seems increasingly viable. There is value in the ‘linked’ aspect, regardless of whether or not semantic layers are introduced. This is how the Web works after all, and much of the impetus behind Web 2.0 seems, to me, to have come from a healthy mixture of addressable and accessible information and human-mediated convention (e.g. ‘hackable URLs). Perhaps this is the ‘Great Global Graph’ and it’s just a matter of scale?

I’m very open to comment and argument on any of this. Perhaps I’m worrying unduly about these things being mixed up, but I do sense that this space could benefit from some clarity to match the excitement and endeavour.

No data here – just Linked Concepts

Tuesday, July 21st, 2009

Over the years I’ve found the ‘Semantic Web‘ to be an interesting though, at times, faintly worrying concept. It has never much impacted on my work directly, despite my having been embroiled in Web development since, well pretty much, Web development began. Of late I’ve tried to follow the earnest discussions about how the Semantic Web went all wrong because it was hijacked by the AI enthusiasts, and how it is going to be alright now because a more pragmatic paradigm has gained the upper-hand, that of Linked Data.

This post is my tuppence worth provoked by an interesting debate on Twitter recently which was kicked off by Andy Powell who has just blogged about it. It’s worth reading Andy’s post to get the details of this, but in essence, Andy asked if there was a term we could use for Linked Data where the RDF part is not required. This provoked a distributed argument between those who believe that the RDF model is integral to Linked Data, those who believe it shouldn’t be, and those who Don’t Really Care To Be Honest.

I found myself generally in agreement with Paul Miller who made the point:

Despite this undoubted progress, the green shoots of a Linked Data ecology remain delicate. By moving from a message that stresses the value of unambiguous and web-addressable naming (HTTP URIs), providing ‘useful information,’ and enabling people to ‘discover more things’ by linking toward a message that elevates one of the best mechanisms (RDF) for achieving this to become the only permissible approach, we do the broader aims great harm.

It seems to me that there has been progress over the years which a zealous insistence on RDF could jeopardise. I had thought about joining in and blogging about this, and then came across this comment from Dan Brickley via Rob Styles, which pretty much said it all I thought. He finishes with:

But we needn’t panic if people put non-RDF data up online…. it’s still better than nothing. And as the LOD scene has shown, it can often easily be processed and republished by others. People worry too much! :)

Quite.

But then I read Andy’s post, in which he links to various people including Ian Davis in the Linked Data Brand. Right up front, Ian states:

This is not a technical issue and its not one of zealots or pragmatists: its a marketing and branding issue.

The term Linked Data was coined to brand a specific class of practices: namely assigning HTTP URIs to abitrary things and making those URIs respond with RDF relating the things to other things.

Here very few of the ‘things’ are documents, instead they are people, places, objects and concepts.

That deliberately excludes many other practices of publishing data on the web such as atom feeds, spreadsheets, APIs and even many existing RDF use cases.

Ah – so, It’s the label which is important, because it denotes an important movement, led by Tim Berners Lee himself. Interestingly, it’s concerned with a very small part of the general concern of making data available on the Web – actually it’s not even about data per se – it’s about linking concepts.

Ian goes on to say:

The Semantic Web community has been notorious for its poor marketing over the past decade. Now just when it seems the community has found the right balance between technology and mass appeal it feels like people are trying to rip away that success for their own purposes. That is deliberately emotive language because brands are all about emotion.

I have spent much of my career linking data on the Web, linking eLearning systems to Library OPACs for example. I have occasionally used RDF in the past and am working with it again now. I have used many other technologies. In the last few years I have seen the dawning of an understanding on the part of the mainstream of Web developers and users that this kind of thing might be useful and worth investing some time and effort in. I would argue that the most significant advance in linking data in recent years has been in the wide-spread adoption of cottage-industry XML formats in Web 2.0 mashups. I don’t think people are trying to appropriate the brand, so much as resisting the idea that a term as generic sounding as ‘Linked Data’ could be owned by what is, in the scheme of things, a small group.

So if I decided to use ‘Linked Data’ to describe linking data in general – it certainly wouldn’t be because I was jumping on a band-wagon – I think that the wheels came of that particular band-wagon years ago.

So that leaves us back at Andy’s question. I’m happy to avoid winding up the Linked Data people by ‘appropriating’ their term but, then, what do I call it when I link data on the Web and I don’t check Sir Tim’s design issues first? Personally, I like ‘Web of Data’. I’ve blogged about this before, but I still believe that this slide from Tom Coates’s Native to a Web of Data presentation (which I suggested to Andy as part of the answer to his original question) sums it up best – I’ve had a print-out of that particular slide stuck up on my office wall for about three years.