Archive for the ‘Programmable Web’ Category

An infrastructure service anti-pattern

Monday, December 7th, 2009

Last week I outlined an idea, that of the service anti-pattern, as part of a presentation I gave to the Resource Discovery Taskforce (organised by JISC in partnership with RLUK). The idea seemed to really catch the interest of and resonate with several of those members of the taskforce who were present at the meeting. My presentation was in a style which does not translate well to being viewed in a standalone context (e.g. on Slideshare) so I have decided to write it up here. I would very much welcome comments on this. (The presentation will be published on the Resource Discovery Taskforce pages and I will ask for this post to be linked to from there when it does appear).

The following diagram is meant to represent a design ‘pattern’ which I have seen often proposed, and sometimes implemented, in the JISC Information Environment (IE) as well as in the wider higher education (HE) sector in general:

anti-pattern.gif

It is my belief that readers who have been involved with the IE for some time will recognise this, at least in a general sense, if not in specific cases. In this arrangement, an aggregation of data is presented to the end user, through the development of a user-facing application or service. The user-facing service will in almost all cases be a web-interface, somewhat similar to the ‘portal’ concept of old but in a centralised, single, global deployment. Because it is generally accepted to be desirable to make such data available to other services (in keeping with the larger goal of interoperability through open standards), one or more machine interfaces or so-called APIs, giving access to the ‘backend’ of the system, will be offered. What this design pattern aspires to is a service implemented to be both user-facing service and machine-facing infrastructure component.

However, I contend that this is, in fact, what software engineers might call an anti-pattern. An anti-pattern is a design approach which seems plausible and attractive but which has been shown, with practice to be non-optimal or even counter-productive. It’s a pattern because it keeps coming up, which means it’s worth recording and documenting as such. It’s anti, because, in practice, it’s best avoided….

There is much which is implicit in this pattern, so I will attempt to surface what I believe are some hidden assumptions in a new version of this diagram: this is what this design pattern, once implemented, reveals:
anti-pattern-extended.gif

In this second diagram, the orange colouring indicates the parts which actually get built and are supported; the yellow indicates the parts which might get built, but which won’t really be supported as a service – in a sense, this is stuff which is believed to work but actually doesn’t; in the case of the users, the yellow colouring indicates that their demand for this service is believed to exist; those components in the diagram which are neither orange, nor yellow, are the product of little more than speculation. In the end, the investment in creating a user-facing application based on an expectation of future demand which doesn’t materialise is wasted while, at the same time, the investment in providing unused machine interfaces is also wasted.

I believe that this design pattern rests on several assumptions which are actually fallacies, and is, therefore, an anti-pattern.

Fallacy 1: “Build it and they will come”:

While infrastructure services can, indeed should, be developed with future opportunity in mind, it is helpful to have an existing and real demand to satisfy, which the new development addresses. If the service is demonstrably useful to users, and is developed effectively with future opportunity in mind, then there is more chance of the service actually working, and of it being attractive to developers working on future opportunities.

Fallacy 2: Interoperability through additional machine interfaces:

Machine interfaces need as much specification, development, testing an maintenance as user-interfaces. Simply making a machine interface available through the adoption of a platform which has a built-in facility offering some standard interface is not enough. A system which proposes to offer three or four APIs is quite likely not going to support any of them adequately. I have argued before that ‘interoperability is not enough‘: in fact, this arrangement does not often lead to interoperability, let alone actual exploitation of the capability to interoperate.

Fallacy 3: People/organisations who can make good infrastructure are also going to be good at building end-user-facing services (and vice versa):

Effective infrastructure supports services which in turn support end-users. The skills and knowledge required to support service-providers are generally quite different from those needed to deliver good user-facing services.

I call this the infrastructure service anti-pattern because the result comes from conflated requirements to deliver both infrastructure (machine-to-machine interfaces) and compelling user-facing services and applications. The result can be something which satisfies neither requirement. The users, requirements and priorities are often completely different between these two problem spaces. I suggest that the following are some possible reasons for this anti-pattern appearing:

  • funding (naturally) tends to follow services, happy users and, importantly, new features.
  • funders like to see their investment showcased
  • infrastructure is mostly invisible making it hard to ascertain impact from users

Proposals for alternative design patterns

Here is a suggested alternative design-pattern:
better-pattern.gif

In this design pattern, the API is developed before any user-facing application, or at least in parallel. An application is developed to exploit this API based on real users requirements. No service is developed until such requirements can be identified. This means that an API will be developed, and it will be being used in at least one case. Opportunities for third party integration for usage of the service are, ideally, identified beforehand. The API is properly supported from the start, or else the service fails completely. The value proposition being offered for further, opportunistic third-party developments, whether real or imagined, is now real and, crucially, supported.

An interesting alternative to this is the approach of combining the user-facing web pages and the machine-actionable API into one interface, through embedded RDFa for example:
better-pattern2.gif

It remains to be seen how this approach is going to work out over time, but we have seen hints of simpler approaches to combining user and machine interfaces in the past, such as RSS being styled to give a decent human-readable interface, or earlier attempts to do interesting things with XHTML.

I wonder if readers agree that the first diagrams represent an anti-pattern which they recognise. And would the proposed alternatives fare any better?

Linked, open, semantic?

Wednesday, November 11th, 2009

During an interesting session called the ‘Great Global Graph’ at the CETIS conference this week I formed the opinion that, in the recent rush of enthusiasm for ‘linked data’, three ‘memes’ were being conflated. These next three bullets outline my understanding of how these terms have been used in recent discussions, including the CETIS session:

  • Open data: I see this as something expressed as a philosophy or, in more concrete terms, as a policy, such as that espoused by the UK Government. There are aspects of public ownership in this, but also a philosophical approach based on ‘openness’ and a rejection of the economic idea of value in scarcity of information. I think that specific technology does not come into this really: for example one concrete realisation of this policy in the UK is the Freedom of Information Act under which it is perfectly permissible for a data owner to supply data in any reasonable format and medium. Essentially, I generally take ‘open’ to mean accessible to all, notwithstanding conditions of use.
  • Linked data: This one is trickier, as the term is used in quite a precise way by some proponents, based on the principles of linked data form the W3C. There are others who prefer a looser definition. There have been some well-reshearsed arguments about this, which generally come down to whether or not RDF is a pre-requisite of linked data. I’ve become inclined to use the term in its more precisely defined sense, in recognition of the efforts going on in this space.
  • Semantic Web: This term introduces ’semantics’ into the mix, by layering on ontologies allowing inferences to be made from the data itself.

It seems that these terms are often used together in the same discussions, and I suspect I could benefit from some separation of concerns in some of these discussions. It seems to me that the following are true:

  1. data can be open, while not being linked
  2. data can be linked, while not being open
  3. data which is both open and linked is increasingly viable
  4. the Semantic Web can only function with data which is both open and linked

Option 1 satisfies, in part at least, the drive to make available to the public data which has been paid for by the public and which might be useful to it. There are those (and I count myself among them) who generally believe that at present, for example, it would be better to quickly make the data open in some useable form than to delay this unduly while it is processed into RDF. However, there is a reasonable case to be made for not polluting information spaces with poorly prepared datasets.

Option 2 is an approach for organisations which want to take a more resource-oriented approach to managing and exploiting internal information assets. In the CETIS session an interesting idea was floated around how such an approach might go a long way to helping organisations address data-quality issues.

Option 3 seems increasingly viable. There is value in the ‘linked’ aspect, regardless of whether or not semantic layers are introduced. This is how the Web works after all, and much of the impetus behind Web 2.0 seems, to me, to have come from a healthy mixture of addressable and accessible information and human-mediated convention (e.g. ‘hackable URLs). Perhaps this is the ‘Great Global Graph’ and it’s just a matter of scale?

I’m very open to comment and argument on any of this. Perhaps I’m worrying unduly about these things being mixed up, but I do sense that this space could benefit from some clarity to match the excitement and endeavour.

No data here – just Linked Concepts

Tuesday, July 21st, 2009

Over the years I’ve found the ‘Semantic Web‘ to be an interesting though, at times, faintly worrying concept. It has never much impacted on my work directly, despite my having been embroiled in Web development since, well pretty much, Web development began. Of late I’ve tried to follow the earnest discussions about how the Semantic Web went all wrong because it was hijacked by the AI enthusiasts, and how it is going to be alright now because a more pragmatic paradigm has gained the upper-hand, that of Linked Data.

This post is my tuppence worth provoked by an interesting debate on Twitter recently which was kicked off by Andy Powell who has just blogged about it. It’s worth reading Andy’s post to get the details of this, but in essence, Andy asked if there was a term we could use for Linked Data where the RDF part is not required. This provoked a distributed argument between those who believe that the RDF model is integral to Linked Data, those who believe it shouldn’t be, and those who Don’t Really Care To Be Honest.

I found myself generally in agreement with Paul Miller who made the point:

Despite this undoubted progress, the green shoots of a Linked Data ecology remain delicate. By moving from a message that stresses the value of unambiguous and web-addressable naming (HTTP URIs), providing ‘useful information,’ and enabling people to ‘discover more things’ by linking toward a message that elevates one of the best mechanisms (RDF) for achieving this to become the only permissible approach, we do the broader aims great harm.

It seems to me that there has been progress over the years which a zealous insistence on RDF could jeopardise. I had thought about joining in and blogging about this, and then came across this comment from Dan Brickley via Rob Styles, which pretty much said it all I thought. He finishes with:

But we needn’t panic if people put non-RDF data up online…. it’s still better than nothing. And as the LOD scene has shown, it can often easily be processed and republished by others. People worry too much! :)

Quite.

But then I read Andy’s post, in which he links to various people including Ian Davis in the Linked Data Brand. Right up front, Ian states:

This is not a technical issue and its not one of zealots or pragmatists: its a marketing and branding issue.

The term Linked Data was coined to brand a specific class of practices: namely assigning HTTP URIs to abitrary things and making those URIs respond with RDF relating the things to other things.

Here very few of the ‘things’ are documents, instead they are people, places, objects and concepts.

That deliberately excludes many other practices of publishing data on the web such as atom feeds, spreadsheets, APIs and even many existing RDF use cases.

Ah – so, It’s the label which is important, because it denotes an important movement, led by Tim Berners Lee himself. Interestingly, it’s concerned with a very small part of the general concern of making data available on the Web – actually it’s not even about data per se – it’s about linking concepts.

Ian goes on to say:

The Semantic Web community has been notorious for its poor marketing over the past decade. Now just when it seems the community has found the right balance between technology and mass appeal it feels like people are trying to rip away that success for their own purposes. That is deliberately emotive language because brands are all about emotion.

I have spent much of my career linking data on the Web, linking eLearning systems to Library OPACs for example. I have occasionally used RDF in the past and am working with it again now. I have used many other technologies. In the last few years I have seen the dawning of an understanding on the part of the mainstream of Web developers and users that this kind of thing might be useful and worth investing some time and effort in. I would argue that the most significant advance in linking data in recent years has been in the wide-spread adoption of cottage-industry XML formats in Web 2.0 mashups. I don’t think people are trying to appropriate the brand, so much as resisting the idea that a term as generic sounding as ‘Linked Data’ could be owned by what is, in the scheme of things, a small group.

So if I decided to use ‘Linked Data’ to describe linking data in general – it certainly wouldn’t be because I was jumping on a band-wagon – I think that the wheels came of that particular band-wagon years ago.

So that leaves us back at Andy’s question. I’m happy to avoid winding up the Linked Data people by ‘appropriating’ their term but, then, what do I call it when I link data on the Web and I don’t check Sir Tim’s design issues first? Personally, I like ‘Web of Data’. I’ve blogged about this before, but I still believe that this slide from Tom Coates’s Native to a Web of Data presentation (which I suggested to Andy as part of the answer to his original question) sums it up best – I’ve had a print-out of that particular slide stuck up on my office wall for about three years.

OpenID and name authority

Thursday, January 22nd, 2009

In his Science in the Open blog Cameron Neylon has written an interesting post, A Specialist OpenID Service to Provide Unique Researcher IDs? in which he asks:

Good citation practice lies at the core of good science. The value of research data is not so much in the data itself but its context, its connection with other data and ideas. How then is it that we have no way of citing a person?

Cameron suggests that OpenID might offer a solution to this.

I have been very interested in OpenID for some time. I like the relatively agile way in the which the standard has evolved. I like the fact that it has been responsive to the developer community. I agree with Andy Powell when he talks about the importance of the capacity for the delegation of the service providing your OpenID – I’ve maintained an OpenID for myself at http://paulwalk.net despite having changed the underlying OpenID identity provider service twice. However, I’ve become frustrated by the way in which OpenID has been deployed and couched almost entirely in terms of it’s potential to solve the often-exaggerated problem of users needing to maintain too many user accounts (although I confess that I have contributed to this). Personally I maintain a small handful of username/password combinations for accessing hundreds of web services – it’s a minor inconvenience. And as Mike Ellis pointed out in a great post, OpenID: fail:

In a technical sense, OpenID works. But from a usability perspective, it’s absolutely horrible.

I blogged about OpenID a while ago, saying:

I’ve thought for a while that the introduction of URIs for people was the often overlooked yet potentially most interesting aspect of OpenID. In a resource-oriented-architecture, it would seem plausible to suppose that a reliable pointer to a representation of a person would be a useful thing. But when I try to sketch out a useful application for this, I struggle….

The idea of using OpenID as an ‘author identifier’ in scholarly communications has occurred to me before too – specifically in the context of repositories. I agree it could play a part here. At one level this could be seen as an extension of the ongoing persistent identifier issue in the context of web-resources, being applied to people. However, as an OpenID is a URL, it is open to the same criticisms levelled against the use of URLs for papers in an institutional repository for instance (the delegation feature does mitigate this, albeit only slightly).

One aspect of OpenID, which I think might become relevant if OpenID reaches any kind of critical mass as a public identifier system will be the way in which a given OpenID could gain authority over time. The only thing you can trust about a newly minted OpenID is that you can interrogate the ‘user’ of the OpenID and verify that they are the agent which ‘controls’ or ‘owns’ it. However, an OpenID will rarely be surfaced without other metadata about the agent – there will be a context in which it is used. In a community of researchers for example, as a particular OpenID is used more and more by a researcher in various contexts and systems, a level of trust will build around the association of that OpenID with an actual person.

For a long while I thought that OpenID might be the answer to a problem arising out of the need for a different user-account in every system we use – not the bogus issue of needing to remember lots of passwords, but the fact that this creates an immediate obstacle to joining up those systems at the level of the user. This issue has become more visible with the systems underpinning social networks. I see all kinds of potential in being able to conclude that while I might not know the person identified here in this system, I can be sure that they are the same person in this other system, because they have the same OpenID. Of course there is all kinds of potential for abuse of such join-up, but I would still like to be able to control such arrangements myself.

Increasingly, I’m annoyed by my social-web activities being constrained unnecessarily by really prosaic limitations in the systems I use. As I said in another post back in September 2007:

Now, it’s certainly not unusual to maintain more than one, unconnected circle of contacts. Many people prefer to keep their professional and their social networks separate. But, and this is the important point, I really don’t want my social networks to be constrained by particular software choices. As I can connect resources across the web in a uniform way to form a network of resources, I want to be able to connect people to form my social network. Perhaps OpenID or something similar could provide the solution.

Imagine a Web where everything you did publicly was linked by the very fact that you were represented by a URL exactly like your blog post, or your photo on Flickr, or your post on Twitter, or your correction to that Wikipedia entry, or your research paper in your institutional repository for that matter…. think of the possibilities.

Library hackers FTW

Friday, November 28th, 2008

Yesterday I went along to Mashed Library UK 2008 in London. Quickly abbreviated to ‘mashlib’, the event was the brain-child of Owen Stephens. Owen did most of the organising, aided by David Flanders who provided the space at BirkBeck college, and our excellent events team at UKOLN. The event was sponsored by UKOLN, using funding from the JISC.

I thought the balance of activities on the day was excellent – a healthy mixture of short presentations, demonstrations and a good amount of hands-on hacking. The group was comprised of commercial vendors (Talis, ExLibris, OCLC), academic-library folk (the majority), a lone representative from the public library world (Paul Bevan for the National Library of Wales), and a few developers from various (mostly JISC-funded) services.

Rob Styles from Talis gave us a demo of the Talis Platform. There is an open API which you can play with – it’s quite impressive. I was very struck by some of the language Rob used in his demo – he talked about dipping, where a result-set from a query (in RSS 1.0 format) is “dipped into” another – with the original data-set accreting more infromation from the second. (Jim Downing and I had an interesting chat about this over lunch, with Jim proposing that we could visualise data-sets as molecules – having a certain shape which allows them to bond with other molecules which have a complementary shape). Rob also talked about mixing in in a smiler vein. The Talis Platform APIs appear to be quite RESTful, with a good deal of passing URLs around rather than result-sets. I plan to have a closer look at this.

Timm-Martin Siewert spoke next about the ExLibris Open Platform. I did get a URL for this but it takes me to a page whcih challenges me for a username and password which I do not have. The Open Platform is , apparently, open to paying customers only. Edward Corrado suggested via a tweet that:

I think they mean open in the sense of the open systems movement of about 20 years ago

Next up was Mark Alcock, standing in for Tim McCormick and representing OCLC, to talk about the WorldCat Developer Network. Mark came armed with a bunch of limited life API keys, so that people could try out some of the WorldCat services. OCLC appear to be offering a spectrum of services, from the commercial pay-for-use variety, to the ‘affiliate’ model – i.e. form a business partnership with us and use our services, to some free services. I’m interested in several of the WorldCat services but am wary of getting too fond of something I cannot, in the end, afford to use. Unfortunately, I did not get time on the day to make use of Mark’s API keys.

I noted that the three vendors represented seem to be spaced evenly along a spectrum of openness, with Talis at the ‘very open’ end of the spectrum, ExLibris at the ‘closed’ end, and OCLC (specifically WorldCat) somewhere in between. I can’t yet see how Talis are going to monetise the completely open model, and I think ExLibris will certainly need to open up somewhat. Perhaps OCLC have hit a sweet-spot of openness? I really don’t know enough about these services in detail, but I noticed some comments from Dorothea Salo which are somewhat critical about the business model behind WorldCat.

Ashley Sanders followed, with a quick description of an Atom (APP) based object store he is developing as part of his work extending the COPAC service. I’m following COPAC developments with interest – I’m very much in favour of the general direction they seem to be taking (I recently blogged about one aspect of this).

Tony Hirst, mashup maestro, gave a tour-de-force demonstration of using Yahoo Pipes and Google Spreadsheets as mashup tools. This went down very well with the technically-minded-but-mostly-not-developers group – especially Yahoo Pipes. I gave a presentation at the Shock of the Social in March 07 where I remarked that the potential of Yahoo Pipes was to do for web development what the spreadsheet did for non-web development before it (Microsoft Excel has been described as the most widely used Integrated Development Environment). Tony showed us how the spreadsheet is certainly relevant in a web-mashup world with his demonstrations of using Google Spreadsheets to mashup data-feeds.

Later on, after lunch, the group got down to some general hackery. On Twitter, Chris Awre (who wasn’t at the event but had been following comments on Twitter) remarked:

Silence from #mashlib08 this afternoon. The mashing must be going well…

And he was right! There was a fair stream of Twitter commentary in the morning – but it dried up as people got absorbed in hacking code and testing interfaces. I saw people exploring the Talis Platform and, in particular, Yahoo Pipes. I expect there will be some blogging about this activity – look out for the official tag:

mashlib08

Andrew McGregor of JISC has already written up his experience of this , as has Jo Alcock – I think these posts describes representative experiences of the event.

Paul Bevan rounded off proceedings with a view from public libraries – the National Library of Wales to be precise. I learned a lot from this presentation about the unique challenges facing the public non-academic sector.

I thoroughly enjoyed the day – kudos to Owen for getting the right balance of people, subjects and activities. There was a ‘buzz’ generated as the day went on which was excellent. I have been to a fair number of ‘hacker’ events where the emphasis is on the tools and the running code – I generally enjoy this kind of thing. But mashlib08 was different – what was really good about this day was that the enthusiasm came from doing stuff with information, more than from the actual development.

I think Tony Hirst deserves a special tip o’ the hat for firing up a real enthusiasm for mashups on the day.

We should definitely do this again!

“Any any any old data”

Tuesday, October 7th, 2008

Over on ZDNet, Paul Miller has blogged some thoughts about what he calls the ‘Data Cloud’. He points out that in the evolution of the ‘cloud computing’ paradigm, the:

…emphasis for much of this wider discussion remains firmly rooted in the realm of computation and storage. On many levels it’s about offloading the costs of scaling and maintaining local infrastructure, and ‘data’ doesn’t really enter the conversation at all. Something is ‘stored,’ but it’s a nameless, faceless, shapeless something that merely exists in order to be stored or computed upon.

Initially, Paul posted the germ of this idea to Twitter, where I responded with a degree of scepticism. Having given it a little thought, I remain sceptical. However, I have realised that my own, internal, ideas of what the ‘Cloud’ entails has informed my scepticism, so I figure it might be worthwhile externalising these ideas. (Note that Paul has helpfully included in his post a variety of definitions from good sources, so I won’t revisit these here. Like such celebrated memes as ‘Web 2.0′, the meaning of ‘cloud’ in this context is delineated by broad consensus, rather than strict definition. Also, I suggest that the cloud is highly connotative – depending on the exact context within which it is used it can imply much.

theCloud.png

The word itself must surely have come from all those network diagrams which included a cloud to denote the ‘great outdoors’ – i.e. the stuff beyond the local area network. (I actually remember seeing such a diagram years ago with “here be dragons” written inside the cloud).

Anyway, for what it’s worth, here are some of the characteristics which I think are important, and why I disagree (perhaps not very strongly) with Paul:

Remotely hosted:

In a literal, basic sense, if services or data are in the cloud, then they are hosted remotely, on someone else’s infrastructure. The immediate implication might be that the user also doesn’t particularly care, or even know about the details of this arrangement. At one level, this is nothing new – and if the data cloud is just meant to signify data out there, then OK – but this notion is almost as old as computer networking itself, and was certainly present at the birth of the Web.

However, the reason that the cloud meme has gained such traction over the last two years lies in the new possibilities for moving not just data, but applications, services and even infrastructure onto remote servers. Closely aligned with the Cloud in this context is Software as a Service (SaaS), which in contemporary terms means the delivery of application-specific functionality from a remote source, typically to a modern browser.

Ubiquitous:

If it’s in the Cloud, then it is available anywhere. There are many examples of where this statement could be challenged but there is, nonetheless, an expectation that if an application is delivered to me from the Cloud then I ought to be able to access and use it from any connected device with the requisite software. There is a weaker assumption that the requisite software might be simply a modern web browser.

Commodified:

One of the really interesting developments of recent years has been the introduction of infrastructure services to the Cloud. This moves an important aspect of computing services closer to the ‘utility’ model. I know which company ’supplies’ my electricity because they take large amounts of money off me and regularly send me ‘advice’ on how to reduce my bill (in case you’re wondering the best advice is to, “switch off things which are powered by electricity when you’re not using them”). However, I don’t know where that electricity is being generated, and frankly, one lot of electricity is much like another, regardless of who supplies it (in the UK at least!). So, I suggest that commodification works best where the commodity is undifferentiated. The history of computing is filled with examples of evolution towards undifferentiated supply of functionality – abstraction is the method used to achieve this. For example, if I want to run Linux on my servers, then I can use a variety of hardware, without much having to worry about this. If I pay someone else to provide me with Linux servers in the Cloud (this blog is running on one such), then I can get away with not even knowing the specifics of the hardware which hosts my system. To an extent, in trusting your infrastructure to a third party, you are saying “I trust you, look after this lot for me please and don’t bother me with the details”.

In fact, we have now reached the point, with services such as Amazon’s EC2 service, where we can say, “I’d like some computing power please – any old cycles will do”.

And right here is why I think I disagree with Paul. If you believe, as I do that the Cloud implies a move towards undifferentiated, commodified hardware and services, then I don’t see how to include data, at least most data. How often do you hear a user say, “I’d like some data please – any old data will do”. The value of data is often measures in terms of scarcity, provenance, authority, quality. When Paul describes data as a:

nameless, faceless, shapeless something that merely exists in order to be stored or computed upon.

I think he’s right – this is how data is represented in the Cloud. Where we differ, I guess, is that I think that this is a reasonable and useful way for the Cloud to treat data – it allows the Cloud to become ubiquitous and undifferentiated, feeing up the our time to concentrate on what we really care about – our data.

I’ll end with a song……Any old iron, any old iron, Any any any old iron….

COPAC gets RESTful

Wednesday, October 1st, 2008

Just a quick pointer to the really encouraging announcement from the COPAC development blog that COPAC individual COPAC records are now addressed with a persistent, and RESTful(ish) URL. The example given is:

…the work “China tide : the revealing story of the Hong Kong exodus to Canada” has a Copac Record Number of 72008715609 and can be linked to with the url http://copac.ac.uk/crn/72008715609

The records are marked up as MODS XML – but this of secondary importance to me compared to the fact that the records are easily and reliably addressed. I note that Owen Stephens has already commented, saying:

….I need some time/space to think about this, but I’m sure there is some stuff to be exploited here.

My sentiments also.

This is excellent news – it allows a significant service to participate in a resource-oriented-architecture to a greater degree. Well done the COPAC team!

The opportunistic developer is allergic to soap

Monday, June 9th, 2008

For some time now I’ve been thinking about what I think of as the ascendency of the opportunistic developer in web application development. The phrase has unfortunate connotations for those who remember the ‘personas’ meme from some years ago when it was revealed that Microsoft had characterised three type of developer for three of its software development products. [1] and [2]. This post is not directly related to these archetypes (the opportunistic developer was called ‘Mort’ in the meme, a name which has become derogatory). Rather, I’m talking abut the developer who, regardless of their ability or their occupation wants to make quick use of something when they discover it, typically on the web.

The opportunistic developer prefers to use someone else’s service/component in the majority of cases. They will create their own software when necessary, and will choose to do so under certain circumstances, but they will accommodate a certain amount of compromise if it means they can get away with using something off-the-shelf. The opportunistic developer is still a developer, as opposed to a power user: they will still write code, just as little as they can get away with.

The proliferation of freely available web-services with simple APIs has created a happy-hunting-ground for the opportunistic developer – a few years ago they were inhibited by a lack of choice of available services to use. In addition to the usual concerns – stability, provenance, price… ease of use is becoming a more important differentiator.

In the JISC Information Environment, the norm has been to develop SOAP interfaces to services, almost by default. There are, no doubt, reasons why this has made sense in the past. However, if there is one thing which became abundantly clear at last week’s IE Demonstrator/CRIG event, it is that institutional repository developers do not want to have to use SOAP interfaces. Aside from the hard-core which is interested in pushing REST as the approach to use in repository-service interactions, the consensus was that the use of SOAP for public service interfaces, rather than being an enabling mechanism, is actually a barrier to adoption.

Whether RESTful or not, services are going to have to start having very good reasons for not offering very simple APIs over HTTP, if they are to attract the opportunistic developer.

Personal profile portability

Sunday, May 18th, 2008

I haven’t minted a TLA for ages – I think I might be the the first to come up with PPP for Personal Profile Portability as a convenient handle to wrap around the current flavour of ‘data portability’ being touted by the major ‘walled-garden’ social network sites.

Both MySpace and Facebook have recently launched initiatives to open up a little….but not too much.

MySpace has announced its Data Availability project with some major partner applications. Essentially, this will encourage the user to manage ‘profile’ information on MySpace, with a view to surfacing this information in other, partner applications (initially Yahoo, eBay, Photobucket and Twitter. It will also allow users to share some data such as photos which they have added to the MySpace site. Facebook has a similar initiative called Facebook Connect, initially in partnership with Digg. In both cases, a set of usage policies will be imposed such that the user retains control over what is shared, with the power to revoke the sharing agreement. I’m really encouraged to note that in the case of MySpace’s Data Availability, the mechanism adopted to solve the inter-authentication/authorisation issues between these systems is an implementation of OAuth.

Amit Kapur (MySpace’s Chief Operating Officer) says that Data Availability is:

“…founded first and foremost on allowing users to have comprehensive control over their content and data.”

Dave Morin of Facebook believes that:

“…the next evolution of data portability is [...] about giving users the ability to take their identity and friends with them around the Web, while being able to trust that their information is always up to date and always protected by their privacy settings.”

The extent to which users ‘have control’ over their content and data even while it has been completely locked up within the MySpace and Facebook applications has been argued about extensively. The relationships between these sites, their users, and their users’ data have evolved over the last year or two, as users have become a little more savvy. Pressure from groups such as DataPortability appears to have had an effect, with MySpace also signing up to this recently.

So, it seems as though the walled gardens are opening up, getting ready to participate in the wider web. Or are they?

In a web of distributed social networks, the most likely way in which users might manage their participation would seem (right now) to be through a single entry point. Essentially, if the web of social networks is going to allow ’single-sign-on for the user, and allow a re-use of profile information, and even content across multiple applications, then one model is to give the user a ‘gateway’ service, where they sign-on and manage their ‘account’. Both Facebook and MySpace are going to battle hard to be that gateway service for the masses. Both have accepted that they can no longer remain as a completely walled garden – they must open up, just a little, to avoid being eventually marginalised. But now that they are not totally closed, they may find it difficult to retain control. They may find others are waiting to seize the initiative. Enter Google, and its Friend Connect service.

Friend Connect is different to the previous initiatives from Facebook and MySpace. Google’s new offering is designed to provide a ‘middleware’ services, sitting between the big social networks, and sundry web applications which might want to exploit the new openings in these services. It also utilises components which have been developed with the OpenSocial API. Friend Connect is, I think, a very significant development, because it shows how more distributed social networks might work. It is significant also in a particular detail – notice how Friend Connect can become a social network of sorts simply by integrating existing social networks. Suddenly, the huge headstart enjoyed by Facebook and MySpace doesn’t look so unassailable. This is, presumably, the real reason why Facebook have taken steps to block Friend Connect.

I suggest that because they have been walled gardens for so long, neither Facebook nor MySpace really know how to succeed as middleware. They have always been the destination – never really a component in someone’s workflow. By contrast, Google has always offered services which the user employs en route to a different destination. Google understands this kind of arrangement fundamentally. Expect to see increasingly desperate measures from MySpace and Facebook to retain control while Google quietly grows its Friend Connect service.