Archive for the ‘Web Infrastructure’ Category

An infrastructure service anti-pattern

Monday, December 7th, 2009

Last week I outlined an idea, that of the service anti-pattern, as part of a presentation I gave to the Resource Discovery Taskforce (organised by JISC in partnership with RLUK). The idea seemed to really catch the interest of and resonate with several of those members of the taskforce who were present at the meeting. My presentation was in a style which does not translate well to being viewed in a standalone context (e.g. on Slideshare) so I have decided to write it up here. I would very much welcome comments on this. (The presentation will be published on the Resource Discovery Taskforce pages and I will ask for this post to be linked to from there when it does appear).

The following diagram is meant to represent a design ‘pattern’ which I have seen often proposed, and sometimes implemented, in the JISC Information Environment (IE) as well as in the wider higher education (HE) sector in general:

anti-pattern.gif

It is my belief that readers who have been involved with the IE for some time will recognise this, at least in a general sense, if not in specific cases. In this arrangement, an aggregation of data is presented to the end user, through the development of a user-facing application or service. The user-facing service will in almost all cases be a web-interface, somewhat similar to the ‘portal’ concept of old but in a centralised, single, global deployment. Because it is generally accepted to be desirable to make such data available to other services (in keeping with the larger goal of interoperability through open standards), one or more machine interfaces or so-called APIs, giving access to the ‘backend’ of the system, will be offered. What this design pattern aspires to is a service implemented to be both user-facing service and machine-facing infrastructure component.

However, I contend that this is, in fact, what software engineers might call an anti-pattern. An anti-pattern is a design approach which seems plausible and attractive but which has been shown, with practice to be non-optimal or even counter-productive. It’s a pattern because it keeps coming up, which means it’s worth recording and documenting as such. It’s anti, because, in practice, it’s best avoided….

There is much which is implicit in this pattern, so I will attempt to surface what I believe are some hidden assumptions in a new version of this diagram: this is what this design pattern, once implemented, reveals:
anti-pattern-extended.gif

In this second diagram, the orange colouring indicates the parts which actually get built and are supported; the yellow indicates the parts which might get built, but which won’t really be supported as a service – in a sense, this is stuff which is believed to work but actually doesn’t; in the case of the users, the yellow colouring indicates that their demand for this service is believed to exist; those components in the diagram which are neither orange, nor yellow, are the product of little more than speculation. In the end, the investment in creating a user-facing application based on an expectation of future demand which doesn’t materialise is wasted while, at the same time, the investment in providing unused machine interfaces is also wasted.

I believe that this design pattern rests on several assumptions which are actually fallacies, and is, therefore, an anti-pattern.

Fallacy 1: “Build it and they will come”:

While infrastructure services can, indeed should, be developed with future opportunity in mind, it is helpful to have an existing and real demand to satisfy, which the new development addresses. If the service is demonstrably useful to users, and is developed effectively with future opportunity in mind, then there is more chance of the service actually working, and of it being attractive to developers working on future opportunities.

Fallacy 2: Interoperability through additional machine interfaces:

Machine interfaces need as much specification, development, testing an maintenance as user-interfaces. Simply making a machine interface available through the adoption of a platform which has a built-in facility offering some standard interface is not enough. A system which proposes to offer three or four APIs is quite likely not going to support any of them adequately. I have argued before that ‘interoperability is not enough‘: in fact, this arrangement does not often lead to interoperability, let alone actual exploitation of the capability to interoperate.

Fallacy 3: People/organisations who can make good infrastructure are also going to be good at building end-user-facing services (and vice versa):

Effective infrastructure supports services which in turn support end-users. The skills and knowledge required to support service-providers are generally quite different from those needed to deliver good user-facing services.

I call this the infrastructure service anti-pattern because the result comes from conflated requirements to deliver both infrastructure (machine-to-machine interfaces) and compelling user-facing services and applications. The result can be something which satisfies neither requirement. The users, requirements and priorities are often completely different between these two problem spaces. I suggest that the following are some possible reasons for this anti-pattern appearing:

  • funding (naturally) tends to follow services, happy users and, importantly, new features.
  • funders like to see their investment showcased
  • infrastructure is mostly invisible making it hard to ascertain impact from users

Proposals for alternative design patterns

Here is a suggested alternative design-pattern:
better-pattern.gif

In this design pattern, the API is developed before any user-facing application, or at least in parallel. An application is developed to exploit this API based on real users requirements. No service is developed until such requirements can be identified. This means that an API will be developed, and it will be being used in at least one case. Opportunities for third party integration for usage of the service are, ideally, identified beforehand. The API is properly supported from the start, or else the service fails completely. The value proposition being offered for further, opportunistic third-party developments, whether real or imagined, is now real and, crucially, supported.

An interesting alternative to this is the approach of combining the user-facing web pages and the machine-actionable API into one interface, through embedded RDFa for example:
better-pattern2.gif

It remains to be seen how this approach is going to work out over time, but we have seen hints of simpler approaches to combining user and machine interfaces in the past, such as RSS being styled to give a decent human-readable interface, or earlier attempts to do interesting things with XHTML.

I wonder if readers agree that the first diagrams represent an anti-pattern which they recognise. And would the proposed alternatives fare any better?

Not ready to wave goodbye to email

Wednesday, October 7th, 2009

Last week I posted a remark on Twitter:

Can’t help thinking that the idea that Google Wave will replace email rather misses the point….

The first response to this echoed my view on this suggesting that the real nature of Wave is rather harder to explain or understand, and implying that people fall back on a frame of reference with which they are comfortable. It certainly looks as though Google have anticipated this and offered some easily digested marketing messages. However, I also saw responses which suggested that some people still seem to be missing the point. One response insisted that Wave would only be successful if it was ‘integrated’ with email. I must confess that I still don’t understand this – I can’t really imagine what impact an integration between Wave and email would really have.

It seems to me that Wave is an ambitious attempt to exploit the idea that one future for the Web lies in social networked activity clustered around shared artefacts. Such artefacts, often what we still call ‘documents’, have been given the useful label social objects. At the centre of a Wave is a social object, with a series of applied and recorded operational transforms. Wave would therefore seem to be primarily about collaboration, as opposed to email or IM which are primarily concerned with messaging. Another way of looking at this would be to suggest that Wave is ‘object-centric’, as opposed to email which is message oriented with a facility to attach auxiliary objects.

The idea that Wave would replace email seems to be suggesting that we won’t need apples anymore because now we have oranges. This is not to say that Wave might not better fit some use-cases currently served by email – such as the problematic mode of collaborative editing of documents by sharing copies sent as email attachments. But even as we adopt better software for collaboration, there’s not much sign that we’re giving up using email. I don’t know about you, but my email inbox isn’t getting any smaller just because I use Google Docs, IM, Twitter…. Email has been tested quite thoroughly now over a few years, and appears to work quite well for asynchronous messaging!
Wave uses XMPP as its underlying protocol which is both interesting and important, but it is also slightly misleading as it implies an important connection with ‘instant messaging, which I think is illusory and unhelpful.

Wave is possible because the barrier of network latency is gradually being reduced. Real-time collaboration across the global network is now viable for many. Of course Wave is not the only game in town – other interesting approaches (mostly also using a variation on the pubsub paradigm) to the real-time Web, such as pubsubhubbub are being actively developed and experimented with. But Google Wave is important – because it’s Google who are doing it. It will gain a lot of publicity, and will likely play its part in driving a culture change allowing real-time collaboration across the global network to ‘go mainstream’. It should be remembered that Google’s Gmail, the poster-child for Web-based email, is still significantly smaller in terms of users than Yahoo and Hotmail.

Because Wave offers APIs to developers and users out of the box, I think it is going to be difficult to say what shape this new offering from Google will take once a significant number of people are using it. The ability to federate Wave services could be significant in this respect.

HEIs Get Facebook Fever (again)

Sunday, June 14th, 2009

LandRun.jpegFacebook rolled out its ‘usernames‘ function today. This is a new feature at Facebook which allows a user to claim their little bit of the Facebook namespace, along the lines of:

http://www.facebook.com/[preferred_name]/

The process started at 05:00 am UK local time – on a Saturday morning – yet several people in my social and professional networks got up early to claim their personalised Facebook URL. Not all were successful despite this determination, and some ended up having to settle for some variation on their preferred username.

As for me, I enjoyed a rare lie-in :-)

So, why do people think this is important – and worth getting up at 05:00 for? And why am I not ‘bovvered’? From the various commentaries I’ve seen so far – blog posts and Twitter discussion primarily, here are some aspects & motives I’ve identified so far, and some of the issues I have with them.

Fear of someone else registering your preferred username

This seems to be the main reason for the 05:00 land-grab. The motivation for registering a username appears to be, primarily, a defensive one. I guess there’s a sense that this might become important. The majority of people, from my very limited straw-poll, seem to fall into this category. While I don’t personally feel the need, I understand this reasoning.

Wanting to be able to offer a neat & personalised Facebook URL for you or your organisation

This is covered by Brian Kelly – he describes the decision to register a Facebook URL for an organisational Facebook page as a ‘no-brainer’, and lists a few higher-education institutions (HEIs) which have rushed to register a URL.

In his post, Brian asks:

So tell me, what is the logic in having a personal or institutional Facebook account and keeping the long form for its address? Or are the tweets I’ve been seeing simply a minority view from the ideological purists….?

For some people, the personalised URL is immediately important as they intend to use it as a personal ‘identifier’. The motivations here are convenience – such a URL can be much more memorable, and ‘vanity’ – a personalised URL is undoubtedly more satisfying and attractive. (Note, I use the term ‘vanity’ here as it has been used by others in this context and I don’t intend any pejorative sense that this term might convey).

So, why was I lounging in bed rather than rushing to claim my Facebook ID, and why would I hesitate (’ideological purity’ aside!) before registering and publicising a URL for my HEI?

  1. I have a personal namespace, having registered the domain ‘paulwalk.net’. This is also my OpenID, through the use of delegation (I have already changed OpenID identity provider twice without changing my OpenID). I realise that maintaining a personal domain is not yet a mainstream activity – yet I’m frequently surprised by the fact that many of those generally very tech-savvy people in my professional/social networks do not bother to do this, instead investing a major part of their online identities with companies such as wordpress.com or Facebook.
  2. Do you trust Facebook? How much? Because, by registering a Facebook URL and publicising it, you just tied a potentially major part of your online identity with the fortunes and behaviour of this company. As an individual, this risk might be worth the convenience perhaps. But as an HEI – why would you want to introduce this risk when you already own and manage your own namespace?
  3. As an HEI, you will have, no doubt, invested considerably in establishing a strong URL-based online brand, being careful with search engine optimisation and the like. Why then would you introduce a competing URL which will tend to dilute your primary Web address’s prominence? It may be that some HEIs have, after careful deliberation, decided to base their online identity and the marketing of their organisation on the Facebook platform – but I’d be amazed if this were true. So what exactly is the point in establishing a public Facebook URL for your organisation?

An expectation that Facebook will become an OpenID identity provider in the future

More tech-savvy users recognise that the Facebook URL they claim could soon become an OpenID. If they are a regular user of Facebook, this could offer a measure of convenience in the sense that their identity provider will be also a service provider which they use frequently. But as the usability issues with OpenID (and there are several) are gradually ironed out, we can expect to see OpenID’s importance as an ‘identifying system’ rather than an authenticating mechanism come to the fore. Using Facebook (or any equivalent service provider) as an identity provider will make less and less sense.

Time will tell

It may be that I am wrong about these issues. However, I have challenged the HEI sector’s desire to jump on the Facebook bandwagon in the past, and I have not seen much evidence to convince me that Facebook is a significant platform for engagment with students. As part of a marketing strategy, it probably makes sense to maintain some sort of presence in Facebook – just as it might make sense to establish a presence in various other systems. But on the public Web, an HEI’s identity must surely be kept independent of any private commercial concern. The mechanisms for ensuring this are well established. And, increasingly, we can begin to apply these mechanisms to our individual identities.



OpenID and name authority

Thursday, January 22nd, 2009

In his Science in the Open blog Cameron Neylon has written an interesting post, A Specialist OpenID Service to Provide Unique Researcher IDs? in which he asks:

Good citation practice lies at the core of good science. The value of research data is not so much in the data itself but its context, its connection with other data and ideas. How then is it that we have no way of citing a person?

Cameron suggests that OpenID might offer a solution to this.

I have been very interested in OpenID for some time. I like the relatively agile way in the which the standard has evolved. I like the fact that it has been responsive to the developer community. I agree with Andy Powell when he talks about the importance of the capacity for the delegation of the service providing your OpenID – I’ve maintained an OpenID for myself at http://paulwalk.net despite having changed the underlying OpenID identity provider service twice. However, I’ve become frustrated by the way in which OpenID has been deployed and couched almost entirely in terms of it’s potential to solve the often-exaggerated problem of users needing to maintain too many user accounts (although I confess that I have contributed to this). Personally I maintain a small handful of username/password combinations for accessing hundreds of web services – it’s a minor inconvenience. And as Mike Ellis pointed out in a great post, OpenID: fail:

In a technical sense, OpenID works. But from a usability perspective, it’s absolutely horrible.

I blogged about OpenID a while ago, saying:

I’ve thought for a while that the introduction of URIs for people was the often overlooked yet potentially most interesting aspect of OpenID. In a resource-oriented-architecture, it would seem plausible to suppose that a reliable pointer to a representation of a person would be a useful thing. But when I try to sketch out a useful application for this, I struggle….

The idea of using OpenID as an ‘author identifier’ in scholarly communications has occurred to me before too – specifically in the context of repositories. I agree it could play a part here. At one level this could be seen as an extension of the ongoing persistent identifier issue in the context of web-resources, being applied to people. However, as an OpenID is a URL, it is open to the same criticisms levelled against the use of URLs for papers in an institutional repository for instance (the delegation feature does mitigate this, albeit only slightly).

One aspect of OpenID, which I think might become relevant if OpenID reaches any kind of critical mass as a public identifier system will be the way in which a given OpenID could gain authority over time. The only thing you can trust about a newly minted OpenID is that you can interrogate the ‘user’ of the OpenID and verify that they are the agent which ‘controls’ or ‘owns’ it. However, an OpenID will rarely be surfaced without other metadata about the agent – there will be a context in which it is used. In a community of researchers for example, as a particular OpenID is used more and more by a researcher in various contexts and systems, a level of trust will build around the association of that OpenID with an actual person.

For a long while I thought that OpenID might be the answer to a problem arising out of the need for a different user-account in every system we use – not the bogus issue of needing to remember lots of passwords, but the fact that this creates an immediate obstacle to joining up those systems at the level of the user. This issue has become more visible with the systems underpinning social networks. I see all kinds of potential in being able to conclude that while I might not know the person identified here in this system, I can be sure that they are the same person in this other system, because they have the same OpenID. Of course there is all kinds of potential for abuse of such join-up, but I would still like to be able to control such arrangements myself.

Increasingly, I’m annoyed by my social-web activities being constrained unnecessarily by really prosaic limitations in the systems I use. As I said in another post back in September 2007:

Now, it’s certainly not unusual to maintain more than one, unconnected circle of contacts. Many people prefer to keep their professional and their social networks separate. But, and this is the important point, I really don’t want my social networks to be constrained by particular software choices. As I can connect resources across the web in a uniform way to form a network of resources, I want to be able to connect people to form my social network. Perhaps OpenID or something similar could provide the solution.

Imagine a Web where everything you did publicly was linked by the very fact that you were represented by a URL exactly like your blog post, or your photo on Flickr, or your post on Twitter, or your correction to that Wikipedia entry, or your research paper in your institutional repository for that matter…. think of the possibilities.

Push or pull?

Thursday, January 8th, 2009

A brief comment, as I hop across the North Sea back to Bristol.

With the news that arXiv will now accept deposits from institutional repositories, Dorothea Salo continues her theme about a deposit flow which goes from author, to institutional repository, to subject/discipline repository. Dorothea offers some scenarios, including:

Achaea University adopts a Harvard-style open-access mandate. If she wants her articles in arXiv as well, Dr. Troia must rather annoyingly dual-deposit… unless Achaea’s IR implements a deposit pipeline to arXiv, in which case the most she has to do is tick a ticky-box (and I can imagine ways to abstract away the ticky-box).

In an abstract sense I appreciate the notion of the ‘deposit pipeline’. I also agree with the main point which is about the direction of the flow. Indeed, I have previously characterized the institutional repository as being, or more usually containing, the source repository. However, I remain slightly doubtful about the need for the flow to be initiated by the source. If there were some mechanism by which the subject/discipline repository could be alerted to the appearance of relevant materials in the institutional repository, then doesn’t it make sense for the subject repository to fetch the record/artefact, rather than wait to have it sent. Well, we already have the mechanism, it’s called RSS (or Atom) and it’s already supported by some of our most popular repository software.

Come to think of it, an even better approach might be for the subject repository, having been alerted to a new & relevant deposit in the institutional repository, to simply maintain a pointer to the original (optionally creating new and related resources)

In other words, as a certain generation of programmers would put it, pass by reference, not by copy.

Europeana, numbers and scalable architectures

Tuesday, December 9th, 2008

I just got around to reading the press release issued after the collapse of Europeana (previously the more easily pronounced ‘European Digital Library’) following its launch a couple of weeks ago. If you go to the site now, you are greeted with the following message:

The Europeana site is temporarily not accessible due to overwhelming interest after its launch (10 million hits per hour).

We are doing our utmost to reopen Europeana in a more robust version as soon as possible.

We will be back by mid-December.

(my emphases)

The press release explains what happened. Or rather, it explains whose fault it is that the site couldn’t cope with the traffic it received. The blame is laid squarely at those pesky ‘experts’ who predicted a peak demand of 5 million hits per hour, and at the public who disregarded this and whose demand reached a peak of 10 million hits per hour. Or 13 million hits. Or nearly 20 million hits. Each of which is claimed in the press release or on the website. We’ll come back to these numbers in a moment.

Piecing together the limited information provided by the website, press release, and a recording of the press-conference following the site being taken down, one arrives at the following sequence of events:

  1. Europeana is launched, following a good deal of publicity
  2. Peak usage approaches 8/10/13/nearly-20 million hits. Site begins to behave unpredictably and to become unresponsive
  3. Hardware capacity doubled from 3 servers to 6 servers
  4. Decision made to take site down to ‘ease pressure’ on it.

In a breath-taking display of ’spin’, this rather faltering start to Europeana’s fortunes is being hailed as a very positive development. According to spokesman Martin Selmayr in a recorded press conference it demonstrates unequivocally that there is a huge demand for the service. Rather intriguingly we are told that the service fell over because thousands of people were searching for the query term ‘Mona Lisa’ at exactly the same time. When one of the journalists points out that this seems a little suspicious, the spokesman tells him that this is because the press specifically used the ‘Mona Lisa’ as its example when discussing the impending launch of the service – so the crash is partly the press’s fault as well! When another journalist suggests that thousands of concurrent requests for the same resource has some of the characteristics of a distributed-denial-of-service attack, the response is to claim that actually there was a wide range of content searched for. I sensed that the assembled press corps were becoming a little puzzled by this point. Yet another journalist asked why the site would be down until mid-December. Apparently, this is to ‘take pressure off the system’, which doesn’t make a whole lot of sense.

So, what now? It is possible to speculate based on what is hinted at in the press release. Clearly, Europeana did not scale to cope with 10 million hits an hour – double what was predicted. One might suppose that Europeana would experience a peak-load at launch, given the publicity, which might then ease off a little. The Europeana team have already tried to respond by ’scaling out’ – adding more hardware. In fact they have doubled the hardware, to no avail. If scaling-out was going to work, then why not double again? Surely the cost is not the issue? I suggest that someone has realised that scaling out will not work, and that some deeper adjustments to the system’s architecture will need to be done. In which case, I wonder if it can be achieved by mid-December. And why did they think that scaling out would work in the first place?

Europeana seems to be driven by numbers. This seems like an increasingly anachronistic approach to the design and measurement of success of a web-service. A memo about the service states:

The objective of the European Commission is that in 2010, the number of digitised works available online through Europeana should reach 10 million.

Numbers again…. No matter what estimations have been calculated, or plans been laid, this number can’t be much more than arbitrary. Why 10 million? Why not 20? Or 5?. These metrics are just not helpful or interesting. If Europeana hadn’t crashed it would be making 2 million objects available now, apparently. I have no way of appreciating the benefit of 5 times as many objects. A better statement would have talked about growth in terms of responding to user feedback perhaps. (Incidentally, in this memo there is also a table of percentage contributions from EU ‘member states’ to Europeana – France has contributed 52% of the content so far, compared with 10% from the UK).

When you’re facing a global scale of usage you need a global-scale architecture to cope. Global isn’t a number. Global implies that you might need to keep growing. Continuously. If your service is an instant hit, you might even need to grow rapidly. There is much which has been learned and developed about this in recent years, and new architectures have evolved to meet huge levels of demand. Another sensible strategy which has emerged is the ’soft-launch’. Launch the application, and allow word to spread; give yourself some breathing space to tweak the application, fix bugs, and grow. So, while the Europeana engineers are working away to try to meet their mid-December deadline, I urge them to consider two things:

  • forget the numbers you’ve been saddled with – get a global-scale architecture in place, something which allows you to scale out at will, regardless of the numbers. If this means sacrificing functionality then do it nonetheless.
  • don’t be persuaded into a big launch with fanfares – do a soft launch and allow some time for the system to shake-down and for its reputation to grow. If it’s a good service, the users will come.

I hope Europeana is allowed to launch when it’s good and ready – and I hope that it does launch. And I hope that, in time, we can learn from the mistakes of this ambitious project, rather than be fed marketing claims and face-saving spin.

Library hackers FTW

Friday, November 28th, 2008

Yesterday I went along to Mashed Library UK 2008 in London. Quickly abbreviated to ‘mashlib’, the event was the brain-child of Owen Stephens. Owen did most of the organising, aided by David Flanders who provided the space at BirkBeck college, and our excellent events team at UKOLN. The event was sponsored by UKOLN, using funding from the JISC.

I thought the balance of activities on the day was excellent – a healthy mixture of short presentations, demonstrations and a good amount of hands-on hacking. The group was comprised of commercial vendors (Talis, ExLibris, OCLC), academic-library folk (the majority), a lone representative from the public library world (Paul Bevan for the National Library of Wales), and a few developers from various (mostly JISC-funded) services.

Rob Styles from Talis gave us a demo of the Talis Platform. There is an open API which you can play with – it’s quite impressive. I was very struck by some of the language Rob used in his demo – he talked about dipping, where a result-set from a query (in RSS 1.0 format) is “dipped into” another – with the original data-set accreting more infromation from the second. (Jim Downing and I had an interesting chat about this over lunch, with Jim proposing that we could visualise data-sets as molecules – having a certain shape which allows them to bond with other molecules which have a complementary shape). Rob also talked about mixing in in a smiler vein. The Talis Platform APIs appear to be quite RESTful, with a good deal of passing URLs around rather than result-sets. I plan to have a closer look at this.

Timm-Martin Siewert spoke next about the ExLibris Open Platform. I did get a URL for this but it takes me to a page whcih challenges me for a username and password which I do not have. The Open Platform is , apparently, open to paying customers only. Edward Corrado suggested via a tweet that:

I think they mean open in the sense of the open systems movement of about 20 years ago

Next up was Mark Alcock, standing in for Tim McCormick and representing OCLC, to talk about the WorldCat Developer Network. Mark came armed with a bunch of limited life API keys, so that people could try out some of the WorldCat services. OCLC appear to be offering a spectrum of services, from the commercial pay-for-use variety, to the ‘affiliate’ model – i.e. form a business partnership with us and use our services, to some free services. I’m interested in several of the WorldCat services but am wary of getting too fond of something I cannot, in the end, afford to use. Unfortunately, I did not get time on the day to make use of Mark’s API keys.

I noted that the three vendors represented seem to be spaced evenly along a spectrum of openness, with Talis at the ‘very open’ end of the spectrum, ExLibris at the ‘closed’ end, and OCLC (specifically WorldCat) somewhere in between. I can’t yet see how Talis are going to monetise the completely open model, and I think ExLibris will certainly need to open up somewhat. Perhaps OCLC have hit a sweet-spot of openness? I really don’t know enough about these services in detail, but I noticed some comments from Dorothea Salo which are somewhat critical about the business model behind WorldCat.

Ashley Sanders followed, with a quick description of an Atom (APP) based object store he is developing as part of his work extending the COPAC service. I’m following COPAC developments with interest – I’m very much in favour of the general direction they seem to be taking (I recently blogged about one aspect of this).

Tony Hirst, mashup maestro, gave a tour-de-force demonstration of using Yahoo Pipes and Google Spreadsheets as mashup tools. This went down very well with the technically-minded-but-mostly-not-developers group – especially Yahoo Pipes. I gave a presentation at the Shock of the Social in March 07 where I remarked that the potential of Yahoo Pipes was to do for web development what the spreadsheet did for non-web development before it (Microsoft Excel has been described as the most widely used Integrated Development Environment). Tony showed us how the spreadsheet is certainly relevant in a web-mashup world with his demonstrations of using Google Spreadsheets to mashup data-feeds.

Later on, after lunch, the group got down to some general hackery. On Twitter, Chris Awre (who wasn’t at the event but had been following comments on Twitter) remarked:

Silence from #mashlib08 this afternoon. The mashing must be going well…

And he was right! There was a fair stream of Twitter commentary in the morning – but it dried up as people got absorbed in hacking code and testing interfaces. I saw people exploring the Talis Platform and, in particular, Yahoo Pipes. I expect there will be some blogging about this activity – look out for the official tag:

mashlib08

Andrew McGregor of JISC has already written up his experience of this , as has Jo Alcock – I think these posts describes representative experiences of the event.

Paul Bevan rounded off proceedings with a view from public libraries – the National Library of Wales to be precise. I learned a lot from this presentation about the unique challenges facing the public non-academic sector.

I thoroughly enjoyed the day – kudos to Owen for getting the right balance of people, subjects and activities. There was a ‘buzz’ generated as the day went on which was excellent. I have been to a fair number of ‘hacker’ events where the emphasis is on the tools and the running code – I generally enjoy this kind of thing. But mashlib08 was different – what was really good about this day was that the enthusiasm came from doing stuff with information, more than from the actual development.

I think Tony Hirst deserves a special tip o’ the hat for firing up a real enthusiasm for mashups on the day.

We should definitely do this again!

Infrastructure

Wednesday, November 26th, 2008

I was recently invited to join the JISC Resource Discovery Infrastructure Taskforce – the first meeting was yesterday. We had been given some background material, and a couple of people (Owen Stephens and Paul Miller) were asked to present ideas around this general area, but the main order of the day was to establish terms of reference and some guiding principles. I had fondly imagined that this would be a fairly rapid exercise – more or less a bureaucratic process before we got down to the nitty-gritty of what problems we needed to solve and how we were going to solve them.

I couldn’t have been more wrong. As the day progressed, I was fascinated to gradually realise that there was a real disconnect between different understandings of what was meant by the term infrastructure. I made a few attempts to pause proceedings while we sorted out a definition, and at the end of the meeting I suggested that a priority for the work which will go on before the next meeting ought to be to establish a reasonable working definition of this term.

Some other scoping issues were dealt with fairly quickly. For example, the infrastructure, what ever that is, would be national in scope, but would serve ‘local’ services. But I think the lack of a general and common understanding of the term ‘infrastructure’ became a real problem yesterday.

Now, I’m not one to insist on precise definitions. Some terms are very useful in spite of, or even because of the fact that they have no precise definition. I have no problem with using ‘Web 2.0′ for example, even in non-marketing contexts! My own problem was that I have some fairly clear ideas about what infrastructure is – or is not – but it turns out that these are not shared by everyone else. For example:

  • I assume that infrastructure is not generally user-facing. In the sense that a national infrastructure supports local services which support users, I see the primary stakeholders in an infrastructure as being those people who are providing local services. They are the people who both know what users want (or they should do!) and who know what support they need at a national level to make that happen. However, others in the meeting assumed that users would be directly accessing ‘infrastructure services’ in a variety of ways.
  • I imagine a successful infrastructure to be mostly invisible. This turns out to be the opposite view to some at the meeting, who (perfectly reasonably) want UK national infrastructure to be overtly ‘world-class’.
  • I imagine a successful infrastructure to be rather boring. Even train-spotters don’t generally photograph the infrastructure, the track etc…. But some of the discussion yesterday was around innovation and not being afraid to take risks. I guess I see infrastructure as tending to be the product of a conservative development process.

Some at the meeting had notions of infrastructure as clear (to them) as mine (are to me), just different. Some I suspect did not have a clear idea in their mind to start with. These were probably the wiser ones in hindsight. I suspect by the end of the meeting, everyone’s ideas had shifted (mine were in free-fall).

Nonetheless, the discussion was really interesting – I enjoy having my preconceptions challenged – and I met some insightful people. I look forward to the next meeting,

In the meantime – help us out please! Infrastructure – what does it mean to you?

One-way bridges and interim solutions

Tuesday, October 28th, 2008

In my previous post about QR codes I made a couple of points which, after receiving some interesting comments, I’d like to expand on.

“I see them [QR codes] occasionally on blogs/web-pages but I just don’t much see the point of that”

Shortly after making this point, I suggested on a UKOLN internal mailing list that it might make more sense to include a QR code in a cascading style sheet provided for printing, rather than viewing on the screen. If I want to link my blog/post/webpage to some other web resource, I include a hyperlink (which might be displayed as a title, rather than the raw URL, and so occluded on the screen). If I want to link a print-out of my blog/post/webpage to some other web resource, I can include a hyperlink, being careful to display the actual URL, or I can present a QR code. Or both. Tony Hirst also made the point about CSS for printing in a follow-up post to his original.

“I see QR codes as an interim technology, but a potentially useful one, which bridges the gap between paper-based and digital information.”

Some context: QR has been around for a while, and is well established in some industrial contexts. However, the aspects of their usage (or of their potential usage) which is of interest to Mia, Tony, Andy as well as Lawrie, Jon and Mike who all commented on my previous post stems from the possibility of wide-spread use by consumers with mobile devices, typically phones.

It seems to me there are two, different, aspects to this:

  1. giving users an easy way of jumping to a virtual resource while they are not immersed in a virtual context
  2. connecting the physical and the virtual worlds

QR codes seem to satisfy the former to some extent. In a comment, Jon mentioned that:

City AM (a free London daily business newspaper) use QR codes printed on the frontpage to drive visitors to their mobile site. It’s a simple idea that does actually work really well. There is clearly great potential for this in any number of marketing/promo activities.

Leaving aside the clunkiness of the iPhone as a QR-reading client device, the user is still required to actively scan the barcode with a handheld camera/scanner. The user must know that they want to, in this case, ‘go’ to the website. I find it hard to imagine that this will ever drive huge volumes of users to such webpages, but I guess that isn’t the point – it’s just ink after all and costs almost nothing…. there’s nothing to lose for the publisher. And even on an iPhone, scanning a QR code to input a URL is still probably quicker and more convenient than typing in a URL read off a piece of paper. As a way of encoding a URL in a machine scannable and readable way on paper, QR codes have the virtue of relative simplicity, very low cost, and a growing capacity among users to exploit them. And having thought more about Tony’s idea for QR codes in the margins of learning materials linking to video clips which supplement the content, I’m now persuaded that this could be worth doing.

In my previous post, I mentioned in passing how it might be interesting to use QR codes in a museum context:

Imagine walking around a museum – scan a QR code attached to an exhibit, load the URL and get a commentary played on the iPhone without needing to supply/hire those dedicated units some institutions supply to visitors.

Now imagine that, rather than having to scan a QR code, my phone automatically knows that it is near a particular exhibit. When I enter the physical space of the museum, I load the virtual space into the browser on my phone. As I stand in front of the physical exhibit, my device orientates me in the virtual space as well. There are are existing technologies which might help get us to this point, such as RFID & GPS. Imagine linking this with something like Graffitio for the iPhone….

Lawrie picked up on my point about ‘bridging the gap’, and Mike said:

It seems to me that the ways in which we begin to bridge the gap between virtual and real is something that is pretty permanent…

I agree with Mike’s sentiment absolutely. How virtual and physical worlds interact, and how technology and people mediate between these ‘places’ is already becoming fascinating. However, I think the fact that we’ve all jumped on the ‘bridge’ metaphor is revealing. A bridge is a narrow, limiting connection between two larger places. The bridge represented by QR codes is, furthermore, one way. We need bi-directional connections…. but that’s another blog post.

In the meantime, and coming down to earth, it’s difficult to see how information from QR codes can ever be pushed to the user. And that, I think, is why in many contexts it can only be an interim solution.

“Any any any old data”

Tuesday, October 7th, 2008

Over on ZDNet, Paul Miller has blogged some thoughts about what he calls the ‘Data Cloud’. He points out that in the evolution of the ‘cloud computing’ paradigm, the:

…emphasis for much of this wider discussion remains firmly rooted in the realm of computation and storage. On many levels it’s about offloading the costs of scaling and maintaining local infrastructure, and ‘data’ doesn’t really enter the conversation at all. Something is ‘stored,’ but it’s a nameless, faceless, shapeless something that merely exists in order to be stored or computed upon.

Initially, Paul posted the germ of this idea to Twitter, where I responded with a degree of scepticism. Having given it a little thought, I remain sceptical. However, I have realised that my own, internal, ideas of what the ‘Cloud’ entails has informed my scepticism, so I figure it might be worthwhile externalising these ideas. (Note that Paul has helpfully included in his post a variety of definitions from good sources, so I won’t revisit these here. Like such celebrated memes as ‘Web 2.0′, the meaning of ‘cloud’ in this context is delineated by broad consensus, rather than strict definition. Also, I suggest that the cloud is highly connotative – depending on the exact context within which it is used it can imply much.

theCloud.png

The word itself must surely have come from all those network diagrams which included a cloud to denote the ‘great outdoors’ – i.e. the stuff beyond the local area network. (I actually remember seeing such a diagram years ago with “here be dragons” written inside the cloud).

Anyway, for what it’s worth, here are some of the characteristics which I think are important, and why I disagree (perhaps not very strongly) with Paul:

Remotely hosted:

In a literal, basic sense, if services or data are in the cloud, then they are hosted remotely, on someone else’s infrastructure. The immediate implication might be that the user also doesn’t particularly care, or even know about the details of this arrangement. At one level, this is nothing new – and if the data cloud is just meant to signify data out there, then OK – but this notion is almost as old as computer networking itself, and was certainly present at the birth of the Web.

However, the reason that the cloud meme has gained such traction over the last two years lies in the new possibilities for moving not just data, but applications, services and even infrastructure onto remote servers. Closely aligned with the Cloud in this context is Software as a Service (SaaS), which in contemporary terms means the delivery of application-specific functionality from a remote source, typically to a modern browser.

Ubiquitous:

If it’s in the Cloud, then it is available anywhere. There are many examples of where this statement could be challenged but there is, nonetheless, an expectation that if an application is delivered to me from the Cloud then I ought to be able to access and use it from any connected device with the requisite software. There is a weaker assumption that the requisite software might be simply a modern web browser.

Commodified:

One of the really interesting developments of recent years has been the introduction of infrastructure services to the Cloud. This moves an important aspect of computing services closer to the ‘utility’ model. I know which company ’supplies’ my electricity because they take large amounts of money off me and regularly send me ‘advice’ on how to reduce my bill (in case you’re wondering the best advice is to, “switch off things which are powered by electricity when you’re not using them”). However, I don’t know where that electricity is being generated, and frankly, one lot of electricity is much like another, regardless of who supplies it (in the UK at least!). So, I suggest that commodification works best where the commodity is undifferentiated. The history of computing is filled with examples of evolution towards undifferentiated supply of functionality – abstraction is the method used to achieve this. For example, if I want to run Linux on my servers, then I can use a variety of hardware, without much having to worry about this. If I pay someone else to provide me with Linux servers in the Cloud (this blog is running on one such), then I can get away with not even knowing the specifics of the hardware which hosts my system. To an extent, in trusting your infrastructure to a third party, you are saying “I trust you, look after this lot for me please and don’t bother me with the details”.

In fact, we have now reached the point, with services such as Amazon’s EC2 service, where we can say, “I’d like some computing power please – any old cycles will do”.

And right here is why I think I disagree with Paul. If you believe, as I do that the Cloud implies a move towards undifferentiated, commodified hardware and services, then I don’t see how to include data, at least most data. How often do you hear a user say, “I’d like some data please – any old data will do”. The value of data is often measures in terms of scarcity, provenance, authority, quality. When Paul describes data as a:

nameless, faceless, shapeless something that merely exists in order to be stored or computed upon.

I think he’s right – this is how data is represented in the Cloud. Where we differ, I guess, is that I think that this is a reasonable and useful way for the Cloud to treat data – it allows the Cloud to become ubiquitous and undifferentiated, feeing up the our time to concentrate on what we really care about – our data.

I’ll end with a song……Any old iron, any old iron, Any any any old iron….