Archive for the ‘Web Infrastructure’ Category

“Any any any old data”

Tuesday, October 7th, 2008

Over on ZDNet, Paul Miller has blogged some thoughts about what he calls the ‘Data Cloud’. He points out that in the evolution of the ‘cloud computing’ paradigm, the:

…emphasis for much of this wider discussion remains firmly rooted in the realm of computation and storage. On many levels it’s about offloading the costs of scaling and maintaining local infrastructure, and ‘data’ doesn’t really enter the conversation at all. Something is ‘stored,’ but it’s a nameless, faceless, shapeless something that merely exists in order to be stored or computed upon.

Initially, Paul posted the germ of this idea to Twitter, where I responded with a degree of scepticism. Having given it a little thought, I remain sceptical. However, I have realised that my own, internal, ideas of what the ‘Cloud’ entails has informed my scepticism, so I figure it might be worthwhile externalising these ideas. (Note that Paul has helpfully included in his post a variety of definitions from good sources, so I won’t revisit these here. Like such celebrated memes as ‘Web 2.0′, the meaning of ‘cloud’ in this context is delineated by broad consensus, rather than strict definition. Also, I suggest that the cloud is highly connotative - depending on the exact context within which it is used it can imply much.

theCloud.png

The word itself must surely have come from all those network diagrams which included a cloud to denote the ‘great outdoors’ - i.e. the stuff beyond the local area network. (I actually remember seeing such a diagram years ago with “here be dragons” written inside the cloud).

Anyway, for what it’s worth, here are some of the characteristics which I think are important, and why I disagree (perhaps not very strongly) with Paul:

Remotely hosted:

In a literal, basic sense, if services or data are in the cloud, then they are hosted remotely, on someone else’s infrastructure. The immediate implication might be that the user also doesn’t particularly care, or even know about the details of this arrangement. At one level, this is nothing new - and if the data cloud is just meant to signify data out there, then OK - but this notion is almost as old as computer networking itself, and was certainly present at the birth of the Web.

However, the reason that the cloud meme has gained such traction over the last two years lies in the new possibilities for moving not just data, but applications, services and even infrastructure onto remote servers. Closely aligned with the Cloud in this context is Software as a Service (SaaS), which in contemporary terms means the delivery of application-specific functionality from a remote source, typically to a modern browser.

Ubiquitous:

If it’s in the Cloud, then it is available anywhere. There are many examples of where this statement could be challenged but there is, nonetheless, an expectation that if an application is delivered to me from the Cloud then I ought to be able to access and use it from any connected device with the requisite software. There is a weaker assumption that the requisite software might be simply a modern web browser.

Commodified:

One of the really interesting developments of recent years has been the introduction of infrastructure services to the Cloud. This moves an important aspect of computing services closer to the ‘utility’ model. I know which company ’supplies’ my electricity because they take large amounts of money off me and regularly send me ‘advice’ on how to reduce my bill (in case you’re wondering the best advice is to, “switch off things which are powered by electricity when you’re not using them”). However, I don’t know where that electricity is being generated, and frankly, one lot of electricity is much like another, regardless of who supplies it (in the UK at least!). So, I suggest that commodification works best where the commodity is undifferentiated. The history of computing is filled with examples of evolution towards undifferentiated supply of functionality - abstraction is the method used to achieve this. For example, if I want to run Linux on my servers, then I can use a variety of hardware, without much having to worry about this. If I pay someone else to provide me with Linux servers in the Cloud (this blog is running on one such), then I can get away with not even knowing the specifics of the hardware which hosts my system. To an extent, in trusting your infrastructure to a third party, you are saying “I trust you, look after this lot for me please and don’t bother me with the details”.

In fact, we have now reached the point, with services such as Amazon’s EC2 service, where we can say, “I’d like some computing power please - any old cycles will do”.

And right here is why I think I disagree with Paul. If you believe, as I do that the Cloud implies a move towards undifferentiated, commodified hardware and services, then I don’t see how to include data, at least most data. How often do you hear a user say, “I’d like some data please - any old data will do”. The value of data is often measures in terms of scarcity, provenance, authority, quality. When Paul describes data as a:

nameless, faceless, shapeless something that merely exists in order to be stored or computed upon.

I think he’s right - this is how data is represented in the Cloud. Where we differ, I guess, is that I think that this is a reasonable and useful way for the Cloud to treat data - it allows the Cloud to become ubiquitous and undifferentiated, feeing up the our time to concentrate on what we really care about - our data.

I’ll end with a song……Any old iron, any old iron, Any any any old iron….

Tags: , , , ,

Blog commons?

Friday, July 25th, 2008

You may have noticed that I have included a statement on this blog’s ‘home-page‘ to the effect that:

This work is licensed under a Creative Commons Attribution 2.0 UK: England & Wales License.

This is standard blurb from the Creative Commons (CC) site. In the context of my blog this means - well, what exactly? Feel free to use anything you find here, for whatever purpose you like, so long as you credit me? What about material I include from elsewhere? What about other people’s comments on my posts? It seems to me that this just isn’t clear enough….

And another problem - I don’t necessarily want to apply the same license, indiscriminately, to all of my posts. I probably want credit/attribution for anything I write here, true, but I might feel differently about commercial re-use of the contents of different posts (although I’m probably deluding myself if I think that my blog has potential for commercial exploitation!).

In point of fact, I actually changed the license on my blog a while ago, to remove the non-commercial use clause from my Creative Commons 2.0 license. I guess this is pretty poor practice as it has, by implication, retrospectively changed the license I applied to past entries. So far, no one has complained…. ;-)

Would it be better practice to attach a license to the text of each post, rather than to the blog as a whole? Is the ‘post’ closer to being a ‘work’ in CC terms? Even better, should I embed the license as a footnote to the content itself? Currently, my CC license declaration is simply an artefact of the user interface I host at http://blog.paulwalk.net/index.php - it doesn’t even appear in the RSS feed. If I licensed each post, rather than the blog as a whole, I could be selective about licensing content (perhaps maintaining a sensible default to avoid unnecessary work). And I could move to a different license later without feeling vaguely guilty. I guess I could include a statement making clear to people who want to post comments on my blog just how their comments are going to be licensed. Or even allow the them to select a license themselves….!

It occurred to me that someone might have developed a ‘Creative Commons License plugin’ for Wordpress, the blog engine used to manage this blog. In fact, I found two very easily, WpLicense and the Creative Commons Configurator. However, both of them apply the CC license in a system-wide manner, rather than to each individual post. This is an improvement over my current practice, as the license will show up in the blog’s public RSS feeds for example, but it’s not really what I have in mind. I’m pretty sure I could insert license statements in the necessary templates if it came to it, and maybe code up a plugin to allow me to select from a menu of licenses. However, it occurs to me that I don’t particularly want to use Wordpress as the ‘author’ tool (currently I use Ecto)

Whatever. I can’t help thinking that attaching a license to a blog feels a little like licensing the deployment of a content management system, rather than the content itself. Anyone care to comment?

Tags: , , ,

Repository architecture #83

Monday, July 7th, 2008

At a JISC workshop last Thursday I was invited to present some ideas around an architecture to support and exploit repositories in the UK. I gave the presentation the title Repository Architecture #83 . ;-)

My intention was to suggest some starting principles and then explore how they held up in the face of real-world issues. Here is the slide where I outlined these principles:

presentation.004.png

I also asked the question: “do we actually need a new architecture?” - suggesting that there is already a ubiquitous & successful architecture supporting much/most/(all?) of the functionality we want from repositories. Taking a resource oriented approach also seems to offer all kinds of advantages. Applying this approach is certainly not a new idea - others have been here before. However, I suggest that the resource oriented approach and the service oriented approach can be most effective when used to complement each other. I think that there is still be place for the institutional repository as the collection of systems which surround what I call the source repository. I define the ’source repository’ as an (ideally) quite simple system which contains:

  • the resources themselves, individually addressed with HTTP URIs
  • simple, item-level metadata records
  • site-map(s) to aid remote search engines
  • public, HTTP interfaces
  • feeds to notify remote agents of the deposit of new resources in the repository (RSS and/or Atom)

An ‘institutional’ or ’subject’ or ‘learning object’ repository contains one or more source repositories plus any systems needed to manage it in its particular context. These larger repositories might be very complex: the important point is that the logical component I call the source repository should be as simple as possible in it’s public facing interface: basically a bunch of resources, with an address space. So, a resource is given a Cool URI , and a (probably) simple metadata record is made available, also as a resource with a URI. I suggested that an ORE resource map could be used to relate metadata record to resource - from the point of view of the web or ORE, a metadata record is a resource just like, for example, a PDF of a scholarly paper. Elsewhere more, richer metadata might be created through mechanisms ranging from automatic metadata creation, to further human effort which might be in the nature of traditional cataloguing by trained and motivated individuals, or ‘crowd-sourced’ tagging by untrained but still motivated people.
Complexity is introduced, where necessary, in services developed to manage and exploit resources held in source repositories. Crucially, such activity does not happen unless there is a clear incentive for it, and then it happens close to the point of incentive. As an example, if a particular domain has a strong need to classify papers then someone might go to the trouble of harvesting, aggregating and text-mining the text of these papers with a view to extracting terms to use for classification. Or something similar might be achieved through the application of a team of professional cataloguers using an agreed vocabulary. However it is done, the new metadata thus created could be made available as a web resource where it could be used and combined with other resources as required.
I was asked to illustrate this with a few diagrams which provoked a fair amount of discussion.

deposit.png discovery.png

The point was made, strongly, that it is subject repositories which have the content, rather than institutional repositories. Regardless of whether this is, or will continue to be true, I think the architectural principles hold up. The business drivers are, I guess, quite different!

I learned a lot from the workshop and had some of these ideas challenged quite robustly. I think they held up but the clarity of presentation could be improved - this is what I will be working on now.

Tags: , , , ,

The opportunistic developer is allergic to soap

Monday, June 9th, 2008

For some time now I’ve been thinking about what I think of as the ascendency of the opportunistic developer in web application development. The phrase has unfortunate connotations for those who remember the ‘personas’ meme from some years ago when it was revealed that Microsoft had characterised three type of developer for three of its software development products. [1] and [2]. This post is not directly related to these archetypes (the opportunistic developer was called ‘Mort’ in the meme, a name which has become derogatory). Rather, I’m talking abut the developer who, regardless of their ability or their occupation wants to make quick use of something when they discover it, typically on the web.

The opportunistic developer prefers to use someone else’s service/component in the majority of cases. They will create their own software when necessary, and will choose to do so under certain circumstances, but they will accommodate a certain amount of compromise if it means they can get away with using something off-the-shelf. The opportunistic developer is still a developer, as opposed to a power user: they will still write code, just as little as they can get away with.

The proliferation of freely available web-services with simple APIs has created a happy-hunting-ground for the opportunistic developer - a few years ago they were inhibited by a lack of choice of available services to use. In addition to the usual concerns - stability, provenance, price… ease of use is becoming a more important differentiator.

In the JISC Information Environment, the norm has been to develop SOAP interfaces to services, almost by default. There are, no doubt, reasons why this has made sense in the past. However, if there is one thing which became abundantly clear at last week’s IE Demonstrator/CRIG event, it is that institutional repository developers do not want to have to use SOAP interfaces. Aside from the hard-core which is interested in pushing REST as the approach to use in repository-service interactions, the consensus was that the use of SOAP for public service interfaces, rather than being an enabling mechanism, is actually a barrier to adoption.

Whether RESTful or not, services are going to have to start having very good reasons for not offering very simple APIs over HTTP, if they are to attract the opportunistic developer.

Tags: , , , ,

Personal profile portability

Sunday, May 18th, 2008

I haven’t minted a TLA for ages - I think I might be the the first to come up with PPP for Personal Profile Portability as a convenient handle to wrap around the current flavour of ‘data portability’ being touted by the major ‘walled-garden’ social network sites.

Both MySpace and Facebook have recently launched initiatives to open up a little….but not too much.

MySpace has announced its Data Availability project with some major partner applications. Essentially, this will encourage the user to manage ‘profile’ information on MySpace, with a view to surfacing this information in other, partner applications (initially Yahoo, eBay, Photobucket and Twitter. It will also allow users to share some data such as photos which they have added to the MySpace site. Facebook has a similar initiative called Facebook Connect, initially in partnership with Digg. In both cases, a set of usage policies will be imposed such that the user retains control over what is shared, with the power to revoke the sharing agreement. I’m really encouraged to note that in the case of MySpace’s Data Availability, the mechanism adopted to solve the inter-authentication/authorisation issues between these systems is an implementation of OAuth.

Amit Kapur (MySpace’s Chief Operating Officer) says that Data Availability is:

“…founded first and foremost on allowing users to have comprehensive control over their content and data.”

Dave Morin of Facebook believes that:

“…the next evolution of data portability is [...] about giving users the ability to take their identity and friends with them around the Web, while being able to trust that their information is always up to date and always protected by their privacy settings.”

The extent to which users ‘have control’ over their content and data even while it has been completely locked up within the MySpace and Facebook applications has been argued about extensively. The relationships between these sites, their users, and their users’ data have evolved over the last year or two, as users have become a little more savvy. Pressure from groups such as DataPortability appears to have had an effect, with MySpace also signing up to this recently.

So, it seems as though the walled gardens are opening up, getting ready to participate in the wider web. Or are they?

In a web of distributed social networks, the most likely way in which users might manage their participation would seem (right now) to be through a single entry point. Essentially, if the web of social networks is going to allow ’single-sign-on for the user, and allow a re-use of profile information, and even content across multiple applications, then one model is to give the user a ‘gateway’ service, where they sign-on and manage their ‘account’. Both Facebook and MySpace are going to battle hard to be that gateway service for the masses. Both have accepted that they can no longer remain as a completely walled garden - they must open up, just a little, to avoid being eventually marginalised. But now that they are not totally closed, they may find it difficult to retain control. They may find others are waiting to seize the initiative. Enter Google, and its Friend Connect service.

Friend Connect is different to the previous initiatives from Facebook and MySpace. Google’s new offering is designed to provide a ‘middleware’ services, sitting between the big social networks, and sundry web applications which might want to exploit the new openings in these services. It also utilises components which have been developed with the OpenSocial API. Friend Connect is, I think, a very significant development, because it shows how more distributed social networks might work. It is significant also in a particular detail - notice how Friend Connect can become a social network of sorts simply by integrating existing social networks. Suddenly, the huge headstart enjoyed by Facebook and MySpace doesn’t look so unassailable. This is, presumably, the real reason why Facebook have taken steps to block Friend Connect.

I suggest that because they have been walled gardens for so long, neither Facebook nor MySpace really know how to succeed as middleware. They have always been the destination - never really a component in someone’s workflow. By contrast, Google has always offered services which the user employs en route to a different destination. Google understands this kind of arrangement fundamentally. Expect to see increasingly desperate measures from MySpace and Facebook to retain control while Google quietly grows its Friend Connect service.

Tags: , , , , , , ,

Making digitised content available for searching and harvesting(2)

Monday, April 28th, 2008

Back in February I was asked to give a talk to the JISC Digitisation Programme meeting. I blogged about this shortly beforehand asking for comments and suggestions. The response was fantastic - I received a bunch of great suggestions and incorporated many of them into the presentation. Everyone who commented got a public ‘thankyou’ at the event, and I included all names in the slides I used.

I have finally gotten around to making the slides available (someone who was at the meeting has asked for them so they made some sort of impression with someone!).

Thanks again.

Tags: , ,

Google gives up on supporting OAI-PMH for Sitemaps

Wednesday, April 23rd, 2008

For some time now I have occasionally advised people involved in repository administration that they should consider registering the Base URL of their OAI-PMH interface (if they have one) with Google as a proxy for a Sitemap. Until recently, Google has supported the use of OAI-PMH Base URLs in its Webmaster Tools which site owners can use to create and register sitemaps in order to give hints about the structure of the website to Google’s web-crawler.

A while ago, I noticed that there was no longer any reference to this particular support in any of the documentation and began to suspect that this was being deprecated. Today, Google announced via their official blog that:

…we’ve found that the information we gain from our support of OAI-PMH is disproportional to the amount of resources required to support it. Fewer than 200 sites are using OAI-PMH for Google Sitemaps at the moment.

In order to move forward with even better coverage of your websites, we have decided to support only the standard XML Sitemap format by May 2008. We are in the process of notifying sites using OAI-PMH to alert them of the change.

Fewer than 200 sites…..

There are a few ways of looking at this. Perhaps ‘open access’ repositories are less concerned with Google rankings than the typical website owner. Perhaps the penetration of OAI-PMH in the world is still below any level that Google could find particularly interesting - certainly they never went to great lengths to advertise this support while it lasted. Clearly, Google have come to the end of a ‘trial period’ for their support for this protocol in their main indexing service.

Can we conclude anything from this? Probably not - surely OAI-PMH can thrive without Google Sitemap support? It certainly plays a fairly significant part in my professional life at present! Or should we view this as a symptom of decline….?

The official Google announcement is here.

Tags: , ,

Destination, or workflow component?

Saturday, April 19th, 2008

In a recent post, Facebook Or Twitter - Or Facebook And Twitter , Brian Kelly says:

…in some circle such use of Facebook is being derided with comments such as “It’s a closed garden“, “Its popularity is on the wane” or “Twitter is a better development environment” being made. I have to say that I foind that such comments tend to miss the point.“.

Brian tackles the “popularity on the wane” comment with some web statistics, but leaves the “closed garden” and “better development environment” arguments. I’m not at all sure what the argument is about development environments, but I am very interested in the walled garden aspect - I wrote about this in July last year, and I have seen nothing since to change my mind. I’m not sure I’m deriding Facebook, but I do maintain that it is a walled garden. I still keep an account in Facebook out of interest but I rarely access it.

I attended a session on digital libraries earlier this week at the JISC conference, at which Lorcan Dempsey spoke about how where once the user built their workflow around the library, now the library must build services which fit into the user’s workflow. Facebook, it seems to me, is a destination. I go there sometimes, almost always because someone has uploaded some photos of an event I have attended. I go there for occasional amusement. According to the figures, Facebook is very successful at being a destination. But is it embedded in anyone’s workflow I wonder? Twitter is very much part of my workflow - it is the single most used application on my iPhone.

Twitter is an eminently ‘composable‘ service by design, while Facebook is an attractive (for many) destination. Twitter participates in any number of mashups, and has, given rise to an extraordinary range of user-interfaces. It fits into people’s workflows because they can choose how to access it. I use a combination of the mobile web interface and SMS: others use these and a variety of rich desktop interfaces.

So I think my response is still: use Twitter and Facebook, or both, or neither. But I believe that Twitter is more interesting, really because it’s composable nature will allow it to fit all kinds of workflows.

Your mileage may vary :-)

Tags: , ,

Get off of my cloud

Saturday, February 9th, 2008

Yesterday I left a comment on Brian Kelly’s post, Is That A Pistol In Your Pocket?, where I explained how the iPhone had changed my mind about preferring to carry several dedicated devices which inter-operate, as opposed to carrying one integrated device. At one time I was determined to pursue the former approach, making connections with Bluetooth and, later, WI-FI. Essentially, I expected to create a responsive peer-to-peer network of devices, what has been termed a Personal Area Network.

I’ve given up, probably temporarily, on this approach - the sheer ease-of-use of the iPhone trumps my other concerns at this stage in my career/life/biorhythms. But as we approach a world of ubiquitous, networked computing, it seems to me that a new model is emerging. Where once the personal network of peer-to-peer devices seemed an obvious approach, now we might observe that this can be unnecessary: each of our devices is going to be, if it isn’t already, capable of communicating with the global ‘interweb’ at usable speeds.

To give a concrete example: I once aspired to use my PDA (with it’s larger screen) to act as the pocket display device for photographs I had taken with my mobile phone. Both devices had a Bluetooth interface, so this was the channel to use. I did get this working, but it was never a convenient operation and I eventually stopped bothering.

With today’s equivalent devices, I might do something different: use the mobile phone’s internet connection to post photographs to flickr for instance, and, on my PDA, directly download the ones I want to display there. Of course with my iPhone, I can go a little step further - I have sufficiently robust access to the web to be able to be able to leave some resources on the web and just view them from there when I want to.

Now, there are plenty of use-cases where one might want one’s devices to inter-operate, and where the web might not provide an easier solution than a short-range, peer-to-peer approach. But some common requirements, particularly around the using and sharing of resources (photos, video, bookmark lists, contacts databases etc.) are ideally served by the web.

So, it seems that it is the area in personal area networks which is diminished in importance: the networking remains, but the very local area has been supplanted by the cloud in some respects.

Tags: , ,

What do IM and social networks have in common?

Monday, September 24th, 2007

I haven’t used a dedicated instant messaging (IM) client for many months. I do occasionally use text-chat facilities when they are built into other tools - notably Skype at the moment. Last week however, a colleague sent me their contact details on four of the available IM networks:

  • AOL/AIM
  • Yahoo
  • MSN
  • Google

Because I cannot control what my ID or ’screen-name’ will be on each of these, I am forced to use different IDs for some. I would love to be able to use my OpenID for all these, but none of the above networks offers an OpenID consuming interface. If I were to rely on IM more than I do, then I would want to establish my ‘presence’ on each of the networks in which I have contacts or ‘buddies’. Using an aggregation client (like the excellent Adium for the Mac, or if you prefer a web-solution, Meebo) makes this just about manageable. My presence can be maintained on all four of these networks while running a single client. But the networks are not joined - buddies on one network cannot talk directly to buddies on another. They are also not interoperable (although Google do at least show willing by supporting the Jabber protocol).

I’m quickly remembering why it was that I gradually gave up on IM in the first place….

So, now, as well as buddies, I have friends, thanks to social networking systems like Facebook and Twitter. At one time I was maintaining three IM networks, with many actual contacts spread across them, often with several identities each. Now I’m doing the same for several unconnected, and mostly closed, social network systems. One popular aspect of such new systems is their support for an extended ’status’. Where IM allows the user to indicate if they are online, ‘busy’ etc, Facebook and Twitter (among others) encourage the user to give a little more detail.

Attempts have been made to build aggregation clients such as MoodBlast which allow the user to update their status across several social-networking systems. The developers behind MoodBlast have removed support for updating Facebook however, claiming that this is motivated by a threat of legal action from Facebook.

Now, it’s certainly not unusual to maintain more than one, unconnected circle of contacts. Many people prefer to keep their professional and their social networks separate. But, and this is the important point, I really don’t want my social networks to be constrained by particular software choices. As I can connect resources across the web in a uniform way to form a network of resources, I want to be able to connect people to form my social network. Perhaps OpenID or something similar could provide the solution.
Update: Michael C. Harris says that Facebook have restored the ability for third-party apps to update a user’s status - see his comment below for a link to some details about this - thanks Michael.

Technorati Tags: , , ,

  • Recent Comments

  • Recent Posts

  • Syndication

  • License