Together with Sheridan Brown, I have been tasked with developing some guidelines and a metadata ‘application’ profile for institutional repositories (IRs) in the UK. We are calling this work RIOXX. This post focusses on the application profile more than the guidelines, and describes phase 1 of the project, which aims to deploy this application profile across IRs in the UK by the first quarter of 2013. Objectives to develop an application profile which enables open access repositories to expose metadata more consistently and which, in particular, conveys information about how the item being described in the metadata was funded to develop general guidelines for repositories which support the use of the application profile to support such technical development as is necessary to implement these recommendations and the application profile in common repository platforms to develop these such that they pave the way for a likely CERIF-based solution in the medium-long term.
I have been asked to provide a position paper for next week's Future of Interoperability Standards meeting hosted by CETIS. This blog post is one I have been meaning to write for ages so I'm offering it as a position paper of sorts. UKOLN has been charged by JISC with the task of supporting the development of Dublin Core Application Profiles (DCAPs) in a number of areas. While I have not (so far) had much direct involvement in this work I have developed, over the last year or so, a real interest in the process of developing these.
During an interesting session called the 'Great Global Graph' at the CETIS conference this week I formed the opinion that, in the recent rush of enthusiasm for 'linked data', three 'memes' were being conflated. These next three bullets outline my understanding of how these terms have been used in recent discussions, including the CETIS session: Open data: I see this as something expressed as a philosophy or, in more concrete terms, as a policy, such as that espoused by the UK Government.
Over the years I've found the ' Semantic Web' to be an interesting though, at times, faintly worrying concept. It has never much impacted on my work directly, despite my having been embroiled in Web development since, well pretty much, Web development began. Of late I've tried to follow the earnest discussions about how the Semantic Web went all wrong because it was hijacked by the AI enthusiasts, and how it is going to be alright now because a more pragmatic paradigm has gained the upper-hand, that of Linked Data.
In his Science in the Open blog Cameron Neylon has written an interesting post, A Specialist OpenID Service to Provide Unique Researcher IDs? in which he asks: Good citation practice lies at the core of good science. The value of research data is not so much in the data itself but its context, its connection with other data and ideas. How then is it that we have no way of citing a person?
I just got around to reading the press release issued after the collapse of Europeana (previously the more easily pronounced 'European Digital Library') following its launch a couple of weeks ago. If you go to the site now, you are greeted with the following message: The Europeana site is temporarily not accessible due to overwhelming interest after its launch (10 million hits per hour). We are doing our utmost to reopen Europeana in a more robust version as soon as possible.
Just a quick pointer to the really encouraging announcement from the COPAC development blog that individual COPAC records are now addressed with a persistent, and RESTful(ish) URL. The example given is: ...the work "China tide : the revealing story of the Hong Kong exodus to Canada" has a Copac Record Number of 72008715609 and can be linked to with the url http://copac.ac.uk/crn/72008715609 The records are marked up as MODS XML - but this of secondary importance to me compared to the fact that the records are easily and reliably addressed.
I haven't minted a TLA for ages - I think I might be the the first to come up with PPP for Personal Profile Portability as a convenient handle to wrap around the current flavour of 'data portability' being touted by the major 'walled-garden' social network sites. Both MySpace and Facebook have recently launched initiatives to open up a little....but not too much. MySpace has announced its Data Availability project with some major partner applications.
Here's an interesting approach. Bernhard Haslhofer at Media Spaces has developed OAI2LOD Server, a system which harvests metadata with OAI-PMH, processes the records to create a triple store and exposes interfaces to this for linked-data clients, SPARQL clients and web-browsers. According to the web-page: The OAI2LOD Server exposes any OAI-PMH compliant metadata repository according to the Linked Data guidelines. This makes things and media objects accessible via HTTP URIs and query able via the SPARQL protocol.
Back in February I was asked to give a talk to the JISC Digitisation Programme meeting. I blogged about this shortly beforehand asking for comments and suggestions. The response was fantastic - I received a bunch of great suggestions and incorporated many of them into the presentation. Everyone who commented got a public 'thankyou' at the event, and I included all names in the slides I used. I have finally gotten around to making the slides available (someone who was at the meeting has asked for them so they made some sort of impression with someone!
For some time now I have occasionally advised people involved in repository administration that they should consider registering the Base URL of their OAI-PMH interface (if they have one) with Google as a proxy for a Sitemap. Until recently, Google has supported the use of OAI-PMH Base URLs in its Webmaster Tools which site owners can use to create and register sitemaps in order to give hints about the structure of the website to Google's web-crawler.
I was pleased to be invited by Brian Fuchs to a 'Million Books Workshop' at Imperial College, London last Friday. A fascinating day, in the company of what was, for me, an unusual group of 20-30 linguists, classical scholars and computer scientists. The morning session consisted of three presentations (following an introduction from Gregory Crane which I missed thanks to the increasingly awful transport system between London and the South West) which brought us up to speed with some advances in OCR, computer aided text analysis and translation, and classification.
I have been invited to give a short presentation to the JISC Digitisation Programme on Friday, giving an overview of different ways of exposing content and metadata. I'll be talking to projects which are concerned with Cultural Heritage content which is being surfaced in websites to support eLearning. Formats vary tremendously. This is the complete list: 18th century parliamentary papers 19th century pamphlets online A digital library of core e-resources on Ireland Archival sound recordings 2 British Cartoon Archive digitisation project British Governance in the 20th century: Cabinet papers, 1914-1975 British Library 19th century newspapers British Library archival sound recordings project British newspapers 1620-1900 Electronic ephemera: Digitised selections from the John Johnson collection First World War poetry digital archive Independent Radio News Archive digitisation InView: Moving images in the public sphere Medical journal backfiles Modern Welsh journals online NewsFilm online Online historical population reports Portsmouth University: Historic boundaries of Britain Pre-Raphaelite resource site Scott Polar Research Institute: Freeze Frame – Historic polar images The East London theatre archive UK theses digitisation project Aside from the obvious stuff like OAI-PMH, Google, RSS, what should I be talking about?
RLG Programs has conducted a survey of partner institutions which have “multiple metadata creation centers” to: ...gain a baseline understanding of current descriptive metadata practices and dependencies, the first project in our program to change metadata creation processes. Some intriguing statements in this summary post (I look forward to getting hold of the report when it's completed). For example: 76 listed the tools they used to create metadata.
An interesting post from Philipp Keller on Tag history and gartners hype cycles from back in May of this year which I missed first time around. Now part of me thinks it must be possible to plot just about anything on the Gartner Hype Cycle, but it can be a useful tool for provoking reflection and discussion.Note how Philipp indicates that we now find ourselves in the Trough of Disillusionment in 2007.
Good to see that 37signals have adopted (optional) OpenID support for Basecamp. I was already using Basecamp with the older, local user account credentials, but the system allowed me to swap to using my OpenID very easily. Good work, as ever, by 37signals. However, I can't help thinking that is only half the story. The 'ID' in OpenID is, at one level, about identity as in 'Identity Card'. But my OpenID is also an identifier, which is fundamental in the context of the web.
There follows a pretty self-indulgent exercise in mixing metaphors and stretching them beyond breaking point. I'm sure I'm not the first person who has speculated about similarities between Web 2.0 and Punk. I've mused over this with Liz and Brian at UKOLN for example. Brian and I have pushed the 'music-genre-as-analogy-for-what-we-do' a fair bit (Brian even blogged about being a Dedicated Follower of Fashion getting cited in the Wikipedia entry for this song title in the process!
An interesting post by Mike Neuenschwander on the Burton Group Identity Blog. I'm not certain I agree entirely with the main thrust of Mike's argument, which he offers as an axiom: There are no identifiers, only attributes That is to say, things are identified by their existence as a collection of attributes in a given context. Some of Mike's claims, such as "most people have [...] several dozen nicknames" seem a little exaggerated.
Well, I finally got around to sorting out my own OpenID ( paulwalk.net), following the excellent instructions provided by Simon Willison. As I find myself signing up for more and more remote services, nearly all of which ask me to create yet another user account, the potential value of a user-controlled, decentralised identity system becomes clearly apparent. Like many others, I have been interested by Yahoo Pipes, enough to create a Yahoo account for the purpose of trying it out.
Scott Golder and Bernardo A. Huberman of HP Labs have written a paper called The Structure of Collaborative Tagging Systems based on research done primarily with data from del.icio.us. They identify seven kinds of tags: Identifying What (or Who) it is About. Overwhelmingly, tags identify the topics of bookmarked items. These items include common nouns of many levels of specificity, as well as many proper nouns, in the case of content discussing people or organizations.