From IEEE Internet Computing
by Tom Heath • Talis Information
The Semantic Web is a global information space of linked data, designed for machine consumption rather than human use. Right? Well, yes and no. It’s true to say that machine-readable data, given explicit semantics and published online, coupled with the ability to link data in distributed data sets, are the Semantic Web’s key selling points. Together, these features allow aggregation and integration of heterogeneous data on an unprecedented scale, and machines will do the grunt work for us.
However, without a human somewhere in this process to reap the rewards of these new capabilities, the endeavour is meaningless. Far from removing humans from the equation, a Web of machine-readable data (the Semantic Web; we also call it the “Web of data”) creates significant challenges and opportunities for human-computer interaction.
To date, the Semantic Web community has mostly been busy developing the technical infrastructure to make the Web of data feasible in principle and publishing linked data sets to make it a reality. If we’re to fully exploit the challenges and opportunities of a Web of data, we need to move beyond the initial phase and work to understand how this changes the Web’s user interaction paradigm.
In this column, I’ll discuss some ways in which our interaction with the Web of data might differ from how we interact with the established Web of documents, and what this might mean for both users and producers of Web content.
Semantic Web: from vision to reality
In 1999, Jakob Nielsen wrote about a looming crisis — the Web was growing at a phenomenal rate, and without closer attention to user interface principles, he predicted it would become an unusable mass of documents.1 Almost 10 years have passed since then, and the Web is undergoing another seismic shift. The result of this is the emergence of the Web of data, or Semantic Web; envisioned for more than a decade, and the result of many years’ work on the underlying technology. Although we might refer to them as distinct concepts, the Web of data isn’t a separate entity distinct from the existing Web of hypertext documents, but more akin to another layer of cloth interwoven with the Web as we know it.
Now, though, the headline statistics for the Web’s growth aren’t quoted in terms of Web pages or Web sites. Instead, people talk about numbers of triples published in the Web of data using the Resource Description Framework (RDF), and the number of links these triples create between distributed data sets.
RDF is a W3C specification for making statements about things in machine-readable form. These statements each consist of a subject, predicate, and object, hence the name triples. In most cases, the subject of a triple is a uniform resource identifier (URI) that can identify anything the data publisher chooses, be that a person, a place, a document in the Web, an abstract concept — in short, anything. Predicates specify the nature of the relationship between the subject and object, are drawn from vocabularies published in the Web, and are identified by URIs. An RDF triple’s object is usually a string literal or another URI. When the object is a URI from a different namespace — that is, it identifies something in an external data set — the RDF triple creates a link between those data sets, replacing isolated data islands with a giant, distributed data set built on top of the Web architecture — a true Web of data.
When members of the grassroots Linking Open Data project last tried to calculate the current size of the Web of data, their conservative estimates suggested that data sets in the Web contained more than 2 billion RDF triples, 3 million of which were links across data sets.2 The rate of growth in this Web is so great that any future estimates are likely to be out of date as soon as they’re published.
One other RDF feature worth noting — it enables easy integration of triples contained in any number of documents distributed across the Web. Source documents can be merged painlessly, without the graph that results from this merge needing to conform to a particular schema. One consequence of this is a major reduction in the headaches associated with integrating heterogeneous data.
Throw out your homepage!
In the Web of documents, individuals and organizations often take great care to develop visually attractive homepages that send out just the right message to their target audience. If RDF enables data from multiple sources to be easily integrated to form a coherent view of a particular thing, what does this mean for how we publish data on the Web? It means that the Web page as we know it is dead.
Developers of Web 2.0 mashups have been demonstrating this for some time, integrating data from a handful of different sources to present a novel view that none of the source data sets alone can provide. The Web of data is the logical extension, letting developers create links between data sources that are themselves exposed on the Web for others to reuse to build large-scale, ad hoc mashups, while simultaneously reducing the challenges of integrating heterogeneous data.
Documents will always be useful data containers, but in many cases, I predict they will become nothing more than that. In the Semantic Web, you can’t assume you have control over how the information you publish will be presented — it’s just data. Thinking at the visual design level, RDF represents an extension of the long-established principle of separating content from presentation. For some content creators, this might ring alarm bells — how will brand be maintained if you have less control over presentation? For others, it will represent an opportunity to free themselves from visual design concerns, concentrate first on publishing relevant, high-quality data, and let others build the views they want rather than those that someone else assumes they need.
At the data level, publishers can have some influence over which external sources their data links to, primarily by creating these links themselves and publishing them for others to consume. However, in the Web of data, no one can control with any degree of certainty the sources with which their data is integrated — enabling serendipitous reuse is exactly the point! As already discussed, data published in the Web in a reusable form enables new views that have value beyond the sum of the parts and that the original creators might not have anticipated in advance.
It’s for these reasons that I suggest we throw away our homepages. Researchers well know the challenge of connecting all the pieces of their professional activities into a coherent whole: the projects, the papers, the committee and editorial board memberships, the blog entries and photo albums, all scattered across isolated islands on the Web, maybe replicated on their personal Web site or connected by strands of hypertext; or maybe not, owing to the effort involved.
A homepage for the Web of data takes a different shape. At its most basic level, it could simply be a collection of RDF triples that tie together data we want to express that’s distributed across numerous locations. The machines’ job is then to assemble this data into a coherent view, ready for human consumption.
To put my money where my mouth is, next time I get business cards printed, I won’t be including my homepage address. Instead, I’ll put my URI on the card, safe in the knowledge that a human being with a browser, Semantic or otherwise, can look up that URI and find some of what the Web has to say about me.
What should a Semantic Web browser look like?
Extending these ideas, we can see that the document in which a particular RDF graph is published becomes primarily an indicator of provenance, rather than representing the definitive packaging of a certain slice of data or content.
Of far greater relevance than the documents themselves are the things described in those documents — the people, places, and concepts. So far, I’ve talked about a Web of data, but I’m really using that term as shorthand for “Web of data about things” — any things. We might not be able to retrieve a car over HTTP, but we can identify it with an HTTP URI and use the Web to retrieve its description in RDF.
It’s at the level of “things” that browsers for the Web of data should operate. Providing simple browsers for RDF triples, and the documents in which they’re published, is one option for enabling people to interact with this information space. We saw this trend with some of the earliest Semantic Web browsers, but it rather misses the point. The one-page-at-a-time style of browsing, which we know well from the Web of documents, would make nothing of the potential we now have for integrated views of data assembled from numerous locations.
So, Semantic Web browsers must not simply echo the underlying representation of the data.3 Instead, they must treat “things,” in the broadest sense, as first-class citizens of the interface. A particular thing of interest should take center stage, with the browser assembling relevant information seamlessly behind the scenes.
We’re seeing shades of this trend in Semantic Web browsers such as The Tabulator4 and DBpedia Mobile,5 in which the thing of interest is of greater importance, and specific documents simply supply fragments of data that together make up a broader picture. Despite these moves in the right direction, we still have some way to go.
Conventional browsers have largely failed to deliver on the original vision of the Web as a read/write medium. Although this vision is slowly being realized at a general level through, for example, blogs, wikis, and specialized annotation interfaces such as Flickr, there remains a significant degree of indirection when it comes to editing Web documents. In some cases, this process still involves starting an editor for HTML documents, making appropriate changes, and then starting another application (such as an FTP client) to publish the updated document.
Browsers for the Semantic Web, which I suggest we call “thing browsers,” have an opportunity to enable a far greater degree of direct manipulation in their interfaces. Different types of objects afford different types of actions, and knowing the type of object on which the user is focused should let browsers provide a menu of actions specialized for this object type, and perhaps even adapt these according to the context.
For example, if the user is currently browsing a person, the browser could let the user send a message to that person, share an object with them, or arrange a meeting, without any of these functions having been explicitly listed as actions that can be invoked on these individuals. Instead, the Semantic Web at large can provide the necessary knowledge and services on which to offer such functionality, such as statements describing “arrange meeting with” as a valid action for a thing of type “person,” or definitions of what constitutes a meeting, or venue suggestions that are tailored to the relationship between the two parties and the time of day.
Clearly, a Web of data can’t offer direct manipulation of real-world things, such as cars and dogs, which are not, and never will be, online. However, in a Web where we can explicitly reference anything, not just documents, there’s great potential to reduce the degree of indirection in Web interfaces. We no longer have to refer to Web pages about things but can refer to the things themselves.
In case there was any doubt, this is no overnight endeavor but a trend that will take years to be realized and could take many different forms. Giving a keynote talk at the 2007 World Wide Web conference, Bill Buxton of Microsoft Research made the claim that “The diversity of ‘Web browsers’ tomorrow will match the diversity of ‘ink browsers’ (a.k.a. paper) today — in terms of diversity of form, function, location, and importance.” I don’t get the impression that Buxton was thinking about the Web of data when making this statement, but the claim stands up nonetheless — a true Web of things will require similar diversity in the interfaces through which we exploit it. The browser is just one approach.
A back button for the Semantic Web?
Accepting the shift from document to thing, and from predefined views to those assembled dynamically, won’t just require completely new interfaces but also several changes to the interaction widgets in interfaces with which we’re already familiar. If browsing becomes not just about moving from one document to another by following links, but about integrated views of data assembled from various sources, then the notion of the “back” button takes on a slightly different meaning in the interface. Rather than moving between documents, the back button in a Semantic Web browser should move the user to previously viewed things. More significant, a form of “undo” button, as you might find in a word processor, could be of critical importance in an environment in which vast amounts of data can be assembled at minimal cost, but not all of it will be pertinent to the job in hand.
The range of potential sources from which data will be available about a certain thing will be immense. Imagine entering a URI for “London” into your Semantic Web browser’s address bar. All the data available on the Web about London can’t feasibly be presented in one interface; users will need to decide which sources to add in depending on their current task or context, or will need the browser to make this decision intelligently for them, with the ability to undo the addition of any particular sources. This functionality becomes even more critical if automated reasoning is carried out on Semantic Web data, creating knowledge that wasn’t previously explicit in any of the individual data sources.
How to manage the assembly of these data sources becomes a critical issue. When several colleagues and I evaluated the deployment of various Semantic Web technologies to delegates at the 2006 European Semantic Web Conference, one of the key themes to emerge was “coherence.”6 Delegates had been presented with various Semantic Web applications for use at the conference; they expected data to be integrated across these and presented as a coherent whole. For various reasons described elsewhere,6 this wasn’t possible, leading to a suboptimal user experience and confusion for delegates.
Key to developing Web of data browsers will be look-up services such as Sindice,7 which provide a means to find other RDF documents on the Semantic Web that mention a particular thing. This kind of service might help ensure that the user experience is coherent — that is, that it includes all data the user expects it to. However, ensuring that a particular view of data is useful is another issue.
Any system aiming to integrate heterogeneous data on an ad hoc basis and present this to users will need to adopt sophisticated models of relevance, quality, and trust that are sensitive to the user’s current task and its context. How that might be achieved is a question for another day.
References
- J. Nielsen, “User Interface Directions for the Web,” Comm. ACM, vol. 42, no. 1, 1999, pp. 65–72.
- C. Bizer et al., “Linked Data on the Web (LDOW 08),” Proc. 17th Int’l World Wide Web Conference (WWW 08), ACM Press, 2008, pp. 1265–1266.
- D. Karger and M.C. Schraefel, “The Pathetic Fallacy of RDF,” Proc. 3rd Int’l Semantic Web User Interaction Workshop (SWUI 06), 5thInt’l Semantic Web Conf., 2006.
- T. Berners-Lee et al., “Tabulator: Exploring and Analyzing Linked Data on the Semantic Web,” Proc. 3rd Int’l Semantic Web User Interaction Workshop (SWUI 06), 5th International Semantic Web Conf., 2006. (pdf)
- C. Becker and C. Bizer, “DBpedia Mobile: A Location-Enabled Linked Data Browser,” Proc. Workshop on Linked Data on the Web (LDOW 08), Central Europe (CEUR) Workshop Proc., 2008. (pdf)
- T. Heath, J. Domingue, and P. Shabajee, “User Interaction and Uptake Challenges to Successfully Deploying Semantic Web Technologies,” Proc. 3rd Int’l Semantic Web User Interaction Workshop (SWUI 06), 5thInt’l Semantic Web Conf., 2006. (pdf)
- G. Tummarello, E. Oren, and R. Delbru, “Sindice.com: Weaving the Open Linked Data,” Proc. 6th Int’l Semantic Web Conf. and 2nd Asian Semantic Web Conf. (ISWC+ASWC 07), Springer, 2007, pp. 552–565.
Tom Heath is a researcher in the Platform Divison at Talis. His research interests include recommendation, trust, and social networks in a linked data/Semantic Web context. Heath is a PhD candidate in computer science at The Open University’s Knowledge Media Institute and has a first degree in psychology. He is a member of the IEEE and the ACM. Contact him at tom.heath@talis.com.
Tags: Development by Fred
No Comments »