A skim-read introduction to linked data
Introductions
Software Engineer
Information Architect
Some egg sucking
The internet
![...provides a means to connect machines](/staticarchive/96e8126774edae98e4a074de9e98c0dfe4c8a9c7.png)
The internet ≠ the web
![World Wide Web](/staticarchive/7953ba1c93ddffe8e7a1b0c0c3c69b28afbc32c5.png)
The web
![...provides a means to connect documents](/staticarchive/533355da9de4faf21572098cf6aafa5a97795bce.png)
The web = the internet
The web = the internet + links
The web = the internet + links + documents
...or....
The web = the internet + http + html
Web standards
https://en.wikipedia.org/wiki/Web_standards#Common_usage
When a web site or web page is described as complying with web standards, it usually means that the site or page has valid or nearly valid HTML, CSS and JavaScript. The HTML should also meet accessibility and semantic guidelines.
We tend to obsess on the documents:
- "Semantic" HTML
- HTML, HTML2, HTML3, HTML4, XHTML, HTML5...
- DOM
- ?Flash?
- ufs
- CSS
- document accessibility
- validation
...at the expense of the links:
The trouble is...
...HTML has always been...
Everything that's good about the web comes from links
- wikis
- blog permalinks
- citations
- sharing
- social bookmarking
- twitter
- socialness in general...
If you can point at something you can talk about it and share it
The web = the internet + http + html
<aside>
On SEO (sorry)
- in the days of Alta Vista and Yahoo! mk 1 search was all about documents
- metatags...
- ...and keyword density
- Google changed the game with PageRank by making search...
- ...less about keyword density...
- ...and more about link density
</aside>
Magazines are made of pages....
...websites are made of links
One problem with the web
- people aren't really interested in documents
- they're interested in things
We need to get from this...
![a web of documents](/staticarchive/533355da9de4faf21572098cf6aafa5a97795bce.png)
..to this
![a web of things](/staticarchive/392887acaaad47e534390f2c7d3910f2bf380a2e.png)
The other problem with the web...
...people can parse documents and extract meaning...
![meaning](/staticarchive/86730109075aab200f7a66d169965ef174be7578.png)
...but machines can't
![no meaning](/staticarchive/9bcaf26fc4375ddf9dfff31b3995906a8ee0cf10.png)
We need to help machines to understand the web...
...so machines can help us to understand things
The semantic web
Mk 1
- an attempt to make documents that machines could understand
- using RDF
- before we go any further...
<aside>
RDF
- is a way to model data
- is not a serialisation
- can be serialised as XML
- or inside HTML (as RDFa)
- or N3
- or Turtle
- or even json
</aside>
The RDF data model
RDF
- is based on triples
- subject, predicate, object
- <The sky> <has the colour> <blue>
More examples
- <Yves> <was born in> <France>
- <France> <is part of> <the EU>
- <Michael> <was born in> <the UK>
- <The UK> <is part of> <the EU>
- <Yves> <likes> <open data>
- <Michael> <likes> <open data>
So what happened
- lots of people made interlinked foaf files
- but there was a shortage of vocabularies to describe other things
- so the foaf files only really linked to other foaf files
- it all became a little document-y
- and less link-y
Semweb mk 1 = the internet + http + rdf
<yet_another_aside>
On REST
- described by Roy Fielding
- about the proper use of HTTP (post, get, put, delete)
- separated the resource (abstract) from...
- the representation (concrete-ish)
If I ask for a document about Yves...
- ...my user agent (browser) sends additional information...
- ...about what I accept in the form of accept headers
- I may accept French but prefer English
- I may accept JSON but prefer RDF
- I may accept full fat xhtml but prefer xhtml-mp
The key is
- a URI identifies a resource
- not a representation
- I ask for a resource - the server returns the best possible representation
Content negotiation - what I want / what I accept
![I'd like this resource about Yves, I speak English but I can just about get by in French and I'd like it for my mobile, please](/staticarchive/eea2dc1075096d9ebb53e9160410c4e116843a7a.png)
Content negotiation - what I'm given
![Can't do you English but I've got French and can send as xhtml-mp. Here you go](/staticarchive/191aea16048a55cbe55c2e082156d2734b66318c.png)
Not always successful
- the resource may exist
- but with no representation that matches your accept headers
- 406 - Not acceptable
- My favourite HTTP code :-)
Content negotiation - what I want / what I accept
![I'd like this resource about Yves, I speak English but I can just about get by in French and I'd like it for my mobile, please](/staticarchive/eea2dc1075096d9ebb53e9160410c4e116843a7a.png)
Content negotiation - what I'm not given
![I've got that resource but can only do German. 406](/staticarchive/3193185c02adcc7e475008b619f5a0002fce7c7c.png)
An honorary mention for One Web
The Web is designed as a universal space. Its universality is its most important facet. I spend many hours giving talks just to emphasize this point. The success of the Web stems from its universality as do most of the architectural constraints.
#timbl
- So one URI per resource; many representations per resource
- Mobile demands a different representation...
- ...but not a different resource...
- ...and not a different URI
- so no mobile. or .mobi etc
If you can point at something you can talk about it and share it... universally
</yet_another_aside>
Back to linked data
Linked data
- is sort of a second pass at the semantic web
- still uses rdf (although that can be contentious)
- as the name implies puts the emphasis back on links
- is about things not documents
Linked data = the internet + http + rdf
Linked data = web standards
Design issues for linked data
- Use URIs as names for things
- Use HTTP URIs so that people can look up those names
- When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
- Include links to other URIs. so that they can discover more things.
Use URIs as names for things (my emphasis)
- separating resource from representation (REST) gets us so far
- but it's still about getting resources as information documents...
- ...and we still think primarily about information resources
The map is not the territory
- /programmes/b006m86d is an information resource about Eastenders
- you can ask for and get it as html or xhtml-mp or rdf...
- but it is not Eastenders
- is an information resource about Yves
- but it is not Yves
Non-information resources
- why do we need the distinction?
- is saying represents Yves good enough?
- Yves was not created when was created
- The person who created (Yves) did not create Yves
- I am not friends with
- The people who created /programmes/b006m86d did not create "Eastenders"
We want to be able to make different claims about the thing and the document about the thing
So we need URIs for non-information resources - stuff that you can't send down wires
What happens if someone asks for a non-information resource?
![I'd like Yves, and by the way I speak English but I can just about get by in French](/staticarchive/2b9e5d1d4dabb5b306fcb45ff298a834d2f6e8ac.png)
What happens if someone asks for a non-information resource?
![Yves will not fit down the wires but (303) I can give you some information about him in English](/staticarchive/48dfe66f5203566d141199f42189fc8b6cf6f601.png)
Comparing the microformats and linked data approaches
Given 2 websites...
![One owned by Michael, one owned by Yves](/staticarchive/c4ec71f4f4225dd663f064c7e144e044874831d7.png)
The microformats approach: XFN @rel
![Colleague, friend, met](/staticarchive/1ab360e1d786c75a80b1ce325243bf058de62810.png)
The linked data approach: 2 new non-information resources...
![one for Michael, one for Yves](/staticarchive/5e7584e12499a0e93cfd322cdc145826b6074da4.png)
...each with a homepage...
![via foaf:homepage](/staticarchive/1c31d0b3d321c508b4c0470ff2a7259abc6fe1b0.png)
...tying together the 2 people, not the 2 documents
![via foaf:knows](/staticarchive/3d97078d8fd9263ec0e70751889058dc73782002.png)
Designing URIs for non-information resources
Hashes
- you can give the non-information resource a hash URI
- /programmes/b006m86d is the URI for Eastenders
- if you attempt to dereference /programmes/b006m86d it can't be sent down the wires...
- ...but there's no need for a 303...
- .../programmes/b006m86d is content negotiated to either HTML or RDF depending on what you accept
- and your user agent looks for a #programme inside it
Slashes
- or you can give the non-information resource a completely different path
- is the URI for Holborn
- if you attempt to dereference it can't be sent down the wires...
- ...so it 303s (see other)...
- to - an information resource about Holborn...
- ...which is then content negotiated to either HTML or RDF depending on what you accept
URIs in pictures: slash + 303 + conneg
![slash uris](/staticarchive/f5ae5c1389adb72005569487b2ef02de75fcb3ce.png)
You need to be able to configure your server for 303s and content negotiation.
URIs in pictures: hash + conneg
![hash uris](/staticarchive/98b1cc6312d16085d2a0420169a3b5b5739c3a16.png)
Cheaper setup - no need to set up for 303s although you still need content negotiation. Fewer round trips to server.
URIs in pictures: RDFa
![RDFa uris](/staticarchive/9312e9ebd2147fbac30d3f5abc12fa7f9ef003b5.png)
Cheapest setup - no need to set up for 303s or content negotiation.
So, what's the point?
Different people know (or claim to know) different things about the same topic
- The 麻豆官网首页入口 knows when it's played a record by The Fall
- knows all the records The Fall have ever released
- knows The Fall are from Salford...
- knows New Order are from Salford...
- The 麻豆官网首页入口 knows when it's played a record by New Order
- etc
Linked data is a web-scale database
A special mention for owl:sameAs
owl:sameAs
- allows us to declare that non-information resources across the web are the same thing
- so the 麻豆官网首页入口's The Fall is the same as...
- ...the MusicBrainz The Fall is the same as...
- ...the DBpedia The Fall is the same as...
- ...the last.fm The Fall is the same as...
- ...etc
When sameAs goes wrong
- Language is ambiguous
- We often use the same labels to refer to different things in different contexts
An example stolen from Tom Heath
- I have in my house a bottle of Bowmore single malt whisky...
- ...which is one bottle of one vintage (Bowmore 18 year old single malt)...
- ...which is one vintage of one brand (Bowmore single malt)...
- ...which is distilled by the Bowmore Distillery
- The Distillery, the brand, the vintage and my bottle are not the same thing...
- ...but we use the same label
When we declare sameAs we need to be careful
- If I said my bottle of Bowmore whisky was the sameAs the Bowmore Distillery and someone else said the Bowmore Distillery was created in 1779...
- ...then by implication my bottle would be have been created in 1779...
- ...which would be quite a vintage
When using sameAs you need to decide
- is this the same thing in a different context
- or does the different context make it a different thing
- which can get tricky
Ceci n'est pas une pipe...
...and this is not Hamlet
![Photo of book cover of Hamlet](/staticarchive/fab927fd17dbf716264ba22026a6947b61efba17.jpg)
It is...
- a photograph of...
- ...one copy of...
- ...one edition of...
- ...one version of...
- ...Hamlet
<aside>
FRBR
- is a library science thing
- it stands for Functional Requirements for Bibliographic Records
- it's used to break down the book example into works, expressions, manifestations and items
FRBR can also be used to describe music
![Music ontology](/staticarchive/7a27d643ca8bbacdabd3a97bf8669a8704c68bb7.jpg)
</aside>
Once you've minted a URI for a non-information resource
- anybody else can make claims about that resource (including sameAs)
- the claims may be true... or not
- we still need trust mechanisms
- [and we should be particularly careful when minting URIs for people]
Linked data can describe anything
There are vocabularies available for
And if an ontology doesn't exist
What's been made so far
Linking open data
- A community driven project...
- ...to take data sets published under a liberal licence...
- ...and express them as linked data.
The LOD cloud - baby steps
![Linking open data cloud in 2007](/staticarchive/15dac13790867102fb9cc5c9f9907cdf400c37b9.png)
The LOD cloud - today
![Linking open data cloud in 2009](/staticarchive/182def81ebb7559fba4d64b6e7d77fd34a4dd36b.png)