A skim-read introduction to linked data

Introductions

Software Engineer

Information Architect

麻豆官网首页入口 Audio and Music

Some egg sucking

The internet

...provides a means to connect machines

The internet ≠ the web

World Wide Web

The web

...provides a means to connect documents

The web = the internet

The web = the internet + links

The web = the internet + links + documents

...or....

The web = the internet + http + html

Web standards

https://en.wikipedia.org/wiki/Web_standards#Common_usage

When a web site or web page is described as complying with web standards, it usually means that the site or page has valid or nearly valid HTML, CSS and JavaScript. The HTML should also meet accessibility and semantic guidelines.

We tend to obsess on the documents:

"Semantic" HTML
HTML, HTML2, HTML3, HTML4, XHTML, HTML5...
DOM
?Flash?
ufs
CSS
document accessibility
validation

...at the expense of the links:

HTTP
URIs

The trouble is...

...HTML has always been...

...a bit of a hack

Everything that's good about the web comes from links

wikis
blog permalinks
citations
sharing
social bookmarking
twitter
socialness in general...

If you can point at something you can talk about it and share it

The web = the internet + http + html

<aside>

On SEO (sorry)

in the days of Alta Vista and Yahoo! mk 1 search was all about documents
metatags...
...and keyword density
Google changed the game with PageRank by making search...
...less about keyword density...
...and more about link density

</aside>

Magazines are made of pages....

...websites are made of links

One problem with the web

people aren't really interested in documents
they're interested in things

We need to get from this...

a web of documents

..to this

a web of things

The other problem with the web...

...people can parse documents and extract meaning...

meaning

...but machines can't

no meaning

We need to help machines to understand the web...

...so machines can help us to understand things

The semantic web

Mk 1

an attempt to make documents that machines could understand
using RDF
before we go any further...

<aside>

RDF

is a way to model data
is not a serialisation
can be serialised as XML
or inside HTML (as RDFa)
or N3
or Turtle
or even json

</aside>

The RDF data model

RDF

is based on triples
subject, predicate, object
<The sky> <has the colour> <blue>

More examples

<Yves> <was born in> <France>
<France> <is part of> <the EU>
<Michael> <was born in> <the UK>
<The UK> <is part of> <the EU>
<Yves> <likes> <open data>
<Michael> <likes> <open data>

So what happened

lots of people made interlinked foaf files
but there was a shortage of vocabularies to describe other things
so the foaf files only really linked to other foaf files
it all became a little document-y
and less link-y

Semweb mk 1 = the internet + http + rdf

<yet_another_aside>

On REST

described by Roy Fielding
about the proper use of HTTP (post, get, put, delete)
separated the resource (abstract) from...
the representation (concrete-ish)

If I ask for a document about Yves...

...my user agent (browser) sends additional information...
...about what I accept in the form of accept headers
I may accept French but prefer English
I may accept JSON but prefer RDF
I may accept full fat xhtml but prefer xhtml-mp

The key is

a URI identifies a resource
not a representation
I ask for a resource - the server returns the best possible representation

Content negotiation - what I want / what I accept

I'd like this resource about Yves, I speak English but I can just about get by in French and I'd like it for my mobile, please

Content negotiation - what I'm given

Can't do you English but I've got French and can send as xhtml-mp. Here you go

Not always successful

the resource may exist
but with no representation that matches your accept headers
406 - Not acceptable
My favourite HTTP code :-)

Content negotiation - what I want / what I accept

I'd like this resource about Yves, I speak English but I can just about get by in French and I'd like it for my mobile, please

Content negotiation - what I'm not given

I've got that resource but can only do German. 406

An honorary mention for One Web

The Web is designed as a universal space. Its universality is its most important facet. I spend many hours giving talks just to emphasize this point. The success of the Web stems from its universality as do most of the architectural constraints. #timbl
So one URI per resource; many representations per resource
Mobile demands a different representation...
...but not a different resource...
...and not a different URI
so no mobile. or .mobi etc

If you can point at something you can talk about it and share it... universally

</yet_another_aside>

Back to linked data

Linked data

is sort of a second pass at the semantic web
still uses rdf (although that can be contentious)
as the name implies puts the emphasis back on links
is about things not documents

Linked data = the internet + http + rdf

Linked data = web standards

Design issues for linked data

Use URIs as names for things
Use HTTP URIs so that people can look up those names
When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
Include links to other URIs. so that they can discover more things.

Use URIs as names for things (my emphasis)

separating resource from representation (REST) gets us so far
but it's still about getting resources as information documents...
...and we still think primarily about information resources

The map is not the territory

/programmes/b006m86d is an information resource about Eastenders
you can ask for and get it as html or xhtml-mp or rdf...
but it is not Eastenders
is an information resource about Yves
but it is not Yves

Non-information resources

why do we need the distinction?
is saying represents Yves good enough?
Yves was not created when was created
The person who created (Yves) did not create Yves
I am not friends with
The people who created /programmes/b006m86d did not create "Eastenders"

We want to be able to make different claims about the thing and the document about the thing

So we need URIs for non-information resources - stuff that you can't send down wires

What happens if someone asks for a non-information resource?

I'd like Yves, and by the way I speak English but I can just about get by in French

What happens if someone asks for a non-information resource?

Yves will not fit down the wires but (303) I can give you some information about him in English

Comparing the microformats and linked data approaches

Given 2 websites...

One owned by Michael, one owned by Yves

The microformats approach: XFN @rel

Colleague, friend, met

The linked data approach: 2 new non-information resources...

one for Michael, one for Yves

...each with a homepage...

via foaf:homepage

...tying together the 2 people, not the 2 documents

via foaf:knows

Designing URIs for non-information resources

Hashes

you can give the non-information resource a hash URI
/programmes/b006m86d is the URI for Eastenders
if you attempt to dereference /programmes/b006m86d it can't be sent down the wires...
...but there's no need for a 303...
.../programmes/b006m86d is content negotiated to either HTML or RDF depending on what you accept
and your user agent looks for a #programme inside it

Slashes

or you can give the non-information resource a completely different path
is the URI for Holborn
if you attempt to dereference it can't be sent down the wires...
...so it 303s (see other)...
to - an information resource about Holborn...
...which is then content negotiated to either HTML or RDF depending on what you accept

URIs in pictures: slash + 303 + conneg

slash uris

You need to be able to configure your server for 303s and content negotiation.

URIs in pictures: hash + conneg

hash uris

Cheaper setup - no need to set up for 303s although you still need content negotiation. Fewer round trips to server.

URIs in pictures: RDFa

RDFa uris

Cheapest setup - no need to set up for 303s or content negotiation.

So, what's the point?

Different people know (or claim to know) different things about the same topic

The 麻豆官网首页入口 knows when it's played a record by The Fall
knows all the records The Fall have ever released
knows The Fall are from Salford...
knows New Order are from Salford...
The 麻豆官网首页入口 knows when it's played a record by New Order
etc

Linked data is a web-scale database

A special mention for owl:sameAs

owl:sameAs

allows us to declare that non-information resources across the web are the same thing
so the 麻豆官网首页入口's The Fall is the same as...
...the MusicBrainz The Fall is the same as...
...the DBpedia The Fall is the same as...
...the last.fm The Fall is the same as...
...etc

When sameAs goes wrong

Language is ambiguous
We often use the same labels to refer to different things in different contexts

An example stolen from Tom Heath

I have in my house a bottle of Bowmore single malt whisky...
...which is one bottle of one vintage (Bowmore 18 year old single malt)...
...which is one vintage of one brand (Bowmore single malt)...
...which is distilled by the Bowmore Distillery
The Distillery, the brand, the vintage and my bottle are not the same thing...
...but we use the same label

When we declare sameAs we need to be careful

If I said my bottle of Bowmore whisky was the sameAs the Bowmore Distillery and someone else said the Bowmore Distillery was created in 1779...
...then by implication my bottle would be have been created in 1779...
...which would be quite a vintage

When using sameAs you need to decide

is this the same thing in a different context
or does the different context make it a different thing
which can get tricky

Ceci n'est pas une pipe...

...and this is not Hamlet

Photo of book cover of Hamlet

It is...

a photograph of...
...one copy of...
...one edition of...
...one version of...
...Hamlet

<aside>

FRBR

is a library science thing
it stands for Functional Requirements for Bibliographic Records
it's used to break down the book example into works, expressions, manifestations and items

FRBR can also be used to describe music

Music ontology

</aside>

Once you've minted a URI for a non-information resource

anybody else can make claims about that resource (including sameAs)
the claims may be true... or not
we still need trust mechanisms
[and we should be particularly careful when minting URIs for people]

Linked data can describe anything

There are vocabularies available for

erm,
etc

And if an ontology doesn't exist

you can make one

What's been made so far

Linking open data

A community driven project...
...to take data sets published under a liberal licence...
...and express them as linked data.

The LOD cloud - baby steps

Linking open data cloud in 2007

The LOD cloud - today

Linking open data cloud in 2009

Fin!

Questions?