Â鶹¹ÙÍøÊ×Ò³Èë¿Ú

« Previous | Main | Next »

Zeitgeist - the most shared Â鶹¹ÙÍøÊ×Ò³Èë¿Ú links on Twitter

Post categories:

Theo Jones | 10:45 UK time, Wednesday, 14 July 2010

is a prototype to highlight the most shared Â鶹¹ÙÍøÊ×Ò³Èë¿Ú webpages on Twitter, a digest to link people to the hottest Â鶹¹ÙÍøÊ×Ò³Èë¿Ú pages. The project is part of a larger area of exploration to see how the Â鶹¹ÙÍøÊ×Ò³Èë¿Ú can use real-time trending data to enrich user experiences. One of our recent projects shows how the artists played on Â鶹¹ÙÍøÊ×Ò³Èë¿Ú radio are trending on other music services, such as and .

We developed Zeitgeist as a simple information source for users and to provide insight into users' interests and behaviours for our production teams. There are some interesting commercial alternatives available such as , , and , which are worth checking out but we had some specific requirements for our prototype.

The system combines a custom built ingest chain using to search for tweets containing a Â鶹¹ÙÍøÊ×Ò³Èë¿Ú URL. As it's running in real-time these links come and go depending on what Twitter users are talking about. You can see the 'liveness' in the view or take a broader view of the .

Zeitgeist uses the web page's URL and metadata to determine where it comes from and assign it a category, e.g , , or . These give links a context for the user and a means of navigating deeper.

The links are ranked by a tweet count (including retweets) for the chosen time period. Each entry details the page title, category, media type, short description and when it was first tweeted. The date of publication is indicated where available as it's not just new links that seem to get picked up on Twitter.

We have a different view for Â鶹¹ÙÍøÊ×Ò³Èë¿Ú employees (shown below), which allows us to see; the tweet history of each page, a full list of tweets, most retweeted messages, hashtags and keywords. We are unable to show this to everyone as the messages would need to be moderated.

Zeitgeist detail page screenshot

We use the Twitter streaming API to access the Gardenhose sample stream, which provides a subset of the full Twitter message stream, at a rate of about 100 messages per second and to track "Â鶹¹ÙÍøÊ×Ò³Èë¿Ú" as a keyword. These messages are then fed into a pipeline of processes written in connected by queues provided by , a fast and reliable messaging server.

These are the stages that each incoming tweets goes through:

  1. Twitter combines retweets with it's original tweet, these are split to deliver both messages to the pipeline
  2. A tweet from the API contains a lot of extraneous data which needs to be removed, such as the user's page background colour
  3. Links in the message are extracted and resolved following through redirections and expanding shortened links, provide a for this
  4. Only tweets containing links to Â鶹¹ÙÍøÊ×Ò³Èë¿Ú pages are kept. Automatically generated Â鶹¹ÙÍøÊ×Ò³Èë¿Ú tweets from accounts such as are filtered out and links to the are also removed as they skew the results
  5. These are saved to the database
  6. The link category is determined by its domain and in-page metadata
Zeitgeist ingest chain diagram

We split these steps into separate processes for two reasons: it's easier to develop and test a process if it does only one thing; and more importantly, it allows us to balance different parts of the system depending on load. For example, there is only one process required to strip data out of tweets, but ten to resolve the URL. By load balancing this way, we can maintain a steady throughput of messages that does not get overloaded at any point.

To make Zeitgeist, we have had to handle large data sets at high speed. As a rough guide, the Zeitgeist ingest chain handles about 300,000 tweets an hour, of that 900 contain links, 500 of which link to the Â鶹¹ÙÍøÊ×Ò³Èë¿Ú. Finally, short lists work well as there's a steep drop-off of tweets lower down the chart and as you might expect the majority of links point to Â鶹¹ÙÍøÊ×Ò³Èë¿Ú News articles.

Zeitgeist is now up and running for a limited period and we trust that you'll find it an interesting resource. We think a system like this could feed into Â鶹¹ÙÍøÊ×Ò³Èë¿Ú Search as a ranking algorithm, as an additional real-time feed for News recommendations, or as a 'news on the move' mobile service. In any case it shows how audiences can help shape and prioritise content.

Visit the Â鶹¹ÙÍøÊ×Ò³Èë¿Ú prototype

Comments

  • Comment number 1.

    Zeitgeist looks interesting, it reminds me a bit of Shownar, but obviously covers much more than just programmes. Does Zeitgeist make any attempt to match up content that is available at multiple URLs (e.g. /programmes and /iplayer)?

    Also, you mentioned using the bit.ly API. What benefits does this provide over sending a http HEAD request to the url and seeing where it redirects to?

  • Comment number 2.

    Hi @lucas42...

    1. It doesn't at the moment, it is just bbc.co.uk URL based but we could add special rules for programmes (PIDs) later.

    2. I think that bit.ly prefer you to use the API if you're expanding lots of links, I guess it's more efficient for them. We just use redirection for other shortening services.

  • Comment number 3.

    Hi,

    Very interesting ingest chain, what the first process that's connected to the Gardenhose is it Ruby / Node.js or something else?

  • Comment number 4.

    Hi Tom,

    The entire pipeline is written in Ruby. The process that connects to the Twitter API is a custom client using the , and libraries. We've found this is perfectly capable of handling our use case of up to 150 tweets/second.

    I'll be publishing a detailed technical blog post later this week. Watch this space!

    Regards,
    Sean


  • Comment number 5.

    I DON'T CARE!!

    I do care that you're always playing with MY money to mess around with these fleeting, COMMERCIAL technologies.

    HELL'S TEETH, Â鶹¹ÙÍøÊ×Ò³Èë¿Ú, WRITE YOUR OWN!

  • Comment number 6.

    Can't believe they stole a very meaningful and powerful word and used it for something so irrelevant. Ugggh (vomits)



    A word that has also been associated with an anti corporation movement


Ìý

More from this blog...

Â鶹¹ÙÍøÊ×Ò³Èë¿Ú iD

Â鶹¹ÙÍøÊ×Ò³Èë¿Ú navigation

Â鶹¹ÙÍøÊ×Ò³Èë¿Ú © 2014 The Â鶹¹ÙÍøÊ×Ò³Èë¿Ú is not responsible for the content of external sites. Read more.

This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.