Hypothetical architecture to federate well known micro blogging services

Background

Perhaps you might familiar with well reputed social networking websites like Facebook and MySpace and may have experienced the intuitive user experience they offer. Most of social networking websites offer its users a publicly accessible profile page which consists of personal information of that user. They might be user’s biography, photos, videos and any other sharable things. Now most of the social networking websites go beyond that level and displays micro blog updates associated with each user profile. For example, a website X may display Twitter updates of user Y in his profile page. Whenever user Y updates his status on Twitter, user’s profile on website X will be get updated automatically or it is synchronized with Twitter.

Requirement

Imagine I’m a web developer and I’m planning to develop a social networking website to host profiles of different people. In order to keep the current trend, I need to display micro blog updates in the profile page of a given user. If the user in this case happens to have a Twitter account, I have to display his Twitter feeds in the profile page. Obviously most users tend to have several micro blogging websites other than Twitter. So I have to offer more connectivity and synchronization facility to my users.

Problem

Finally I decided to display micro blog updates from 5 well reputed websites. Those are Twitter, Flickr, Youtube, Facebook and WordPress ( You can substitute any other five regardless of them). Whenever the user made a change in one of the above websites, my website MUST synchronize the changes and display them on user’s profile page. For instance, if this user post a blog on WordPress, my website should have published a link (shortened URL of the post) on his profile automatically. But how?

As you know, most of the micro blogging websites provide their own API to access their content( such as Feeds,Users,Search) programmatically via simple HTTP requests. They are in XML or Jason format. In order to consume their content in my web application, I have to write several client scripts for each of these websites. If any website modifies its API, I have to rewrite my script too.

Different websites produce different output and they haven’t got unified format across all the sites. Twitter‘s status update notification response is different from Flickr‘s new photo upload notification response. So I have to aware of each and every website I’m pulling content and parse and cast their responses into common format before I make use them.

Another issue is synchronization. At least I should provide a near real time synchronization between above five websites. Simple and most common way is to polling each website periodically for new content. In order to get the most updated content, I have to poll all five websites repeatedly. This is rather unpleasant and an overhead for my web server.

So if I summarize above scenarios, following problems can be presented

  1. I have to write API clients for each website and keep in touch with their API updates.
  2. I have to validate,parse and cast their content in to a common format by myself.
  3. I have to poll each website repeatedly for new content so that it would make my server overloaded.
Problems arised with current issue

Solution

In order to solve above problems, I’d like to suggest a new architecture to consume micro blog updates from well known websites.

Suggested architecture for federating

This architecture is made of well known micro blogging services(publishers), their subscribers(nodes) and centralized hub. There can be many hubs exist and for each hub, it can have many nodes subscribed to it. According to above architecture, I made a decision to keep complexity at the center hub and make nodes simple.

This architecture is somewhat different from Google’s pubsubhubbub. In pubsubhubbub, publishers can push their updates in to the hub and notify them about new updates. But in this architecture, hub polls its publishers for new content or pulls new content. This is because there is no way to tell micro blogging services about notifying the hub when new content is arrived. Well is there anyway to do that, you can suggest me.

In this architecture, all I have to do is just telling the hub about what are the micro blogging services I’m interested in. Then hub will takes the responsibility of pulling off the new content and notifying me the changes.

Hub – This is a server application written in Java NIO and the critical application component in this architecture so that it shields its client’s by acquiring following responsibilities to itself.

  • Hub has already implemented the client scripts for several well known micro blogging websites including Twitter and Flickr. So consumers do not need to care about the API client implementation of their own. This makes their development process more efficient and convenient.
  • Hub consumes/pulls content from several micro blogging services and crunch them in to a common format which is unified and completely detached from their original format. So client consumers no longer required to validate, parse and cast the responses of micro blogging services by them selves because hub takes that responsibility.
  • Hub periodically polls each micro blogging service to pull new updates. Whenever hub receives new updates, it notifies its subscribed nodes about the new update. All update notifications which are flowing from hub to nodes, are in single unified format(RSS/Atom). So consumer clients no longer required to poll for new content.

Implementation

Currently I’m in the process of designing the hub. I’ll use Java as the programming language and code will be hosted at Google Code. Soon I’ll publish the design documents and let you know about new proceedings by extending this post.

I warmly welcome your comments on this architecture. Perhaps you could contribute by adding a API client for well known micro blogging services like Flickr or you can help me to enhance this architecture by introducing the concept of web hooks.

6 thoughts on “Hypothetical architecture to federate well known micro blogging services

  1. How do you plan to handle authentication? If it’s OAuth that’s fine but if users have to give their username/password to a 3rd party in order to aggregate all these content from social media sites?! Seems a bit far fetched 🙂 BTW, sounds a lot similar to what Friendfeed does (http://friendfeed.com/). Will this be something like an open source version of friendfeed which users can host the aggregator themselves?

    1. Thanks Chintana for the comment,
      This architecture should be considered as a service rather than an aggregation of feeds. If you need to get Twitter updates for a given user profile, you don’t have to write client script from the scratch. Instead you delegate the hard part to the hub by invoking several methods on it. Hub does the authentication,fetching and formating of feeds from various feed sources and forwards the final outcome to your website/callback url. If you need to get Flickr updates for a profile, same process would be there. So this architecture aimed at developers rather than end users.

      When it comes to the authentication, hub takes care of authenticating user profiles against various feed sources. OpenID,SAML and claim based security token service can be used at the hub level. At this moment I haven’t considered to build the security model deeply. But in future I will.

      Hub can employ Google’s Social Graph API to discover different profile appearances based on a single user profile URL which is submitted to the hub. Based on this profile URL, hub can discover several profiles for the same user and may attempt to establish connections to them using preferred authentication method. So micro blogging services are not hard coded to the user profile, instead they are dynamic and discoverable.

      I think this is the point which differentiates FriendFeed and this architecture.

      Thanks,
      Dunith

    1. Thanks Arindan,
      Actually for the time being there are several micro blogging aggregation services exist. Well known “Yahoo Query Language” is a leading one. Using those services, making a mashup of feeds is a simple task.
      I’m thinking about making micro blog mashup more semantic rather than cloning it as it is.

      • Dunith

Leave a reply to Chintana Cancel reply