Perhaps you might familiar with well reputed social networking websites like Facebook and MySpace and may have experienced the intuitive user experience they offer. Most of social networking websites offer its users a publicly accessible profile page which consists of personal information of that user. They might be user’s biography, photos, videos and any other sharable things. Now most of the social networking websites go beyond that level and displays micro blog updates associated with each user profile. For example, a website X may display Twitter updates of user Y in his profile page. Whenever user Y updates his status on Twitter, user’s profile on website X will be get updated automatically or it is synchronized with Twitter.
Imagine I’m a web developer and I’m planning to develop a social networking website to host profiles of different people. In order to keep the current trend, I need to display micro blog updates in the profile page of a given user. If the user in this case happens to have a Twitter account, I have to display his Twitter feeds in the profile page. Obviously most users tend to have several micro blogging websites other than Twitter. So I have to offer more connectivity and synchronization facility to my users.
Finally I decided to display micro blog updates from 5 well reputed websites. Those are Twitter, Flickr, Youtube, Facebook and WordPress ( You can substitute any other five regardless of them). Whenever the user made a change in one of the above websites, my website MUST synchronize the changes and display them on user’s profile page. For instance, if this user post a blog on WordPress, my website should have published a link (shortened URL of the post) on his profile automatically. But how?
As you know, most of the micro blogging websites provide their own API to access their content( such as Feeds,Users,Search) programmatically via simple HTTP requests. They are in XML or Jason format. In order to consume their content in my web application, I have to write several client scripts for each of these websites. If any website modifies its API, I have to rewrite my script too.
Different websites produce different output and they haven’t got unified format across all the sites. Twitter‘s status update notification response is different from Flickr‘s new photo upload notification response. So I have to aware of each and every website I’m pulling content and parse and cast their responses into common format before I make use them.
Another issue is synchronization. At least I should provide a near real time synchronization between above five websites. Simple and most common way is to polling each website periodically for new content. In order to get the most updated content, I have to poll all five websites repeatedly. This is rather unpleasant and an overhead for my web server.
So if I summarize above scenarios, following problems can be presented
- I have to write API clients for each website and keep in touch with their API updates.
- I have to validate,parse and cast their content in to a common format by myself.
- I have to poll each website repeatedly for new content so that it would make my server overloaded.
In order to solve above problems, I’d like to suggest a new architecture to consume micro blog updates from well known websites.
This architecture is made of well known micro blogging services(publishers), their subscribers(nodes) and centralized hub. There can be many hubs exist and for each hub, it can have many nodes subscribed to it. According to above architecture, I made a decision to keep complexity at the center hub and make nodes simple.
This architecture is somewhat different from Google’s pubsubhubbub. In pubsubhubbub, publishers can push their updates in to the hub and notify them about new updates. But in this architecture, hub polls its publishers for new content or pulls new content. This is because there is no way to tell micro blogging services about notifying the hub when new content is arrived. Well is there anyway to do that, you can suggest me.
In this architecture, all I have to do is just telling the hub about what are the micro blogging services I’m interested in. Then hub will takes the responsibility of pulling off the new content and notifying me the changes.
Hub – This is a server application written in Java NIO and the critical application component in this architecture so that it shields its client’s by acquiring following responsibilities to itself.
- Hub has already implemented the client scripts for several well known micro blogging websites including Twitter and Flickr. So consumers do not need to care about the API client implementation of their own. This makes their development process more efficient and convenient.
- Hub consumes/pulls content from several micro blogging services and crunch them in to a common format which is unified and completely detached from their original format. So client consumers no longer required to validate, parse and cast the responses of micro blogging services by them selves because hub takes that responsibility.
- Hub periodically polls each micro blogging service to pull new updates. Whenever hub receives new updates, it notifies its subscribed nodes about the new update. All update notifications which are flowing from hub to nodes, are in single unified format(RSS/Atom). So consumer clients no longer required to poll for new content.
Currently I’m in the process of designing the hub. I’ll use Java as the programming language and code will be hosted at Google Code. Soon I’ll publish the design documents and let you know about new proceedings by extending this post.
I warmly welcome your comments on this architecture. Perhaps you could contribute by adding a API client for well known micro blogging services like Flickr or you can help me to enhance this architecture by introducing the concept of web hooks.