Real time web: Making web real time with push technology


Annoying son and the angry father

A boy tells his father “Hey dad! I want to go the zoo right now!” Then his father takes him to the car and start driving to the zoo. The boy is so excited about the tour and eager to get to the zoo immediately. So, while they are travelling boy repeatedly asks his father “Are we there yet?” Then father replied “No”

Boy: Are we there yet?
Dad: No

Boy: Are we there yet?
Dad: No

Boy: Are we there yet?
Dad: No

Boy is too annoying and angry father tells to the boy “Will you stop asking the same question? Otherwise I’ll get back you to home!”

World Wide Web in the good old days

If you carefully read the above story, you can find a similarity between that story and the traditional web. This is exactly analogous to a case like RSS reader repeatedly polling a web server for new content. Annoying boy is RSS reader and the angry father is web server in that case.

This is the nature of the web from its beginning. Since web is based on the client-server architecture, every time client has to initiate the request and server has to respond. Client exactly doesn’t know when new content arrives from the server so that client has to poll the server periodically. That’s why your email client repeatedly polls the mail server for new email. That’s why your news reader repeatedly polls the news server for latest news.

This “Pull model” makes so many problems when it comes to the bandwidth consumption. Continuous polling makes higher traffic on the networks and it causes the server hardware overloaded. Because of these issues, several attempts have came up during the last few years.

What is real-time web?

“The Real-Time Web is a paradigm based on pushing information to users as soon as it’s available – instead of requiring that they or their software check a source periodically for updates. It can be enabled in many different ways and can require a different technical architecture. It’s being implemented in social networking, search, news and elsewhere – making those experiences more like Instant Messaging and facilitating unpredictable innovations. Early benefits include increased user engagement (“flow”) and decreased server loads, but these are early days. Real-time information delivery will likely become ubiquitous, a requirement for almost any website or service.” – Read Write Web, 2009

Actually speaking, social networks put the first steps of making web real time. If you are Facebook or Twitter fanatic, you may have experienced their real time news feeds and status updates. They are more likely to Instant Messaging. Once your friend tags you in a photograph, you will be immediately notified by Facebook. Whenever a new Tweet arrives, you will be notified by Twitter. Likewise social networks offer their users a real time experiences day by day.

How to make web real time?

As I mentioned earlier there are several attempts to make the traditional web real time. Some of them are very straightforward in their nature while some of them are technically challenging. Below I’m going to describe several approaches to make web real time.

1. Web hooks

The concept of a WebHook is simple. A WebHook is an HTTP callback: an HTTP POST that occurs when something happens; a simple event-notification via HTTP POST.

A web application implementing WebHooks will POST a message to a URL when certain things happen. When a web application enables users to register their own URLs, the users can then extend, customize, and integrate that application with their own custom extensions or even with other applications around the web. For the user, WebHooks are a way to receive valuable information when it happens, rather than continually polling for that data and receiving nothing valuable most of the time.

Read more about web hooks here

2. HTTP server push (HTTP Streaming)

HTTP streaming is yet another elegant way of getting content as soon as they are published. Here, web server sends or pushes data to the web browser which is opposite to the traditional client-server architecture.

In this mechanism, web server doesn’t terminate a connection after response data has been served to a client. The web server leaves the connection open such that if an event is received, it can immediately be sent to one or multiple clients. Otherwise the data would have to be queued until the client’s next request is received. Several web servers offer this functionality CGI (Common Gateway Interface)

Read more about HTTP server push here

3. XMPP

XMPP is a protocol that can be used to send instant messages. Its underlying technology is XML stanzas. By using XMPP, server can push new content to its client including browsers, desktop applications and mobile devices as well.

Read more about XMPP here

4. Comet

Comet is a common term that describes a web application model in which a long-held HTTP request allows a web server to push data to a browser, without the browser explicitly requesting it. Comet has various methods to achieve this web model.

Hidden iFrames, Ajax with long polling and XMLHttpRequest are some of the methodologies that are adhere to this Comet web application model.

Read more about Comet here

5. pubsubhubbub

pubsubhubbub is a simple, open, server-to-server web-hook-based pubsub (publish/subscribe) protocol as an extension to Atom and RSS.

Currently this is the most rapidly adopting technology to push new content from server side to the client side. Parties (servers) speaking the Pubsubhubbub protocol can get near-instant notifications (via web hook callbacks) when a topic (feed URL) they’re interested in is updated.

This architecture composed of a publisher, subscriber and a hub. Publisher delegates the responsibility of distributing new content to the hub. Subscribers initially require subscribing to this hub, in order to get new content delivered. Whenever the new content is added, publisher notifies the hub about the changes so that hub “pushes” the content to its subscriber’s in real time.

Read more about pubsubhubbub here

Hypothetical architecture to federate well known micro blogging services


Background

Perhaps you might familiar with well reputed social networking websites like Facebook and MySpace and may have experienced the intuitive user experience they offer. Most of social networking websites offer its users a publicly accessible profile page which consists of personal information of that user. They might be user’s biography, photos, videos and any other sharable things. Now most of the social networking websites go beyond that level and displays micro blog updates associated with each user profile. For example, a website X may display Twitter updates of user Y in his profile page. Whenever user Y updates his status on Twitter, user’s profile on website X will be get updated automatically or it is synchronized with Twitter.

Requirement

Imagine I’m a web developer and I’m planning to develop a social networking website to host profiles of different people. In order to keep the current trend, I need to display micro blog updates in the profile page of a given user. If the user in this case happens to have a Twitter account, I have to display his Twitter feeds in the profile page. Obviously most users tend to have several micro blogging websites other than Twitter. So I have to offer more connectivity and synchronization facility to my users.

Problem

Finally I decided to display micro blog updates from 5 well reputed websites. Those are Twitter, Flickr, Youtube, Facebook and WordPress ( You can substitute any other five regardless of them). Whenever the user made a change in one of the above websites, my website MUST synchronize the changes and display them on user’s profile page. For instance, if this user post a blog on WordPress, my website should have published a link (shortened URL of the post) on his profile automatically. But how?

As you know, most of the micro blogging websites provide their own API to access their content( such as Feeds,Users,Search) programmatically via simple HTTP requests. They are in XML or Jason format. In order to consume their content in my web application, I have to write several client scripts for each of these websites. If any website modifies its API, I have to rewrite my script too.

Different websites produce different output and they haven’t got unified format across all the sites. Twitter‘s status update notification response is different from Flickr‘s new photo upload notification response. So I have to aware of each and every website I’m pulling content and parse and cast their responses into common format before I make use them.

Another issue is synchronization. At least I should provide a near real time synchronization between above five websites. Simple and most common way is to polling each website periodically for new content. In order to get the most updated content, I have to poll all five websites repeatedly. This is rather unpleasant and an overhead for my web server.

So if I summarize above scenarios, following problems can be presented

  1. I have to write API clients for each website and keep in touch with their API updates.
  2. I have to validate,parse and cast their content in to a common format by myself.
  3. I have to poll each website repeatedly for new content so that it would make my server overloaded.
Problems arised with current issue

Solution

In order to solve above problems, I’d like to suggest a new architecture to consume micro blog updates from well known websites.

Suggested architecture for federating

This architecture is made of well known micro blogging services(publishers), their subscribers(nodes) and centralized hub. There can be many hubs exist and for each hub, it can have many nodes subscribed to it. According to above architecture, I made a decision to keep complexity at the center hub and make nodes simple.

This architecture is somewhat different from Google’s pubsubhubbub. In pubsubhubbub, publishers can push their updates in to the hub and notify them about new updates. But in this architecture, hub polls its publishers for new content or pulls new content. This is because there is no way to tell micro blogging services about notifying the hub when new content is arrived. Well is there anyway to do that, you can suggest me.

In this architecture, all I have to do is just telling the hub about what are the micro blogging services I’m interested in. Then hub will takes the responsibility of pulling off the new content and notifying me the changes.

Hub – This is a server application written in Java NIO and the critical application component in this architecture so that it shields its client’s by acquiring following responsibilities to itself.

  • Hub has already implemented the client scripts for several well known micro blogging websites including Twitter and Flickr. So consumers do not need to care about the API client implementation of their own. This makes their development process more efficient and convenient.
  • Hub consumes/pulls content from several micro blogging services and crunch them in to a common format which is unified and completely detached from their original format. So client consumers no longer required to validate, parse and cast the responses of micro blogging services by them selves because hub takes that responsibility.
  • Hub periodically polls each micro blogging service to pull new updates. Whenever hub receives new updates, it notifies its subscribed nodes about the new update. All update notifications which are flowing from hub to nodes, are in single unified format(RSS/Atom). So consumer clients no longer required to poll for new content.

Implementation

Currently I’m in the process of designing the hub. I’ll use Java as the programming language and code will be hosted at Google Code. Soon I’ll publish the design documents and let you know about new proceedings by extending this post.

I warmly welcome your comments on this architecture. Perhaps you could contribute by adding a API client for well known micro blogging services like Flickr or you can help me to enhance this architecture by introducing the concept of web hooks.