As if...
As I've stated before, the aggregate flow of all tweets in on the order of 10-20 messages per second, based on peeking at the message sequence numbers. It seems readily apparent that they've chosen the wrong architecture for this.
The tweets themselves should be aggregated with a sequence number, and user sequence number into a stream which should get copied to all of the boxes handling User Interface. Deleting a message would be handled by reposting it to the same queue with no data.
The subscription lists should be another stream.
The user database could be yet another stream.
All of those streams should aggregate out to about 10 kbytes/second. The process of splitting out the work to UI boxes is one of straight forward partitioning of the load, and maintaining a list of tweet sequence numbers for each person to see. The aggregated total of all of the three streams would sit on each UI box so they didn't have to get any of it from across the net.
That's my basic idea for scaling twitter. Comments welcome.
No comments:
Post a Comment