I did some guestimates in Excel to see if I could figure out why twitter has issues with scaling. The results are surprising. I guestimated that the total byte flow of the raw tweets to/from the database would be on the order of 3k per second.
You can see the spreadsheet for yourself (thanks to Google docs).
The raw flow is surprisingly low. Mostly because we're talking about unique tweets, with no duplication to all of a person's followers, etc.
At the heart of it, any reasonable PC could do it... the bandwidth is very low.
The key question is architecture. If it's done right, the backbone of twitter has this very low bandwidth (but very important) flow multicast out to a bunch of front end machines that handle the subscriber's pipes. If you keep it basic, listing for each message (about 10 / second), if everyone had 5000 subscribers, you'd still only have to put out 50k messages /second for the whole twitter farm... not unreasonable if you broke it up into pieces.
I think I could build a twitter clone. I think a lot of us could. I think it would be a cool hacking challenge to see who could do it with the smallest collection of VMware machines against a simulated load/userbase.
What do you folks think?
--Mike--
No comments:
Post a Comment