A look behind the scenes of pollly
At seerow we always push to find time to implement our own ideas and projects. One of these projects is poll.ly, which, after one complete rewrite of the frontend and about four complete rewrites of the backend (but more on that later), we finally could release last December.
The purpose of pollly is to provide a tool, with which you can easily find ideas together with your friends or colleagues. You create a poll, and on this poll, everyone who has the link to the poll can create a user and add his own ideas as well as vote on them.
pollly was designed to be as slim as possible, so that really everyone can use it on first sight. We had long and hard discussions about which features we need to scrap in order to keep the tool as easy to use as possible. We ended up dropping almost all the ideas that where floating around, and we think it was the right decision. A really often discussed idea was the need to create a user account, Users should be enabled to continue suggesting and voting on other devices, everybody agreed on that. Where the differences came up, was on how to achieve that goal. The obvious idea one might think, was to implement a user system, where the user has to create an account, and can then login on different devices and continue voting and suggesting. The other approach is, just let the user state his name, BAM that’s it! Now if he comes back on a different device, he is offered to select from all the names that are registered on the poll. So he selects his name (hopefully) and continues where he left of. Well now wait a minute, he could also select a name from a different person and just vote in there name. After we discussed this point vigorously, we ended up with the easy solution. Creating a login (with a necessary email address attached to it), was just not what we imagined for pollly. And so it came to be, that everyone can easily fake votes on behalf of other people. But here at seerow we really believe in the good nature of people on the internet
Now let us take a deeper look at how pollly was built. As mentioned in the beginning, pollly underwent a lot of changes and codebase rewrites. And as pollly grew so did we along with it. You might think what the hell is this guy talking about, pollly is just a simple webapp where you can cast some votes. I have to admit, you are right, it’s nothing more, and that’s how it started out, a simple PHP Backend working with a straight forward SQL Database. The Frontend a very simple AngularJS application. But who does not like to dream big. Dream of all the thousands concurrent sessions our tool might have to handle. And with this dream came the realization that our cheap hosting might run into some performance problems handling all those imaginary requests. So we turned to the Buzzword of 2015, scalability yeah in 2015 stuff needs to scale and a 10$ a month hosting with MySQL certainly does not scale. That’s where we turned to the Google AppEngine, because yeah that scales pretty well and you only pay for the stuff that you actually use.
We decided to go with golang as the programming language, running it on the AppEngine and using the Datastore (NoSQL Database running on the GAE). The AppEngine in conjunction with the Datastore automatically scales up the performance of the server as needed. So we dug in head on, without all to much thought. No one of us had any real experience with NoSQL and s,o as can be expected, we normalized our Application like madman, a poll object had user option objects which in turn had vote objects. A poll object also had link objects attached to it and all the user objects. This was built pretty quickly and it ran very very smooth. But then we had a look at the Billing table on our GAE. Wow 2$! Just for some testing, if our application got a little traction on reddit we would certainly lose all our capital just to the AppEngine costs. What was going on? Well a GET request to our API would trigger hundreds of Read Operations on the Datastore, because every Poll Object reads all of User Objects, every User Object reads all of the according votes and suggestions and so on. This can spiral out of control very quickly. So what’s the plan? Well use NoSQL the way it’s intended to be used. With a really flat structure. So just store all the data in a Poll Object. What can go wrong? Scaling goes wrong, because the GAE only allows a certain number of writes on an object per second. But every vote now needs to be stored on the poll object, which reduces the votes per second on a poll drastically.
To solve this issue, we had to use sharding, something we hadn’t ever heard about. Luckily a lot of other people on the internet know a lot about this subject. The main idea is, to create a lot of Shards in the Database for every Object, a shard can be considered a copy of an existing object. When we write to the database we don’t write on the Object itself but on a randomly chosen shard. This way we can write on the same object multiple times per second by using different shards. The tricky part now is to consolidate the different shards. To do this you can write a worker which iterates over all shards and writes them on the true poll object. We used a different approach, instead of writing on a real poll object we iterate over all the shards and write the result directly to the cache. If there is a request for let’s say /poll/3 we check if there is something in the cache. If so, we serve what’s in the cache. If the cache is empty, or older than some threshold we iterate and write the results in the cache. With this we arrived at our solution which is currently running on http://poll.ly. As of today we’re still missing all the requests we dreamed about, but it runs flawlessly and with really minimal costs. The 10$ a month hosting (mentioned at the beginning of the blog) would have cost us way more.
If you have questions about pollly just ask them in the comment section, we will be happy to answer them. Let us know, if you found the post informative. And last but not certainly not least, go checkout http://poll.ly and let us know how to improve it.