Tags: , ,
2010-11-23

Google App Engine Evaluation 

Google App Engine Evaluation

Some think Google App Engine is awesome, others think it's shit.  I've been working with it for several months doing everything from simple blogs to searching and even real time games.  What I've learned is that some things are great and other things really are shit.

Personally I've found GAE 1.3.8 to be lacking in many ways but there are also many great things about it and as with everything; you need different tools for different jobs.

This review has been updated for version 1.5

Uses Python 2.5 

GAE uses Python which is just great.  Python really is an awesome language and Google uses it in many of their products and services.  I've long since come to love Python as the best language there is in general and one of it's greatest strengths is how productive you can be with it.

Python has been gaining a lot of ground recently in running websites and I really prefer to not work with anything else anymore.  Being able to use Python is a big plus in my book, and Google also offers Java which a lot of people think is great too, although I personally could never warm up to it.

Of course, you can only use version 2.5 but it's understandable since that's the version Google took quite some time ago and made their own implementation of since the standard implementation can't run "restricted" environments.

There have been talks about version 2.7 but what I'd like to see is version 3.

Reasonably High Free Quotas 

An other really great thing about GAE is that it'll host your site for free as long as it doesn't get too large or complex.  In many cases you can actually serve quite a large number of users with only the free quotas and hopefully you'll have created a revenue stream for when you need to enable billing for additional bandwidth and CPU.

No Naked Domains 

A naked domain is a second level domain without the third level part, like example.com.  With GAE you can only use domains like www.example.com.  This isn't really much of a problem in itself, you can simply set up a redirect from the naked domain.

But a lot of people are worried about losing PageRank from being forced to change their domain if they move to GAE.  Personally I'm mostly unhappy because I do not want www in my domain.  And the way I see it, there really is no excuse for not supporting naked domains.  It really should not be that hard.

No JOIN Statements 

One of the most common complains about GAE is that you can not do JOINs.  I personally haven't had much of a problem with this at all because in many cases you can quite easily redesign your datastructures.

But for some applications, no JOINs is simply a deal-breaker and thus a major reason why a lot of companies can't use GAE.  However, with Google's App Engine For Business they're estimating to have dedicated, full-featured SQL servers for your application by Q2 2011.

No SSL for Your Domain 

Like with no JOINs, this is a deal-breaker for a lot of companies.  You can get SSL but only through example.appspot.com.  Google are however working to provide limited support for this by the end of this year with "App Engine For Business".  (And maybe something for "normal" users too.)

Scalable Storage 

The biggest selling point for GAE is how scalable it is.  You can create however big databases you like and they will remain just as fast.  You can also be perfectly calm in knowing that all your data is backed up and if a server breaks an other one will take over.

But it's important to realize that this scalability comes at a price and that price is latency.  Getting an entry from the Datastore by ID usually takes about 20 ms and sometimes peaks above 50 ms.  A query however, even for a single entry, can take 100 ms and can sometimes peak to several times that.

Many applications don't have a problem with this amount of latency but for other applications it can be very bad.  On the other hand, it usually doesn't take much longer to do more in one call.  Indeed, the silver bullet for making an application on GAE fast is to reduce the number of calls (to any and all services) it makes by batching.

Memcache 

Memcache has become an important part of modern web-development and it's great that GAE provides it.  Because of the Datastore's latency issues you should use Memcache to cache basically everything.  Calling Memcache only takes 5 to 10 ms but you should certainly batch your calls anyway.  Memcache can also be used for global locks.  Never memcache Datastore entries, convert them to dictionaries and they'll pickle faster.

File-uploads to the Blobstore 

You can accept file uploads using the Blobstore, which now also allows you to create and write files.  The good thing about the Blobstore is that you can access file-like objects for reading the files but you can't use the Blobstore unless you enable billing.

The first major missing features with the Blobstore is that even though you create a special URL for each upload you can't specify to abort if a certain file-size is reached.  The second is that you can't track the upload progress.  A third useful feature would be to be able to read at least the head of the file being uploaded (and the filename) so that you can abort it if it's the wrong type.

This is just as much because you don't want to waste the users time and because you don't want users to waste your resources.

Task Queues and Cron Jobs 

Task Queues are a very useful feature that allows you to have GAE access an URL independent of a user so that you can process things in the background.  The maximum request time is much higher (10 minutes) than the normal 30 seconds which allows you to do bigger jobs.

While Tasks are created by your application at will, Cron Jobs allow you to schedule a request every hour, day, etc which is useful for many things.  These request will also have the time limit increased in the next release.

No Global Application Data 

Each instance can hold data but it's not global and instances get recycled, Memcache is global and can be locked but it may forget data at any time and the Datastore is slow.  There are of course ways of working around this but a persistent version of Memcache, using quotas, would be immensely useful for a variety of applications.  This is being addressed with the upcoming Backends.

Latency 

Latency is a big issue with GAE.  The Datastore can have high latency and all other services have some latency too.  Many applications spend most of their time waiting for something and that means the user is waiting too.  There's also some latency whenever a new instance has to be started and in addition to the unavoidable latency of sending data around the world it can take 200 ms just for the request to go through GAE, even with the application doing nothing.

Most sites just spit out a page that the user then stays on a little while but if you need to have any kind of action "in real time", with GAE you have to count on being 0.3 - 0.5 seconds late even without your application doing anything.

I've written a library that uses Memcache as the "primary" storage, bulk preloading, application caching, locking and Tasks for continuously backing up to the Datastore which manages to both read and write (with locks) a bunch of things in less than 50 ms for most requests in order to run real time applications like games but then it takes 300 ms more that I can't do anything about.

Data Import and Export 

When it comes to moving data to GAE there are a number of problems; you have to create a small program that posts the data to your site, you have to write a script that parses and saves it, you're limited to 30 seconds on each request, it uses your quota and your site may become unavailable for some time.

Of course, there is a tool that can do this for you but I think it's actually more work to learn, configure and set it up and it still uses your quota.  Especially now when the request limit for Tasks is being increased to 10 minutes, I'd say it's easier to upload a few big data-chunks to the Blobstore and then have some Tasks go through it.

Long Polling 

Long Polling, also known as "Comet" and various other names, is a technique where instead of sending a request and then getting a response you send part of a request and get part of a response and then keep the connection open in order to use it like a traditional socket where you're not limited by the request/response cycle.  This allows the server to send out information about something whenever it's state changes, instead of the client having to ask repeatedly.

The Channel API attempts to provide this feature but it's not really worth talking about since it doesn't really provide the features you want, only simplifies doing the "work-around".

Terrible Spikes and Unavailability 

GAE is a quite unstable service and latency spikes are common.  Even if you don't have a real time application the spikes can be terrible for your users.  If your primary market is not in America you'll be noticeable better off.

It's also worth noting that the amount of CPU a Datastore operation costs sometimes changes seemingly in accordance with the system load.  I get less done before my quota runs out if I do bigger processing jobs while the Americans are awake.

At one time when GAE was having a lot of problems the cost suddenly increased to between 100 and 1000 times as much as normal and swallowed my quota in literally seconds.  I'm hopeful that I would've gotten my money back but I'm still glad I didn't have billing enabled.

Additionally, every now and then GAE can not serve your site.  It's usually due to spikes when it's just a few random pages that don't load, or longer downtime in order to do maintenance because of spikes.  Google is working on a 99.9% uptime agreement for businesses but that's still one and a half minute for users to look elsewhere every day.

No Search API 

Search has become such a ubiquitous feature and since it's Google you'd expect GAE to have some kind of built-in full-text search service.  The feature is listed in their Issue Tracker with the status "Started" but it's not on their Roadmap.

Because of the various limitation on GAE (mostly the Datastore and what libraries you can use), making advanced features like search is not an easy task, and making a really good search is just impossible.

This is actually a quite big let down and may even be a deal-breaker.  If your site is not popular enough to get indexed fully so you can use Google Site search you'll have to look for other similar services.  But what you want to do is have a search tailored to your own applications needs.

Now with version 1.5 there's finally a Search API on it's way.

Conclusion 

A lot of people defend GAE with "the right tool for the job" argument but you still have to recognize that GAE does have many flaws.  It's hard to complain when it's Google but I'd say that this tool is a poor choice for any job, but that's of course not completely true.

There are a variety of sites that will run perfectly on GAE.  The problem is that a lot of people are mislead when Google keeps talking about how scalable it is – the applications that run best on GAE does Not need to scale.  On the other hand, sites that do, or will, need to scale will usually Also need a lot of advanced features that GAE simply does not allow for.

Some of those features are finally on their way, like SSL on your own domain and SQL servers, but it's already been two years since GAE was first released.  Once you can write files and use binary-backed libraries to do more advanced things is when GAE will really become a great option.

GAE is awesome when the website is not the primary business.  If you have a company that does something and need a small website that informs about what that company does, GAE is just awesome because Python allows you to make, and maintain, the site easily and quickly.  Google will also host the site for free.

For the same reasons it's also great for blogs or news sites where all you need are "pages", and no "functionality".  But you can go further and run a webstore or a simple social network on GAE as well (when SSL on your domain is ready).  And you can scale these sites to have however much content and users you like without problems.

You can grow as big as you like, but you will have problems making the features expected of such a huge site.  Search, real time action and other complex features are some of the things that are still hard or even impossible to make on GAE.

Google are certainly working on some of these things, and given enough time maybe they'll eventually be able to give us everything we want.  Personally, I run all of my websites on GAE almost exclusively because of "Free and Python".  While the other features are nice, I could set those things up myself on any cloud while not having the limits of GAE.

If you only need a simple site or you're short on money, GAE is certainly a great choice.  But otherwise you need to think twice before choosing GAE and not just go with it since it's from Google and therefor expect it to be capable of everything because it's not.





« PREV PAGE
Having Fun With Battle Edge
NEXT PAGE »
Game of Thrones: A Disappointment