Slight shift in our application strategy

Mike McGrath mmcgrath at redhat.com
Fri Feb 27 20:17:15 UTC 2009


Background:  Shortly after we started scaling out to remote sites we
noticed some parts of some of our applications had issues over the vpn
link.  Initial ticket creation was a bit off as to what we were looking at
but:

https://fedorahosted.org/fedora-infrastructure/ticket/281

To try to keep this short: after lots of tests we discovered that
applications that run lots of queries are the core of our issue.  We have
ok bandwidth to all of our sites, but latency is high enough that it's
become too expensive to actively run applications at these sites.  Every
query that gets run at a remote site seems to take a minimum of .3 to .5
seconds for the complete round-trip.  As we mature and as features get in
lots of our apps need more queries.  We can and should go through and make
these more efficient but that's going to happen over a long time.  We just
don't have the number of people we need to do trends on each page of each
application and convert all the sql to its most efficient.

Instead we're going to convert all of our remote application servers to
passive/backup servers.  Up until now we've generally been using our
remote sites to scale load.  Now though we can't really do that.  They're
an important role for being fairly HA (our SPOF is still our data layer).
Having a multi-master data layer of postgres and mysql just won't be a win
for us at our size at this time.

So what does this mean for the future?  Our scaling issues at our app
layer will just have to be in a centralized location.  This won't scale
forever but I think for the near and middle term in Fedora's future it's
what we're going to have to bank on.  We can continue to focus on better
caching at our proxy layer which will continue to be active at each remote
site.

All of what I've written here is probably very obvious to most of you, and
it is.  The difference now is we have some much better data concerning the
interaction between our app servers and the data layer and better metrics
for how long those interactions take.  So darnit, I'm not going to spend
all the time I did with testing and metrics and not write some long
summary about it!  :)

	-Mike




More information about the Fedora-infrastructure-list mailing list