Submitted by CSÉCSY László on Thu, 08/28/2008 - 01:43.
And if the site is aware of it and it can be solved by restoring a backup (I have heard this somewhere, maybe on irc?), why doesn't the site itself does/trigger the restore?
Submitted by Gábor Hojtsy on Thu, 08/28/2008 - 10:01.
It is "down" when for some reason the usually working database query does not return anything. We have two levels on caching on top of it, so to avoid the database coming back with bad data, and if the data is bad it is never cached, and the cache is cleared when the cache if found bad. We have been all around this code bazillion times, built in various self healing things, so that it heals itself as soon as possible (caches when have data, drops cache with bad cache).
I think I have done everything I can to make it work flawlessly and I did not succeed in that.
it does
It is "down" when for some reason the usually working database query does not return anything. We have two levels on caching on top of it, so to avoid the database coming back with bad data, and if the data is bad it is never cached, and the cache is cleared when the cache if found bad. We have been all around this code bazillion times, built in various self healing things, so that it heals itself as soon as possible (caches when have data, drops cache with bad cache).
I think I have done everything I can to make it work flawlessly and I did not succeed in that.
ugly little bug
So in the end it turned that the trouble with the schedule was a timezone issue...
When the schedule cache was renewed for somebody with a different timezone than the website, the sessions didn't fit any more in the schedule.
Gabor found the bug and properly squashed it.