1. I have corrected the e-mail settings so that outgoing e-mails from these forums should be sent now. If you tried to Register or Reset your Password, please try again!
    Dismiss Notice

bulk walk of collections without causing pain

Discussion in 'Census: General Discussion' started by Quicktiger, Feb 1, 2012.

  1. Quicktiger

    Quicktiger Guest

    Over the last few days, I've found some ways I might be able to bulk walk various large collections without causing anyone (especially Sony) any pain.  I'd love feedback on how I can sync what I store to what Sony provides without causing high load on any of Sony's hardware.  I'm more than willing to increase my hardware, but my DB host is fairly beefy as it is, and the entire DB easily resides in my server's RAM.

    Specific attention is currently on the characters collection due to the data complexity involved and the ever-growing size of the collection.

    (1) "just walk it"  In this mode, I perform a simple walk starting at start=0 with a limit (15 currently due to timeouts with larger values.)  This is a brute force walk and is not guaranteed to catch all updates since the collection is returned in whatever order the back-end happens to have it in.

    (2) "walk by last_update" or "walk by id" -- this is a straight forward walk again, but with c:sort=id or c:sort=last_update.  This seems to time out after a while no matter how small the limit.

    (3) "smart walk by id or last_update" -- id is actaully easier as they are guaranteed to be unique.  This mode will return N items by using c:sort=id&id=>M where M starts out at 0 (I don't want the bogus record 0 anyway!) and is set to be the highest ID returned in each block.

    (4) Any variation of the above, but with c:show=id,last_update and a smart comparison between what I have in my database for that record and the last_update timestamp.  If it has changed, this would mean I'd have to directly query that record.  As I see a churn of about 30% each day, this would be upwards of 200,000 queries or so.

    So far, (1) works, but takes about 23,000 seconds (6 hours) to complete.  (2) times out after the offset gets large.  (3) I have not tried with ID, but I have tried with last_update.  The trick here is that sometimes, when returning 25 records, all of them have the same start time, so the collection may become unordered for the records that match that specific time, and using c:sort=last_update,id will get the infamous gateway timeout.

    I believe for long term (4) is the right answer, but using id instead of last_update.  Trying to only get changed-since records but in ID order seems to be slower than walking the whole database, and eliminates the need to track the last time I used for last_update.

    Much of the time is waiting on a Sony server to return the query, and a small but increasing time is on my side processing it.  I can do the next query in parallel to cause my side's time to be effectively zero, leaving only the 8-10 seconds per query.

    --Quicktiger

     
  2. DanKinney

    DanKinney Guest

    Part of the performance problem you are seeing from us is due to how we set up the primary database machine.  It was scaled improperly (my fault) and we are taking steps to improve it.

    The biggest deficiency is the amout of RAM on this box.  It only has 8 GB and we are going to increasing that soon.

    While we have 2 machines, it is effectively a 1 + 1 - one host with a real-time "hot spare" for redundancy.  The primary host handles the reads AND writes (updates from the game).  We are going to be moving that to a 6 host setup that allows for all hosts to serve requests.  We are also going to "shard" the data so that half of the data will be served from different hosts.

    These changes will allow us to keep the entire database resident within RAM.

    The other issue pertains to the reliability of the last_update field.  This week's data migration involved multiple steps and touches on the data outside of the normal update pipeline.  I am hopeful that it will settle down and you'll be able to rely on this field once more.  If not, we'll add a mechanism that you can use to track changes.

    -dan

     
  3. feldon30

    feldon30 Guest

    Dan, This is fantastic news. We've been able to build a robust site with the depth of information available so far. But of course there are always going to be queries which currently won't complete within the time allotted. Can't wait to see what we can do with more RAM and a beefier setup. ;)
     
  4. Dethdlr

    Dethdlr Guest

    Dan,

    Any guestimates on when the server upgrades are going to happen?  This thing is going to fly once that happens.

    Dethdlr

     
  5. DanKinney

    DanKinney Guest

    Not yet.  The first step is to build the internal version of the cluster.  I hope that will be done on Monday.  Once we have that in place, we'll be able to update our operational plan and know exactly what we need to do for production.

    There is an interim step of simply updating the RAM of the existing machine.  We are doing that in parallel.

    I'll keep you guys up to date on our progress.

    -dan

     
  6. DanKinney

    DanKinney Guest

    Just to explain why you didn't see much progress from me last week, I was out for 3 days moving into a new house.  I'm back online now and should be back to speed on Monday.

    -dan

     
  7. Quicktiger

    Quicktiger Guest

    Hope you enjoy your new digs!

    I've been playing around a bit here and there, but nothing major yet.  The bulk of my time recently has gone into making it possible for people to reschedule a refresh of specific data, and then have it update in the background using Resque.  Very cool stuff so far.

     
  8. Zoltaroth

    Zoltaroth Guest

    Dan's new place:

    <img src="http://cdn.freshome.com/wp-content/uploads/2010/05/iron-man-home.jpg" width="597" height="377" />

     
  9. DanKinney

    DanKinney Guest

    Yeah...and Gwyneth Paltrow is my new assistant.

    -dan

     
  10. Lantis

    Lantis Guest

    Daymn... <img src="/station/images/smilies/385970365b8ed7503b4294502a458efa.gif" border="0" />

    Is SOE hiring? <img src="/station/images/smilies/97ada74b88049a6d50a6ed40898a03d7.gif" border="0" />

     

Share This Page