1. I have corrected the e-mail settings so that outgoing e-mails from these forums should be sent now. If you tried to Register or Reset your Password, please try again!
    Dismiss Notice

Bulk data and experimentation

Discussion in 'Census: General Discussion' started by Quicktiger, Jan 12, 2012.

  1. Quicktiger

    Quicktiger Guest

    I assume it's clear that I am very interested in a bulk transfer of data, in as close to real time as possible, from Sony.  Ideally this would be for all public information I can get.  I love data and love mashing it up in new ways, and I think this is the only way to satisfy that love with the EQ2 data.

    I am therefore pushing the limits of what the data site can do, and finding new and exciting ways to receive that all too common "gateway timed out" response, which I know means I am murdering a server somewhere at Sony.

    For example, today's fun was that I learned I can actually use &last_update=]<timestamp>&c:sort=last_update and actually walk the list of last updated characters.  This is extremely cool and useful to me.  Of course, it doesn't solve the problem of DELETED items from collections...  but we'll get to that on another thread.

    However, I found that this field is really not indexed fully, or at least is strangely indexed to me.  The further back in time I go, the longer the server takes to respond it seems.  If I go back too far, I need to adjust the limit, and if I go back too far, I will get that infamous timeout.  Also, any date in the future will cause the timeout.

    I clearly want to walk the edge of immediate without killing the fun for everyone else.  I'm starting this thread in hopes that we can come up with a way to do bulk transfers of data, even partial transfers for things like "all guild members", without killing the server on Sony's end.  Clearly that is an explicity requirement in any use of this system.

    I see two goals here:

    (1)  Bulk data transfer in a backend-friendly, reliable way

    (2)  Safety to experiemet with new queries without fear of losing access to the API, or causing others pain by requiring us to register differently, or making life harder for anyone in any way.

    Part of (2) is, I think, rejecting some queries that are going to be "too hard" from inspection, like c:sort on a field that is not indexed, but then providing a way for us to request an index on that field.  Indexes are likely not free in termsl of space or time.

     
  2. Bella

    Bella Guest

    I don't understand why you need bulk data. I don't believe that was ever the intent here for this data. I can see you wanting to cache information temporarily, but it would seem that bulk transfer is beyond the scope of what is being offered here. As for what you are talking about further back you go that is not surprising and should be expected. You are getting a great deal more information as you go back. They have warned us about being "good" with the data. Breaking up requests in smaller chucks I would think would fall right in line with being good.
     
  3. Proopai

    Proopai Guest

    I am with you on this one Quicktiger the site and plan for the data that I have in mind is going to requirement to pull mass amounts of data from the server (And I would only update once every 24 hours or so) but I want to be able to as much accurate data as possible and often as possible.

    I too am finding that when I get more and more past the start of the list i tend to get the gateway timed out error so when i go and sort by server then it stops and I can do little bit more but still tend to get the error.  I got to the point I am pulling just the Character ID number then having a 2nd process pull each ID number's information.  I am sure there is a more reasonable path to doing this but That will take time to get.

     
  4. Quicktiger

    Quicktiger Guest

    This isn't how a database should act.  I don't know what engine they are using, but if you have an index (and I suppose I'm requesting that it be...) on last_update, then it is quite easy for the engine to start anywhere and retrieve a number of items at that point.

    Remember that I'm not saying "give me all toons since" but just "give me 25 since" and will walk through those 25.

     
  5. Bella

    Bella Guest

    It may not be a matter of just indexing though but of joins.  We have no idea how many tables across how many systems we are looking at here. 

    I'm still trying to understand the need for the bulk load however.

     
  6. Proopai

    Proopai Guest

    Well the reason why I am looking at needing bulk is the system I am making is going to allow you to compair your armor to others of the same class not only on your server but to other servers to give you a idea of what you can upgrade too.  Thats the plan I have at least but this requires me caching the data into my own database then using that information to display what I need.

     
  7. DanKinney

    DanKinney Guest

    I think this is an interesting thread.  I am thankful for the conversation.

    The goal of this API is to enable these types of applications without having to require a bulk download of data.  If there is a performance issue, we will work to identify and resolve it quickly.

    I am quite certain that there will be types of uses that will require bulk download and I am not completly opposed to it.  The costs to this are going to be substantial to many people - bandwidth, storage, scaling, technical, etc.  There are going to be issues of data "latency".  Not many people are going to be able to do it well.  I want the API to take the burden off of that need.

    However, I do think that the next cool thing is going to come from finding the interesting new thing that finds the social magic between in-game items or interaction and what triggers the emotional connection between humans.  The innovation will be in finding the right connections; the right questions that are worth answering.  

    -dan




     
  8. Proopai

    Proopai Guest

    One thing I was looking at doing is allowing people access to the data I have Via API calls to my server which in turn would allow the person to get information 2nd hand form SoE's server and not have to hit the server till my site did its update.

    There is other ways but this was one that I thought might work better.

     
  9. DanKinney

    DanKinney Guest

    Now, that is something that I have a problem with.  I would like to maintain that this API should be the source of truth.  I don't want to enable multiple alternative universes out there.

    If you are going to provide the data, provide the data.  Otherwise, let us handle it.

    If you want to use your site to cache a request to the API, that is different.  But providing a separate path to the data is probably unwise.

    -dan

     
  10. Quicktiger

    Quicktiger Guest

    In my eq2mission.flame.org site I do export data via yaml, json, and XML which generate, but I made an immediate decision to not export data gathered from other sources, census.daybreakgames.com being the only source so far.

    Any data that is cached always has the "last updated at" tag.  I also make it clear that there is a lag.  Also, this data is only HTML, so eyeballs are the intended destination of the cached data.

    My own mashing up of the data I gather, however, I may choose to export at some point.  It's a tricky topic.  I do believe census.daybreakgames.com is the authority of absolute truth, and that is something that SoE may want to clearly state.  A recommended (but I hope not required) cache policy for developers would also be good, I think.

     
  11. Proopai

    Proopai Guest

    Ok the way I state it was not taken in the way I ment it to be.  SOE is the only source the way quicktiger is doing the data from his site is what I was meaning not like I would give them data that I wanted sorry if it sounded like that I agree 100% that sony needs to be the ONLY TRUE data source.

     
  12. Bella

    Bella Guest

    This is why I have issue with the bulk and I have seen your site.  I have issue with the "feeds" that I saw on there.

    [url="http://eq2mission.flame.org/about/developer">http://eq2mission.flame.org/about/developer[/url]

    I don't believe you should be providing these feeds.  Caching data for performance is one thing.  Full blow loads and then providing exports is another.

    The feeds that SOE is providing can provide the querying that is needed for almost everything I can think of that I would consider "legit" for these feeds and the priviledge of using them.

     
  13. DanKinney

    DanKinney Guest

    I can state that I would like census.daybreakgames.com to be the clear authority of absolute truth for this information.  This is why we are doing it.  I can't realistically stop you from providing feeds to others from your site(s).  In fact, you may be adding features that we don't provide in our feeds.  If you find a niche, I encourage you to fill it.

    One way to resolve this is to put attribution on your site that points folks back to census.daybreakgames.com for basic feeds to encourage others to come back to SOE for basic information.  Please don't provide un-differentiated feeds - that will only confuse new developers.

    I am going to be working on providing more tools and documentation for the feeds as soon as I can.  We are trying to grow the capabilities of this API on (at least) 3 different axes - availability, performance, additional games.

    -dan

     
  14. Quicktiger

    Quicktiger Guest

    I've experimented with the newly working c:sort=last_update and last_update=]<timestamp> method of retrieving data.  I've seen some odd results...

    If I walk the list (say from time stamp 1328050456 onwards) I will get some new characters, and some updated ones.  If I then wait a few hours and walk the SAME list again, which should be pretty much fixed ordering, I will get new characters and even some updates in there.

    That is, between time stamps of any two values, when walking them, I would expect the data to be mostly unchanged.

    Does making a character public when it was private change last_update?  Are there any corner cases where last_update may not be changed but the character will appear in the database?

    Also, are there load balancer issues coming into play where my request may go to one server this time, and another next, which may have slightly differing views of the database?  How long would it be expeced for those to sync up?

    If I have to lag Sony's database by a few hours this should not pose a problem in practical terms.  Even a day would not be that painful.  I'd expect things to stabalize after about an hour acrross your server farm, more likely sooner.

     
  15. Quicktiger

    Quicktiger Guest

    Specifcally, between 1328050062 and 1328050083 a new character was added to the collection, which walking this time range previously did not detect.

    I fear that walking the list without any ordering is going to hurt your servers eventually, as the data will become large enough that the walking that takes 23000 seconds now (about 6 hours or so) will eventually be prevented as being a strain on resources.  I'd get that.

    However, I really would like to be able to continue to do things like the leaderboards (not just show individual character rankings) on any character or guild field, or even mash-ups of multiple fields.

    <a rel="nofollow" href="http://eq2mission.flame.org/leaderboards"]http://eq2mission.flame.org/leaderboards[/url] and http://eq2mission.flame.org/summary are things I would miss doing, even when the new players site appears.

     
  16. DanKinney

    DanKinney Guest

    I would do your tests again.  Yesterday saw a great number of changes...not only the change in the format, but we converted every character into the format in the database.  All of this happened while the game was up and updating.

    If things were stable before the migration, they will return to that.

    -dan

     

Share This Page