2718.us blog » scribbld http://2718.us/blog Miscellaneous Technological Geekery Tue, 18 May 2010 02:42:55 +0000 en hourly 1 http://wordpress.org/?v=3.0.4 Limitations of lj-stat http://2718.us/blog/2008/04/13/limitations-of-lj-stat/ http://2718.us/blog/2008/04/13/limitations-of-lj-stat/#comments Sun, 13 Apr 2008 20:23:47 +0000 2718.us http://2718.us/blog/?p=16 To the best of my knowledge and research, my LJ-code-base Site Statistics page (lj-stat) has the most comprehensive list of sites running off of LiveJournal’s codebase (if you know of any that I’ve missed, please let me know).  The main point, though, is the comparative statistics.  This is where things get strange.  LJ and most of the sites provide a pretty statistics page at /stats.bml and in most (or all?) instances, stats.bml says at the top (this is from LJ itself)

Raw data can be picked up here.

where “here” links to /stats/stats.txt.  On at least one site, stats.bml has this text, but stats/stats.txt returns a 404.  On at least one site, both stats.bml and stats/stats.txt return 404.  Since it looks to me like the whole point of providing stats.txt was to provide a more machine-readable set of stats that didn’t require loading a full web page and screen-scraping, I have no intention of trying to screen-scrape the info I want.

Now, to make things even stranger, some sites are missing what I’d call “key” stats from their stats.txt files.  In particular, the one I care most about is the “active in some way in the past 30 days” measure since I think that’s the best measure of the vitality of a site (well, either that, or what portion of the total userbase it represents).  Stranger still is that some sites report numbers in stats.txt that not only don’t match stats.bml, but make no sense whatsoever (DeadJournal perpetually reports only 10 accounts updating in the past 30 days, even though stats.bml has more sensible numbers).

Unrelated to the content of stats.txt is the “Speed Index” column–based on the rate of transfer reported by libcurl when retrieving stats.txt, where the speed index of a site is given as the percentage of the fastest transfer rate.  What I don’t quite understand is how InsaneJournal is always at least twice as speedy as any other site, often at least 4x or 6x the speed.  It actually made me wonder if my server and theirs were somehow in the same datacenter or something, but there are at least a dozen hops between us (which is more than from my server to some other LJ-based sites), so maybe it does have something to do with the servers themselves and not just network conditions.

Please let me know if you have any suggestions about enhancements to lj-stat.  Also, feel free to try to convince the sites that don’t provide stats.txt to start providing it and to try to get sites where the numbers are clearly wrong to try to fix it.

]]>
http://2718.us/blog/2008/04/13/limitations-of-lj-stat/feed/ 0