DISQUS

Brizzled: http://brizzled.clapper.org/id/88

  • schlenk · 9 months ago
    Please clarify the results on 'Percentage of requests served within a certain time', those look weird. Is that a kind of 'median' time for x % of the total of 100.000 requests or is it the time-per-request for 50% of the requests? At least the values for SocketServer look really weird.
  • Brian Clapper · 9 months ago
    Those numbers are straight from the ApacheBench output. My understanding is that the figures mean, for instance, that the Scala server served 50% of the total requests within 81 milliseconds. Yes, the SocketServer percentage statistics are very odd, but they're also very consistent across runs. I haven't had time to dig into why they're so strange; I'm betting the answer hinges on ApacheBench's interpretation of the term "requests served". When I do have time to dig into it further, I'll post my findings. If someone beats me to it, I'd love to hear about it.
  • Glyph Lefkowitz · 9 months ago
    Interesting results. Thanks for sharing them.

    I am sad that Twisted fared so poorly in this comparison. Twisted isn't really heavily optimized; I wish we had more time to do performance analysis on it.

    One thing you might want to try to validate these results, though, would be to re-run the test with httperf. I'm not suggesting that Twisted would fare better as a result, but when doing benchmarks of my own I've found 'ab' to be buggy and sometimes grossly misreport its results. httperf, on the other hand, has been very reliable.
  • Brian Clapper · 9 months ago
    When time permits, I'll do just that, and I'll post the results.

    I'm also disappointed at Twisted's performance. Its programming model tends to pervade an application; ripping it out or replacing it could be annoying.
  • Jean-Paul Calderone · 9 months ago
    I quickly hacked (~3 line diff, but it was definitely a hack) your Twisted web server version to use 3 cores on a 4 core machine (leaving the last to ab). Performance went from ~1400r/s to ~6000r/s. I assume the main advantage the Scala version has over the rest is that it's actually exploiting most of the available hardware than any of the Python versions, so this change brings the Twisted web version onto even footing with it. :) I realize this isn't in the spirit of your "naive approach" servers, but I thought I'd point it out in case you hadn't considered just using more processes instead of switching all of your development to a new language. This should be easy, since your case seems to involve little or no state shared between requests; lots of Twisted is geared towards the opposite, where it's a bit harder to split things up into different processes.
  • Brian Clapper · 9 months ago
    Multiple processes is definitely one of the ways we've been scaling--though in our production servers, it's not as easily accomplished (due to some shared state) as it is with these brain-dead test servers of mine. And you're right: Converting all of our code to a new language isn't a palatable idea at all. Much as I would love to be working in Scala, it's difficult to justify a big code conversion.
  • Ken Faulkner · 9 months ago
    I'm very dubious about that SocketServer based response. Particularly the 2ms for 90% of the queries. One trivial thing you can do to speed it up is to use a threadpool instead of creating a thread every time from scratch. Specifically try the http://code.activestate.com/recipes/574454/ bit of code.

    On my machine (MBP-2.4Ghz) I went from 2896 rps to 3759.

    Any chance you could try and same recipe and see how the results compare to what you already have?

    I was also questioning the checking character by character for the newline, but in the big scheme of things it didn't add any time to the execution time.

    Ken
  • Brian Clapper · 9 months ago
    Run it yourself. Tell me what YOU see. I get a similar result every time. Another colleague thinks the interpretation of that benchmark should be, "50% of the requests were delivered in 1 millisecond or less". That interpretation makes more sense to me. It also says that there's one freakin' big outlier in that data set.

    And, yes, a threadpool is a trivial way to speed it up. In fact, the Scala Actor implementation does exactly that: It multiplexes 1,000 lightweight actors across a thread pool. (The Scala library handles that. I didn't have to do that myself.) And I have done exactly that same thing with Java, using java.util.concurrent.

    Twisted also does that, to a degree. But there are limits to the scalability of threads in CPython, owing to the GIL. A better solution (one Guido, himself, often recommends, when stating that he refuses to remove the GIL) is to use a process pool, not a thread pool. See the comment by Jean-Paul Calderone, above, for instance.
  • Ken Faulkner · 9 months ago
    I'll eventually run it myself, but was wondering what it looked like on your system. No matter.
    Yeah, normally I have a small process pool and within each process is a thread pool, usually works out well.

    Do I take it that the interpretation is 50% of the queries were delivered in 1ms or less (in TOTAL)... or each of those requests (in the 50%) were delivered in 1ms ? AB always gets me with its terminology.. :)
  • Brian Clapper · 9 months ago
    Ken,

    AB's docs suck. I could not find any documentation telling me how to interpret the percentage stats, and I did not have time to dig through AB's source code to figure it out. I think the second interpretation ("each of the requests in the 50% were delivered in 1ms or less") makes more sense, though, given the data.

    As for running it again here, I ran it a number of times, after getting those weird results the first time for the SocketServer implementation. Each time I ran the test, the stats were pretty much the same. I haven't had time to dig into it further--especially since I have no intention of actually using a SocketServer implementation at work. I included it mostly for comparison.
  • Zor · 9 months ago
    So your application returns hardcoded strings and throughput is the bottleneck then, right ?

    Possibly of interest :
    http://jessenoller.com/code/pycon_jnoller_multi...
    http://code.google.com/p/unladen-swallow/wiki/P...
  • Brian Clapper · 9 months ago
    Yes, that's an accurate description of the test servers. I'll look at those links. Thanks.
  • zor · 9 months ago
    Probably of some interest to you
    http://us.pycon.org/2009/conference/schedule/ev...
  • Cyril B. · 8 months ago
    It seems you missed the SocketServer.request_queue_size option [1] which sets the backlog size. By default it's set to 5, which is way too low for the tests you made. After setting it to 50, here are my results for 50 requests/sec (with warm-up, same columns as you):

    fapws3 : 5886.33 / 8.494 / 8 / 9 / 11 / 13
    Scala : 5467.53 / 9.145 / 9 / 11 / 12 / 20
    SocketServer : 3354.10 / 14.907 / 15 / 15 / 16 / 20

    That was done on Ubuntu 9.04 with Python 2.6, Scala 2.7.3 and Java 1.6.0_13 and an Intel Core 2 Duo 1.86 GHz.

    [1] http://docs.python.org/library/socketserver.htm...
  • Brian Clapper · 8 months ago
    Good catch, Cyril. Thanks.
  • Johann Romefort · 6 months ago
    You might also want to have a look at Nagati, which is Scala Actors + MINA written by one of the Twitter folks : http://robey.lag.net/2009/03/02/actors-mina-and...
  • Brian Clapper · 6 months ago
    Great pointer, Johann. Thanks.
  • Sebastian · 6 months ago
    I tried the Scala/Mina combination of robey but there is practically no documentation which doesn't help if you don't have extensive Scala knowledge. So a sample benchmark-implementation would be much appreciated :)
  • Den Shabalin · 4 months ago
    Using Stackless (http://zope.stackless.com/) instead of CPython might give some extra perfomance gain.