On Mindcraft's April 2022 Benchmark

On Mindcraft's April 2022 Benchmark


I just compiled [Apache] using  almost all of the modules disabled  I'm using the highperformance-conf.dist config file from the distribution." See also Karthik's post on linux-kernel and its followups. This sounds rather like the behavior Mindcraft reported ("After the restart, Apache performance climbed back to within 30% of its peak from a low of about 6% of the peak performance"). Kernel issue #2 - Wake-One vs. The Thundering Herd



 (Note: According to the Linux Scalability Project's paper on the thundering herd problem, a "task exclusive" wake-one patch is now integrated into the 2.3 kernel; however, according to Andrea, as of 2.4.0-test10, it still wakes up processes in same order they were put to sleep, which is not optimal from a caching point of view. It would be more efficient to have the reverse order. See also Nov 2000 measurements by Andrew Morton (andrewm@uow.edu.au); post 1, post 2, and Linus' reply.) Phillip Ezolt, 5/05/99 in linux-kernel. "Overscheduling DOES occur with high web server loads." "): "When running a SPECWeb96 strobe with Alpha/linux I found that 18% of my time was spent in scheduling. (Russinovich talked about something like this in his critique of Linux.) This post sparked a lively thread in linux kernel (now in its second-week). It looks like Apache and the scheduler are up for some changes. - Rik van Riel, 6 May 1999, in linuxperf (Re: [linuxperf] Possible fix for Mindcraft Apache problem): ... The web benchmark's main problem remains. The way Apache and Linux 'cooperate' is a problem. This means that when a signal is received, all processes are woken and the scheduler must choose one of the many new runnable processes ..... The real solution is switching from wake-all semantics into a wake-1 style. This will avoid the runqueues Phillip Ezolt at DEC experienced. The good news is that it's a simple patch that can probably be fixed within a few days... - Tony Gale, 6 May 1999, in linuxperf ( Re: [linuxperf] Possible fix for Mindcraft Apache problem): Apache uses file locking to serialise access to the accept call. This can be very costly on some systems. I haven't found the time to run the numbers on Linux yet for the 10 or so different server models that can be employed to see which is the most efficient. Check Stephens UNPv1 2nd Edition Chapter 27 for details. - Andrea Arcangeli on May 12, 1999 in linux-kernel. [patch] wake_one to accept(2) [was Re] Overscheduling DOES occur with high web server load. 2.2.8_andrea1.bz2) - I released a new andrea patch against 2.2.8. This new one has my new wake-one on accept(2) strightforward code (but to get the improvement you must make sure that your apache tasks are sleeping in accept(2), a strace -p `pidof apache` should tell you that). You can find the patch here. David Miller's response to the above:...on every new TCP connection there will be two spurious and unneeded wakeups. These originate in the write_space socket socket callback because we free up the SYN frame, which wakesup listening socket sleepers. I have been working on this exact issue today. Ingo Molnar, May 13, 1999 in linux-kernel. Re: [RFT] 2.2.8_andrea1 Wake-one [Review: Overscheduling DOES occur with high web server loads. ]): note that pre-2.3.1 already has a wake-one implementation for accept() ... and more coming up. - Phillip Ezolt (ezolt@perf.zko.dec.com), May 14th, 1999, in linux-kernel ( Great News!! Was: [RFT] 2.2.8_andrea1 wake-one ): I've been doing some more SPECWeb96 tests, and with Andrea's patch to 2.2.8 (ftp://ftp.suse.com/pub/people/andrea/kernel/2.2.8_andrea1.bz) **On identical hardware, I get web-performance nearly identical to Tru64! **... Tru64 4ms2.2.5 100ms2.2.8 9ms2.2.8_a4ms... I get web-performance almost identical to Tru64, according to this Iprobe data: The number of SPECWeb96 maxOps per second has increased as well. **Please put the wakeone patch in the 2.2.X Kernel if it isn’t already. ** Larry Sendlosky tried this patch, and says: Your 2.2.8 patch really helps apache performance on a single cpu system, but there is really no performance improvement on a 2 cpu SMP system.



 below. Also see: - Dimitris Michailidis, 14 May 1999 in linux–kernel. ([PATCH] scheduler fixes, improvements and improvements). -- several improvements to the 2.2.8 scheduler. - Andrea Arcangeli, Andrea@suse.de, 21 May 1999 in linux–kernel. (Re: andrea buffer code (2.2.9–C.gz.) ) -- update. Some SMP bottleneck issues might also be addressed. Kernel issue #3: SMP Bottlenecks in 2.2 Kernel



Juergen Schimmel, May 19th 1999 in Linux-kernel ( Bad Apache Performance wtih the linux SMP), asked why Apache is performing poorly under SMP. Andi Kleen said that it is most likely that TCP data copy runs completely serialized. This can be fixed by replacing the skb->csum = csum_and_copy_from_user(from, skb_put(skb, copy), copy, 0, &err); in tcp.c:tcp_do_sendmsg with unlock_kernel(); skb->csum = csum_and_copy_from_user(from, skb_put(skb, copy), copy, 0, &err); lock_kernel(); The patch does not violate any locking requirements in the kernel... [To fix your connection refused errors,] try: echo 32768 > /proc/sys/fs/file-max echo 65536 > /proc/sys/fs/inode-max Overall it should be clear that the current Linux kernel doesn't scale to CPUs for system load (user load is fine). I blame the Linux vendors for advertising it, although it is not true. ... All of these problems are being addressed. [2.3 will be the first to be fixed. Then, the changes will be backported from 2.2]. [Note] Andi's TCP unlocking fix appears in 2.2.9.ac3. Andrea Arcangeli responded describing his own version of this fix ( ftp://ftp.suse.com/pub/people/andrea/kernel/2.3.3_andrea2.bz2 ) as less cluttered: If you look at my patch (the second one, in the first one I missed the reaquire_kernel_lock done before returning from schedule, woops :) then you'll see my approch to address the unlock-during-uaccess. My patch doesn't change tcp/ip extension2, etc... but it only affects uaccess.h. and usercopy.c. I don't like to have unlock_kernel everywhere. Juergen Schmidt, 26 May 1999, on linux-kernel and new-httpd, ( Linux/Apache and SMP - my fault ), retracted his earlier problem report: I reported "disastrous" performance for Linux and Apache on a SMP system. I downloaded clean kernel sources (2.2.8 and 2.2.9) to double-check. These do not have the reported penalty for running on SMP system. My mistake was to use the kernel sources I had installed (which I patched to 2.2.5 to 2.2.8 after seeing the first very bad results). These sources had already been modified before I got the machine. They should have been thrown away in the first instance. Please excuse my confusion. Others have reported modest performance gains (around 20%) when Andrea's SMP fix is used, but only when using large files (100kilobytes). Juergen is now done with his testing. Unfortunately, he neglected to compile Apache with -DSINGLE_LISTEN_UNSERIALIZED_ACCEPT, which ( according to Andrea) significantly hurt Apache performance. Juergen may have missed this. It means that it is too difficult to solve. To make it easier to get good performance in the future, we need the wake-one patch added to a stable kernel (say, 2.2.10), and we need Apache's configuration script to notice that the system is being compiled for 2.2.10 or later, and automatically select SINGLE_LISTEN_UNSERIALIZED_ACCEPT. Other Apache users can help solve performance problems



Mike Whitaker (mike@altrion.org), 22/05/99 in linuxperf - High load under Apache1.3.3/mod_perl1.16/Linux2.2.7SMP ), described a performance problem. A typical webserver has a dual PII450 and split httpds.  Minecraft Mods Typically 300 static pages serve the pages, and proxy to 80-100 mod_perl adverts. Unneeded modules are disabled and hostname lookups are turned off as a sensible person would. There's typically between one and three mod_perl hits/page on top of the usual dozen or so inline images... The kernel (2.2.7) has MAX_TASKS upped to 4090, and the unlock_kernel/lock_kernel around csum_and_copy_from_user() in tcp_do_sendmsg that Andi Kleen suggested. The performance is.. interesting. The load on the machine fluctuates between 10 to 120, while the CPU of the user goes from 15% (80% idle, machine *crawling*) to 180% (85% idle, machine *crawling*), approximately once every minute and a quarter. vmstat displays the number and type of processes in a run state. It ranges from 0 when load is low to 30-40 to 60-40, while the static servers manage 60-70 hits per second. Without the dynamic httpd's everything *flies*... After being advised to try a kernel with wake-one support, he wrote back: We're up with 2.3.3 plus Andi Kleen's tcp_do_sendmsg patch plus Apache sleeping in accept() on one production server, and comparing it against a 2.2.7 plus tcp_do_sendmsg patch plus Apache sleeping in flock(). Identical systems (dual PII450 and 1G, two disk drivers). As far as I can tell, the wake-one patch seems to be doing its job: The 2.2.7 machine still manages loads of three figures, while 2.3.3 has not yet managed a load of one. Unfortunately, observation suggests that about one connection is being lost/ignored by the 2.3.3 machine/Apache combination in ten. (Network error - connection reset by peer. His next update, which was posted on May 25, reads: More progress at the bleeding edge. (Remember that the config is split static/mod_perl HTTPD's with a very CPU-intensive mod_perl Script serving advertisements as an SSI, as the probable bottleneck.) Linux kernel version 2.2.9 plus the 2.2.

9_andrea3 (wake-1) patch seems to work. It can handle hits at a speed that suggests it's pushing its adverser to the limit. (As I said in a previous note, avoid 2.2.8 like the plague: it trashes HDs - see threads on linux-kernel for details.) However... But... Once the idle CPU drops to zero (i.e. its spending most of its time processing advert requests, everything goes unpleasantly pearshaped, with a load of 400+, and the number of httpd's on both types of server *well* above MaxClients (in fact, suspiciously close to MaxClients + MinSpareServers). This can be caused when there is a spike in demand. Once this happens, it becomes difficult to get out of this state. It is possible to counterintuitively take the next step and *REDUCE* MaxClients. Then, you hope that the TCP Listen queue can handle a load spike. This is a proven strategy that works. (Aside: this is a perfect case for using something like Eddieware's load balancing DNS). - Eric Hicks, 26 May 1999, in linux-kernel ( Apache/kernel problem? ): ... I am having major problems with the fact that a single PII 400Mhz, or a single AMD 400, will outrun a dual PII 450 when Apache requests are made. ... HTTP Server Tests Data: 100 1MByte MPEG files stored on local drives. Results: - AMD 400Mghz K6, 128MB, Linux 2.0.36; handles 1000 simultaneous clients @ 57.6Kbits/sec. - PII 400Mghz, 512MB, Linux 2.0.36; handles 1000 simultaneous clients @ 57.6Kbits/sec. - Dual dual PII/450Mghz with 512MB, Linux2.0.36 and 2.2.8; handles far fewer clients concurrently @57.6Kbits/sec (and even then, clients were experiencing 5 second connection delays and'resetby peer' and "connection time out" errors).


I advised him not to try 2.2.9_andrea3 but he said he would and would report back. Kernel issue #4: Interrupt Bottlenecks



 According to Zach, the Mindcraft benchmark's use of four Fast Ethernet cards and a quad SMP system exposes a bottleneck in Linux's interrupt processing; the kernel spent a lot of time in synchronize_bh(). (A single Gigabit Ethernet cable would lessen this bottleneck. Mingo claims that TCP throughput scales better with more CPUs in 2.3.9 than in 2.2.10; however, he hasn’t yet tried it with multiple Ethernets. Steven Guo, Steve Underwood, and Steven Guo have also posted comments about reducing interrupts when under heavy load. See also Linus's "State of Linux" talk at Usenix '99 where he talks about the Mindcraft benchmark and SMP scalability. SCT's Jan 2000 comments concerning progress in scalability. Softnet is coming! Kernel 2.3.43 adds the new softnet networking changes. Softnet modifies the interface to the networking card drivers. This means that every driver needs to be updated. However, large SMP systems should see a much higher network performance. (For more information, Alexy's softnet-howto or his February 15 post about how you can convert old drivers to Softnet) The thread Gigabit Ethernet Pipeline Bottlenecks, which was posted in February 2000, has many interesting tidbits about what interrupt (and other bottlenecks) remain and how they are being addressed by the 2.3 kernel. Ingo Molnar's post of 27 Feb 2000 describes interrupt-handling improvements in the IA32 code in great detail. These improvements will be integrated into core kernel 2.5, it seems. Kernel issue #5: Mysterious network slowdown



This is a bug and not a scaling issue. Many 2.2 users reported that networking speeds can sometimes slow down to 1-10% of normal. This is often due to high ping times and high network traffic. Users have reported that temporarily cycling the interface fixes this problem. Oystein Skandsen reported 29 June 1999: We have experienced TCP performance slowdowns after upgrading to the new 2.2 series. After I have taken down the interface, I can restore normal performance by reinserting the eepro100 module in the kernel. Once that is done, the performance is good for a few days, or even weeks. David Stahl reported on 29 Jun 1999: I have 3 computers running 2.2.10 [with multiple]3COM 905b PCI [cards ]...] After approximately two days of uptime I will begin to notice ping times jump to 7-20 secs on the local network. As others have noted, there is no loss. There is just some damn high latency. ... It seems to depend on network load -- lighter loads mean longer periods between problems. It's also gradual. It starts at 4 seconds pings, then 7 seconds pings around 20 minutes later, and then it goes up to 12-20 second pings 30 minutes later. - Another eepro100 alert. A tulip report. Less repeatable. - David Stahl wrote on 13 July 1999: What DID fix the problem was a private reply from someone elese (sorry about the credit, but i'm not in the mood to sieve 10k emails right now), to try the alpha version of the latest 3c59x.c driver from Donald Becker (http://cesdis.gsfc.nasa.gov/linux/drivers/vortex.html). 3c59x.c:v0.99L 5/28/99 is the version that fixed it, from ftp://cesdis.gsfc.nasa.gov/pub/linux/drivers/test/3c59x.c - On 23 Sep 1999, Alexey posted a one-line patch that clears up a similar mysterious slowdown. 2.2.13 and Red Hat 6.1 already have this patch applied. This patch was applied to three Red Hat 6.0 systems that I know have Masq support compiled in and connected to cable modems. It fixed a bug that caused extremely high pings following even short bursts heavy TCP transfers to distant host. Rickard Cedergren & Michael Brown reported on linux kernel on October 21st, that while Alexey's patch significantly improved the problem it is still not completely eliminated. Tony Hoyle has been experiencing some long delays with 2.2.13. Jeremy Fitzhardinge reported a new delay. The replies indicate that it is likely due to a Tulip driver. Kernel issue #6. 2.2.x/NT-TCP slowdown



Petru Paler, July 10, 1999 in linux-kernel ([BUG] TCP connections among Linux and NT ), reported that any type of TCP connection between Linux 2.2.10 and a NT Server 4 Service Pack 5 slows down to a crawl. With 2.0.37, the problem was less severe (6kbytes/sec). Andi Kleen provided a log of a slow connection with tcpdump. This allowed Andi to see that NT took a long time to ACK a particular data packet, which was causing Linux stall. Solved: false alarm! It wasn't Linux' fault at all. It turned out that NT had to be told not to use full duplex mode with the ethernet card. Kernel issue #7 - Scheduler



Phil Ezolt (22 January 2000 in linux_kernel): Re: Interesting analysis on linux kernel threading from IBM: When I run SPECWeb96 test here, I see both a large quantity of running processes as well as a lot context switches. ... Here's an example of the vmstat information: procs memory Swap io system CPU r bw SWPD free buff cache sio bo in Cs Us sy id... 24 0/0 2320 2066936 590088 61061464 0/0 0/0 8680 7404 3 96 1 24 1 0/0 2320 22065752 590088 61061464 0/0 0/0 8680 10920 3 95 1 notice. 24 running processes and 7000 context switches. This is a lot. Every second, 7000*24 goodnesses is calculated. Not the (20*3) that a desktop system sees. This is a matter of scalability. A better scheduler means better scalability. Don't tell me that benchmark data is useless. If you are unable to give me data using real systems and tell me where the faults are, then benchmark data will be useless. SPECWeb96 pushes Linux to the limit until it bleeds. I'm going tell you where it is bleeding. You have two options: fix it or bury yourself in the sand. It might not be what your system is seeing today, but it will be in the future. Would you rather fix it right away or wait until someone else does? ... Here's an interesting fact. During my runs, I see 98% contention on the [2.2.14] kernel lock, and it is accessed ALOT. I don't have much memory support so I don't know how it compares to 2.3.40. Andrea will likely be kind enough to send me a patch. Then I can see if the situation has improved. [Phil's data refers to the web server that is undergoing the SPECWeb96 testing. It is an ES404 CPU alpha EV6 with Redhat 6.0 w/kernel v2.2.14 & Apache-v1.3.9 w/SGI Performance patches; the interfaces that are receiving the load are two ACENicGigabit Ethernet cards. Kernel issue #8 - SMP bottlenecks in 2.4 kernel



 Manfred Spraul, April 21, 2000, in linux-kernel ( [PATCH] f_op->poll() without lock_kernel()): kumon@flab.fujitsu.co.jp noticed that select() caused a high contention for the kernel lock, so here is a patch that removes lock_kernel() from poll(). [tested] with 2.3.99.pre5. Although there was some debate about whether this was a wise decision at this late date Linus and David Miller were enthusiastic. Looks like one more bottleneck bites the dust. On 26 April 2000, kumon@flab.fujitsu.co.jp posted benchmark results in Linux-Kernel with and without the lock_kernel() in poll(). A kernel patch was posted to improve checksum performance. Another patch was added for Apache 1.3 to force it align its buffers to 32 word boundaries. Linus praised the Dean Gaudet patch. He rumored that it could speed up SPECWeb results by 3%. This thread was interesting. See also LWN's coverage, and the paragraph below, in which Kumon presents some benchmark results and another patch. Kernel issue #9: csum_partial_copy_generic



 kumon@flab.fujitsu.co.jp, 19 May 2000, in linux-kernel ( [PATCH] Fast csum_partial_copy_generic and more ) reports a 3% reduction in total CPU time compared to 2.3.99-pre8 on i686 by optimizing the cache behavior of csum_partial_copy_generic. ZD's WebBench was used for the workload. He adds The benchmark we used has almost same setting as the MINDCRAFT ones, but the apache setting is [changed] slightly not to use symlink checking. We used 24 independent clients, and the maximum number of apache processes was 16.

A four-way XEON procesor system is used, and the performance is twice and more than a single CPU performance. ZD's benchmarks with Z2.6 showed that a 4 CPU system only achieved 1.5x faster than a single CPU. Kumon reports a > 2x speedup. This is about the same speedup that NT4.0sp3 saw with 4 CPUs for 24 clients. It's encouraging news to hear that things might have improved in the 11 month since the 2.2.6 testing. Kumon indicated that major improvements were made between pre3 & pre5, poll optimization. Until pre4 (I forget exact version), kernel-lock prevents performance improvement. If you are able to retrieve l-k emails between Apr 20-25, these mails will help explain the background. Subject: namei()Query subject: [PATCH]f_op->poll()withoutlock_kernel() subject lockless poll() (was RE: namei()Query) subject: "movb” spin-unlock (was RE: namei()Query).


Kumon posted again on 4 September 2000, noting that his changes hadn't been implemented into the kernel. Kernel issue #10 - getname(), poll() optimizations



 On 22 May 2000, Manfred Spraul posted a patch on linux-kernel which optimized kmalloc(), getname(), and select() a bit, speeding up apache by about 1.5% on 2.3.99-pre8. Kernel issue #11: Reducing lock contention, poll overhead in 2.4



 On 30 May 2000, Alexander Viro posted a patch that got rid of a big lock in close_flip() and _fput(), and asked for testing. Kumon ran a benchmark. He reported: I measured viro’s ac6D patches with WebBench using the 4cpu Xeon systems. I applied to 2.4.0.test1 instead of ac6. The patch decreased stext_lock time by 50% and OS time by 4%. ... Do_select can cause some overhead with kmalloc/kfree. This can easily be eliminated by using a small array on a stack. Kumon posted a patch that avoided kmalloc/kfree using select() or poll() when the number fd's involved was less than 64. Kernel issue #12: Poor disk seek behavior in 2.2, new elevator code in 2.4



 On 20 July 2000, Robert Cohen (robert@coorong.anu.edu.au) posted a report in Linux-kernel listing netatalk (appletalk file sharing) benchmarks comparing 2.0, 2.2, and several versions of 2.4.0-pre. The elevator code in 2.4 seems helpful (some versions can handle 5 benchmark clients instead o... The test4 and the test5pre2 are not as successful. They manage 2 clients on a 128Meg server well, so they're doing much better than 2.2. But they choke and go to seek bound with 4 clients. Things have changed a lot since test1 - ac22. Here's an update. The *only* 2.4 kernel versions that could handle 5 clients were 2.4.0-test1-ac22-riel and 2.4.0-test1-ac22-class 5+; everything before and after (up to 2.4.0-test5pre4) can only handle 2. Robert Cohen posted a patch on 26 Sept 2000. It included a simple program that demonstrated the problem. Jens Axboe (axboe@suse.de) responded that he and Andrea had a patch almost ready for 2.4.0-test9-pre5 that fixes this problem. Robert Cohen posted an update on 4/10 2000 with benchmark results for many kernels. This showed that the problem was still present in 2.4.0-test9. Kernel issue #13: Fast Forwarding / Hardware Flow Control



Jamal (hadi@cyberus.ca), posted a note in Linux kernel on 18 September 2000. It described proposed changes to the network driver interface of the 2.4 kernel. The changes include hardware flow control and other refinements. He says Robert Olson and I decided after the OLS that we were going to try to hit the 100Mbps(148.8Kpps) routing peak by year end. I fear the bar has been raised. Robert is already hitting with 2.4.0-test7 ~148Kpps with a ASUS CUBX motherboard carrying PIII 700 MHZ coppermine with about 65% CPU utilization. I was able to achieve a steady value of 110Kpps using a single PII-based Dell machine. As an example of how to use feedback values, a modified tulip driver was created by Alexey for Linux 2.2 and mod'ed over time by Robert. ... I believe we could have done better with the mindcraft tests with these changes in 2.2 (and HW FC turned on). [update] BTW: I was informed that Linux users were not allowed to modify the hardware during those tests. I don't think they could have used these modifications if they were available then. Kernel tuning issue: hitting TIME_WAIT



Takashi Richard Horikawa published a report in Linux Kernel on 30 March 2000 listing SPECWeb96 results both for 2.2.14 & 2.3.41. Performance between a 2.2.14 client and a 2.2.14 server was poor because few enough ports were being used that ports were not done with TIME_WAIT by the time that port number was needed again for a new connection. It is possible to tune the client or server to use a larger port range, e.g. with echo 1024 65535 > /proc/sys/net/ipv4/ip_local_port_range to avoid bumping into this situation when trying to simulate large numbers of clients with a small number of client machines. The problem was resolved by Mr. Horikawa on 2 April 2000. Suggestions on future benchmarks



Become familiarized with linux kernel and Apache mailing lists, as well the Linux newsgroups that exist on Usenet (try DejaNews power-searches in forums matching "*linux *').". See if people agree with your configuration. Also, be open about your benchmark; post intermediate results, and see if anyone has suggestions for improvements. These mailing lists will allow you to spend at least a week discussing ideas with them. If possible, use a modern benchmark like SPECWeb99 rather than the simple ones used by Mindcraft. To better simulate the Internet's situation, it might be possible to inject latency between the client and the server. If possible, benchmark single and multi-core CPUs and single and many Ethernet interfaces. Be aware that the networking performance of version 2.2.x of the Linux kernel does not scale well as you add more CPUs and Ethernet cards. This is mostly true of static pages and cached Dynamic pages. Noncached dynamic Pages take a lot of CPU-time and should scale well with additional CPUs. To save frequently generated pages, caches can be used to speed up dynamic page speeds. When testing dynamic content: Don't use the old model of running a separate process for each request; nobody running a big web site uses that interface anymore, as it's too slow. Use a modern interface for dynamic content generation (e.g. Apache mod_perl Configuring Linux



 Tuning problems probably resulted in less than 20% performance decrease in Mindcraft's test, so as of 3 October 1999, most people will be happy with a stock 2.2.13 kernel or whatever comes with Red Hat 6.1. The 2.4 kernel will help with SMP performance when it becomes available. Here are some notes if you want to see what people going for the utmost were trying in June: - As of June 1, Linux kernel 2.2.9 plus 2.2.9_andrea3 have been mentioned as performing well on a dual-processor task (see above). (2.2.9_andrea3 seems that it includes both a wake_one scheduler fix, as well as an SMP unlock_kernel solution. (andrea3 works only on x86. PPC's and Alphas will need to apply another wake-one or tcp copy kernel_unlock fix. Jan Gruber writes that the 2.2.9_andrea3 patch doesn't work with SMP Support disabled. Andrea told me to use ftp://ftp.suse.com/pub/people/andrea/kernel-patches/2.2.9_andrea-VM4.gz instead." - On 7 June, Andrea Arcangeli asked: If you are going to do bench I would like if you would bench also the patch below. ftp://e-mind.com/pub/andrea/kernel-patches/2.2.9_andrea-perf1.gz - On 11 Oct 1999, Andrea Arcangeli posted his list of pending 2.2.x patches, waiting to go into 2.2.13 or so. These patches might improve performance of SMP system and systems that are subjected to heavy I/O. These might be worth considering if you encounter bottlenecks. - The truly daring may wish to try using the kernel-mode http server, khttpd, as a front-end for Apache. It speeds up static web page fetches tremendously. It's at version 0.1, so use caution. - linuxkernel (week 1 & week 2) is currently (8 June 1999), discussing Apache benchmarking. Linus Torvalds is in principle bullish on using khttpd or something like it, and points out that NT is doing the same kind of thing. Configuring Apache



 - The usual optimizations should be applied (all unused modules should be left out when compiling, host name lookup should be disabled, and symbolic links should be followed; see http://www.apache.org/docs/misc/perf-tuning.html) - Apache should be compiled to block in accept, e.g. env CFLAGS='-DSINGLE_LISTEN_UNSERIALIZED_ACCEPT' ./configure - The http://www.arctic.org/~dgaudet/apache/1.3/top_fuel.patch may be worth applying. PC Week used top_fuel to benchmark their latest benchmarks. (See also Dean Gaudet’s interesting comments in new-httpd and linux-kernel.) According to top_fuel.patch, using mod_mmap_static for a set of documents can reduce request times by 18 to 9. - For static file benchmarks, try compiling mod_mmap_static into Apache (see http://www.apache.org/docs/mod/mod_mmap_static.html) and configuring Apache to memory-map the static documents, e.g. Create a config file by searching for /www/htdocs and printing */mmapfile. Squid could be used as a front-end for Apache to speed up static web page fetches, according to several people.

Similar reading


- Some Usenet posts show slow Apache or Linux performance: "Apache may not be as fast as people think?" ", 1999/04/05, comp.infosystems.www.servers.unix "...when we run WebBench to test the requests/sec and total throughput, Microsoft IIS 4 is 3 times faster for both Linux and Mac OS X." "Re: Apache vs IIS 4: IIS 4 3 times faster", 1999/04/02, comp.infosystems.www.servers.unix "Why are you surprised? It was well-known that Apache is slow. I haven’t tested IIS but I did compare Apache to a few other servers last year. I found some that were three to four times faster. Methods to profile the kernel: Kernel Spinlock Metering For Linux IA32 - Tools to measure SMP spinlock contention. Also, see the test results comparing version 2.2 to version 2.3. An example of someone using spinlock measurement to find and fix kernel bottlenecks in 2.3.19. Andrea Arcangeli’s original announcement about ikd's kernel profiling patch gprof (original announcement) Ingo Mollar's Ktracer - for version 2.1.x. Example of ktracer output. Christoph Lameter's perfstat Patch, at Captech’s Linux Performance, Stability and Scalability Project. Also, see their 25 Oct 99 post about linuxperf. How to profile user program: - The old favourite: compile with the -pg.out with the gprof. Mikael Pettersson's x86 performance-monitoring counters patch. Supports 2.3.22/2.2.13 List of related tools. Using hardware performance counters with Linux by David Mentre PCL - The Performance Counter Library -- supports many architectures. Stephan Meyer's MSR Patch -- Only supports up to 2.2.6 No longer actively developed. Richard Gooch’s MSR and PTC patches -- only supports 2.2. Requires devfs. A few linux-kernel posts: "2.2.5 optimizations for web benchmarks? ", 16 Apr 1999 -- Karthik Prbhakar, who is about do serious SPECWeb96 benchmarking, asks some important questions. The followups are very interesting. "Re: 2.2.5 optimizations of web benchmarks? ", 16 Apr 1999 -- Dean Gaudet's response. An Apache insider offers some interesting insights. "[patch] New Scheduler", 9 May 1999 -- Rik van Riel started the thread about possible scheduler changes. The smbtorture Benchmark lets you test an SMB Server like the big boys Rik Van Riel's Linux Performance Tuning Site The Linux Scalability Project C10K Problem - Why can't Johnny service 10000 clients? Banga and Druschel's paper on web server benchmarking Linus's "State of Linux" talk at Usenix '99 where he talks about the Mindcraft benchmark and SMP scalability. my NT vs. Linux Server Benchmark Graphs page A post on comp.unix.bsd.freebsd.misc from June '99 which mentions that FreeBSD also has similar SMP scaling properties as Linux on tests like those run by Mindcraft. Mike Abbott from SGI has posted Apache performance patches 1.3.9.