The 64000 connection question

| 7 Comments

I've been spending some time pushing the limits of The Server Framework, my IO Completion Port based socket server framework, to see how many connections my servers can handle and what happens when system resources run out. Earlier postings on the subject are here and here.

This morning I fired up one of my older server boxes and ran the server on that rather than on my dev box. It effortlessly managed 64000 concurrent connections.

Previously I've had problems getting above ~30000 connections. The machines I've been testing on have been my main development box and my laptop (both are Windows XP and both have 1GB ram). This morning I ran the server on a Windows 2000 box with 650MB ram and the server managed around 64000 concurrent connections before it hit the non-paged pool limit. It looks like it actually hit the limit this time, having 129MB of non-paged pool allocated, rather than just stopping early (as my Windows XP machines seem to do).

So, the question now is what's wrong with my Windows XP systems?

7 Comments

what is the result on the win 2003 server platform? are server OS platforms optimized for better performance than xp? how about nonpaged resource consumption apart from your service on the XP machines and win 2k machine respecively?

I havent yet tested on Win 2003 Server. I'll need to use a virtual machine for that and the tests take much longer in a virtual machine. It's on my list.

I haven't seen it said anywhere that there's a more limited number of network connections available on XP.

Other resources are fine, there's plenty of memory, the server has only 10 threads, I can't remember what the handle count was (will check next time I run the tests).

I've now tried with Norton Antivirus uninstalled (no change) and with the XP firewall off (no change). I've used LSPFix to display the list of LSPs installed and there's nothing unusual or non standard.

Windows Server 2003 lets me have well over 60k connections; amusingly the NT 4 server image that I was using as one of my clients can create more outbound connections than any of my XP Pro boxes. Anyway, interesting as all this is, I don't really need to explore any more. I know that my framework will allow servers that handle with 64,0000+ concurrent connections on Windows Server 2003 and that's good enough for me (and most of my clients) right now.

well, think of XP as an improved win 98 :)

A couple of interesting findings here from my side that I thought I'd share with you, you might find some of it interesting, and might be able to offer me your feedback too.

1) The NP Pool indicator in TaskManager that shows you per-process NP Pool is way off! E.g. right now I have 707 connections on my server, its showing 450K NP Pool usage for my server process. The only proper indicator is the NP Pool Usage for the whole system on the Performance Tab of Task Manager. Note that this information (as well as max NP Pool for the system) can also be got by loaidng up Windbg, using it to kernel debug the system, and execing the command !vm - which also gives you the # of locked pages on the system currently, a real bonus.

PoolMon can also be used for some NP Pool info gathering.

2) If you note down initial NP Pool usage for the system in Task Manager, run your server, and let a large # of connections be established, what do you see as the delta (change in) NP Pool usage? I'm interested in that figure as a per-socket usage figure. Currently Im seeing ridiculous amounts, ~20 KB per socket created (not connection established) by the server process. So I'm wondering what Im doing wrong. (I haven't set SO_SNDBUF to zero yet though).

3) Right now Im tracking down an NP Pool leak. On my Win2k3 test box, the NP Pool peak is at 190 MB odd. 20MB of NP Pool is in use by the system, so around 170 MB is free for my server process. Which unfortunately leaks it over 9-10 hrs, crashing with a WSAENOBUFS. At that point, checking System wide NP Pool usage in the Kernel Debugger shows you that you're perilously close to the limit!

Interested in hearing what you think :)

A followup - I tracked down the source of my NP Pool leak using PoolMon. It led me to a pooltag that 5 odd driver used. One of them was a third party packet capture app....I had an old version installed, it had a bug in its driver that led to the NP Pool leak. Once I saw the driver listed, I googled immediately, saw the issue, uninstalled the app.

Now back to normal :) My server is extremely stable, has shown uptime of 5 days without any resource leaks at all or performance vagaries.

Ultimately, with SNDBUF and RCVBUF set to 0, I'm seeing 6-10K of NP Pool usage per socket.

That's good news!

Leave a comment