I've just completed phase 2 of this project and it now works just like we originally envisaged. Managed code can be decorated with custom attributes and the XLL uses reflection to work out which classes within an assembly are being exposed to Excel. It then looks for [WorksheetFunction] attributes and exposes individual functions to Excel whilst working out all of the marshalling requirements itself. Additional attributes allow you to specify that some parameters are optional or have default values. More attributes allow for the definition of custom Excel menus and the provision of the code behind them. By annotating a suitable constructor you can have an IManageAddins interface passed into your managed class when it's constructed and this lets you call back into the XLL framework and/or Excel to do things like enable and disable menu items, etc. Several custom types allow us to pass ranges of Excel cells into and out of managed code, and our standard marshalling can deal with converting Excel ranges that contain consistent types to arrays of native managed types. Volatile and command equivalent worksheet functions are supported and, all in all, it all seems to work pretty nicely.
When the XLL is loaded it reads its configuration from a config file and loads the specified assemblies, parses them for attributes, builds and registers the appropriate data structures and wires up the XLL entry points to the appropriate marshaller and from there into managed code.
The next phase is likely to be the inclusion of 'results object' which can be used to store and manipulate returned ranges by name. At this point I'll have something similar to the system I wrote in C++ back in 2001, but this time all of the complex stuff just works and the developers can settle back and get on with writing the business logic.
]]>I started out by writing the data feed. This was a simplified version of the echo server test harness that I'd extended to use a controllable TCP receive window. The data feed is just a client that generates random data in packets that have simple header, length, sequence number, type and sends it to the server that it's connected to. It doesn't get anything sent back and it uses the various options that I discussed last time to enable it to push data as fast as it can without using all your machine resources.
Next was a server. This was pretty easy to put together with the framework. The server listens on two ports, one for data feeds and one for feed clients. Data feeds connect and send data that needs to be distributed. Feed clients connect and receive any data that arrives at the server. The servers listening on the two ports are connected together by a "distributor" (the only piece of custom code in the whole server). The distributor maintains a collection of feed clients and allows the data feeds server to send all data that arrives to all clients. Since, at present, this example is a simple broadcast of all of the inbound data to all of the outbound clients we use the CBufferHandle class that was developed for use in an auction server and that allows you to send the same buffer to multiple connections. The framework's data buffers contain all of the I/O Completion port housekeeping data that they require as well as the data and this means that you can't simultaneously send a single buffer to multiple connections. The CBufferHandle class allows you to attach multiple sets of 'housekeeping data' to a buffer so that you can safely send it on multiple connections. The key point about this is that if you're sending the same data to each connection then you don't need to duplicate it.
Finally the Feed Client. Again this is just another hacked around version of the standard echo server test client. This can make multiple connections to a server and simply accepts data.
With these pieces in place I can run up the server, add several feed clients and then run up one or more data feeds, and watch. The server behaves as expected, performance is pretty good though I'd like to expose some performance counters so that it's easier to monitor what's going on. With everything on a 1gb lan things are good, but, as expected, with the data feed on a machine with a 1gb connection to the server and a feed client on a machine with a 100mb connection the client's connection gets swamped and the server uses an unrestrained amount of non-paged pool trying to send everything it can.
So, the next steps are to take the write completion driven flow control that's used in the Data Feed and finally implement a connection filter based version that can be used in the server on its Feed Client connections. This will need to keep track of the number of outstanding write completions and buffer a configurable number of writes when it is unable to send. Then, when it can send, it should send from the buffered data first and then resume sending directly. If it gets to the point where it has buffered a configurable 'too much' then it should allow the user to configure a policy for dealing with the situation (throw away new writes or throw away buffered writes...). Once that's done some performance counters on the server to clearly show the data flow and we're almost there.
]]>The main problem I have is that I get this with text files where I've never done anything 'special' to create "extended attributes" (whatever they are). The solution which doesn't involve replacing my NAS with something that is running Windows ;) seems to be to copy the data out of the file and recreate the file from notepad... I'd rather the error dialog had a "strip the extended attributes from the copied file" option, better still if it had some way of showing what these attributes were so that I might be able to decide if I wanted to keep them...
As it is it seems that something is going around and adding "extended attributes" to my files without asking me and the result is that they're difficult to move around my network...
]]>Unfortunately there's a whole area that I haven't previously paid a great deal of attention to (mainly because I didn't have many clients that had these kinds of problems); by default the framework doesn't do high data flow, low number of connections servers especially well. That's a bit of an exaggeration, but you do have to hack at the framework code a little to make the simple changes that are required to get decent throughput.
The problem is that as of version 5.2.1 the framework doesn't provide an easy way to set a connection's TCP receive window and although it's easy enough to hack in a change to the framework it should be something that's made more accessible. Of course, the reason for this is that my test servers don't need or use this...
I currently have several clients that are doing high data flow servers. They're writing market data distribution servers, automatic execution engines and other trading related stuff. One of them has some performance questions and to be able to answer his questions I first need to have a similar demo server architecture that I can use to experiment on. Step one was adjusting the test client to be able to send lots and lots of data on a connection very quickly. The test client can do 'lots of connections' and 'x data every y ms' but couldn't easily do 'as much data as possible in the shortest time'.
One of the problems with a simplistic IO Completion Port design for sending data is that you can keep pushing data into the TCP stack long after the stack can keep sending to the peer. Unless you keep track of how many of your writes are outstanding you could easily just chew up all of the resources on your machine and queue vast amounts of data for transmission. A connection filter based solution to this problem has been planned for the 5.3 release for some time but other work has got in the way. So, to understand the problems I wrote a send-data flow control system into the test harness directly with the intention of moving it out into a connection filter once I had it working correctly.
Doing this kind of flow control involves the following; keeping track of the number of overlapped writes that you issue, keeping track of the number of overlapped writes that complete, stopping issuing writes when you have more than X outstanding writes, restarting the data flow when the number of outstanding writes drops below a certain limit. This allows you to keep the amount of data that is currently waiting to be sent by the TCP stack to a manageable amount and helps to restrict your use of non-paged pool. If you don't do this (and just issue writes until you have no more data to send) then you can use an arbitrary amount of non-paged pool and since that's a finite resource this kind of behaviour can cause unexpected server failures.
So, the first thing I did was run the existing test harness with some settings that caused it to send as much data as it could and I watched it chew up resources until it crashed. What I also saw, with dismay, was that it completely failed to stress the echo server that it was testing. Looking at network utilisation showed that the test harness, although sending data as fast as a tight loop could, was actually squirting very small amounts of data onto the wire in intermittent bursts. Not what I was after at all; but not too surprising when you think about how TCP throttles data flow...
Step two was to prevent the test harness from crashing by adding the flow control based on outstanding writes. Of course this didn't change the amount of data flowing but it did make the test more reliable; it now reliably ran to completion even if I increased the number of connections.
Step three was to fire up WireShark and take a look at what was actually happening on the wire; though I had a fairly good idea of what I would see. The network trace clearly showed the TCP stacks regulating the data flow using the TCP receive window. Lots of TCP ZeroWindow situations as the TCP stack on the server told the TCP stack on the client to stop sending because its buffers were full.
Adjusting the server to use a larger receive window on its connections was fairly straight forward, I added a new callback to the framework that was called straight after each socket was created and which was passed the socket. This can now be implemented in a client's server code to adjust the socket in any way they want prior to connection establishment. I'm actually going to add specific send and receive buffer setting code as it's likely to be used reasonably regularly but the generic callback can be used to set any other stuff that users might want and that might not currently be included in the framework.
Once the server and client could be set up with decent sized TCP receive windows and the client could be configured to send data at a controlled maximum rate it was possible to start doing some experiments. It seems that you don't need a great many pending writes to keep the network busy; just keep pushing data in until the write completions slow right down and then switch to only pushing more data in as the pending writes complete. You then tend to use around 110% of your TCP receive window in non-paged pool and you get a network connection full of data.
Next on the list is writing a more focused server and clients. I need a data feed which pushes lots of data into the server and several clients that can subscribe to the data. I expect the auction server work that I did a while back will come in handy here; basically we'll send out data to many clients without copying the buffer that it's in for each client. Once I've got that working I can start to look at performance issues... Though right now things look pretty good...
]]>This new server is based on the server that was developed for PayPoint in 2006 for their Littlewoods football pools games, It's the fourth ISO 8583 transaction server that we've built for them and it's actually built on quite an old version of the server framework. The project didn't have the scope to upgrade to the new framework version so several small fixes have been back ported to allow for easier operation on Windows Server 2003 (the original code was written with a Windows NT 4 target!). The move to Vista/Server 2008 would require some more changes as the server exposes performance counters and the code that I used to install these doesn't work with Vista (see here); updating the code to use our latest Perfmon Tools library would solve that though.
The original server design has stood the test of time well. During message processing there are limited reasons to lock shared data and limited need to allocate memory as the response message is built into the message buffer that was used for the original request message. The performance (in messages per second) that we achieved back in 2002 was a surprise to the client and the new server is even better given the improvements in hardware.
I'd forgotten how well the development of these ISO 8583 servers fits with TDD and unit testing in general. Since the server operates on a very formal spec which essentially boils down to "given this message as input, do this to the database, this with MSMQ, log things here and respond with this response" for a set of input messages. This makes it ideal for interaction based mocks and unit tests. Simply read the spec, write tests for each input message and response combination and write the code. If you structure things right then all of the message processing can be tested in isolation from your database layer which can then be unit tested using some setup and tear-down scripts and some known data scenarios. Once all that's done a "black box" server test can throw a selection of messages at the server on multiple concurrent connections and give the whole thing a nice thrash.
Of course TDD doesn't guarantee that you've covered everything, but it does help to give a warm feeling as you go into integration testing. If nothing else, when problems are discovered you know that your scope to break things is limited by the tests that you have in place.
]]>It seems that my plan to "stick a breakpoint in mscoree.dll's _CorExeMain()" wasn't such a good idea after all. With the new updates installed and with an x64 process using the Win32 debug API to run a CLR 1.0 or 1.1 app (x86) the breakpoint in mscoree.dll's _CorExeMain() never gets hit. Luckilly switching to sticking a breakpoint in mscorwks.dll's _CorExeMain() instead seems to work on both x86 and x64, running either x86 or x64 (where appropriate) CLR apps and on both a clean install of Vista and on patched systems. So, now if the app we're launching is a CLR app we ignore the start address entirely and use mscorwks.dll's _CorExeMain() as our start address. This seems to give a reliable way to halt a CLR app after it has started up, when it's in a stable state, and before it starts to do anything. Just what I need to inject my code.
]]>In addition to the blog postings JP has produced cfix a unit testing framework for C++. I haven't had a chance to look at it too deeply yet, but the documentation looks good and the source is available from SourceForge under the GPL.
]]>When running a CLR app under the Win32 debug interface you only ever seem to hit the native entry point if you're running under WOW64. In all other situations you don't hit the native entry point ever. If you rely on it to pause your debug tools once the process is completely loaded and ready to roll then you need to stick a break point in _CorExeMain in mscoree.dll. What's more, if you're on x64 you might not even be able to access the native entry point's memory...
Well, that seems to have changed... Upon running up my "Debug Tools" test harness a couple of days ago I found I had some test failures when launching CLR 1.0 apps for debug from a Win32 debugger running on an x64 system. On my system only CLR 2.0 apps run as native x64, so, in effect the Win32 debugger was launching a Win32 CLR application whilst running under the WOW64 layer. The behaviour now seems to be identical to running a Win32 CLR application from a Win32 debugger on an x86 system; which, I suppose, is good. The downside is that I've no idea when this change was rolled out and I now have no sure fire way of building a VM box with the old style behaviour to see if I can write some code that works with box fixed and unfixed CLR start up semantics. I guess I can try a clean install of Vista x64...
]]>When you have an active TCP/IP connection that you wish to terminate cleanly you need to initiate a TCP/IP protocol level shutdown sequence by calling shutdown(). This sends the appropriate packets between the two TCP/IP stacks (server and client) and terminates the connection. Once this is done you can close the socket by calling closesocket(); this cleans up the resources used by the socket (and associated data structures) within your program. Closing the socket without initiating the protocol level shutdown sequence implicitly triggers the shutdown sequence. This is explained here "Graceful shutdown, linger options, and socket closure".
Simple servers written using our server framework tend to operate as follows: When an incoming connection is detected an asynchronous read is issued, this increments a reference count on our socket class. When a read completes the last thing that happens before the function returns to the calling code within the framework is that a new read is issued. If the client closes the connection the pending read within the server returns with 0 bytes read, this is interpreted as a 'client close' and no further reads can be issued on the socket. This, eventually, causes the reference count on the socket to fall to 0 and the socket gets cleaned up. Part of that clean up involves calling closesocket(). If the server wants to terminate the connection then it calls Shutdown(ShutdownSend) on its socket to indicate to the client that it has no more data to send and this eventually results in the client shutting down its socket and the server socket cleanup sequence that I described earlier.
Due to the way the server is designed, there's some 'clever stuff' in there to make sure that if you have several writes pending but not yet issued by the framework then the call to shutdown() occurs after the last write has actually been passed off to the TCP/IP stack.
The socket class also exposes a Close() method which calls closesocket() on the socket directly; that is it doesn't do 'clever stuff' to deal with outbound data that is 'in flight' within the framework. You probably don't want to call Close() unless you don't care if the data gets to the other end or not; or if you know that there's no data 'in flight'.
It gets more complex...
Due to either my misreading of the docs for closesocket() (or the fact that they were originally less clear and have since been clarified) it was my belief that a graceful shutdown using closesocket() would block. Since one of the most important design decisions of the framework is that work done on the I/O threads should not block the default behavious for the automatic socket closure that happens when a socket is being cleaned up is for the close to be a 'hard' or 'abortive' close. That is we deliberately choose not to linger. Because this isn't always what you want (no kidding!) there's some code in there that allows a user to intercept the default behaviour and, potentially, call CloseSocket() yourself or to marshal the CloseSocket() call off to your own threads so that it could block them instead. However, graceful shutdowns that occur due to closesocket() do not block, so, it seems, most of that code isn't really needed...
Some of the example servers in the 5.2.1 release get this wrong, it doesn't cause them to lose data, since they're not doing anything that complex, but a more complex server that has been modelled on one of the examples may have problems. If you've been having this problem then I'm sure you'd have contacted me already, but, if not, do get in touch and I'll help sort things out for you.
So, in summary, at present, in version 5.2.1 of the framework or before, you should generally be calling Shutdown() to terminate your connections and the framework will deal with the resource cleanup and eventual call of closesocket() itself. You can call Close() but you shouldn't do that unless you KNOW that there cannot be any data 'in fligh' that the server has sent but that the client might not have recieved, OR you don't care if the data gets to the client.
This will become nicer in 5.3, I hope. I plan to make "standard" connection termination easier to manage and provide access to the, currently private, AbortiveClose() method on the socket class; this sets the socket's linger options in such a way that the socket is closed immediately and all pending data is discarded. What's especially useful is that this also sends a RST (reset) on the TCP/IP connection and this closes the connection without putting the closer into the TIME_WAIT state; which is useful sometimes.
Of course this didn't seem right. I sent myself a test email and that worked. I checked the webmail interface and the mailbox was really empty. I bothered the guy who runs my mail hosting via messenger and he explained that he'd changed the smtp server last night. He now uses qpsmtpd and it has a pluggin that checks emails for known spam urls and filters these spam messages out.
I'm still not convinced... So far most of the legitimate email that I should get on a daily basis is arrving OK; newsgroup notifications, NAS alerts, etc but one of my NAS devices doesn't seem to be getting through... And if that's not working, who else is having problems?
Overall the lack of spam it nice, if a little wierd and ever so slightly retro. Assuming it is actually working correctly then I think it's a great improvement. However, I can't help feeling slightly cut off from the 'heart beat' of the internet.
If anyone sends me an email to my jetbyte account and doesn't get a reply then you could try sending to my gmail account, which is the obvious address, or leave a comment here... Fingers crossed you wont need to...
]]>What's the best way of dealing with this kind of problem?
]]>If your comment is refused you should get a message telling you why; the reason is logged, but, unfortunately the full comment txt isn't. The best approach if you have a legitimate comment that you cant post is to either email me, or leave a simple comment that explains that you cant comment ;) (I know...) Anyway, if you do that then I can remove the offending, over zealous, blacklist entry and post your comment for you...
]]>The poster laments the fact that if you're doing TDD then the test fails first and then you write the code and then it works and therefore you know the test is testing the correct thing but if you have existing code then, well, it doesn't work that way. It only doesn't work that way if you're being lazy.
The example given is that an already developed component that has tests is now made multi-threaded. The poster decided that simply running all the existing tests in parallel would test the component for thread-safety. And was then surprised to find that it didn't due to how the tests all tested their own, isolated, instance of the component...
Hmm. Personally I feel that if you're trying to write a test to prove that something is thread safe then you need to write a test that deliberately puts the thing under test in a situation where the lack of thread safety shows itself. You don't just assume that running existsing tests together and having them work will somehow prove something... Writing tests for multi-threaded code is hard. You need to think about it. Abdicating thinking and then somehow pushing this failure back onto the tests themselves is, er, rather crap.
As I've said before, the tests act as scaffolding for the code and the code acts as tests for the tests. If either is wrong or is changed so that previously held beliefs are no longer true then the tests fail. You don't write tests for tests you write tests for code and if either disagrees with what should happen then the test fails. It's like aircraft having multiple redundant systems, they should agree, or there's a fault.
The original poster's problem is that he didn't actually bother to write a test for the situation that he wanted to test. I think he should be asking "who tests the tester"...
]]>My variation on this idea is that it all tends to be in one process. Work items are passed from one 'processor' to another via queues and each processor can run multiple threads to process multiple work items in parallel. In simple systems you end up with a "pipeline" and work items flow from one end to another; more complex systems may be modelled as networks of processors. You can tune the system by adjusting the number of threads in each processor's thread pool and can also do things like having different processors run at different thread priorities (if you really want to). Since a work item is only ever being acessed by a single processor at a time, the data in the work item doesnt need any locking. If a processor needs to access data which can be shared (either by instances of a processor or by different processors) then normal locking is required but the situations where locking IS needed are greatly reduced.
I find it interesting that the Dr. Dobbs article points out that 'careful measurement is required'. I agree, this is one of those situations where it's vitally important to include performance monitoring (via perfmon counters?) from the outset. Unless you can see how many threads are active at each stage in the pipeline and how many work items are in each of the queues then you simply cannot tune the system in a meaningful manner.
]]>The suggestion is, essentially, to use a timer with a longer range before roll-over rather than GetTickCount() with its 49.7 day roll-over. In Vista and later we could just use GetTickCount64() but on earlier platforms that's not available to us. My commenter's solution was to build a GetTickCount64() on top of GetTickCount() and use that. Given that adjusting the code for Vista support via the real GetTickCount64() was on my list of things to do, I decided to also take a look at the potential of the hybrid approach suggested by my commenter.
Switching to using a greater range means that we can remove much of the complexity which was there to protect us from the rollover as this will now only occur after around 584942417.4 years of machine up-time rather than after 49.7 days...
In the zip file that accompanies this article there are two timer queues under test. The first, CCallbackTimerQueue uses a hybrid GetTickCount64() implementation that will work on any platform as it uses GetTickCount() to do the work and the timer queue manages the upper 32-bits itself. The second, CCallbackTimerQueueEx, uses the real GetTickCount64() call and will only run on Windows Vista or later platforms. You can build for pre-Vista systems by editing the Admin\TargetWindowsVersion.h header file and adjusting the values for NTDDI_VERSION and _WIN32_WINNT.
The native Vista version of the code is the simplest so I'll discuss that first. There are several additional issues that need to be dealt with if we are building our own GetTickCount64() and these get in the way of the simpler code...
The first thing, of course, is that I had the tests that were written for the previous versions of the code to make it easier for me to make these changes to the internals of the code. I did this before in part 15 when I fiddled around with the internals to make the code more scalable. The presence of the tests makes this kind of change quite fun; I can concentrate on hacking away at the old design and know that if I change some functionality that is covered by my tests then I should find out as soon as I run the tests. Looking at the header for CCallbackTimerQueueEx the first thing that you'll notice is that I've removed a couple of constructors; there's now no need to allow the user to tune the maximum timeout allowed. Next you'll see that the actual data structures used for the queue have been simplified; we only need one queue now rather than two and the timers are keyed by ULONGLONG rather than Millisecond (DWORD). There are less helper functions and we use an instance of IProvideTickCount64 rather than IProvideTickCount. Looking at the code itself, I've hardcoded the maximum timeout to one less than INFINITE which gives us the whole usable range of a DWORD for timeouts. I don't see any advantage in expanding the length of the timeouts that you can set to be ULONGLONGs as 49.7 days should be long enough for anyone ;) and, if it isn't, the user can set another timer when that one expires and build a longer timeout using the current implementation. Since all of the multiple queue stuff can go, setting timers is now simpler and we can go back to the functionality from part 15 where calling SetTimer() does NOT cause timed out timers to be handled automatically (I was never really comfortable with that change anyway!). InsertTimer() is simpler as we're only ever dealing with a single timer queue and rather than all the complexity that we had before for dealing with a timer that spans a rollover we can now simply disallow timers that do that; I don't feel too bad about doing this as I think it's reasonable to specify that the code doesn't support setting timers that cross a 584942417.4 years rollover point. GetNextTimeout() is now massively simplified as all it needs to do is look at the timeout value and compare it with now to see if it has expired. And that's it.
CCallbackTimerQueue is more complex, but not massively so. The complexity arises due to how I maintain the high 32-bits of the 64-bit counter. Since the code works in terms of the 32-bit counter value returned by GetTickCount() and we know that this wraps every 49.7 days I figure we can spot the wrap (now is less than the last time we checked) and use the event to increment the high 32-bit counter. The only potential risk is that we don't spot the wrap, that is, we don't call GetTickCount() for 49.7 days and the counter wraps and then becomes more than the last time we called GetTickCount(). To prevent this unlikely situation, the timer queue sets its own internal maintenance timer for the 32-bit counter roll over point. All this timer does is go off reset itself, but, I think, this is enough to cause GetTickCount() to be called often enough to prevent any problems...
The tests need to change a little due to the way that SetTimer() no longer implies HandleTimeouts() and because of the internal maintenance timer that is set upon construction.
The duplication in the code bothers me, so I expect the next instalment will deal with that, and any bugs that people report!
Code is here and the new rules apply.
Note: This release has been rushed, I haven't had a chance to check any of the builds except the VS 2008 and VS 2005 ones. I'll check the rest and fix any problems when I get back from Zermatt.
]]>