June 22, 2009

SSPI Negotiation; NTLM and Kerberos clients and servers

I've been working on a library that works in a similar way to our SChannel code and allows the use of the Microsoft "Negotiate" Security Support Provider Interface (SSPI) to provide NTLM and Kerberos support (see SPNEGO for more details). Things are going well and, in general, using Negotiate or NTLM or Kerberos is easier than using SChannel and the structure that was originally born to work with OpenSSL and then adapted for SChannel works well with the new security providers. It has also been especially useful to have all of the various high level tests that I developed for the SChannel code which deal with the mundane issues of negotiation and data flow using the 'async connector' model. This model was originally developed for the OpenSSL code; but I didn't have tests back then; although the tests all needed to be adapted to work with the new SSPIs it was relatively painless and having all of that in place already has helped to speed up the development work.

One thing that I found interesting with this new development was that whilst the SChannel SSPI was geared up to handle an unpredictable byte stream and incomplete messages via SEC_E_INCOMPLETE_MESSAGE error code, the Negotiate, NTLM and Kerberos SSPIs were not. I'm guessing that this is mostly down to the fact that SChannel (basically SSL/TLS etc) is, by design, network stream based and has all of the appropriate message framing built in to the messages. That said, SChannel was different to OpenSSL in that with OpenSSL the library provided incomplete data accumulation and buffering and you just kept shoving data in until the OpenSSL library had a complete message or two at which point it pushed some cleartext out at you. With SChannel you're told that there's not enough data to provide a complete message via the SEC_E_INCOMPLETE_MESSAGE error return and you have to buffer this data yourself and add further data to it when it arrives and then retry the operation. With Negotiate, etc, you have to go one step further and provide your own framing of the messages and perform your own accumulation so that you can present complete messages to the SSPIs for processing; they simply fail if you provide incomplete messages. This wasn't a great problem as much of the message accumulation code had been written for the SChannel code. The only difference being that this time I manage the framing as well rather than relying on the SSPI to inform me of incomplete messages by SEC_E_INCOMPLETE_MESSAGE. In general it makes the whole Negotiate/Kerberos/NTLM method more versatile.

The Negotiate servers are also slightly different to the SSL/TLS servers in terms of authentication and message protection. With Negotiate you have two levels of message protection; you can either 'seal', that is encrypt, or 'sign', which just generates a message authentication code to prevent tampering. You can decide which method of protection you need on a per message basis so that you can send most messages as signed messages and then sent particularly sensitive data encrypted. The 'async connector' model works reasonably well with this; you can specify which method of protection you want to be the default for the connection; none, sign or seal, and then any data sent in the 'usual way' with socket.Write(). I've then added some functions at the 'socket server' or 'connection manager' level to provide for sending data with another protection level where you pass the socket and the data and the required protection level to the function and it does the right thing for you. At the other end we just do the right thing as the kind of message protection in use is included in the message framing and so the remote end knows what to do with the data. From an authentication point of view SSL gives the ability to verify the server and the option of requiring verification for the client. NTLM provides a guarantee of client authentication for the server but only Kerberos provides authentication of the server for the client.

Finally the Negotiate security providers give the server the opportunity to impersonate the authenticated client. You can do this with SChannel if you set the system and certificates up properly but it's much more straight forward with the Negotiate protocols; mainly due to the fact that Windows uses these internally for file server access, etc.

These changes will be included in the 6.1 release which currently has no scheduled release date; they'll be a separately licensable option.


Share this entry: Email it! | bookmark it! | digg it! | reddit!
Posted by Len at 09:04 PM | Comments (0) | Categories: Socket Servers

June 19, 2009

Where's the catch(...)

As of the next release of the server framework use of catch(...) handlers at thread boundaries, in 'no throw' functions and in destructors will be configurable.

At present, in v6.0 and earlier of the server framework, we use catch(...) judiciously at thread boundaries in functions that are either marked as 'no throw' (with throw()) or which are 'implicitly no throw' and in some destructors. Generally these handlers let the user of the framework know about the unexpected exception via a callback. Generally there's not a lot you can do about it except log it and hope to find the cause later. I've always felt that this approach was the lesser of two evils. It has definitely led to more robust servers for some clients where the client code has a hard to track down bug that only appears rarely and the server can continue to serve clients with the occasional 'glitch'. The alternative is that the server shuts down which may or may not be better. The catch(...) approach gives you the option, generally.

Some clients have asked that we don't do this. They prefer to let unhandled exceptions bring the server down and then have a post mortem debugger deal with it. As of the next release of the framework you can configure when and where catch(...) are used with several configuration options in your Config.h file.

Actually implementing this change was reasonably straight forward, essentially it means converting code like this:


try
{
doThing();
}
catch(const CException &e)
{
// tell someone...
}
catch(...)
{
// tell someone...
}

To code like this:

try
{
doThing();
}
catch(const CException &e)
{
// tell someone...
}
CONDITIONAL_MACRO_THING
{
// tell someone...
}

Where the expansion of the CONDITIONAL_MACRO_THING depends on the settings you configured in Config.h. If you go for the default settings then CONDITIONAL_MACRO_THING expands to catch(...) and there is no change in functionality. If you decide to turn off catch(...) handlers then the CONDITIONAL_MACRO_THING needs to expand to a catch that wont catch anything and that is trying to catch something that a) can't be throw and b) wont already be being caught. To achieve this I create a simple new exception class which is never thrown and so CONDITIONAL_MACRO_THING expands to catch(const thingThatIsNeverThrown &) if you turn off catch(...). Seems to work well and gives everyone the option to deal with these issues in a way that's appropriate for their project.

These changes will be included in the 6.1 release which currently has no scheduled release date.


Share this entry: Email it! | bookmark it! | digg it! | reddit!
Posted by Len at 10:03 AM | Comments (0) | Categories: Socket Servers

June 14, 2009

Allocating page aligned buffers

Back in October 2007 I briefly looked at, what seemed to be at the time, a simple change to the server framework so that you had the option to use buffers that were aligned to page boundaries. This could help you scale better as one of the two problems with scalability at the time was the 'locked pages limit'; there's a finite limit to the number of memory pages that can be locked at one time and in some circumstances data in transit via sockets is locked in memory whilst it is sent. Reducing the number of pages used, by making sure that buffers were aligned on page boundaries and so used the fewest pages possible, can help if your server is hitting this limit.

Anyway, I proposed a simple change which was immediately shot down by a commenter for being too simple. The change was to use VirtualAlloc() to allocate our I/O buffers on page boundaries; the reason it was too simple is that VirtualAlloc() works in terms of the system's allocation granularity and not arbitrary sizes. This meant that the proposed changes, whilst simple, wasted oodles of memory.

I thought about it some more and then went away and made some more complicated changes. The results then sat around on a development branch for some time as no clients were desperate for the changes and I never had time to profile them.

Well, I've finally profiled them and they perform pretty well and they are, after all, entirely optional, and so they're going to be included in the next revision of the server framework.

The changes are that the CBufferAllocator can now either be passed flags which tell it to use page aligned buffers, in which case it uses a very simple fixed sized memory allocator that can return page aligned allocations of arbitrary size OR you can pass the CBufferAllocator and instance of IAllocateFixedSizedMemory and it will use your own allocator.

The fixed sized memory allocator is very simple and pretty basic. It does deliver page aligned fixed sized memory blocks with little wastage. It doesn't return memory that has been allocated from it and then released back to it back to the operating system. It simply keeps it in its free list for later reuse. This may or may not be a problem to you.

Anyway, the performance of the allocator is such that it's pretty much on par with the default allocator that's used for non aligned memory and the fact that the buffers can be page aligned means that each pending send operation could take up one page less than if you don't use the page aligned allocator. That may make a big difference to your scalability.

These changes will be included in the 6.1 release which currently has no scheduled release date.


Share this entry: Email it! | bookmark it! | digg it! | reddit!
Posted by Len at 07:20 PM | Comments (0) | Categories: Socket Servers

June 13, 2009

Structured exception translation is now optional

I've had a couple of requests from clients recently that they be able to handle any structured exceptions in the server framework themselves. Up until now all framework threads install a SEH translator and catch the resulting C++ exceptions in their outer catch handlers and report the error to the framework user via various mechanisms. This generally works well and, prevents exceptions going unreported but sometimes users want to integrate the framework with code that deals with uncaught structured exceptions in other ways.

The latest version of the framework will include two new configuration defines that can be set in the Config.h file that will selectively turn off the structured exception translator installation. If JETBYTE_TRANSLATE_SEH_EXCEPTIONS is set to 0 then no translation will occur which will mean that structured exceptions are allowed to propagate out of framework threads and remain uncaught. If JETBYTE_TRANSLATE_SEH_EXCEPTIONS_IN_DEBUGGER is set to 0 then the translator is installed unless the code is running in the debugger at the time when the translator is installed. Note that the defaults if not explicitly configured remain as before.

These changes will be included in the 6.1 release which currently has no scheduled release date.


Share this entry: Email it! | bookmark it! | digg it! | reddit!
Posted by Len at 09:30 AM | Comments (0) | Categories: Socket Servers

June 11, 2009

Why not to compile as 64 bit...

Here's a nice piece by Rico Mariani about why Visual Studio is unlikely to go 64 bit any time soon. In a nut shell, unless you have very large data sets that need to be kept in memory you might be worse off as a 64 bit process than you are as a 32 bit process on a 64 bit operating system. Food for thought.


Share this entry: Email it! | bookmark it! | digg it! | reddit!
Posted by Len at 02:06 PM | Comments (0) | Categories: Socket Servers

June 10, 2009

New Windows Services library

I'm currently working on a new version of the Windows Services library that ships as part of the Licensed I/O Completion Port Server Framework. The Services library allows you to turn your server into a Windows Service very easily and also allows you to run it as a normal executable inside the debugger, etc. It integrates nicely with our Performance Monitoring library for exposing perfmon counters and comes with several example servers that show you how to use it (see here and here).

The library hasn't changed much for many years, it was originally put together back in around 1997 as part of our initial set of reusable code and although it was refactored a little when the rest of the framework underwent major reworking but since it wasn't that broken it wasn't changed a great deal.

The Windows Service API has changed considerably since those early days (see here for the latest changes) and we've had various clients asking for various new bits of functionality for a while and often these have been shoe-horned into a client specific version of the library. I'm now taking all of these changes and updating the mainline code to support all of the new service API features and redesigning it to include a few things that we didn't support, or didn't do as well as we could, the first time around. We now support multiple services within a single exe in a convenient manner (and can still run them under the debugger easily), for example.

The library still includes support for installing and running 'multiple instances' of a single service; where you might wish to run two identical services with slightly different configurations and this is a favourite feature with clients as it enables them an extra level of isolation when running their servers.

The code changes are well under way and will be included into the next release of the server framework. If you have any suggestions for things that you'd like included then please get in touch.


Share this entry: Email it! | bookmark it! | digg it! | reddit!
Posted by Len at 09:48 AM | Comments (0) | Categories: Socket Servers

June 03, 2009

Bug Psychology and how you can get stuck in a rut...

Eric Lippert has an interesting blog posting (here) on how sometimes you can be so focused on fixing the bug you fail to step back and take a better look at the actual problem that you're trying to solve.

I'm guilty of this and, with me at least, it doesn't only apply to bug fixes. Sometimes I can become overcommitted to a design to the point where I don't recognise that it's just one design option I treat it as the only design option. I suppose it's to be expected, especially when you've already committed a lot of thought to the design you've chosen, you build up momentum in the direction that you're heading and it can take a long time to slow down and turn around.

A classic example of this is how my Practical Testing series continued to refine my originally chosen design and continued to deal with all of the various edge cases that came from using GetTickCount() when using GetTickCount64() made most of the issues go away and fabricating a version of GetTickCount64() for where it wasn't available was pretty straight forward. It took a reader comment to kick me out of my rut.

This kind of thing is actually one of the main reasons that I blog about my technical stuff. I want to have the chance that people who are far smarter than me might see what I'm doing, gasp at its stupidity, and take the time to correct my misunderstandings.


Share this entry: Email it! | bookmark it! | digg it! | reddit!
Posted by Len at 09:25 AM | Comments (0) | Categories: Geek Speak

June 01, 2009

Race condition during service shutdown

There's a race condition in the service shutdown code which is most likely to show up if there's an exception thrown from your implementation of ContinueService(), PauseService() or StopService() but that could show up during any service shutdown sequence.

This race condition is present in all versions of the Service Library and so far has only been reported by one client. A fix is available, please contact me directly if you need it, or think you need it. The fix will be included in the 6.1 release which currently has no scheduled release date.

Thanks to Richard and Lyndel at Vexis for reporting this issue.


Share this entry: Email it! | bookmark it! | digg it! | reddit!
Posted by Len at 09:47 AM | Comments (0) | Categories: Socket Servers

Bug fix in performance counter instance activation code

There's a bug in all releases of our performance counter library that may cause the creation of an instance with a name that has been previously used as an instance but that has been released to fail by connecting the new instance to the previously released instance data structure.

The bug is in PerformanceDataBlock.cpp, the else if around line 167 in AllocateObjectInstance() should be changed from:


if (pInstance->NameLength == 0 && !pFirstFreeInstance)
{
pFirstFreeInstance = pInstance;
firstFreeInstanceIndex = i;
}
else if (0 == memcmp(
reinterpret_cast(pInstance)+ pInstance->NameOffset,
unicodeInstanceName.c_str(),
unicodeInstanceNameLength))
{
allocationDisposition = ConnectedExisting;

to

if (pInstance->NameLength == 0 && !pFirstFreeInstance)
{
pFirstFreeInstance = pInstance;
firstFreeInstanceIndex = i;
}
else if (pInstance->NameLength == unicodeInstanceNameLength &&
0 == memcmp(
reinterpret_cast(pInstance)+ pInstance->NameOffset,
unicodeInstanceName.c_str(),
unicodeInstanceNameLength))
{
allocationDisposition = ConnectedExisting;

Thanks to Steve and Ramzi at NetIQ for the bug report, analysis and bug fix.

This problem affects all released versions of the Performance counter library. The fix will be included in the 6.1 release which currently has no scheduled release date. If you need have problems applying this fix then get in touch.


Share this entry: Email it! | bookmark it! | digg it! | reddit!
Posted by Len at 08:59 AM | Comments (0) | Categories: Socket Servers

May 14, 2009

#pragma unmanaged

I've just spent a little too long trying to track down a bug in a mixed mode DLL that I'm building. The DLL exposes a set of entry point functions that are defined as taking a single pointer argument and lies to the application that hosts it so that the application can call it with various numbers of arguments. The arguments could change from call to call or from 'session' to 'session'. This all works fine thanks to the wonders of __stdcall and the fact that my DLL knows what arguments to expect and so can unmarshall them correctly by working from offsets from the single known argument that the entry point functions actually take. All of this has been working for ages but I've recently begun adjusting the code to make it more flexible with regards to the hosts that it supports and in the process of adjusting it the parameter marshalling code stopped working.

What began happening was that the first argument could be accessed fine and all of the other arguments simply didn't seem to exist. Whereas I could previously walk along my 'argument format string' and then step along memory from the address of the first argument and locate and decode all subsequent arguments now I just got what looked like random memory...

The problem was that the entry points into the DLL were now being compiled as managed code rather than as unmanaged code. This was something that took me far too long to work out, but once I did work it out it was obvious (if I scrolled down the call stack far enough I could see an unmanaged to managed code transition outside of my DLL before my entry point was hit). This transition did some clever managed stuff to the arguments being passed in and rendered my "offsets from arg1" method of decoding useless.

Luckily once I realised what was going on I simply specified that the entry point was unmanaged (#pragma unmanaged) and everything started working again... There's now a nice big comment in the entry point's file explaining what the problem is if I ever see the symptoms that I was seeing. Hopefully it will prevent some wasted time if it happens again...


Share this entry: Email it! | bookmark it! | digg it! | reddit!
Posted by Len at 06:19 PM | Comments (0) | Categories: Geek Speak

May 12, 2009

Building an inproc ATL COM object as x86 and x64

I'm currently wrapping a server's client side API in an ATL COM object for a client. The COM object will be used to communicate with the server from managed code or VB or other COM compatible systems. It's a fairly straight forward process as the original 'C' DLL interface client API was built with this kind of thing in mind and I've done enough C++ objects wrapping a 'C' API conversions in the past to make the whole process relatively painless and straight forward.

However, there's always something...

I'm running the code through my automated build process before release and I find that the x64 builds of the COM object are failing to link with a missing XXX_ProxyFileInfo error (error LNK2001: unresolved external symbol MYTypeLib_ProxyFileInfo). A clean rebuild works fine and building again after a successful build works fine. Switching from a successful x86 build to an x64 build fails as does switching from a successful x64 build to building in x86 mode.

The problem is that the MIDL compiler generates different code for x86 marshalling and x64 marshalling and since this code is compiler specific it doesn't get built if you compile the resulting proxy file MyTypeLib_p.c for the wrong architecture. Since the _p.c file is generated by the MIDL compiler based on the idl file the dependency checking for subsequent builds doesn't find a need to rebuild the file as it's up to date. When you switch architectures the file is still considered up to date but doesn't build any code as the marshalling code is for the wrong architecture.

This took a while to track down...

Anyway, the fix is either to have the _p.c file generated in the output directory (which is correctly differentiated by architecture) or to append the architecture to the file name. I decided on the latter as it's easier to work with in the dlldatax.c file that compiles the _p.c file. Once the MIDL rule is changed to build a file that looks like MyTypeLib_p-x86.c and MyTypeLib_p-x64.c the dlldatax.c can be made conditional on the _WIN64 define and can pull in the correct marshalling code for the architecture that's being compiled. Note that the other files that the MIDL step generates are not architecture specific.

Update: Unfortunately it seems that the dependency check that the MIDL tool uses doesn't check all of the output files that it's set to produce. So now I'm in the (slightly better) position of the build failing sometimes because the architecture specific proxy code file doesn't exist... Looks like I may have to write a custom build rule after all.


Share this entry: Email it! | bookmark it! | digg it! | reddit!
Posted by Len at 09:50 AM | Comments (0) | Categories: Geek Speak

May 08, 2009

Embedded assembly programming

Well, I've finally done something that I've been meaning to do for a long time. I've written some non-trivial assembly language code. Up until recently I wasn't expecting this to be embedded assembly, but it actually seems like a sensible way to get into this low level stuff. Programming an 8bit RISC microcontroller in assembly is considerably easier than trying to do something with a PC. The chips are cheap (as chips), the tools are free, there's an active user community and the electronics required is relatively simple.

Right now the code is heavily based one someone else's code; but heh, isn't that the way we always start out. The good thing is that I have the complete development system working, a VM with the tool chain, simulator and programmer all working. I have a breadboarded circuit with my Atmel ATTiny2313 in it and I have a challenging project to keep me interested.

Already I can see how this new project will change how I think about the coding that I do for my day job. Testing embedded assembly could be interesting, for once I'm working on code where performance really does matter and it's relatively easy to reason about it. It's interesting stuff!


Share this entry: Email it! | bookmark it! | digg it! | reddit!
Posted by Len at 07:50 AM | Comments (0) | Categories: Geek Speak

May 05, 2009

Everything you need to know about timers and periodic scheduling in the server framework

Often when you're writing a server or client application with the framework you will find yourself needing to perform operations periodically or after a timeout. The framework provides a light weight timer queue that can be used to schedule timers and that uses minimal system resources so that there's no problem with having multiple timers per connection; even on systems with 10,000 connections.

Continue reading "Everything you need to know about timers and periodic scheduling in the server framework"

Share this entry: Email it! | bookmark it! | digg it! | reddit!
Posted by Len at 01:25 PM | Comments (0) | Categories: Socket Servers

April 28, 2009

May your software scale, and other curses...

I'm in the process of upgrading another client to v6.0. These guys write multi-user game systems and have a fairly complex CLR hosting custom application server. Anyway, I was hoping that we'd have a few easy performance wins from the changes that have gone into v6.0 and these gaming guys are possibly even more hung up on performance than my banking and trading clients.

The good news is that the changes in how we convert numbers to strings has drastically improved performance in their debug builds where they have lots of logging going on. The bad news is that performing better when generating log lines from multiple threads that are then written to disk asynchronously has put more strain on the next stage in that particular pipeline. The problem now is very similar to the problem that you have if you don't bother to use any kind of write completion flow control on your socket connections; you keep pushing overlapped, asynchronous writes into the system and each one of these takes up some resources, if you're pushing items into the system faster than the system is processing them then you're using system resources in an uncontrolled manner. Given that the resource in question is 'non paged pool memory' and that it's a finite resource this is a Bad Thing...

The problem is that, in the debug build, we have multiple threads generating log messages and firing them off into a queue to be written asynchronously by the file writer class (see here for details). Since the log messages are being produced faster than the file writer and the operating system is able to complete the asynchronous writes the system as a whole starts to eat 'non paged pool'. This can be seen from the NP Pool counter in task manager and from the fact that a 'tail' of the log file that is being written begins to show that the timestamps on the latest log lines begin to lag behind real time. Once the load on the server eases the log system catches up but during the log message oversupply phase the server can use an unlimited amount of non paged pool and that's a Bad Thing (though on Vista and Windows Server 2008 it's less of a bad thing than it used to be (see here for details)...).

A solution to this would be to limit the number of writes that the asynchronous file writer can have pending at any one time. This can either be done before the write is dispatched to the queue or after it is removed from the queue but before the actual write is issued; or both... The problem of applying the limit before the message is dispatched to the queue is that you then effectively remove the asynchronous nature of the log system when the number of pending writes reaches a certain limit. This prevents the number of pending messages from growing out of control by throttling the log message producers... If, instead, you apply the limit to the actual write operations then you have simply moved the resource usage from system controlled 'non paged pool' to the application memory used to manage the queue... Given that I currently have but a single client who has the potential to generate this problem, and given it takes a particular style of stress testing to get it to happen I think I'll opt for the pending limit being applied before queue dispatch. This will provide users for which this is a problem with a way of removing the problem at the expense of performance and should anyone then have a performance issue we'll address that at the time...

As I said over on the Joel On Software forum a while back; "Just remember that getting to the point where you CAN write something that is massively scalable and performs well is only the start of your problems..."


Share this entry: Email it! | bookmark it! | digg it! | reddit!
Posted by Len at 01:16 PM | Comments (0) | Categories: Geek Speak , Socket Servers

April 21, 2009

Interesting piece on thread pools

Herb Sutter has just published an interesting article over at DDJ on correctly using thread pools: Use Thread Pools Correctly: Keep Tasks Short and Nonblocking.

It's not rocket science and it doesn't deal with platform issues but it's a useful summary of why the Socket Server thread pools operate as they do. Note that on Windows you can use IO Completion Ports to manage the work queue into the thread pool and this can keep the number of threads that are scheduled to run at the optimum number so that the pool operates at the 'correct load' for most of the time even in the presence of blocking tasks.


Share this entry: Email it! | bookmark it! | digg it! | reddit!
Posted by Len at 08:48 AM | Comments (0) | Categories: Geek Speak , Socket Servers