WebSockets - The deflate-stream extension is broken and badly designed

As I mentioned here, the WebSockets protocol is, at this point, a bit of a mess due to the evolution of the protocol and the fact that it’s being pulled in various directions by various interested parties. I’m just ranting about some of the things that I find annoying…

The WebSockets protocol is designed to be extended, which is all well and good. Extensions can, at present, be formally specified by RFCs or be “private use” extensions with names that are prefixed with an “x-”. So far the only “official” extension is the deflate-stream extension that’s detailed in the WebSockets protocol RFC itself.

Unfortunately, the deflate-stream extension isn’t really an ideal example of how future extensions should work. A far better example would be the alternative deflate-application-data extension that’s detailed here.

So, what’s wrong with deflate-stream?

The WebSockets protocol is very open to extension, possibly too open.

4.8. Extensibility

The protocol is designed to allow for extensions, which will add capabilities to the base protocols. The endpoints of a connection MUST negotiate the use of any extensions during the opening handshake. This specification provides opcodes 0x3 through 0x7 and 0xB through 0xF, the extension data field, and the frame-rsv1, frame- rsv2, and frame-rsv3 bits of the frame header for use by extensions.

The negotiation of extensions is discussed in further detail in Section 9.1. Below are some anticipated uses of extensions. This list is neither complete nor proscriptive.

o Extension data may be placed in the payload data before the application data.

o Reserved bits can be allocated for per-frame needs.

o Reserved opcode values can be defined.

o Reserved bits can be allocated to the opcode field if more opcode values are needed.

o A reserved bit or an “extension” opcode can be defined which allocates additional bits out of the payload data to define larger opcodes or more per-frame bits.

But even with all of this available the deflate-stream extension goes further. The deflate-stream extension completely replaces the wire format of the WebSockets protocol with a stream of compressed data. That is, all framing and headers are included in the compression and you can’t decompress a portion of the stream without decompressing all of it. It means that any form of proxy that wants to inspect the contents of WebSocket traffic has to decompress the whole stream to look at the individual frames; it can’t even simply use the header information to skip the data portions of the frames as the headers are also compressed. The WebSocket protocol doesn’t really even hint at this being the intended use of the extension mechanism, in fact it seems to suggest that whilst extensions can fiddle with the header contents and the frame contents they can’t simply replace the entire wire format. There are those who think the inclusion of “connection-level” extensions is a bad idea and I agree with them. After all, by following the example of the deflate-stream extension surely the obvious solution to the rather broken message based aspect of the protocol is simply to write an extension that replaces the entire WebSocket wire format with one that pleases you better; x-raw-tcp-stream anyone?

If the fact that the deflate-stream extension isn’t exactly an ideal example of how extensions should be built isn’t enough of a reason to remove it from the RFC then surely the fact that it doesn’t actually work as expected might be? It doesn’t seem likely. The problem is that now that WebSockets requires client to server data frames to be masked the data being sent from client to server is sufficiently random that compression can’t be applied. So, at best, deflate-stream, simply burns server-side CPU when dealing with inbound WebSocket data and at worst it makes the data stream from client to server bigger than it need be. Ideally you’d simply not mask if you were going to use deflate-stream but that doesn’t seem to be an option; there appear to be concerns that the extremely unlikely situation that client to server masking is supposed to protect against could occur using unmasked compressed data. Of course the attacker would have to know exactly how the data was going to be deflated and could therefore somehow come up with some payload data that results in a useful stream of deflated data; but really…

Ripping deflate-stream out of the RFC entirely and/or replacing it with deflate-application-data would seem to be the obvious course of action, but there seems to be resistance to this. Making it possible to enable deflate-stream in one direction only, i.e. from server to client, would also seem to be a potential win as you’d get your deflated stream in the direction that it can work and not waste time on data that can’t be meaningfully compressed.