Using Wireshark to debug UDP communication issues

2009-08-19

A customer of mine has been having some problems with communication between a UDP server and their load test client. The UDP server implements the ENet protocol which provides for reliable data transfer over UDP. Their problem was manifesting as the client not getting some ENet level ACKs for some reliable data. The Wireshark log from the client machine showed the client resending the data when the ENet retransmission timeout expired and also showed that the ACKs for these packets never arrived. The communications continued normally until the client disconnected due to a final timeout for the missing ACK.

A quick look at the server source and I could see that this situation should never be able to occur. The test harness for the ENet protocol code also had plenty of tests in place for the correct generation of ACKs and all of these passed OK. The Wireshark log from the client machine showed that the server had obviously processed the packet for which the ACK was missing as the application level response had been sent and we could see that in the log on the client. Application level responses had also been sent for later packets, and our ENet protocol implementation wouldn’t have allowed that to happen if the server really hadn’t received the missing packet as all of the sequenced packets that were due for delivery after the packet that hadn’t been ACKed would have been queued. So it looked like the server had received the packet in question; my code review of the server code showed that the server MUST have generated an ACK for that packet. It seemed like the datagram containing that packet was just being reliably lost…

The first thing to realise when you’re debugging network traffic with Wireshark is that your log only contains what the machine that you’re logging on is seeing. To really understand what’s going on you may need to log in multiple places; usually a log from each end of the connection should be adequate, but on more complex network topologies it’s nice to be able to have a log generated on each network segment that you have. Without all of these logs you’re only getting part of the picture. The fact that the client machine doesn’t receive a particular datagram doesn’t necessarily mean that the server never generated it.

Once my customer started taking a Wireshark log from the server machine as well as from the client machine it quickly became clear that the problem wasn’t our code. The server log showed the server generating and sending the ACKs that the client log showed as missing. The problem wasn’t in the client or the server code but somewhere in the networking infrastructure in between them.