Home Contact Customer Jobs

Mu Line Blog

Categories

Blogroll

Meta

Want to know what's new @ Mu? Enter your email address to receive Mu Dynamics news.

"Mu’s approach of sampling a customer’s actual service traffic for test creation rapidly improves IP service product quality, time to market, and effectiveness. "

Mike Monticello
Principal Analyst, Security and Risk Management
EMA Associates

        
Follow Mu on Twitter  |   |   |   |   |   

Dealing with Emergent Complexity by Improving Software Engineering Processes

by Thomas Maufer on 22 January 2009 - 10:03:37 AM

Efficiently Addressing Emergent Complexity Requires Improved Processes

For an example of a process improvement, look at uninitialized pointers and buffer overflow opportunities in string handling routines. Both of these were relatively easy to catch, even by a manual code review. They are much more likely to be caught now because people have seen these problems over and over (and over!) again and have learned what to look for through a very painful education process. But any networking product code is complex not only because the code is intrinsically complex, but also because it isn't executing in isolation. Protocol implementations are a very special (and especially difficult) kind of software because these programs have no control over their inputs, and because input validation is difficult.

Even the best programmers are perplexed imagining the near limitless ways for protocol exchanges to go wrong. The two types of bugs I just discussed are very localized to small blocks of code, and fairly easy to spot (again, you have to know what to look for). But when networked programs interact with other programs, or when complex function call chains exist within a single program, and when different people write the various parts of the code, it's much easier for mistakes to emerge from this complexity. In the Network TCP/IP model,
there are no delivery guarantees. Internet Protocol (IP) layer is connectionless (it's also referred to as "stateless"). It has really simple functionality (and relatively low complexity), partly because it's not reliable. That's really important!

Moreover, the networks that IP runs over are worse than simply not reliable:

  • Network traffic and applications may be corrupted, including truncation or even having extra data appended.

There is a very weak header checksum in the IP header but it doesn't protect the rest of the packet: The IP payload. In fact, it barely protects the IP header!

Many people might assume that the MAC checksum protects the frame. While it's true that the MAC (Ethernet) checksum is much stronger than the IP header checksum, it only protects the frame when it's on the wire -- not when it's inside a switch or router! So a packet can be fine when it arrives at a switch, be corrupted inside the switch, and when it leaves the switch on the outbound interface, the packet will have a newly calculated checksum that *will* be correct, but the packet is no longer the same as the one that arrived!

Finally, the MAC checksum, while admittedly stronger than the IP header's checksum, can't detect a wide class of multi-bit corruptions that can change the packet without affecting the checksum. These classes of corruption are therefore undetectable.

The only guaranteed way to ensure that a received packet is identical to what was sent is to use a cryptographically strong checksum that depends on a securely negotiated session key.


  • Traffic may be duplicated, sometimes spectacularly.

  • Traffic may be reordered or delayed by varying amounts.

When network traffic traverse WANs, some of the above effects might be more likely than in LAN scenarios, but they can appear anywhere. It's really hard to write code that can efficiently
expect the unexpected.

  • Networks connect implementations of standards written by different people - this has nothing to do with malicious network exploits

  • Interoperability (or the lack of it) means two communicating implementations won't behave exactly the same in all circumstances. This divergence of behavior causes or exposes bugs previously not visible if the implementation only received standards-compliant traffic.

Is a bug still a bug if it only appears for certain classes of input? Absolutely! A developer can't possibly predict what kinds of broken traffic their code will be presented with in real-world networks. Code that only accepts standards-compliant traffic would be too brittle to use in an open IP network and would crash under the slightest provocation.

  • Software has bugs and some network traffic packets will *start out* broken, at least in the eyes of the receiver. Whatever damage the network does before the packets arrive at the receiver will serve to make those packets worse, not better. Packets that start out broken will not be fixed by the network.

Even if a receiver can tell that a packet is wrong, sometimes there is enough good in it that the receiver can figure out what the sender meant. This is the basis for the Postel's Law: "Be conservative in what you do and liberal in what you accept from others." In practice, that's really easy to say but very hard to do. Inferring meaning to the sender when packets arrive over an actively malicious network is very hard. It's not surprising that programmers write code that isn't perfect.

The reason that TCP is connection-oriented and reliable is that some applications need more reliability than what IP provides (i.e., none at all). TCP exists to provide a reliable, ordered byte stream. UDP is connectionless like IP, and simply exists to provide a multiplexing layer above IP so that multiple UDP-based applications to hide behind the same IP address by using different UDP ports. Again, UDP is stateless (connectionless) and the UDP checksum (like the TCP checksum) only protects the header, not the payload. The implication of UDP being stateless is that application developers have to implement their own customized reliability mechanisms. Unfortunately, it's not easy to figure out how TCP works and reverse-engineer just the pieces that they need. Achieving reliability is hard, especially when code is going to be deployed in aggressively hostile environments.

But what do we do about this? Graduating from college was hard, but people do it all the time. The answer lies in the second part of the statement I quoted from the eWeek article:

.

That's the key, really: Automated processes. But the processes need teeth: The right tools.
In the final segment of this blog posting, we'll look at how Mu is able to integrate with the software development life cycle to provide testing solutions that embrace, rather than ignore, the complexity inherent in the behavior of network protocol implementations.

Comments:

Write a comment

  • Required fields are marked with *.

If you have trouble reading the code, click on the code itself to generate a new random code.
Security Code:
 
 
Solutions | Products | Customers |Resources | Support | News & Events | Company | Labs | Contact | Home