10.05.2017 · Artur Andrzejak · Universität Heidelberg

DLS_AndrzejakVorbeugen statt Neustarten: Was kann man gegen Software Aging machen?

Software aging is a phenomenon of gradual performance degradation of running software processes due to resource or memory leaks, accumulation of rounding errors, or other types of corrupted process state. The underlying software defects typically manifest after a long incubation time, and are frequently discovered only in a production scenario. Consequently, such problems can incur substantial follow-up costs, including decreased availability and performance of production systems, need for workarounds (e.g. controlled restarts), and high debugging effort for localizing the causes.

In this talk we first present results of an empirical study of more than 400 software aging issues found in 11 Apache Foundations (Java) projects. We discuss several surprising findings, including prevalence of seemingly easy-to-fix defects like incorrect exception handling, and the fact that most defects are found by manual code review. Our findings lead to several tangible implications on more effective prevention and diagnosis of software aging.

In the second part we focus on the challenge how leak-related defects can be found during software testing processes widely deployed in industry today. The inherent problem is that software tests should run as fast as possible, contradicting the long incubation time of visible failures.  To address this difficulty we propose an approach for automated leak detection by comparing the memory allocation behavior of successive software versions under development. This approach comes in two flavors: to detect that new memory leaks could have been introduced at all, we compare overall memory usage of two software versions and deploy statistical anomaly detection methods. In the second variant we compare the behavior of each code location which allocates heap memory. The latter version can pinpoint the root causes of potential memory leaks, but leads to an increased testing overhead and requires good code coverage by existing test code.