A Large Scale Study On the Effectiveness of Manual and Automatic Unit Test Generation

Beatriz Souza; Patrícia Machado

Beatriz Souza UFCG
Patrícia Machado UFCG

Resumo

Recently, an increasingly large amount of effort has been devoted to implementing tools to generate unit test suites automatically. Previous studies have investigated the effectiveness of these tools by comparing automatically generated test suites (ATSs) to manually written test suites (MTSs). Most of these studies report that ATSs can achieve higher code coverage, or even mutation coverage, than MTSs, particularly when suites are generated from defective code. However, these studies usually consider a limited amount of classes or subject programs, while the adoption of such tools in the industry is still low. This work aims to compare the effectiveness of ATSs and MTSs when applied as regression test suites. We conduct an empirical study, using ten programs (1368 classes), written in Java, that already have MTSs and apply two sophisticated tools that automatically generate test cases: Randoop and EvoSuite. To evaluate the test suites' effectiveness, we use line and mutation coverage. Our results indicate that MTSs are, in general, more effective than ATSs regarding the investigated metrics. Moreover, the number of generated test cases may not indicate test suites' effectiveness. Furthermore, there are situations when ATSs are more effective, and even when ATSs and MTSs can be complementary.

Palavras-chave: mutation testing, automatic test generation, empirical studies