Twitter Diffy Spots Bugs in Services by Comparing the Responses

Twitter has open sourced Diffy, an automated testing tool used in production for discovering potential bugs in new code running on Apache Trift and other HTTP-based services.

Unlike unit testing tools which run predefined sets of tests against individual pieces of code, Diffy behaves like a functional testing tool running against an entire web service. Diffy acts as a proxy receiving HTTP requests and directing them both to a good service and a potentially buggy one, the later containing the new code under test. It later receives the answers from both services and compares them to discover any regressions in the code.

It is obvious that there are many chances the responses from the two services to have small inherent differences due to different timestamps, the use of generators of random values or other reasons. To eliminate false positives generated by such differences, Diffy actually runs against three services: two based on identical good code and a candidate based on the new code. The responses from the first two instances should be identical except for non-deterministic noise (timestamps, random values, etc.) which is ignored when comparing with the candidate’s response. Finding the noise and eliminating it from comparisons is important in avoiding false positives.