How to fix flaky tests

How to debug flaky tests

After working in the Test Automation industry for several years, I have learned that it’s easy to write hundreds of tests, but it’s often difficult to maintain those tests and keep them useful.
Flaky tests are one of the biggest hurdles in maintaining reliable automation frameworks. Often we hear testers complaining that tests are failing on CI but pass locally, or some tests fail intermittently. These statements result in teams losing interest in automation and testers losing faith in their own tests.
So it’s really important to get into the root cause of those failing tests. Here are a couple of steps which help me to debug flaky tests:
The first thing to keep in mind is that when a test fails, there is definitely something wrong – flakiness is not magic. Trust your tests and start debugging!

It is easy to ignore failing tests

How many times have you found a failing test and just blindly rerun it until it passes? It’s important that as good test automators, we don’t ignore these random failures, but instead quarantine and then systematically fix them. Similarly – how many times have you seen a failing test and automatically assumed the problem is in your test code?
Don’t forget to first of all verify that there isn’t a bug in the application causing the failure.

Run Locally

One of the first things I do upon finding a test failing on my CI system is to run the failing test locally. If a test is failing on CI, but not locally, then that can indicate some differences between how the tests are being run. I usually start by checking that the build machine has the same environment configuration as my local machine (e.g. browser, device, os etc).
If there is no difference in environment then I look for network issues. Sometimes a CI box takes longer to perform some actions than the local machine. In this case custom wait methods could be helpful.

Isolate the failing tests

Group the failing tests and try to find out a common theme. Is it a particular browser, device or functionality where the tests are failing? Often I find that there is a common theme between failing tests. Being an iOS automation engineer, I often find tests failing either in iOS 8.1 or 7.1 or failing only in iPads or iPhones irrespective of the operating system. This gives me a good base to start debugging.

Understand the functionality

When tests fail randomly, look for a pattern in the failures. Often I have found that tests fail around a certain functionality. For example, once I experienced that most of the time tests were failing around a particular functionality in my app, I had a chat with a developer and told them that this particular area of app does not look very stable. After doing some investigations we found that there were some issues in the app around that functionality which were causing the failures. The developer fixed the issues and tests became stable.
So, next time when your tests are flaky, do check if it’s a particular functionality around which the tests are flaky.

Run tests in combination

Sometimes a test passes when it’s run individually but fails while run in combination with other tests. If that is the case, I check the test/tests which ran previously and run both together. Sometimes previous tests put the device or browser into a state where the next test cannot continue, and if you are not resetting the device or browser between tests, it can result in failures.
For example, I had some tests which were using stubs and I was adding a stub in at the beginning of each test and deleting it at the end. I was not doing this before and after every single test because not all scenarios were using stubs. Initially the tests were fine, but after some days they started failing and in addition some other tests also started failing which were otherwise stable and running successfully when run individually.
After some debugging I found that whenever a test with stub failed before reaching its end, the stub was never removed and all following tests ran with the stubbed data instead of real data.
The solution we followed for this was to separate out the stubbed scenarios and add/delete stubs before/after each scenario so we do not end up in this hanging state.

It’s not always tests

Recently I found a test which was crashing the app only in iOS 8.2 but the tests were passing on iOS 7.1. It was quite easy to debug this problem after I looked at the stack trace and did not find anything which could be caused by the tests, so I paired up with a developer and they figured out that the reason for the crash was that the api we were using was not working for iOS 8.2.
So next time when you see a flaky test do not just jump onto fixing the tests, they could be just doing their job of telling you that something is wrong in your app.

Do not get overwhelmed by failing tests

It is quite easy to get frustrated while trying to find the root cause of these random failing tests. Often it is not straightforward and you may not find anything, even after spending a good couple of hours, this is part of the joy of being a tester.
It can help to think of a strategy. If different tests are failing on different devices, then stick to one device first, fix all the tests, and then move to the next one.
Pair up with another tester or developer. I always find solutions quickly when I pair with someone, and it’s always good to have another pair of eyes.
Set time limits. For example – if you do not find the issue within a certain time limit, then seek someone’s help or put things on hold for some time, freshen up, and then start again. The key thing here is not to get overwhelmed and give up. You wrote them in the first place, so you can definitely fix them!
Thanks for reading.
~ Preet

Comments are closed.