UI Automation Out of Control
When many people think of test automation they envision rudimentary scripts with hard-coded events and data that manipulate user interface objects much the same way a customer might interact with the software to accomplish a pre-defined, robot-like task. Perhaps this is the reason there is a plethora of tools available to business analysts or super-users hired as ‘black-box’ testers to help them record and playback (or list keywords to sequentially step through) some contrived set of steps they think a customer might perform. Sure…it’s cool to watch windows open and close, and the mouse cursor move across the desktop as if by magic. But, for anyone with half-a-brain the visual amazement lasts for for oh….about 1.7 seconds….after that it is mind numbingly boring! Unfortunately, this automation is usually short lived, requires tremendous overhead in terms of maintenance costs, and contributes to the exceedingly high percentage of failed or less than successful automation projects.
I will say that in general I am not a big fan of GUI automation for a litany of reasons, but mostly because it is misused, or applied to tests that are simply more effectively and efficiently performed by a person. However, I also understand the GUI automation does provide value when used in the right context. I am not against GUI automation, but I certainly don’t advocate automating a test just because we think we can, or because we have some bizarre idealistic belief that everything should be automated.
For example, in one situation I spoke with a tester whose manager wanted him to maintain a legacy test designed to detect the correct color of an arrow symbol after an action was performed. If the action completed correctly the arrow was green; and if it was unsuccessful the arrow appeared red. Now, besides the fact that we could have just as easily automated a test to check the HRESULT value, this test could have been executed by a user within a reasonable time, there was little probability of change in this area of the code, and there were no dependencies. However, the manager insisted this GUI test run despite this test which used image comparison as an oracle was notoriously problematic. (This shouldn’t be surprising since many image comparison oracles are notoriously problematic and throw an inordinate number of false positives.)
The tester said the manager claimed by automating this test it would negate a tester from having to execute the test manually thus saving time. What??? This tester was spending hours per week chasing down false positives and tweaking the automation to “make it work” on the daily builds just to make his manager happy. So, although this feature was used repeatedly by hundreds of people dog-fooding the daily build, another few thousand people around the company self-hosting internal releases, and thousands of customers using beta releases this particular manager determined continued tweaking of this test would save some tester’s time!
In another example a tester inquired how to automate a a test to determine IF the order of the slides in a power point presentation had changed between different copies of a .ppt file. Of course, the question was followed by a flurry of responses suggesting creating a base set of images of each slide in the deck, and then using an image comparison tool to identify changes. I responded a bit differently. First, there are several ways to programmatically detect file changes, and if we detect changes in the binary properties we can easily open the Power Point presentation in slide sorter view and take a few seconds (depending on the number of slides) and visually compare it against an original. Sure it is a bit slower than an automated test, but I really suspect it would be more effective and probably even more efficient in the long run. I also wondered how many times this “test” would actually need to be ran during whatever project this person was working on (it wasn’t PowerPoint) in comparison to the hours/days it would take to develop such a test, and the ensuing maintenance nightmare.
These are just 2 examples of the misuse of automated UI testing that I think illustrate a few important points:
- Not all automated UI tests save time!
Tests that require constant massaging and tweaking because they constantly throw false positives take up a huge amount of a tester’s time in wasted maintenance. - Sometimes a human is a more efficient oracle than a computer algorithm!
Sure, just about anything a computer does can be automated to some degree in some fashion, but there really are clearly some tests where it is more prudent and simpler to rely on a tester. - Don’t rely on automation to emulate your customers!
Test automation does not effectively emulate a human user. Sure, we have test methods in some of our internal automation frameworks to slow down simulated keystrokes (the actual keys are not being pressed on the keyboard), or simulate multiple or repeated clicks on a control or the mouse, and other tricks that try to emulate various user behaviors; however, test automation is generally poor at detecting behavioral issues such as usability, ease of use, or other customer value type assessments. Rely on the feedback from internal and external customers who are dog-fooding, self-hosting, and beta-testing your product (and act on it). - Go under the covers!
I think many testers rely too heavily on UI automation because they think it emulates user behavior (although most things such as populating a text box are simulated via Windows APIs), or perhaps because they don’t know how to dig into the product below the surface of the UI. Whatever the case, think about the specific purpose of the test. If it is easier to check a return value, or call an API to change a setting then go deep…and stop messing around on the surface. (It only complicates the test, wastes valuable machine cycles, reduces reuse across multiple versions, and often leads to long term maintenance costs. (For an example of this see my previous post.) - Constantly massaging code contributes to false negatives!
I have seen many cases where a tester designs a a UI automated test, and then tweaks a bit here and there to get it to run. Often times this tweaking contributes to a tests ineffectiveness in exposing problems, and may even hide other problems. Also, some tweaks are geared around synchronization issues (sync’ing the automated test with the system under test) and involve artificially slowing down the automation (usually by stopping or ‘sleeping’ the automated test process for a specific period of time). Other tweaks might hard-code parameters that then make the test fail on a different resolution or non-portable across different environments. - STOP trying to automate every damn test!
As I stated before…just because we can automate something doesn’t mean that we should try to automate everything! We need to make rational decisions about what tests to automate, and what is the best approach to automating that test.
It is easy to be lured in by the siren call of UI automation. I write automated tests to free up my time to design and develop more and different tests, and so I don’t have to sit in front of the computer executing redundant tests, or constantly massage code to make it run. Automation is a great tool in the arsenal of competent professionals who understand its capabilities and know how to exploit its potential. But, it is one of many tools in our toolbox; and the best tool is the one sitting on our shoulders. Use it!
-
WesleyB7712 Aug 2009 1:36 PM#
I couldn’t agree with you more.
We had tests in a previous team of mine that tested parts of MSTIME (Microsoft Timed Interactive Multimedia Extensions) by comparing a known good screenshot to a screenshot taken using the binary being tested. I arrived a few years after the code was written, and the nightmare was not over. There were a lot of problems. Due to the sheer magnitude of environments and platforms and machines being tested, there were always a high number of false positives.
Sometimes certain video cards on the machine would colour a pixel differently (FAIL), or a frame would be skipped (FAIL), or it would render slowly (FAIL), or ClearType was enabled on the machine (FAIL), or (FAIL) or (FAIL) or (FAIL) or (FAIL).
It was very frustrating. At the very least though, I learned the hard way that automating everything does not solve all your problems.
-
I.M.Testy13 Aug 2009 1:52 AM#
Thanks Wesley. These are some great examples of things that could cause problems with visual comparison in automation.
There is some interesting work being done with visual comparison such as histograms, color and pixel tolerance maps, etc.
This is all interesting work, but the simple fact is that sometimes it is easier, more reliable, and cheaper to validate UI visuals via usability testing, end-to-end or user scenario testing, exploratory testing, dog-fooding, self-hosting, beta feedback, etc.
Many of us have been down that hard way path...but sometimes the school of hard knocks provides the best education.