Paul is passionate to quality and testing. He plans and implements automation program architecture for platforms, frameworks and tools. His techniques are widely used in such niches as telecom, stock trading, e-commerce and healthcare.
In the United States, the IRS is the Internal Revenue
Service; these are the tax people. I’m oversimplifying the process here, but once
per year, we submit tax documentation related to our income. Based on the
amount of money we made, and the taxes already paid, we either owe additional
taxes or we are due a refund. If, however, your income tax documentation looks
“suspicious”, you’ll be flagged for an audit: a formal, structured
examination of your tax records that can go back several years; I hear they can
be rather unpleasant. But I’m heading off topic.
In a previous blog post, Don’t Eat Stale Automation, I wrote about performing audits on our automation so that we can get rid of automated scripts that are no longer providing value; if the scripts are not providing value, they’re likely costing us too much to execute and maintain. In this post, I write about how automation audits help with other maintenance activities as well.
As I’ve been known to say in some of my talks, we don’t
release the exact same software twice, meaning we’ve changed one or more things
between Release N and Release N+1. Most often changes are fixes and additions
to existing code but may also be the removal of capabilities.
So, how does that affect our automation?
We have several questions to ask when our application’s
code is changed:
Should we add any automation to help us test
this changed code? If applying some technology to help us do our jobs is
valuable, then yes, yes, we should.
Are all our automation scripts still valid? If
not, what should we do about it? Should we fix the scripts that are no longer
valid, or should we remove them?
Did we expect the changed code to break any of
our existing automation? If so, did those scripts, in fact, break? If not,
perhaps we have a flawed understanding of what our automation is doing for us.
We cannot answer these questions unless we know and
understand what our automation is doing for us. Automation audits help us
obtain this information in two ways:
Script reviews when the automation is first
created. These reviews can range from structured code reviews to informal
script walkthroughs. The intent of these reviews is to ensure, as best we can,
that we understand what the scripts are doing for us.
Audits of our scripts at appropriate times.
Appropriate times is a context-dependent term but will usually be associated
with times of change: change of the product, change of the scripts, change of
the automation tool version, change of supporting infrastructure (e.g.
operating system or database versions).
We must obtain and maintain this knowledge for our
automation to continue being valuable. If this knowledge is not current, we
cannot trust that the automation is performing the expected tasks for us; this lack
of trust will cause us not to use the automation and we’ll receive no value
from the effort used to create it. That’s bad business.
We should also audit the logs, results, and error
messages that are produced by our automation. Though those topics could have
included here, I think they are important enough to warrant their own blog
post. Stay tuned!
I was a speaker at
the 2019 Automation Guild Conference. As part of the conference, I participated in a live Q&A session,
but we ran out of time before I answered all the questions. I decided to blog
the answers to some of those questions.
I have seen the term used ‘test coverage’ too many times. Is investing in automation just because I can run 500 test cases instead of a 100 worth it? Or, is it better to look at what current changes are and only plan to run that automation?
I, like many others of my generation, grew up watching
Sesame Street; it might even have been on twice a day, but I’m not sure about
that. My favorite character was Mr. Snuffleupagus, but my second favorite was The
Count. Maybe it was the accent, or maybe it was the counting. I loved the
counting: one test script, ah, ah, ah…two test scripts, ah, ah, ah… You get my
But, could we have watched too much Sesame Street? We
have developed a fascination with counting things. Sometimes that’s good;
knowing the quantity of something can certainly be valuable. That said,
sometimes counting things is not beneficial. Generally, test case or test
script counting is not beneficial.
The first of the two Automation Guild questions above makes me think of counting: “is investing in automation just because I can run 500 test cases instead of 100 worth it?” As usual, the answer depends on your context, your situation. Deciding to automate is a business decision because there’s an opportunity cost there: if you’re spending time automating, you’re not spending time doing some other activity. The value that the automation provides must be greater than the effort to create and maintain the automation; if there’s insufficient value, then don’t automate. We need to recoup that opportunity cost in some way. Also, remember that automation is in support of testing activities. If creating, running, and maintaining 500 test scripts is not helping your testing effort, then don’t do it, even it’s possible to do it.
The second question is related to the first but requires
a different answer than the first, the second question being:” is it better to
look at what current changes are and only plan to run that automation?” There’s
a tendency today to think “we have all this automation, so we may as well run
everything on each deployment”. We’ll certainly get more coverage more often,
but this additional coverage comes with a cost.
The more scripts we execute on each deployment, the
longer it takes to get feedback about the quality of the deployment. We can
somewhat mitigate this by first running the scripts that are “likely to
exercise the changed product code” and then running the remaining scripts. Note,
however, there’s an opportunity cost associated with running the remaining
scripts; someone must review the results of the executions. For some
organizations, the opportunity cost is sufficiently low, or the value is
sufficiently high, to make running all their scripts on every deployment a
valuable endeavor; for other (dare I say “most”) organizations, the opportunity
cost will likely outweigh the value.
Getting back to The Count, counting things is valuable in
some cases. From a testing standpoint, the question of “how many automated test
scripts do we have?” is generally unhelpful and may, in fact, detract from
important testing considerations. From a standpoint of execution duration or code
hygiene, however, the number of automated scripts can be interesting because
that number may affect business decisions pertaining to owning and executing
automation. So, let’s keep counting, but let’s be responsible about it; it’s what
The Count would want.
I was a speaker at
the 2019 Automation Guild Conference. As part of the conference, I participated in a live Q&A session,
but we ran out of time before I answered all the questions. I decided to blog
the answers to some of those questions.
When we talk about BDD, the Gherkin itself is a layer. So, in that case, what could be an ideal pattern for layers? I guess too many layers will make it tough to maintain.
As readers may know, BDD is a hot-button topic for me. Though I wrote about it at length in this article, in the spirit of TL;DR, my stance is that BDD is a framework to attain shared understanding about our user stories or work items, the automation piece is merely a convenience. That’s not to say, however, that the convenience isn’t valuable, it’s just not the primary value.
For the sake of this specific question, I’ll just address
the implementation of “the convenience”, better known as “the automation”.
Much like the question
addressed in my previous blog post, the question posed above has two parts.
The first part is “what could be an ideal pattern for layers” when using BDD
tools. Generally, there is no such thing as “ideal”, only the “most appropriate
for my situation”. That said, I’ve used the layers below with two different
For those not familiar with it, SpecFlow is basically the .Net equivalent to Cucumber. To map the
layers above to the layers from my Automation Guild
SpecFlow Feature Files, Gherkin = Scripts
SpecFlow Step Files = Middle layer
Page Objects = Lower Layer
Notice that in the mapping above, the SpecFlow files are the implementation of the script and middle layers; those files don’t reside on top of those layers.
This brings us to the second part of the previously stated question, “too many layers
will make it tough to maintain”. Yes and no; we’ll take the “no” part first. As
mentioned in the previous paragraph, my implementation replaces other layers’
implementations with SpecFlow, therefore, no additional layers are added, so we
should not have “too many” layers. If, however, we decide to implement on top
of the rest of our layers, we might
have “too many” layers; this is the “yes” part of the previous scenario.
To reduce the impact of “too many” layers, each layer we
add to our automation stack must
provide some sort of value. If there is value to implementing SpecFlow on top
of your existing layers, then, do so! To address the “tough to maintain” part
we need to understand that this frequently has to do with the challenge, hence
the cost, triaging automation script failures. This part of the maintenance
cost comes from the difficulty of diagnosing a failure when we can’t get enough
detail on where and why an error occurred. The general cure for this difficulty
is adding appropriate logging at each layer then allowing that logging to
trickle up to the user so that they have a good idea of what the failure meant
at each layer. For example:
Notice that at each layer a message was added, and that
message was representative of what the error meant at that layer. No errors or messages were obfuscated or eliminated.
Having this level of transparency is essential to reducing the maintenance cost
of triaging failures.
I hope this answer was not too vague. If you did find it so, my apologies, but you’re in luck. My talk at the Spring 2019 STPCon is about creating an automation stack. You can find the details here and you can contact me for a discount code! You can also check out a bit of an audio preview at that link.
I was a speaker at the 2019 Automation Guild Conference. As part of the conference, I participated in a live Q&A session, but we ran out of time before I answered all the questions. I decided to blog the answers to some of those questions.
For an organization that is in the beginning stage of the automation journey, automation seems to be test team focused. How do we bring the systems team in sync with the automation exercise so that requirement changes don’t lead to too much maintenance overhead (which leads to trust issues)?
There are two topics to unpack in this question. The first topic is the statement “automation seems to be test team focused”. In general, I don’t find this to be a problem. When starting a new endeavor, it can be strategic to limit the scope of some activities in that endeavor. In this case, limiting the scope to “applying technology to help the test team” seems appropriate to me.
The second topic is the phrase “bring the systems team in sync with the automation exercise”; for purposes of this post, I’ll assume “systems team” means something like “IT”, “Ops”, “computing infrastructure”, or the like. In general, corporate business goals set the direction and all the team members should be working in support of achieving those goals. Developers should be creating software that is in line with these goals; likewise, testers should gear their testing and automation toward achieving those goals. The systems team should also be aligned with achieving the business goals; that team is, of course, part of the company too.
The complication is that systems teams’ goals are not always directly aligned with corporate goals. Often, their incentives, reviews, and bonuses are tied to things like maximizing system uptime, reducing cost, and infrastructure standardization. These are not bad things to work toward but they are often at odds with delivering new software; the incentive for systems teams often revolve around minimizing change, but software delivery requires change by its very nature.
So, what to do? Talk. Talk to the systems team, let them know how you are attempting to achieve corporate business goals. Partner with them; work together to achieve those goals. Though I’m greatly oversimplifying the creation of the partnership, I’ve found it to be very effective in getting goals accomplished. This post may be of interest as well.
In late 2016, I wrote this blog post describing how I define automation for testing. For the TL;DR crowd, here’s the definition I typically use for automation:
automation is the judicious application of technology to help humans do their jobs.
I still use this definition today. When I talk to clients, this definition gives me additional latitude in how I help them by introducing automation as a force-multiplier for testers as opposed to strictly a replacement for tester tasks.
Why am I bringing this up again and why now?
Over the last couple of months, I’ve seen social media posts and been involved in chats about automation not being feasible, valuable, appropriate, etc. early in the development process. Though I’ve heard, and probably said, similar things over the years, I now think I have my brain sorted enough to write about this topic. Based on the above definition of automation, I suggest a different mindset regarding “too early”.
First, here are the thoughts I have regarding some common refrains I hear regarding early automation:
It’s too early to automateIf we apply the above automation definition, this is not true. It may be too early to build traditional automation based on test cases like smoke or regression testing scripts, but we may be able to write some software to help testers during this time of churn. Data creation and log collation scripts are typical candidates for early automation as are scripts that “turn the crank” for testers; these kinds of automation free testers from repetitive drudgery so they can spend more time on knowledge work. The article here may offer some ideas.The GUI is not stable yetTo this, I say: so, what? Not all automation needs to be driven via the GUI. In particular, I like to ask the question, “are you automating the GUI or are you automating through the GUI?” If the latter, automating “behind the GUI”, i.e. API or service-based automation can provide value while not having that value eroded by GUI churn.That interface isn’t finished yetAgain, I say, so? Automation is not an all or nothing venture; if we can obtain value by having technology help us do part of our jobs in a way that provides value, we would be remiss in our duties if we didn’t explore such opportunities.
When discussing a specific task, some questions I like to ask regarding “should we automate this task now” are:
How many times will we perform this task?
What value will we gain from automating this task?
When could we start using the automation?
When do we expect the automation to be available? Note that we can answer this question even if only know when part of the automation will be available; this subset may still be useful enough to provide value.
Do we expect the automation of this task to be long-lived or disposable?
Will this automation bring more value than cost? If so, why do we think that?
Of course, the answers to the questions above will vary between teams, applications, and the specific tasks themselves. Though we should not automate every task and not all tasks are candidates for early automation, we should seriously consider automating any task where that automation would provide appropriate value…regardless of when in the application’s lifecycle we start that automation.
While writing my previous blog post on automation execution environments, I had these thoughts bouncing around in my head as well. Additionally, I’m cleaning out and rearranging my home office, which reminded me of all the times my mom told me to clean my room. It’s a weird place, my head. Anyway…
What’s running on your desktop right now? Here is a
subset of the things running on my desktop: Outlook, Chrome, Firefox, Keep,
OneNote, Visual Studio, Bash, Skype, and Teams; again, that’s just a subset and
I haven’t even listed any processes running in the background. Additionally,
I’m not always running all these applications. Though not everyone is running
the same applications as everyone else, very few of us close all other programs when we run automation on our
At some point, we will receive a new deployment of our
application and we need to execute some of our automation against that new
deployment. Since more people than just us are interested in the results of
that execution, we also need to have an accessible record of the results. I
call this execution a run-of-record
for that deployment. i.e. the run that matters. It’s the one that lets us know
“hey, we think we have a good deploy here” or “hey, we think something is amiss
with this deployment”.
Often, especially when we first start our automation endeavor, we make our runs-of-record on our desktops. There are challenges with this approach that may be inconsistent with the intent of our automation run. As part of an automation run, we usually set our app to a known, steady state: we set up processes, data, user credentials, etc. so that we can execute specific scenarios. We tend to execute the scripts against the same sets of data and application configurations, so we approximate the same execution steps on each script execution. I liken this “known, steady state” to manufacturing cleanroom, a room designed such that it does not interfere with the manufacturing process. Cleanrooms are intended to reduce the unpredictability that can be introduced by unwanted particles and airflows.
If we take such care when configuring our application,
why shouldn’t we take similar care when setting up our execution environment?
For most of the software we test, a user device implicitly becomes part of our
application’s “execution ecosystem”; it becomes part of the application because
some portion of the application’s code runs on the hardware owned by the user.
It’s the same for automation execution; the environment on which we execute our
tests implicitly becomes part of the application. The extent of “becomes part
of” varies from case to case; for example, for browser-based automation, the
browser and the device are quite intertwined with our application software,
whereas if we are testing via APIs the amount that the execution environment is
part of the application is typically far less.
Since our execution environment becomes part of our application, we want to be similarly deliberate in how we configure this environment as we were when we configured the application itself. It’s often difficult to exercise that level of configuration when we use our desktops for our runs-of-record. If we use dedicated execution hardware for these runs, we have more control over that environment. We have control of the environment’s configuration, including the specifications of the processor, disk, and memory; we also have control over what software is installed and running in that environment.
Now, we can set up our execution environment to be our
version of a cleanroom. We can know exactly what is installed and running and
ensure that we only run the minimum OS background tasks; we can also ensure no
other programs are running that might cause fluctuation in our application
software’s behavior. We can now have some level of confidence that the part of
our application that runs on the user’s device will run consistently and,
therefore, our automation will execute consistently.
Does this reflect the real world? Not really. Our users
are more like us than a cleanroom; they have multiple applications running,
multiple browser tabs, streaming music, etc. They have what we might call a dirty room, i.e. a room whose
cleanliness we can neither know nor control.
Running our automation in dirty room environments exposes our
application execution to these real-world situations. The downside, however, is
this reduces our repeatability.
How do we decide whether to execute in a cleanroom or a
dirty room? Like so much in software development, it depends on what we are
trying to do.
Are we trying to determine if a deployed version of the
application is egregiously broken? If so, a cleanroom
may be a better approach. Executing in a cleanroom environment helps to reduce
the number of variables on a per execution basis; the more predictable our
execution environment, the less likely that an issue is due to a fluctuation
caused by something “dirty” in that environment.
On the other hand, are we trying to chase out problems
and expose risks? In this case, a dirty room may be a better approach. The more
that our execution environment is like a user environment, the more likely we
are to find issues that pertain to a user’s environment. Dirty room testing is
valuable because it better represents actual user situations.
Do we have to choose between cleanrooms and dirty rooms?
Perhaps, but that is a business decision. If it’s sufficiently valuable to use multiple
execution environments, then we should consider doing so. Perhaps our cleanroom
environment is for traditional automated smoke scripts, giving us a quick,
stable automation run to test the minimum viability of a new deployment. Once
we are ready for more in-depth testing, perhaps that’s the time to run our
scripts, the same ones as before or others created expressly for this purpose,
in our dirty room.
Could it be that mom wasn’t always right? I’m sure not telling her that.
You’re the person responsible for your team’s initial automation suite. Using your desktop development computer, you created all the scripts, tested them extensively, and now they are ready to be used. They are based on your existing smoke test suite, so you plan to run them each time there is a deploy; the test team won’t need to perform that activity anymore. The automation is a rousing success, saving multiple hours of tester time every week! You plan a well-earned vacation; you’ll only be gone for a week.
Who’s going to run the automation in your
Your infrastructure for automatic script execution isn’t ready yet, so a lucky team member is conscripted into executing the automation on your behalf. As part of your due diligence, you help your team member get the automation downloaded or installed on their desktop development machine. Now, you have them run it just to, you know, be sure they can run it. They can’t. The scripts fail to execute. You work with your team member to sort out the issues:
They need a specific directory in a specific location on their local disk; they create the directory.
You have a file path that’s hardcoded to your user directory; you make a code modification and redeploy the automation.
You make a call to an external program that’s not one their desktop; you help them get it installed appropriately.
Finally, the scripts run, but they don’t pass; perhaps there is an issue in the application. To assess the situation, you run the scripts on your machine. They pass as expected. After further investigating, you discover that your team member’s machine is an older, less-powerful machine than yours is; it runs the automation slower causing the application to timeout while waiting for responses.
Oh crap! You’re supposed to leave for
While this specific story didn’t happen, I’ve seen parts of it happen and heard of other parts of it happening. The situation here is very much akin to “works on my machine”. The original author developed the automation to run on their machine without consideration of others needing to run the automation; in this case, “others” might be other team members or some automated system such as a continuous delivery pipeline activity. In other words, the automation was not developed for portability.
Portability has two main facets:
Execution environment; the story above mostly shows this facet. Automation must not be bound to a single computer.
System under test. Most of us have more than one environment against which we need to test, meaning we usually have more than one environment against which to execute our automation. In most cases, automation must not be bound to a single test environment.
Here are some potential danger signs of which we should beware:
Anything that’s hardcoded. Hardcoded values limit our ability to be flexible. Most language ecosystems have idioms for handling application configuration; it’s not different when that application is automation. Learn and exploit the idioms for your language’s ecosystem.
Code that relies on specific file paths to exist. While not necessarily something to unconditionally avoid, we need to understand that the more specific the file paths are, the more brittle the automation will be. A better approach is to use paths that are relative to where the automation software is located. Also, when possible, make the automation create directories instead of assuming they already exist.
Code the relies on external applications. Again, it’s not necessarily something to unconditionally avoid, but we need to make provisions for installing that necessary software and give helpful error messages when the application can’t be found.
Code that relies on individual user configuration. Though this can be a powerful capability, we must realize that provisioning each user with the necessary attributes is not a zero-effort activity, especially when we need to work with other organizations for user provisioning.
It may not be feasible to develop our automation to be executable on every computer in the company, but developing for portability increases the number of team members that can participate in the automation endeavor and portability is the first step into automated execution as part of a continuous delivery pipeline.
Hot on the heels of my previous blog, “Is It Automated?”, I was inspired to ponder this question: “Do automation scripts always pass or fail?”. As with my first StickyMinds article, I was inspired to think about this due to an email exchange with automation expert Greg Paskal; I blame and thank him for being this post’s catalyst.
Greg sent a question to our colleague Mark Bentsen and me asking about vocabulary for non-Boolean test results. His question pertained specifically to test scripts that were likely to fail eventually. Some of his results indicated that the scripts were taking longer and longer to execute. They were not yet reporting failures, but the behavior indicated a looming issue. Mark had the idea of adding another column to the execution report that depicts other aspects of the results, in addition to the original pass/fail. This idea is a quick, relatively inexpensive way to report the additional information, but it didn’t sit well with me…at first.
Let me explain.
I’ve long been a fan of pass/fail results for traditional “test-script-based” automation: if no assertions fail, then the script has a result status of pass, otherwise, this status is “fail”. A status of “fail” is an alert that someone needs to evaluate and act upon the results.
At a previous company, we had result statuses of “pass”, “fail”, and “warn”; “warn” was required by the test team. When pressed for how “warn” would be used, the response was “to indicate that the results would need to be looked at to determine pass or fail”. Fair enough; today I’d call those instances something like indeterminate, but I was not so sophisticated back then. I was, however, sophisticated enough to notice that there was a bit of “metrics massaging” when reporting “percent pass” and, in most cases, those scripts with a result status of “warn” would get treated as “didn’t fail”, and therefore “pass”.
In retrospect, most teams I’ve worked with eventually build two buckets of these result statuses. These buckets generally take one of two formats:
Pass or “didn’t pass”, where all “didn’t pass” scripts are treated as a fail
Fail or “didn’t fail”, where all “didn’t fail” scripts are treated as a pass
These realizations shaped my opinion to be “if we aren’t going to treat scripts with different results differently, then why bother with different results.”
Fast forwarding to today, Greg’s question and Mark’s response got me thinking: what are other legitimate states of automation results apart from pass and fail? I discovered that, for me, the problem isn’t really the non-Boolean result statuses; the problem is treating different result statuses the same. The key is presenting the statuses in a way that screams “Hey human, come look at this! There may be a problem!” so that they are not treated the same.
Great! We figured out that we need to present statuses in a way that is understandable by our team and also conveys the appropriate sense of urgency (added dimension on a dashboard, additional result status types, etc.). Now, we need the data to determine and present our new reporting facet(s). In Greg’s case, he had the data to make the “slowing down” determination, but many of us may not be logging at this level of detail. If not, we need to log additional data from our automation executions. We should look at our logs and determine if we have or could add information that would help us add one or more reporting facets. If we have access to them, cross-referencing automation logs with the logs from the program being tested may be immensely helpful. Imagine being able to mark a test script’s results as “indeterminant” because all the steps passed but our script detected a warning in the product’s execution logs!
I’ve focused on automation results in this post because that’s what Greg’s email was about. It’s possible that this extends to non-automation results as well, but, to paraphrase Alton Brown, that’s a topic for another show. Also, a big thanks to Greg and Mark for the inspiration.
Pop quiz, hotshot. There’s a bomb on a bus. Once the bus goes 50 miles an hour, the bomb is armed. If it drops below 50, it blows up. What do you do? What do you do? — Howard Payne, Speed, 1994
Probably the most famous line from the 1994 movie Speed, “Pop quiz, hotshot”, got me thinking about testing and automation. Specifically, regarding whether something is automated or not. As an automation developer/leader, I’ve often been asked pop quiz questions like “Is X automated yet?” and “When will Y be automated?”
I’ve begun thinking that “Is it automated?” isn’t a Yes/No question.
I subscribe to a broad definition of automation; I think that automation is an enabler and a force multiplier. When looking at it from that point of view, we can treat traditional, test-case-based automation as an implementation of automation as opposed to the definition of automation itself. This idea of enabler not only gives us the additional latitude to decide which activities to automate but also to decide how much of an activity to automate.
When we realize that we can decide how much of an activity to automate, we need no longer concern ourselves with “complete” automation versus “zero” automation on that activity. This is an important point because automation is not an all-or-nothing proposition. We can start putting a value assessment on parts of an activity which allows us to stop automating that activity once we determine that further automation on it is no longer a good business value. It also helps prevent us from creating Rube Goldberg machines that often cost more to maintain than the value that they provide.
So, back to the original question, “is it automated?” Simply answering “yes” or “no” leads us back to an all-or-nothing mindset; this is want to avoid. In the broad approach, the question leads us to more helpful questions such as:
Is it valuable to automate additional parts of this activity?
Are the automatable parts of this activity valuably automated?
Are there activities that are more valuable that we should automate first or instead?
Is the maintenance cost of automating this activity likely to erode the value we initially received?
Is this worth automating this at all?
Understanding that “is it automated?” is not a question with a binary answer allows us to focus on enabling business value with our automation implementations, as opposed to “automation at all cost”.