Why QA Teams Should Own Test Data Management
Sherlock Holmes once impatiently blurted out, “Data! Data! Data! I can’t build bricks without clay!”
This quote is often used to highlight the importance of the data-driven approach and using data to guide actions and measure outcomes. But in software testing, the lesson is quite literal. We can’t test without data. And we can’t have good testing without good test data.
And we can’t have a good product or good user experience without good testing. This is why I am a strong advocate for QA professionals to prioritize Test Data Management (TDM) as an integral part of the overall test strategy and not as a tactical challenge that needs to be figured out in order to run their planned tests.
Admittedly, Test Data Management is tricky, and doing it effectively takes planning. It is important to think holistically about how the different data types drive content and functionality, to identify the various data dependencies the test scripts will require. Additionally, the test data strategy needs to account for the volume of data needed, as testing is enhanced by making the environment as realistic and “prod-like” as possible. Data privacy considerations also need to be accounted for and if production data is to be loaded into lower environments, some masking or scrubbing may be required.
So I recommend considering the data dependencies, volume needs, and privacy requirements when looking at the tools and/or processes to set up the test data.
I have recently worked with several teams to help them generate the data they need for testing on demand.
Where possible, I recommend the QA team own this piece because of the strong data dependency and connection to proper testing that exists. The two solutions I wanted to dive into for this article revolve around first repurposing the existing automation framework to build test data and then using a 3rd party tool to create test data.
Automation framework for test data
Naturally, this solution has some pre-requirements. The most basic being “a functioning automation framework”. But interestingly, due to a separation of duties, the same organization might have manual testers painstakingly creating test data or waiting on another group to prepare the data for them and don’t even think about leveraging the existing automation framework to create a specific, robust, and “refreshable” test data set. Naturally, there is a level of effort here as scripting the steps required to produce the desired data set takes time. But the same fundamental ROI calculations of automating a test vs. multiple manual runs apply here and the effort should show efficiency gains quickly.
There are numerous ways to develop these scripts based on both the automation framework in place and the relationship of the applications being tested to the data source. But for the sake of example, consider that there is a Selenium Webdriver automation framework in place and the application in a test is a benefits administration platform redesign. The application in the test has several transactional workflows that populate different content and workflows based on the demographics and prior historical elections of the population. For testing purposes, we need to create testers with a wide degree of variance in indicative data including age, marital status, number of dependents, medical coverage, and so on. If we use the Selenium framework to create the data types needed for new feature testing and regression purposes, not only can we save a lot of test execution time upfront, but we are also simultaneously preparing for a streamlined effort to introduce the new feature changes to the baseline automation scripts.
Test Data Tools
One resource that I don’t believe is used often enough by Quality Assurance testers is test data management tools. TDM tools can perform a variety of very helpful functions to make your life easier including data creation, refreshing, and scrambling. Just like every tool, each one will come with strengths and weaknesses and it’s a good idea to evaluate several tools to identify which one will ultimately best meet your particular needs. Just to get your search started, I’ll give a brief overview of my three favorite TDM tools, though this list is by no means inclusive.
- LISA Solutions – What I appreciate most about LISA is that you can import test data from multiple sources such as XML, CSV, log files, and Excel sheets to create a virtual dataset. Once imported, you can segregate or integrate your data to be most useful and to the right size, the data set needed.
- Datprof – Datprof makes it fairly quick and easy to mask sets of production data or otherwise sensitive data to help protect privacy, though it can also create synthetic data.
- CA Datamaker – Very good at generating synthetic data sets in volume and storing them in a test data repository for on-demand use.
Earlier in this article, I stated that QA teams should take more ownership of Test Data Management. I hope that these methods and tools discussed will give you some ideas on how to better enable your test team to be more efficient in testing and enhance the quality of the testing by using a more realistic and robust data set.
Josh Brenneman
Josh Brenneman is the Delivery & Talent Director at tapQA. He has 10+ years of experience delivering value to organizations in the areas of strategic quality, delivery, release management, and testing.
Have a QA question?
Our team would love to help!