Tag Archives: Automation

How Test Data Builders Help You Sleep

This isn’t a full-on sales pitch for data builders. Rather, it is only a testimonial of how they make my life easier.

Some developers have to start with an empty database. Others have some minimally-populated db that they add to on an as-needed basis. Still others can use a scrubbed copy of production data, which is excellent for judging what the execution plans will be.

If you’re in one of those first 2 scenarios, you maybe have already built scripts to create the test data you need. Perhaps you have a generic script to create a single record for a given table and then miscellaneous UPADTEs to make each new row fit what you need. If you have production-grade data, you more likely have SELECT scripts to locate the data you need for a particular test.

Unfortunately, what all of that has is common is that none of those approaches are manageable or consistent enough to support a suite of automated test cases for any development team with more than a couple of people. Let’s tackle this by looking at how data builders can help you out here.

Consistency

This isn’t about the ACID principles, it’s about staying DRY (Don’t Repeat Yourself).

At the lowest level of the stack, the values that can be placed into a test row of data are constrained by what the table allows. Data types, foreign keys, nullability, check constraints, etc. You may have business logic that resides in your procs or application code that adds to what is allowed in. But you need to resign yourself to the fact that the table’s constraints are the final arbiter as to what’s allowed, even if you don’t expect particular data values to end up in there.

What the data builder does for you here is allow you to define what is and isn’t needed for inserting data by creating a single pinch point from which you can react to any changes to the underlying table. Consider the alternative: you have multiple “scaffolding” scripts – probably embedded in various test cases – which would all need to be modified if the table gets a new non-nullable, non-defaulted column. With a data builder in place, you only need to account for the new column there and the rest of your test code can continue on its merry way. The only places where you have to make any additional edits are for test cases that specifically deal with this new column.

Focus

Remember from my intro post on test data builders that you only need to supply values for the columns that are material to the test at hand. Most production INSERT procs still expect you to provide valid data for most of the parameters. They just save you the trouble of having to keep the INSERT statement updated with table changes.

By removing irrelevant parameters from test code, you can immediately see which data elements are relevant to a test method.

Manageability

If you have some pet generator scripts for complex objects or are using “found data” for your tests, creating builders for that same complex data may seem burdensome. It doesn’t have to be.

First, consider that when you are using the other methods of making your test data, you’re still committing to supplying all those values and maintaining all of those INSERTs. Do the same work in a test data builder, and you will have less code to edit: only 1 “script”/builder per table. If you need any easy way to make a complex object, create a wrapper proc that, in one call, produces an entity and all of its requisite “child” data.

Following this approach allows you to ignore any child details that you don’t care about. For example, you want to make test orders that have been paid for with a credit card (instead of a gift card). Each payment method used has its own business rules about refunds. You need this test data so you can later assert that the dollar amount on the order available for refund is valid.

Put a data builder in place for the whole order which only takes in a payment method and an item to purchase. What this upper-level builder will do is call all of the necessary table-level builders to make sure you not only get a correct order header record, but order detail, payment processing, shipping information, etc. that the application would need to have in place for the code under test to fully operate.

With this higher-level builder, you only had to specifiy the details you need. And since it’s a data builder that uses other data builders, all parameters are optional, which allows for maximum reuse. Sure this breaks the single pinch point concept for dealing with structural table changes. But even half-decent code analysis tools can help you out with finding dependency chains when it’s time to refactor.

Predictability

The problem with found data for testing is that it can run out. We run an ecommerce platform and in order to keep our databases as lean as possible, we remove all data older than N days from the front-end databases, leaving the information in an archive area should it be needed.

Sure, with any successful business, there should be enough orders paid by credit card to keep test code happy in between data refreshes to non-prod systems. But what if you have to keep rerunning a test or have a long set of test cases that require data which has 9 criteria elements? Even if you find a handful of these “Chupacabra” rows, you will likely use them up before your next data refresh.

Another problem is if you do most of your sales around a holiday, what are the odds that you have the situation occur in the opposite end of the calendar year? This type of data skew can happen with other dimensions besides time. It could be the data collected by different types of sensors, the country you’re doing business in, etc.

You may have to simulate the result of multiple, asynchronous activities. To do this by triggering each step in an accelerated fashion can be tricky. To do this in support of a db unit test can be close to impossible. This is because a lot of these activities are carried out in application code. In order to go down this road, you’d have to figure out how to call/trigger that application code from T-SQL.

With a test data builder, missing or hard-to-reproduce data isn’t a problem. You know what qualities the test data for these cases need to have and you make it. No more troubleshooting a test failure to only find out that it ran out of data.

Isolation

Not the “I” in ACID. This is more about the “P” in “Pain reliever”. Just kidding. The “P” is for “Parallel execution.”

Whether you use production data or test data generated outside the scope of the test code, it’s still “found data”. The number one approach to speeding up the execution of automated test cases is to run them in parallel. With found data, you now have to worry that 2 tests which modify records will find the same test data and cause a failure for the test case because they both tried to operate on the same records.

With each test case leveraging a test data builder, the data it needs is generated within the “scope” of the test code. That means you don’t have to require that tests execute in serial or worry that parallel execution will see intermittent failures.

Some data is NOT on an island

There are 2 classes of data that don’t work this way: queuing and reporting.

The “get work to do” procs for handling a queuing table usually work off the principle that “I want the next N records WHERE ready = true and unassigned = true” (I’m paraphrasing). That WHERE clause doesn’t usually have anything to do with the primary key values of the rows you just created nor the queue item’s payload that you tailored for the test case.

When you run an integration test, it’ll just pick up the next N rows, regardless of which test case created the data, now all of the custom-crafted data each test case made was picked up by 1 of those executions and the rest are left with an empty result set.

The only real solution is to run those particular tests in serial. Typically, these don’t represent tests which are a large portion of the test suite. Running them serially shouldn’t be a big performance hit. Also, such tests can be included in only nightly runs rather than per-commit runs.

For reporting, the test assertions usually follow a pattern of “when the table contains X, the report should show Y”. If you happen to be able to assert at a granularity you can control, such as reporting on a single customer’s orders, then data builders will work just fine. However, if your reports are aggregating larger sets of data, your only real choice is to have a copy of the database dedicated for report testing. This allows you to have what is mostly a read-only database for report testing and a read-write system for most of the application and stored proc testing. I’m using “report” in a broad sense here, not just SSRS reports.

Edge Cases

Let’s say your application already does ALL of its data access via stored procedures. You may then ask “why wouldn’t I just use my existing INSERT procs to create test data?” Well, for application integration tests, that may be a way to go. The limitation with that approach shows up when you attempt to perform database unit tests or need to assert failures on dirty data.

Common QA practice is to create test cases that ensure fixed bugs remain fixed (no regressions down the line). Let’s say you had a bug which allowed bad data into your table. Now, you need to test the existing procs to know that they respond correctly should they encounter bad data. Sure, you ran a “fix it” script, but can you guarantee that it’ll never happen again? Of course not, even if it’s unlikely. That’s why you run automated tests: to make sure you account for situations that have occurred (99% of the time).

You fixed your INSERT proc by adding some conditional logic to respond to the bad value. You need to make sure that your SELECT and UPDATE procs behave correctly in the face of bad data. The data builder doesn’t have that protection logic. It just creates what you need for the test. Without it, you’re back to using your production INSERT proc, followed by a custom UPDATE meant to dirty up the data. Taking 2 steps instead of 1 just to set up the data you need isn’t the way to go here.

What About EF?

I definitely don’t want to get into the Entity Framework vs. stored proc religious argument here. Yes, Entity Framework can operate as a data builder of sorts because of its ability to read the table metadata and create INSERT methods from that. However, you still have to provide default values for required columns, just as you would with a SQL-based data builder proc. On this point, both options work out to a similar level of effort.

EF doesn’t really help you when creating db unit tests. There’s a better than average chance that if you’re writing db unit tests, you’re implementing the tests in T-SQL, regardless of whether you’re using SSDT SQL Unit tests, tSQLt, T.S.T., or some other test framework. Calling EF methods from T-SQL code is cumbersome just to do something T-SQL can handle well anyway.

Lastly, if you’re working on database team that isn’t as familiar with EF/.Net, the long term success of your test code will be harder to achieve.

(Another) Intro to SQL Test Data Builders

When I started drafing my thoughts with Data Builders, I had a single post in mind. That was good in theory, but reality is that there is much more ground to cover than a single post should hold. Nearly every aspect of this topic has quite a bit of depth that can be explored.

For now, I’ll start with an overview.

Greg Lucas’ blog post is where I was first introduced to relational data builders. He further credits David Laing, but I have to give Mr. Lucas credit because that is where I first encountered the idea, which I greatly appreciate. This first post will look a lot like Mr. Lucas’ post. That’s why it’s “Another” intro. My plan is to expand on the details.

What is a data builder?

Data builders are methods or stored procedures whose sole job is to make sure that test code has appropriate data to use. The key advantage of a data builder is that you can create whatever data state is necessary while only specifying the values you care about. What I will be focusing on are database stored procedures which are capable of creating the database records you need in order for the test code to succeed.

Continue reading

T-SQL Test Automation (Part 4 of 4): Are we there yet?

The series at a glance:

  1. Get back on course
  2. Live in the now, look to the future
  3. Time to focus
  4. Are we there yet?

HELP!! The run-on sentences have been banished, but I’m now trapped in my office by a mixed metaphor generator and a punctuation doubler is guarding the door for it!!

If you have been following along as I published this series, thank you. If you sat down and read the whole thing front to back at once, thank you very much. If you read this and think I didn’t even cover everything I should have, you have my eternal gratitude.

Huh?

If you recall, I started this whole thing with a simple premise: get started. Make a rough plan and get going. It doesn’t matter what part of your situation you tackle first. Just get that boulder rolling (see Newton’s First Law).

We’ll deal with the finish line in a minute.
Continue reading

T-SQL Test Automation (Part 3 of 4): Time to focus

The series at a glance:

  1. Get back on course
  2. Live in the now, look to the future
  3. Time to focus
  4. Are we there yet?

After reading part 1 of this series, you were [hopefully] motivated to step up your testing work. Encouragement, yes!! From part 2, you learned the importance of knowing where you are and where you want to be. This time around, we’re going to illustrate where to spend your time and effort next.

WARNING: I still haven’t located a decent automated run-on sentence extractor!

Get out of the manual mindset

I’ve seen it dozens of times. Developers get so accustomed to their 9-keystroke shortcuts that they don’t bother looking for a better way. How many times have you shoulder-surfed a co-worker and told them something like this:
Continue reading

T-SQL Test Automation (Part 2 of 4): Live in the Now, Look to the Future

The series at a glance:

  1. Get back on course
  2. Live in the now, look to the future
  3. Time to focus
  4. Are we there yet?

In the first post of this series on T-SQL Test Automation, I laid the groundwork of why you should care and some pretty corny clichés as to how to get started.

First, some housekeeping is in order. I’m not going to dwell on the topic of how to test your T-SQL code in this series. Instead, I’m saving that for the future. If you can’t wait, there are several fine resources and frameworks out there.

Back to business. Part 1‘s clichés are gone1 and all we are left with is the stark reality of “the now”. No frilly tools. No push-button test executions. No manager-distracting shiny reports automatically emailed to anyone who cares (and a few who don’t). Welcome to your starting point.
Continue reading

T-SQL Test Automation (part 1 of 4): Get back on course

The series at a glance:

  1. Get back on course
  2. Live in the now, look to the future
  3. Time to focus
  4. Are we there yet?

WARNING: Excessive metaphors and clichés ahead!

Motivation

Sometimes Often in life, the hardest thing to do is to just get started on something. Pay attention to how many of your thoughts begin with “I should…” Yep, same here. So, my most recent “I should” was “I should really use this blog site since I bothered to pay for a domain name.”

And on that theme, I offer you a plan to do the same… for the code in your SQL Server database.
Continue reading