When I started drafing my thoughts with Data Builders, I had a single post in mind. That was good in theory, but reality is that there is much more ground to cover than a single post should hold. Nearly every aspect of this topic has quite a bit of depth that can be explored.
For now, I’ll start with an overview.
Greg Lucas’ blog post is where I was first introduced to relational data builders. He further credits David Laing, but I have to give Mr. Lucas credit because that is where I first encountered the idea, which I greatly appreciate. This first post will look a lot like Mr. Lucas’ post. That’s why it’s “Another” intro. My plan is to expand on the details.
What is a data builder?
Data builders are methods or stored procedures whose sole job is to make sure that test code has appropriate data to use. The key advantage of a data builder is that you can create whatever data state is necessary while only specifying the values you care about. What I will be focusing on are database stored procedures which are capable of creating the database records you need in order for the test code to succeed.
The primary audience I’m writing this for is any developer who needs data to exist in a relational database in order to test the code they’ve written. That could be a database developer who needs to unit test a stored procedure they’ve written. It could be an application developer who needs the database to have a specific set of data in order to run an integration test. No matter the exact use case, the common thread is that you need the database to hold a known state related to whatever entities you’re running tests on.
I write about T-SQL because it’s where my pay check comes from and because PL/SQL drives me batty. For what I’m going to cover, you only need a relational database that supports stored procedures in the database engine. The concepts still apply if you’re using a non-relational platform such as a document database or Key-Value store. However, if you’re using those sorts of platforms, it’s much more likely that all of your entities’ rules are contained in application code and all you’re doing with the data store is calling “.Save()” and “.Get()”. In that case, the Builder Pattern for application code is probably more relevant for you and you probably won’t need independent data builders.
What does it look like?
For now, I’ll wrap up with a basic code outline. There are some details going on here that I’ll call out in later posts. I’m using the tried and true customer/order database for examples because it’s something most people undersand and it happens to be what I actually work on day-to-day.
Let’s say you need to create a customer in order to run some tests. Your table looks like this:
CREATE TABLE Sales.dbo.Customer ( CustomerCode INT NOT NULL IDENTITY(1,1) , FirstName NVARCHAR(30) NULL , LastName NVARCHAR(40) NULL , EmailAddress VARCHAR(100) NOT NULL , AddressLine1 NVARCHAR(50) NULL , AddressLine2 NVARCHAR(50) NULL , City NVARCHAR(50) NULL , State NVARCHAR(3) NULL , PostalCode NVARCHAR(12) NULL , PrimaryPhone VARCHAR(20) NULL , SecondaryPhone VARCHAR(20) NULL , CreatedTS DATETIME2(4) NOT NULL CONSTRAINT DEF_dbo_Customer_CreatedTS DEFAULT (SYSDATETIME()) , CreatedBy VARCHAR(30) NOT NULL CONSTRAINT DEF_dbo_Customer_CreatedBy DEFAULT (SUSER_SNAME()) , ModifiedTS DATETIME2(4) NULL , ModifiedBy VARCHAR(30) NULL , CONSTRAINT PK_Customer_CustomerCode PRIMARY KEY CLUSTERED (CustomerCode) ); GO CREATE UNIQUE NONCLUSTERED INDEX IX_U_Customer_EmailAddress ON Sales.dbo.Customer (EmailAddress); GO
I know it’s not the most ideal design if you’re a purist, but I’m intentionally keeping this example simple.
The data builder for such a table could look like this:
CREATE PROCEDURE Builder.Sales_dbo_Customer @FirstName NVARCHAR(30) = NULL , @LastName NVARCHAR(40) = NULL , @EmailAddress VARCHAR(100) = NULL , @AddressLine1 NVARCHAR(50) = NULL , @AddressLine2 NVARCHAR(50) = NULL , @City NVARCHAR(50) = NULL , @State NVARCHAR(3) = NULL , @PostalCode NVARCHAR(12) = NULL , @PrimaryPhone VARCHAR(20) = NULL , @SecondaryPhone VARCHAR(20) = NULL , @CreatedTS DATETIME2(4) = NULL , @CreatedBy VARCHAR(30) = NULL , @ModifiedTS DATETIME2(4) = NULL , @ModifiedBy VARCHAR(30) = NULL AS BEGIN; SELECT @CreatedTS = COALESCE(@CreatedTS, SYSDATETIME()) , @CreatedBy = COALESCE(@CreatedBy, OBJECT_NAME(@@PROCID)) , @EmailAddress = COALESCE( @EmailAddress, 'user' + REPLACE( REPLACE( CONVERT(VARCHAR(30),SYSDATETIME(), 121) ,':','') ,' ','') + '@fakeemail.org' ) ; DECLARE @ReturnValues TABLE (CustomerCode INT NOT NULL); INSERT INTO Sales.dbo.Customer ( FirstName , LastName , EmailAddress , AddressLine1 , AddressLine2 , City , [State] , PostalCode , PrimaryPhone , SecondaryPhone , CreatedTS , CreatedBy , ModifiedTS , ModifiedBy ) OUTPUT INSERTED.CustomerCode INTO @ReturnValues (CustomerCode) SELECT @FirstName , @LastName , @EmailAddress , @AddressLine1 , @AddressLine2 , @City , @State , @PostalCode , @PrimaryPhone , @SecondaryPhone , @CreatedTS , @CreatedBy , @ModifiedTS , @ModifiedBy ; SELECT CustomerCode FROM @ReturnValues; END; GO
The key takeaways are:
- No input parameters are required in order for a record to be made. Any required columns receive a value, whether the caller provides a value or not.
- The identity column value is returned as a data set rather than as an output parameter.
- The procedure only creates one record at a time. If we need to create multiple rows, we could create a multi-row wrapper proc or switch to an implementation that uses user-defined table types.
- In the procs I actually use, I also use TRY…CATCH error handling. It’s left out here for brevity’s sake.
- The auto-generation of the email address could also be wrapped into a function call if we needed to re-use that expression.
- The use of COALESCE is to make it easier to maintain should we need to consider more than 2 options, which would be the limit of ISNULL without nesting them.
- This is shorter than Greg Lucas’ code example because he includes a conditional operator to handle both tSQLt (FakeTable) and non-tSQLt use cases.
Ad hoc usage
To use this proc, you could use any of the following calls. With solely calling the proc and not saving the output value, this is what you’d use if you only need to investigate a bug.
EXECUTE Sales.Builder.Sales_dbo_Customer; EXECUTE Sales.Builder.Sales_dbo_Customer @State = 'WY' ; EXECUTE Sales.Builder.Sales_dbo_Customer @EmailAddress = 'firstname.lastname@example.org' , @City = 'Buffalo' , @State = 'NY' ; EXECUTE Sales.Builder.Sales_dbo_Customer @PrimaryPhone = '310-123-4567' ;
Notice how that is much simpler than hand-crafting inserts each time you need to deal with a table. Because the required elements are handled in the proc body, there is minimal visual noise in the calling code. Only the attributes you care about for your test need to be specified.
Also, because this is a stored procedure, you can called it from anywhere: T-SQL, C#, Powershell, etc. As you open up more languages & platforms from which you can use this functionality, you greatly increase the value of the data builders. This is because you have a single pinch point to manage when your table changes. I’ll go into more depth on this later.
In the upcoming posts on data builders, I will go into more detail on these topics:
- The pros & cons of using data builders.
- Examples of calling data builders from T-SQL unit tests.
- Detailed findings on implementation details, including code consistency, refactoring, naming conventions, etc.
- Alternative options for creating test data.
Thank you for reading this far. I greatly appreciate any ideas or comments you have, especially if you want more information on a specific aspect of data builders for SQL databases.