Automate the Generation of Stored Procedures for Your Database

Automate the Generation of Stored Procedures for Your Database


 and  J. Byer Hill
This article assumes you're familiar with T-SQL
Level of Difficulty 1 2 3
Get the sample code for this article.

SUMMARY

Design-time automation makes coding faster and ensures that all the procedures generated use the same naming conventions and structure. In an effort to improve their coding efficiency in a large SQL project, the authors wrote a set of design-time stored procedures that generate run-time stored procedures, and have used them in project after project ever since. Recently, the authors updated their procedures to make use of SQL Server 2000 features, including user-defined functions. This article covers the creation and execution of these dynamic T-SQL scripts to automate the coding of common database stored procedures.


Automate the Generation of Stored Procedures for Your DatabaseContents


S
ome time ago, we began development on a rather large n-tier client/server project. During the initial planning, we decided to require a set methodology for accessing the numerous tables in the database. Four base stored procedures would be used to perform selects, inserts, updates and record deletions on each table. Although the required set of stored procedures would be similar in design, each table's unique column structure would mandate that significant details of each individual procedure would vary, thus making the writing of all the required procedures very tedious indeed. It became quite clear that the stored procedure writing process had to be automated.

 

What started out as an experiment turned into a core set of design-time stored procedures that write the base set of run-time stored procedures for all database tables in any given project. By building and using these design-time procedures, we not only saved ourselves hours of tedium, but saved our client some money as well, and came away with some nifty code that we still use today.

These design-time procedures have been updated to take advantage of some of the features of SQL Server™ 2000, specifically, user-defined functions (UDFs). So now the code is more modular, and we have additional functions available to us for other tasks.

Another bonus that design-time automation offers is the guarantee that the hundreds of procedures generated would be consistently structured and have a standard naming convention. In our case, all run-time procedure names that are generated are formatted as such: prApp_TableName_Task, where Task can be either Select, Insert, Update, or Delete. The procedures for the Customers and Orders tables would look something like this:

prApp_Customers_Delete

prApp_Customers_Insert

prApp_Customers_Select

prApp_Customers_Update

prApp_Orders_Delete

prApp_Orders_Insert

prApp_Orders_Select

prApp_Orders_Update

 

As you can see, this convention adds a great deal of organization to our database, making any given procedure easy to locate and making each procedure name self-describing. Developers will be able to quickly find and create code. And to top it off, future team members on the project will find the code and procedures easy to follow. Of course, if you already have a different naming convention in place, simply change a few lines of code and your conventions can be used instead.

These four design-time procedures are not set in stone, but are meant to act as a template for use in other projects. They are installed in the project database, and if needed, they are modified to suit the needs of the specific application. For example, in several of our applications, we added code to maintain an audit trail record in a separate database every time a record was modified.


A Simple Example

Before we begin, let's take a look at a simple example using the Order_Details table (whose name has been changed to replace a space with an underscore character) from the Northwind database. (Although spaces and other characters are allowed in object names, we recommend that you use regular identifiers for object names to prevent issues when using these automated stored procedures. See "Using Identifiers" in SQL Server Books Online for more information.)

The first task is to run the design-time procedure in order to create the run-time procedure that updates data in the Order_Details table:

EXEC pr__SYS_MakeUpdateRecordProc 'Order_Details'

 

Running the design-time procedure will produce the T-SQL script shown in Figure 1 as output. When this T-SQL script is run, it creates a new Update stored procedure for the Order_Details table. All columns are accounted for as parameters in the new stored procedure code, but notice how the primary key columns (OrderID and ProductID) show up in the Where clause, while the other non-key fields are part of the Update statement's Set clause. Our design-time procedure examined the Order_Details metadata stored in the SQL Server system tables and used this info to create the appropriate T-SQL script as output, which will then create the final run-time procedure when executed.

This execution only produced output and not the new run-time procedure. With a simple modification, however, the design-time procedure can in fact execute the T-SQL script it generated as a final step. To do this, we simply run our design-time procedure, pass the value 1 as a bit flag for the second optional parameter, and execute it again:

EXEC pr__SYS_MakeUpdateRecordProc 'Order_Details', 1

This will not only show the output as before, but also execute the output, thus creating the run-time procedure.

 

Now let's take a look at the design-time procedure code that created this new application-specific run-time stored procedure.

Back to top

SQL Server System Tables and Views

In order to create these design-time stored procedures, we must know how to extract the table's column definition information from the SQL Server system tables and Information Schema views. First, we need to find the columns themselves, and then find out which are the primary key columns, what datatype each column supports, and whether the column allows nulls.

Figure 2 View System Tables
Figure 2  View System Tables

It might be helpful to know that SQL Server Enterprise Manager allows you to view the system tables by changing a property of the server registration, as shown in the dialog in Figure 2. If you right-click the server in Enterprise Manager and then select "Edit SQL Server Registration properties," you will be presented with a dialog. At the bottom of the dialog, you will see a checkbox that's labeled "Show system databases and system objects." Checking that option turns it on to view system objects, or you can choose to turn it off and keep your table list view simpler and easier to read.

Back to top

Parsing a Table's Columns

The syscolumns table provides much of the necessary metadata information, such as the column name, ID, length, and nullability. It is also used in conjunction with the sysindexes table to determine the primary key field(s) of the table. You will also see that the INFORMATION_SCHEMA.COLUMNS view will be used to retrieve the default values of our columns.

Since all procedures are looking for the same metadata information, it would be nice to encapsulate this in a separate piece of code for modularity and maintainability. As we stated before, earlier versions of SQL Server did not have UDFs, which made modularity like this a difficult task. But, with SQL Server 2000 user-defined functions, we decided to take the code a step further and modularize the common features of our four design-time stored procedures. Five new UDFs were created to deal with the system tables and information schema views, encapsulating all the metadata retrieval.

In order to properly create the new run-time stored procedures, we need to know the following metadata column information for the table in question:

  • The name of the column
  • The ID of the column
  • The datatype of the column
  • The maximum length of the column (applicable to character and binary data)
  • The precision of the column—or how many digit places its values have (for decimal and numeric data)
  • The scale of the column—or how many places exist after the decimal point (for decimal and numeric data)
  • Whether the column is nullable
  • Whether the column is part of the primary key
  • Whether the column is an Identity column
  • The default value of the column

 

For the most part, this information comes from the syscolumns table, with a couple of exceptions. The default value is actually retrieved from the INFORMATION_SCHEMA.COLUMNS view. The datatype name is extracted from the systypes table, and a more complex combination of syscolumns, sysindexes, and sysindexkeys is used to determine if a column is part of the primary key. It was so complex, in fact, that we encapsulated this functionality into its own UDF.

Let's take a look at the main function shown in Figure 3, which reveals more of this metadata information. This UDF is not overtly complex. As you can see, most of the metadata information is returned—except for some simple column renaming—without any modification, including column name, colid, length, precision, scale, IsNullable, and type name. As for the rest, a little more work needs to be done for this information. For the primary key metadata, we have created another UDF that determines whether a column is part of the primary key of a table. We will examine both of these additional UDFs shortly.

Let's take a look at the alternate type and the identity status. The eighth bit (128) of the status field of syscolumns indicates whether or not the column is an identity. (This is especially important to know when creating Insert and Update scripts.) Our simple formula consists of performing a logical AND (&) with this value, and then taking that result and wrapping it in the Sign function. If the bit is set, meaning the column is an identity column, c.status & 128 will return a value of 128. Otherwise, a value of 0 will be returned. The Sign function returns a value of 1 for positive values, a negative one for negative values, and a zero for 0 values. So, if the column being evaluated is an identity column, a value of 1 will be returned, otherwise, 0 is returned.

The alternate type is used to indicate whether the datatype requires additional information (length or precision and scale) when being defined. We are categorizing character and binary datatypes as an alternate type of one, decimals and numerics as two, and all other datatypes as zero. This value is used in the stored procedures to determine if length or precision and scale values need to be added to the parameter definitions.

Back to top

Finding Primary Key Columns

As you saw, finding column information is not too difficult. Finding out if a field is part of a primary key, however, requires a little more effort. There is a list of fields that can be retrieved, but these fields are found in a combination of the syscolumns, sysindexes, and sysindexkeys tables and must be compared to our requested column (passed in to the UDF in the @sColumnName parameter). Thus, the task of finding the primary key is easier to achieve in a separate user-defined function because we can encapsulate this work into a single function call.

Let's walk through this function and see what's really going on:

CREATE FUNCTION dbo.fnIsColumnPrimaryKey

(@sTableName varchar(128), @sColumnName varchar(128))

RETURNS bit

AS

BEGIN

DECLARE    @nTableID int,

@nIndexID int,

@i int

SET @nTableID = OBJECT_ID(@sTableName)

The function takes two parameters, the table name and the column name, and returns a bit indicating if the specified column is part of a primary key for that table. We then need to declare some variables that will be used in the procedure and assign some initial values. Now comes the fun part: finding the primary key information. We begin by retrieving the Index ID for the primary key index of the table, as shown in the following code:
SELECT @nIndexID = indid

FROM      sysindexes

WHERE  id = @nTableID

AND      indid BETWEEN 1 And 254

AND      (status & 2048) = 2048

ORDER BY indid

IF (@nIndexID  Is Null)

RETURN 0

 

We now assign the Index ID of the table's primary key index to the variable @nIndexID. The twelfth bit (2048) of the status column indicates if the index is a primary key index. If there is no primary key index, no records will be returned, leaving @nIndexID with a null value. So, if @nIndexID contains a null value, we leave the function, returning a 0 value. In other words, since there is no primary key index, the column cannot be part of a primary key. Now we need to check the requested column (@sColumnName) against the list of columns in the primary key index.

    IF @ColumnName IN

(SELECT sc.[name]

FROM       sysindexkeys sik

INNER JOIN syscolumns sc ON sik.id = sc.id AND sik.colid =

sc.colid

WHERE      sik.id = @nTableID

AND        sik.indid = @nIndexID        )

BEGIN

RETURN 1

END

RETURN 0

END

 

Using the IndexID we retrieved earlier, we select the name of the column(s) from the join of syscolumns and sysindexkeys table. These tables are joined using the column ID and the object ID. The Where clause sets the criteria so that we only select columns from indexes related to our requested table (sik.id = @nTableID), and only for the primary key index (sik.indid = @nIndexID). If @sColumnName is in this retrieved list of columns, we return a value of 1; otherwise, we return a value of 0, which indicates that a match was not found.

Back to top

Column Default Values

When a record is inserted into a table, if no value is supplied for a particular column and that column has a default value, the default is used to populate the column's data. Since our newly produced table-insert procedure has a parameter for all possible columns that can be inserted, and since a variable must contain a value, even if it is a null value, the table's defaults won't be used. Essentially, by explicitly supplying every column with a value (even null), we override the defaults of the columns. In order to counteract this characteristic of our created procedures, we need to be able to supply the defaults when inserting the data. We will see how the default values are used in our automated procedures a bit later in this article, but for now, let's examine how we get those default values in the first place.

The UDF that we will use simply references the built-in INFORMATION_SCHEMA.COLUMNS view, which supplies the default value of a column. It is easier to use this view to retrieve the default value than to use the sysconstraints system table. The next UDF will additionally simplify this process by wrapping the logic of finding the default into a simple function call:

CREATE FUNCTION dbo.fnColumnDefault(@sTableName varchar(128),

@sColumnName varchar(128))

RETURNS varchar(4000)

AS

BEGIN

DECLARE @sDefaultValue varchar(4000)

SELECT  @sDefaultValue = dbo.fnCleanDefaultValue(COLUMN_DEFAULT)

FROM      INFORMATION_SCHEMA.COLUMNS

WHERE     TABLE_NAME = @sTableName

AND       COLUMN_NAME = @sColumnName

RETURN  @sDefaultValue

END

 

Column default values are stored wrapped in a pair of parentheses, which our implementation does not need. So as you can see, we pass the field COLUMN_DEFAULT to another function, fnCleanDefaultValue, which strips the parentheses off, and then returns the actual default.

For example, if a column named nQty has a default of 1, the COLUMN_DEFAULT value would actually contain (1). For a default of "Enter Text Here", we would get ("Enter Text Here"). Here is the source for that UDF:

CREATE FUNCTION dbo.fnCleanDefaultValue(@sDefaultValue varchar(4000))

RETURNS varchar(4000)

AS

BEGIN

RETURN SubString(@sDefaultValue, 2, DataLength(@sDefaultValue)-2)

END

Now we have all the metadata information that we will need to actually create our automated procedures.

 

Back to top

Dynamically Execute T-SQL

Dynamic T-SQL execution is the other essential feature of our stored procedures as it allows you to write a generic T-SQL script that in turn writes a T-SQL script. It is the T-SQL EXECUTE statement that allows the generic T-SQL script to actually execute its specific output and create the run time stored procedures to be used by the application.

EXECUTE, or EXEC, actually has two abilities: it can execute existing stored procedures and dynamically execute a SQL statement passed in as a string. It's the latter ability that we will be using in conjunction with the column metadata we extracted earlier to automatically create the stored procedures. A simplified view of the process would be filling a large varchar variable with the stored procedure code (using the metadata) required to create the stored procedure and then dynamically executing the contents of the varchar variable once complete, creating the new stored procedure.

Let's start by examining a simple example of dynamic T-SQL:

CREATE PROC prGetAuthor

@au_id char(11)

AS

DECLARE @sExec varchar(8000)

SET @sExec = 'SELECT * FROM authors WHERE au_id = ''' + @au_id + ''''

EXEC (@sExec)

In this example, we pass in an author's ID and concatenate it with a SELECT statement that retrieves an author from the author's table.

 

Let's say we call the procedure like this:

EXEC prGetAuthor '123-45-6789'

The prGetAuthor procedure would create a SQL statement as shown in this line:
SELECT * FROM authors WHERE au_id = '123-45-6789'

This statement would then be executed in the EXEC statement and would return the author with the ID of 123-45-6789. As you are about to see shortly, our design-time procedures take this feature to a much higher level.

 

You probably already know this, but we should note that this is not a recommended use of dynamic T-SQL. Anytime dynamic T-SQL code is accessible to the outside world, the possibility of a SQL injection attack exists. We have made it a practice to use dynamic T-SQL for administrative and task purposes only, and never expose this functionality in any procedures that are accessible by anyone other than the system or an administrator.

Back to top

Creating Stored Procedures

The first step in creating these design-time procedures is fairly standard. The procedure is defined, variables are declared, and some of them are initialized. A quick glance at this code reveals no surprises, but rather sets up the rest of the procedure. We do create two special string variables, one of which holds a tab character while the other contains a carriage return and line feed. These could also be set up as UDFs but we decided not to so that we could leave it as an exercise for the reader. These are used to help with the formatting of the code output. Let's take a look at the beginning of this procedure, which is shown in Figure 4.

Again, no great T-SQL revelations will be found here. Before we do anything else, we check to see if our table has a primary key value. This prevents our code from creating potentially damaging run-time procedures. Then we simply set up our variables and some default values. The next part of the procedure sets up the DROP statement for the new proc, just in case our new procedure already exists, creates some comments, and creates the actual procedure definition (the first few lines of Figure 1). You can modify this code to create a new run-time stored procedure only if it does not yet exist (and to do nothing if it already does). This new logic could also be controlled by the addition of a third optional parameter, @bIfExistsDoNothing. We will leave this as a simple exercise for the reader.

Our next code snippet begins the process of creating the dynamic T-SQL. Code is added to drop an existing procedure and for the new procedure definition (see Figure 5). Notice how we use the second (optional) parameter, @bExecute, to determine if we are going to actually run the code. In our automated procedure's definition, this parameter is optional, and defaults to a value of 0, meaning we would not actually execute the code.

Next comes an interesting technique. We use the fnTableColumnInfo UDF as the source for a cursor, knowing that fnTableColumnInfo is a table-valued function. Instead of having to write the more complex T-SQL that this function uses in each of the four automated procedures, we now only have to reference the UDF in the cursor's declaration. After declaring the cursor, we then open it and fetch the first record into a set of variables that hold the metadata information, which we will use in order to create our new procedure (see Figure 6).

Of course, we will establish a loop, using the WHILE statement, that will continue as long as we have a valid row to retrieve (@@FETCH_STATUS = 0). Now we are ready to parse the column info and create the key segments for our new stored proc.

In the next code sample, we begin to loop through the cursor and use the column metadata to create the code. You will notice three key variables being modified: @sKeyFields, @sSetClause, and @sWhereClause. The first of these is used to create the parameter list for the stored procedure (inside the CREATE PRDC block of Figure 1). The second is used for the SET clause of the UPDATE statement in Figure 1. The last variable is used for the WHERE clause near the end of Figure 1. Now we should examine the first part of this code (see Figure 7).

Figure 7 contains the code that creates the parameter list for our new stored procedure. The first IF statement checks to see if we have already appended data into our variable. If we have, we add a comma and a carriage return/line feed, so that we end each parameter in the parameter list correctly. If we did not do this check, and instead just appended the comma at the end of this segment, we would end up with one too many commas. By appending the comma before adding the next column, we prevent this problem.

Next, we append a tab character, an at symbol (@), the name of the column, a space, and the type name of the column—a simple concatenation of characters and metadata information. Then we have to see if we need additional information about the datatype. We check to see if the type requires information on precision, scale, or length. If any is required, we additionally append the value(s) wrapped in parentheses (as required by T-SQL syntax).

Finally, if the column is not an identity column, and the column accepts null values or is a timestamp (which is not allowed to be updated because it is updated directly), then we add "= NULL" to the parameter definition. For example, the columns from the discounts table in the pubs database would look like this:

discounttype varchar(40),

stor_id char(4) = NULL,

lowqty smallint = NULL,

highqty smallint = NULL,

discount decimal(4, 2)

 

Note that the discounts table has no primary key and would not allow proper automated code generation. These automated procedures rely on a primary key to determine how data will be updated. The automated procedure code could be modified to use all columns in the new procedure's WHERE clause if no primary key existed, or to look for a unique key and use its columns for the WHERE clause. In other words, tables should have primary keys if at all possible—it's a basic principle of database design.

Next, take a look at the code that creates the SET statement clause for our new procedure's UPDATE statement (see Figure 8). Notice how we only process columns that are not part of the primary key. Again, if you want to be able to update all columns, including those in the primary key, you could simply remove the IF statement. (Toggling this If logic on and off can be an added feature controlled by yet another optional parameter.) Like the last segment you saw, we append a comma to our variable, as needed. In this case, however, if the variable has no data (meaning we haven't added any columns yet), we put the SET clause in our variable to start it off.

Next, we append a tab character, the column name to be updated, and an equals sign (=). For our Order_Details table, we would end up with this:

SET    UnitPrice = @UnitPrice,

Quantity = @Quantity,

Discount = @Discount

 

Next, we have the code that creates the WHERE clause for the new procedure. You will notice the code segment starts with an ELSE statement. This is the else condition of the primary key check—meaning we only run this code if the column is part of the primary key (see Figure 9).

Again, we either start the variable with the WHERE clause, or we append an AND clause, depending on whether or not it is the first item in the WHERE clause. Next, we append a tab character, the column name, the string " = @" and again, the column name. The results for our Order_Details example would look like this:

WHERE     OrderID = @OrderID

AND     ProductID = @ProductID

 

Before ending the WHILE loop, we need to fetch the next row from the cursor and place the metadata values into our variables again. Once the loop completes, we close and de-allocate the cursor. Now we have all the information needed to print and optionally create our new run-time procedure (see Figure 10).

Finally, the design-time stored procedure will print the T-SQL it created for the new run-time stored procedure by first adding a carriage return/line feed to the SET clause (for formatting purposes only). Next, we add our key fields (our procedure's parameters) and the keyword AS (required for a stored procedure's definition). Then, the UPDATE statement is appended, along with the name of the table that is to be updated. Finally, we append the SET clause variable and the WHERE clause variable, completing our procedure definition. As noted, the @sProcText variable that contains the T-SQL for the new run-time stored procedure can optionally be executed, and if executed, the new run-time stored procedure will be added to the database schema.

Back to top

Conclusion

This is only one of the four automated stored procedures that we developed. Of course, each procedure will vary as required. For example, the design-time procedure that creates the run-time delete procedures only uses the primary key fields of each table, since that is all that is required to delete a row in any table. The source for all the user-defined functions and stored procedures is available for download at the link at the top of this article.

Many other features can be added to these procedures, some of which we mentioned already, such as the ability to accommodate bracketed object names, verification of object existence, creation of an audit trail, and use of an alter statement if the generated procedure already exists (great for maintaining security of the generated procedures and incorporating XML). You can also create additional settings in a table and use those to assist in the code generation. In other words, these procedures can be the starting point for other automated code generation tasks, or they may suit your needs as is. In either case, this code should help you save a lot of time and effort, and may even help you explore other interesting T-SQL techniques.

Back to top

Code download available at: StoredProcedures.exe (108KB)

你可能感兴趣的:(procedure)