本文转自:http://msdn.microsoft.com/en-us/library/cc895212.aspx
With the 2008 release, SQL Server Integration Services (SSIS) continues its advance in the enterprise data integration arena. Integration Services offers an entire architecture that combines the required elements for building solutions that provide high scalability and performance.
A well-architected data extraction, transformation and load (ETL) system should be able to respond to changes in the environment or to other external factors, ideally without editing the source code. Typical examples of these changing factors are:
Unfortunately, many times data integration practitioners fail to bring such flexibility into their solutions, affecting the reliability and the cost of maintenance and deployment of the system.
The good news is that SSIS comes with a range of options that help you when building solutions that respond favorably to these scenarios. These options are:
In essence, these three methods let you modify the values of package properties (like connection strings, variable values, network drive paths, etc) each time you run the package, without the need to edit the package in Business Intelligence Development Studio (BIDS). Since the package configuration is the most versatile, but perhaps the most complex of these methods, the remainder of this article focuses on explaining the main concepts and considerations for its implementation.
When you add configurations to a package, you are basically exposing package properties and allowing them to be updated with a new value that comes from a file, table, environment variable, registry key or parent package variable. Package configurations are disabled by default, and have to be enabled on each package in which you want to use them. You can enable package configurations by choosing the Package Configurations option on the SSIS menu as shown in Figure 1.
Figure 1. The SSIS menu and the Package Configurations option.
After the Package Configuration Organizer is visible, you have to check the box next to the Enable Package Configurations option. See Figure 2 for more details.
Figure 2. The Package Configurations Organizer.
From the Package Configurations Organizer, you can launch the Package Configuration Wizard to create and edit configurations; you can also remove and set the order in which the configurations should be applied. The functionality of the Package Configuration Wizard is covered in the following section.
Package configurations are available in Integration Services in both SQL Server 2005 and 2008. While they offer identical options, they differ in the order in which configurations are applied to the package at execution time.
In SQL Server 2005 Integration Services, all package configurations, except Parent Package Variable configurations, are applied before applying the options specified with the DTExec command. Parent Package Variables are applied after applying DTExec options.
In SQL Server 2008 Integration Services, package configurations are applied twice, before and after applying the options of the DTExec command prompt utility. This should be seen as an improvement, since you can now use the /SET, /CONF or /CONN options of the DTExec utility to alter the original definition of the configurations. For example, you can now use the /CONN option to alter the connection manager being used in SQL Server configurations - something that was not possible with the 2005 version.
Package configurations can fail when they are being applied.When that happens warning messages are generated, the values available inside of the package (design time values) are used, and the package is executed. Including logging capabilities in your packages is a good way to capture failures in package configurations.
You can save some development time by using package templates. If you anticipate creating a large number of packages that use a common set of package configurations, you can simply include the required package configurations in a template, and then use the template when creating new packages.
The data flow task in SSIS does not allow changes to the pipeline structure at run time (number of columns, column names, and data types), and package configurations should not be used to attempt such changes in dataflow pipeline metadata, since it would cause validation errors in the package.
There are five different types of configurations you can use in SSIS, and while all of them serve the same purpose of updating the value of package properties, their actual behavior and implementation differ from each other. The package configuration types are:
Each configuration type, except for the Environment variable, provides a direct and indirect method to provide the location of the configuration information.
With this type of configuration, the configuration value and the path of the property being configured are saved in a XML file. Here is a sample of a very simple XML configuration file:
<?xml version="1.0" ?> - <DTSConfiguration> - <DTSConfigurationHeading> <DTSConfigurationFileInfo GeneratedBy="MARINER\rsalas" GeneratedFromPackageName="Test XML Config - MSDN Article" GeneratedFromPackageID="{E108046D-9862-443B-B98A-4DCC342AE239}" GeneratedDate="6/29/2008 11:30:12 AM" /> </DTSConfigurationHeading> - <Configuration ConfiguredType="Property" Path="\Package.Variables[User::ConfigTarget].Properties[Value]" ValueType="String"> <ConfiguredValue>Run time Value from config file - direct Method</ConfiguredValue> </Configuration> - </DTSConfiguration>
An XML configuration file has two parts. The header contains metadata about the file itself, like creator, the name and ID of the package that was used when creating the file, and the creation date and time. The configuration section is where the path to the properties being updated and the configuration values to be used are stored. In the example above, there is only one configuration entry, whose Path attribute points to the Value property of a variable declared at the package level called [User:: ConfigTarget]:
Path="\Package.Variables[User::ConfigTarget].Properties[Value]"
And the value of the variable will be updated using the string contained in the ConfiguredValue element:
<ConfiguredValue>Run time Value from config file - direct Method</ConfiguredValue>
The good news is that you do not need to create the XML file manually, since the Package Configuration Wizard can create it for you.
Let’s walk through an example to demonstrate how you can use package configurations to change the value of variable.
First, create a new package, add a variable of string type called ConfigTargetObject and assign the string “Development time Value” as its value as shown in Figure 3.
Figure 3. Creating a new package variable.
Then, enable package configurations by going to the SSIS menu, and choosing the Package Configurations option. Then check the Enable Package Configurations option after the Package Configurations Organizer opens, and click the Add button to open the Package Configuration Wizard as shown in figure 4.
Figure 4. The Select Configuration Type page of the Package Configuration Wizard.
On the first page of the wizard, you can choose the configuration type, which in this case is an XML configuration file. Then you have two options for providing the location of the configuration file.
For this example we have chosen the direct method.
In the next page of the wizard, look for the variable name we created at the beginning of this example, and check the box next to the Value property as illustrated in Figure 5.
Figure 5. The Select Properties to Export page of the Package Configuration Wizard.
On this screen, you have the ability to select multiple properties. This is because XML configuration files can contain multiple configuration entries. Click the next button one more time to see a summary of the configuration entry and provide a name as in Figure 6.
Figure 6. The Completing the Wizard page of the Package Configuration Wizard.
When you press the Finish button, the configuration file is created and you should be able to see a new entry in the package configuration organizer as showed in Figure 7. Notice that the file is created by exporting the values inside of the package at the time the wizard starts. This means that you still have to edit the XML file and adjust the configuration values according to your needs. In this example, we decide to provide “Run-time value” as new value for the variable being configured. For that we open the file in a text editor and change the line:
<ConfiguredValue>Development-time Value</ConfiguredValue>
To:
<ConfiguredValue> Run-time value from config file</ConfiguredValue>
Figure 7. The Package Configurations Organizer after creating an XML configuration file.
Now that you have seen how to create an XML file configuration, let’s list some important aspects that will help you in getting the most out of this type of configurations:
Figure 8. The prompt seen when reusing an existing configuration file.
With this type of package configuration, you have to create an environment variable for each package property you intend to update, and place the configuration value as the value of the environment variable. As you can see in Figure 9, the indirect method is not available when use this type of configuration; which it would not make too much sense as the indirect method is based on an environment variables.
Figure 9. Creating a new Environment variable configuration.
In the first page of the wizard, you have to choose the environment variable to be used from a drop down list. Then, the next page, shown in Figure 10, lets you choose which object property is going to be affected by the configuration you are creating. Notice that selecting multiple properties is disallowed for this type of configuration.
Figure 10. Setting a single value for an environment variable configuration.
Now, let’s review some considerations you should keep in mind when using this type of package configurations:
This configuration type lets you store configuration values in Windows registry entries in a similar fashion than environment variable configurations do. Figure 11 shows the first page of the configuration wizard when you select registry entry as the type to be used.
Figure 11. Creating a new Registry entry configuration.
After you select a registry key configuration type from the dropdown list, you have to choose the method to be used. The first option is the direct method, where the wizard expects a valid registry key name that exists under the Windows registry HKEY_CURRENT_USER key. The second option is the indirect method, where you provide the name of an environment variable that in turns contains the registry key name to be used by the configuration. The indirect method gives you the flexibility to change the name of the registry key or to pointto a different one by updating the environment variable value.
Let’s see a couple of examples of the value expected by the package configuration wizard in the registry entry field. If you want to use a registry key that exists directly under HKEY_CURRENT_USER, as in Figure 12, the expected value is:
SSISPkgConfig
Figure 12. A sample registry entry configuration.
If you create a registry key to be used by the configuration that is not directly under HKEY_CURRENT_USER key, as shown in Figure 13, then the wizard expects this value:
SSISPkgConfig\config1
Figure 13. A sample of a nested registry entry configuration.
In the next page of the wizard, select the property object you want to update trough the configuration, as shown in figure 14.
Figure 14. Setting a single value for a registry entry configuration.
There are a few other things you need to keep in mind when using this type of configurations:
When you execute a package (the child) from another package (the parent) via the Execute Package task, you can use Parent Package Variable configurations in the child package to pass variable values from the parent.
Regardless of its name, this configuration type has to be set up in the child package. In the Package Configuration Wizard, in the child package, you have to specify the name of the variable (that exists in the parent) that holds the desired configuration value, as shown in Figure 15.
Figure 15. Creating a new Parent package variable configuration.
Notice that the child package is unaware of the existence of the parent package, and the name of the variable that you enter is not validated when you create the configuration. When using the direct method, you have to type the variable name exactly as it appears in the parent package. Alternatively, you can select an environment variable that contains the name of the parent package variable, thus adding the flexibility the indirect method offers.
The next page of the wizard allows you to select the property to be updated, in the same way as when using registry entry or environment variable configurations.
Finally, let’s review some considerations and facts that are relevant when working with parent package variable configurations:
This configuration type offers almost the same level of flexibility and functionality as XML configuration files, with the difference that configuration information is stored in a SQL Server table. The table can be created in any database that is accessible by the package at execution time. You can use the Package Configuration Wizard to create the table. This is the default structure of the table:
CREATE TABLE [dbo].[SSIS Configurations] ( ConfigurationFilter NVARCHAR(255) NOT NULL, ConfiguredValue NVARCHAR(255) NULL, PackagePath NVARCHAR(255) NOT NULL, ConfiguredValueType NVARCHAR(20) NOT NULL )
These fields are used as follows:
When you create SQL Server package configurations, you first have to choose which method you would use to provide the connection information to the configuration table, as the Figure 16 shows.
Figure 16. Creating a new SQL Server configuration.
With the direct method, the connection information, configuration table and filter are stored inside of the package. The indirect method instead allows storing that information in an environment variable. Notice that both methods use an SSIS connection manager, and its connection string is hard-coded inside of the package. Therefore, if you use this configuration type, it is a good practice to ensure that the connection string in this connection manager can be updated from an external source. A common approach is to use a separate configuration (XML, Registry Key or environment variable) to update the connection manager when required.
The next page in the configuration wizard lets you choose the set of properties to be targeted by the configuration being created. As you can see in figure 17, selecting multiple properties is allowed.
Figure 17. Setting multiple values for SQL Server configurations.
Now, let’s go through some important considerations that will help you to understand this configuration type better, and to avoid common implementation issues:
“ConfigurationManagerName”;”Schema.ConfigurationTableName”;’ConfigurationFilter”
As with the direct method, you have to account for extra logic if you need to modify the connection string inside of “ConfigurationManagerName”
Package configuration is the natural way to parameterize Integration Services packages and to put your ETL solution in a better position to seamlessly respond to possible changes in the environment. With five types and two methods available, package configuration is a sophisticated mechanism that can be combined in a number of ways, and the time invested in understanding its behavior and the options available is well worth it.
About the author. Rafael Salas is a Senior Consultant at Mariner, a BI focus consulting firm, where he specializes in helping organizations to improve performance through Business Intelligence and Data Warehousing solutions. He has been a SQL Server evangelist since he started using the 2005 CTP. He is a SQL Server MVP, MCTS, and an active member of the user communities, where he provides guidance on the use of the SQL Server tools.