10.还有的就是比如 文件加载时的几种方式,SSIS, Buik Insert 等等等等.
1.变量如何使用
2.ETL架构如何搭建
3.出错处理
4.字符转换处理。
5.ETL怎么跟MDM,DQS联动
6.SSAS怎么通过SSIS来运行处理。
Senior Level:
ETL性能最优化的架构,哪些component和task要避免使用,用哪些替代方法在哪些情境下可以提升ETL性能。
有没有写过数据清理软件,具体描述,详细算法等。
12版本和08版本的区别
动态参数化原型链接不同
优化问题,最大跑数据量和时间多少
哪个组件对性能影响比较大
Question 1 - True or False - Using a checkpoint file in SSIS is just like issuing the CHECKPOINT command against the relational engine. It commits all of the data to the database.
False. SSIS provides a Checkpoint capability which allows a package to restart at the point of failure.
Question 2 - Can you explain the what the Import\Export tool does and the basic steps in the wizard?
The Import\Export tool is accessible via BIDS or executing the dtswizard command.
The tool identifies a data source and a destination to move data either within 1 database, between instances or even from a database to a file (or vice versa).
Question 3 - What are the command line tools to execute SQL Server Integration Services packages?
DTSEXECUI - When this command line tool is run a user interface is loaded in order to configure each of the applicable parameters to execute an SSIS package.
DTEXEC - This is a pure command line tool where all of the needed switches must be passed into the command for successful execution of the SSIS package.
Question 4 - Can you explain the SQL Server Integration Services functionality in Management Studio?
You have the ability to do the following:
Login to the SQL Server Integration Services instance
View the SSIS log
View the packages that are currently running on that instance
Browse the packages stored in MSDB or the file system
Import or export packages
Delete packages
Run packages
Question 5 - Can you name some of the core SSIS components in the Business Intelligence Development Studio you work with on a regular basis when building an SSIS package?
Connection Managers
Control Flow
Data Flow
Event Handlers
Variables window
Toolbox window
Output window
Logging
Package Configurations
Question Difficulty = Moderate
Question 1 - True or False: SSIS has a default means to log all records updated, deleted or inserted on a per table basis.
False, but a custom solution can be built to meet these needs.
Question 2 - What is a breakpoint in SSIS? How is it setup? How do you disable it?
A breakpoint is a stopping point in the code. The breakpoint can give the Developer\DBA an opportunity to review the status of the data, variables and the overall status of the SSIS package.
10 unique conditions exist for each breakpoint.
Breakpoints are setup in BIDS. In BIDS, navigate to the control flow interface. Right click on the object where you want to set the breakpoint and select the 'Edit Breakpoints...' option.
Question 3 - Can you name 5 or more of the native SSIS connection managers?
OLEDB connection - Used to connect to any data source requiring an OLEDB connection (i.e., SQL Server 2000)
Flat file connection - Used to make a connection to a single file in the File System. Required for reading information from a File System flat file
ADO.Net connection - Uses the .Net Provider to make a connection to SQL Server 2005 or other connection exposed through managed code (like C#) in a custom task
Analysis Services connection - Used to make a connection to an Analysis Services database or project. Required for the Analysis Services DDL Task and Analysis Services Processing Task
File connection - Used to reference a file or folder. The options are to either use or create a file or folder
Excel
FTP
HTTP
MSMQ
SMO
SMTP
SQLMobile
WMI
Question 4 - How do you eliminate quotes from being uploaded from a flat file to SQL Server?
In the SSIS package on the Flat File Connection Manager Editor, enter quotes into the Text qualifier field then preview the data to ensure the quotes are not included.
Additional information: How to strip out double quotes from an import file in SQL Server Integration Services
Question 5 - Can you name 5 or more of the main SSIS tool box widgets and their functionality?
For Loop Container
Foreach Loop Container
Sequence Container
ActiveX Script Task
Analysis Services Execute DDL Task
Analysis Services Processing Task
Bulk Insert Task
Data Flow Task
Data Mining Query Task
Execute DTS 2000 Package Task
Execute Package Task
Execute Process Task
Execute SQL Task
etc.
Question Difficulty = Difficult
Question 1 - Can you explain one approach to deploy an SSIS package?
One option is to build a deployment manifest file in BIDS, then copy the directory to the applicable SQL Server then work through the steps of the package installation wizard
A second option is using the dtutil utility to copy, paste, rename, delete an SSIS Package
A third option is to login to SQL Server Integration Services via SQL Server Management Studio then navigate to the 'Stored Packages' folder then right click on the one of the children folders or an SSIS package to access the 'Import Packages...' or 'Export Packages...'option.
A fourth option in BIDS is to navigate to File | Save Copy of Package and complete the interface.
Question 2 - Can you explain how to setup a checkpoint file in SSIS?
The following items need to be configured on the properties tab for SSIS package:
CheckpointFileName - Specify the full path to the Checkpoint file that the package uses to save the value of package variables and log completed tasks. Rather than using a hard-coded path as shown above, it's a good idea to use an expression that concatenates a path defined in a package variable and the package name.
CheckpointUsage - Determines if/how checkpoints are used. Choose from these options: Never (default), IfExists, or Always. Never indicates that you are not using Checkpoints. IfExists is the typical setting and implements the restart at the point of failure behavior. If a Checkpoint file is found it is used to restore package variable values and restart at the point of failure. If a Checkpoint file is not found the package starts execution with the first task. The Always choice raises an error if the Checkpoint file does not exist.
SaveCheckpoints - Choose from these options: True or False (default). You must select True to implement the Checkpoint behavior.
Question 3 - Can you explain different options for dynamic configurations in SSIS?
Use an XML file
Use custom variables
Use a database per environment with the variables
Use a centralized database with all variables
Question 4 - How do you upgrade an SSIS Package?
Depending on the complexity of the package, one or two techniques are typically used:
Recode the package based on the functionality in SQL Server DTS
Use the Migrate DTS 2000 Package wizard in BIDS then recode any portion of the package that is not accurate
Question 5 - Can you name five of the Perfmon counters for SSIS and the value they provide?
SQLServer:SSIS Service
SSIS Package Instances - Total number of simultaneous SSIS Packages running
SQLServer:SSIS Pipeline
BLOB bytes read - Total bytes read from binary large objects during the monitoring period.
BLOB bytes written - Total bytes written to binary large objects during the monitoring period.
BLOB files in use - Number of binary large objects files used during the data flow task during the monitoring period.
Buffer memory - The amount of physical or virtual memory used by the data flow task during the monitoring period.
Buffers in use - The number of buffers in use during the data flow task during the monitoring period.
Buffers spooled - The number of buffers written to disk during the data flow task during the monitoring period.
Flat buffer memory - The total number of blocks of memory in use by the data flow task during the monitoring period.
Flat buffers in use - The number of blocks of memory in use by the data flow task at a point in time.
Private buffer memory - The total amount of physical or virtual memory used by data transformation tasks in the data flow engine during the monitoring period.
Private buffers in use - The number of blocks of memory in use by the transformations in the data flow task at a point in time.
Rows read - Total number of input rows in use by the data flow task at a point in time.
Rows written - Total number of output rows in use by the data flow task at a point in time.