weixin_34268310

SSIS 增量更新

本文转自 http://sqlblog.com/blogs/andy_leonard/archive/2007/07/09/ssis-design-pattern-incremental-loads.aspx

Andy Leonard

Andy Leonard is CSO of Linchpin People and SQLPeople, an SSIS Trainer, Consultant, and developer; SQL Server database and data warehouse developer, community mentor, engineer, and farmer. He is a co-author of SQL Server 2012 Integration Services Design Patterns. His background includes web application architecture and development, VB, and ASP. Andy loves the SQL Server Community!

SSIS Design Pattern - Incremental Loads

Introduction

Loading data from a data source to SQL Server is a common task. It's used in Data Warehousing, but increasingly data is being staged in SQL Server for non-Business-Intelligence purposes.

Maintaining data integrity is key when loading data into any database. A common way of accomplishing this is to truncate the destination and reload from the source. While this method ensures data integrity, it also loads a lot of data that was just deleted.

Incremental loads are a faster and use less server resources. Only new or updated data is touched in an incremental load.

When To Use Incremental Loads

Use incremental loads whenever you need to load data from a data source to SQL Server.

Incremental loads are the same regardless of which database platform or ETL tool you use. You need to detect new and updated rows - and separate these from the unchanged rows.

Incremental Loads in Transact-SQL

I will start by demonstrating this with T-SQL:

0. (Optional, but recommended) Create two databases: a source and destination database for this demonstration:

CREATE DATABASE [SSISIncrementalLoad_Source]

CREATE DATABASE [SSISIncrementalLoad_Dest]

1. Create a source named tblSource with the columns ColID, ColA, ColB, and ColC; make ColID is a primary unique key:

USE SSISIncrementalLoad_Source

CREATE TABLE dbo . tblSource

( ColID int NOT NULL

, ColA varchar ( 10 ) NULL

, ColB datetime NULL constraint df_ColB default ( getDate ())

, ColC int NULL

, constraint PK_tblSource primary key clustered ( ColID ))

2. Create a Destination table named tblDest with the columns ColID, ColA, ColB, ColC:

USE SSISIncrementalLoad_Dest GO CREATE TABLE dbo . tblDest ( ColID int NOT NULL , ColA varchar ( 10 ) NULL , ColB datetime NULL , ColC int NULL)

3. Let's load some test data into both tables for demonstration purposes:

USE SSISIncrementalLoad_Source GO

-- insert an "unchanged" row INSERT INTO dbo.tblSource (ColID,ColA,ColB,ColC) VALUES(0, 'A', '1/1/2007 12:01 AM', -1)

-- insert a "changed" row INSERT INTO dbo.tblSource (ColID,ColA,ColB,ColC) VALUES(1, 'B', '1/1/2007 12:02 AM', -2)

-- insert a "new" row INSERT INTO dbo.tblSource (ColID,ColA,ColB,ColC) VALUES(2, 'N', '1/1/2007 12:03 AM', -3)

USE SSISIncrementalLoad_Dest GO

-- insert an "unchanged" row INSERT INTO dbo.tblDest (ColID,ColA,ColB,ColC) VALUES(0, 'A', '1/1/2007 12:01 AM', -1)

-- insert a "changed" row INSERT INTO dbo.tblDest (ColID,ColA,ColB,ColC) VALUES(1, 'C', '1/1/2007 12:02 AM', -2)

4. You can view new rows with the following query: SELECT s.ColID, s.ColA, s.ColB, s.ColC FROM SSISIncrementalLoad_Source.dbo.tblSource s LEFT JOIN SSISIncrementalLoad_Dest.dbo.tblDest d ON d.ColID = s.ColID WHERE d.ColID IS NULL This should return the "new" row - the one loaded earlier with ColID = 2 and ColA = 'N'. Why? The LEFT JOIN and WHERE clauses are the key. Left Joins return all rows on the left side of the join clause (SSISIncrementalLoad_Source.dbo.tblSource in this case) whether there's a match on the right side of the join clause (SSISIncrementalLoad_Dest.dbo.tblDest in this case) or not. If there is no match on the right side, NULLs are returned. This is why the WHERE clause works: it goes after rows where the destination ColID is NULL. These rows have no match in the LEFT JOIN, therefore they must be new.

This is only an example. You occasionally find database schemas that are this easy to load. Occasionally. Most of the time you have to include several columns in the JOIN ON clause to isolate truly new rows. Sometimes you have to add conditions in the WHERE clause to refine the definition of truly new rows.

Incrementally load the row ("rows" in practice) with the following T-SQL statement:
INSERT INTO SSISIncrementalLoad_Dest.dbo.tblDest (ColID, ColA, ColB, ColC) SELECT s.ColID, s.ColA, s.ColB, s.ColC FROM SSISIncrementalLoad_Source.dbo.tblSource s LEFT JOIN SSISIncrementalLoad_Dest.dbo.tblDest d ON d.ColID = s.ColID WHERE d.ColID IS NULL
5. There are many ways by which people try to isolate changed rows. The only sure-fire way to accomplish it is to compare each field. View changed rows with the following T-SQL statement:
SELECT d.ColID, d.ColA, d.ColB, d.ColC FROM SSISIncrementalLoad_Dest.dbo.tblDest d INNER JOIN SSISIncrementalLoad_Source.dbo.tblSource s ON s.ColID = d.ColID WHERE ( (d.ColA != s.ColA) OR (d.ColB != s.ColB) OR (d.ColC != s.ColC) )

This should return the "changed" row we loaded earlier with ColID = 1 and ColA = 'C'. Why? The INNER JOIN and WHERE clauses are to blame - again. The INNER JOIN goes after rows with matching ColID's because of the JOIN ON clause. The WHERE clause refines the resultset, returning only rows where the ColA's, ColB's, or ColC's don't match and the ColID's match. This is important. If there's a difference in any or some or all the rows (except ColID), we want to update it.

Extract-Transform-Load (ETL) theory has a lot to say about when and how to update changed data. You will want to pick up a good book on the topic to learn more about the variations.

To update the data in our destination, use the following T-SQL:
UPDATE d SET d . ColA = s . ColA , d . ColB = s . ColB , d . ColC = s . ColC FROM SSISIncrementalLoad_Dest . dbo . tblDest d INNER JOIN SSISIncrementalLoad_Source . dbo . tblSource s ON s . ColID = d . ColID WHERE ( ( d . ColA != s . ColA ) OR ( d . ColB != s . ColB ) OR ( d . ColC != s . ColC ) )

Incremental Loads in SSIS

Let's take a look at how you can accomplish this in SSIS using the Lookup Transformation (for the join functionality) combined with the Conditional Split (for the WHERE clause conditions) transformations.

Before we begin, let's reset our database tables to their original state using the following query:

USE SSISIncrementalLoad_Source GO
TRUNCATE TABLE dbo.tblSource
-- insert an "unchanged" row INSERT INTO dbo.tblSource (ColID,ColA,ColB,ColC) VALUES(0, 'A', '1/1/2007 12:01 AM', -1)

-- insert a "changed" row INSERT INTO dbo.tblSource (ColID,ColA,ColB,ColC) VALUES(1, 'B', '1/1/2007 12:02 AM', -2)
-- insert a "new" row INSERT INTO dbo.tblSource (ColID,ColA,ColB,ColC) VALUES(2, 'N', '1/1/2007 12:03 AM', -3)
USE SSISIncrementalLoad_Dest GO
TRUNCATE TABLE dbo.tblDest
-- insert an "unchanged" row INSERT INTO dbo.tblDest (ColID,ColA,ColB,ColC) VALUES(0, 'A', '1/1/2007 12:01 AM', -1)
-- insert a "changed" row INSERT INTO dbo.tblDest (ColID,ColA,ColB,ColC) VALUES(1, 'C', '1/1/2007 12:02 AM', -2)
Next, create a new project using Business Intelligence Development Studio (BIDS). Name the project SSISIncrementalLoad:

Once the project loads, open Solution Explorer and rename Package1.dtsx to SSISIncrementalLoad.dtsx:

When prompted to rename the package object, click the Yes button. From the toolbox, drag a Data Flow onto the Control Flow canvas:

Double-click the Data Flow task to edit it. From the toolbox, drag and drop an OLE DB Source onto the Data Flow canvas:

Double-click the OLE DB Source connection adapter to edit it:

Click the New button beside the OLE DB Connection Manager dropdown:

Click the New button here to create a new Data Connection:

Enter or select your server name. Connect to the SSISIncrementalLoad_Source database you created earlier. Click the OK button to return to the Connection Manager configuration dialog. Click the OK button to accept your newly created Data Connection as the Connection Manager you wish to define. Select "dbo.tblSource" from the Table dropdown:

Click the OK button to complete defining the OLE DB Source Adapter.

Drag and drop a Lookup Transformation from the toolbox onto the Data Flow canvas. Connect the OLE DB connection adapter to the Lookup transformation by clicking on the OLE DB Source and dragging the green arrow over the Lookup and dropping it. Right-click the Lookup transformation and click Edit (or double-click the Lookup transformation) to edit:

When the editor opens, click the New button beside the OLE DB Connection Manager dropdown (as you did earlier for the OLE DB Source Adapter). Define a new Data Connection - this time to the SSISIncrementalLoad_Dest database. After setting up the new Data Connection and Connection Manager, configure the Lookup transformation to connect to "dbo.tblDest":

Click the Columns tab. On the left side are the columns currently in the SSIS data flow pipeline (from SSISIncrementalLoad_Source.dbo.tblSource). On the right side are columns available from the Lookup destination you just configured (from SSISIncrementalLoad_Dest.dbo.tblDest). Follow the following steps:

1. We'll need all the rows returned from the destination table, so check all the checkboxes beside the rows in the destination. We need these rows for our WHERE clauses and for our JOIN ON clauses.

2. We do not want to map all the rows between the source and destination - we only want to map the columns named ColID between the database tables. The Mappings drawn between the Available Input Columns and Available Lookup Columns define the JOIN ON clause. Multi-select the Mappings between ColA, ColB, and ColC by clicking on them while holding the Ctrl key. Right-click any of them and click "Delete Selected Mappings" to delete these columns from our JOIN ON clause.

3. Add the text "Dest_" to each column's Output Alias. These rows are being appended to the data flow pipeline. This is so we can distinguish between Source and Destination rows farther down the pipeline:

Next we need to modify our Lookup transformation behavior. By default, the Lookup operates as an INNER JOIN - but we need a LEFT (OUTER) JOIN. Click the "Configure Error Output" button to open the "Configure Error Output" screen. On the "Lookup Output" row, change the Error column from "Fail component" to "Ignore failure". This tells the Lookup transformation "If you don't find an INNER JOIN match in the destination table for the Source table's ColID value, don't fail." - which also effectively tells the Lookup "Don't act like an INNER JOIN, behave like a LEFT JOIN":

Click OK to complete the Lookup transformation configuration.

From the toolbox, drag and drop a Conditional Split Transformation onto the Data Flow canvas. Connect the Lookup to the Conditional Split as shown. Right-click the Conditional Split and click Edit to open the Conditional Split Editor:

Expand the NULL Functions folder in the upper right of the Conditional Split Transformation Editor. Expand the Columns folder in the upper left side of the Conditional Split Transformation Editor. Click in the "Output Name" column and enter "New Rows" as the name of the first output. From the NULL Functions folder, drag and drop the "ISNULL( <> )" function to the Condition column of the New Rows condition:

Next, drag Dest_ColID from the columns folder and drop it onto the "<>" text in the Condition column. "New Rows" should now be defined by the condition "ISNULL( [Dest_ColID] )". This defines the WHERE clause for new rows - setting it to "WHERE Dest_ColID Is NULL".

Type "Changed Rows" into a second Output Name column. Add the expression "(ColA != Dest_ColA) || (ColB != Dest_ColB) || (ColC != Dest_ColC)" to the Condition column for the Changed Rows output. This defines our WHERE clause for detecting changed rows - setting it to "WHERE ((Dest_ColA != ColA) OR (Dest_ColB != ColB) OR (Dest_ColC != ColC))". Note "||" is used to convey "OR" in SSIS Expressions:

Change the "Default output name" from "Conditional Split Default Output" to "Unchanged Rows":

Click the OK button to complete configuration of the Conditional Split transformation.

Drag and drop an OLE DB Destination connection adapter and an OLE DB Command transformation onto the Data Flow canvas. Click on the Conditional Split and connect it to the OLE DB Destination. A dialog will display prompting you to select a Conditional Split Output (those outputs you defined in the last step). Select the New Rows output:

Next connect the OLE DB Command transformation to the Conditional Split's "Changed Rows" output:

Your Data Flow canvas should appear similar to the following:

Configure the OLE DB Destination by aiming at the SSISIncrementalLoad_Dest.dbo.tblDest table:

Click the Mappings item in the list to the left. Make sure the ColID, ColA, ColB, and ColC source columns are mapped to their matching destination columns (aren't you glad we prepended "Dest_" to the destination columns?):

Click the OK button to complete configuring the OLE DB Destination connection adapter.

Double-click the OLE DB Command to open the "Advanced Editor for OLE DB Command" dialog. Set the Connection Manager column to your SSISIncrementalLoad_Dest connection manager:

Click on the "Component Properties" tab. Click the elipsis (button with "...") beside the SQLCommand property:

The String Value Editor displays. Enter the following parameterized T-SQL statement into the String Value textbox:

UPDATE dbo.tblDest SET ColA = ? ,ColB = ? ,ColC = ? WHERE ColID = ?

The question marks in the previous parameterized T-SQL statement map by ordinal to columns named "Param_0" through "Param_3". Map them as shown below - effectively altering the UPDATE statement for each row to read:

UPDATE SSISIncrementalLoad_Dest.dbo.tblDest SET ColA = SSISIncrementalLoad_Source.dbo.ColA ,ColB = SSISIncrementalLoad_Source.dbo.ColB ,ColC = SSISIncrementalLoad_Source.dbo.ColC WHERE ColID = SSISIncrementalLoad_Source.dbo.ColID

Note the query is executed on a row-by-row basis. For performance with large amounts of data, you will want to employ set-based updates instead.

Click the OK button when mapping is completed.

Your Data Flow canvas should look like that pictured below:

If you execute the package with debugging (press F5), the package should succeed and appear as shown here:

Note one row takes the "New Rows" output from the Conditional Split, and one row takes the "Changed Rows" output from the Conditional Split transformation. Although not visible, our third source row doesn't change, and would be sent to the "Unchanged Rows" output - which is simply the default Conditional Split output renamed. Any row that doesn't meet any of the predefined conditions in the Conditional Split is sent to the default output.

That's all! Congratulations - you've built an incremental database load! [:)]

Get the code! (Free registration required)

:{> Andy

Published Monday, July 09, 2007 3:13 PM by andyleonard

Filed under: Design Pattern, Incremental, SSIS

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

	Alberto Ferrari said: Andy, maybe you are interested in taking a look at the TableDifference component I published at http://www.sqlbi.eu. It is an all-in-one and completely free SSIS component that handles these kind of situations without the need to cache data in the Lookup. Lookups are nice but - in real situaton - they may shortly lead to out of memory situations (think at a hundred million rows table... it simply cannot be cached in memory). Beware that - for huge table comparison - you will need both TableDifference AND the FlowSync component that you can find at the same site. I'll be glad to hear your comments about it. Alberto July 12, 2007 5:21 AM
	andyleonard said: Thanks Alberto! Checking it out now. :{> Andy July 13, 2007 9:30 PM
	David R Buckingham said: Thank you greatly Andy. This couldn't have come at a better time as I just started using Integration Services for the first time on Friday to handle eight different data loads (all for a single client). Four of the data loads are straight appends, but the other four are incremental. This approach is vastly superior to loading the incremental data into a temporary table and then processing it against the destination table. In fact, it proved to be more efficient than both set-based insert/updates or a cursor-based approach. Yes, I tested both approaches prior to implementing yours. Your approach was faster than the set-based insert/updates even though I tested it across the WAN which suprised me greatly. I also created a script to assist with the creation of the Conditional Split "Changed Rows" condition which follows (be sure your results aren't being truncated when you have a table with many columns): --- BEGIN SCRIPT --- DECLARE @Filter varchar(max) SET @Filter = '' -- ((ISNULL()?"":)!=(ISNULL(Dest_)?"":Dest_)) \|\| SELECT @Filter = @Filter + '((ISNULL(' + c.[name] + ')?"":' + c.[name] + ')!=(ISNULL(Dest_' + c.[name] + ')?"":Dest_' + c.[name] + ')) \|\| ' FROM sys.tables t INNER JOIN sys.columns c ON t.[object_id] = c.[object_id] WHERE SCHEMA_NAME( t.[schema_id] ) = 'GroupHealth' AND t.[name] = 'ConsumerDetail' AND c.[is_identity] = 0 AND c.[is_rowguidcol] = 0 ORDER BY c.[column_id] SET @Filter = LEFT( @Filter, LEN( @Filter ) - 2 ) SELECT @Filter --- END SCRIPT --- Again, thanks greatly. I now have 2 SSIS books on there way to me. I am eager to learn as much as I can. July 17, 2007 3:52 PM
	Bill Mo said: Hello,Andy!Thanks a lot for your incremental process!I'm doing SSIS project! July 17, 2007 9:47 PM
	david boston said: Thanks this worked a treat for my SSIS project. July 20, 2007 5:01 AM
	andyleonard said: Hi David, Bill, and David, Thanks for the feedback! :{> Andy August 8, 2007 7:14 PM
	saul said: Hi Andy !! Great work... I was scared because of this Incremental load... and you saved my weekend... now I can enjoy it .... :-) September 7, 2007 5:56 PM
	Steve Hall said: Anyone had a problem with the insert and update commands locking each other out? Didn't happen at first but does now. Update gets blocked by the insert and it just hangs. Steve September 18, 2007 1:18 PM
	andyleonard said: Thanks Saul! Steve, are you sure there's not something more happening on the server that's causing this? If this is repeatable, please provide more information and I'll be happy to take a look at it. SQL Server does a fair job of detecting and managing deadlocks when they occur. I haven't personally seen SQL Server "hang" since 1998 - and then it was due to a failing I/O controller. :{> Andy September 27, 2007 6:57 PM
	Bill Mo said: Hi,Andy! I have a same problem with Steve,it is block. When bulk insert and update happen,Update gets blocked by the insert and it just hangs!Insert's wait type is ASYNC_NETWORK_IO. October 8, 2007 4:15 AM
	Bobby said: Thx 4 the trick with Fail -> Left Join ! I was thinking how to do it whole day :o) October 18, 2007 1:23 AM
	Andy Leonard said: Introduction This post is part of a series of posts on ETL Instrumentation. In Part 1 we built a database November 18, 2007 10:53 PM
	Michael Ross said: Steve, This most certainly can be the case with larger datasets. In my case, I ran into this issue with large FACT table loads. Either consider dumping the contents of the insert into a temp table or SSIS RAW datafile and complete the insert in a separate dataflow task or modify the isolationlevel of the package. Be warned, make sure you research the IsolationLevel property thoroughly before making such a change. November 26, 2007 12:03 PM
	Michael said: What happens when a field is NULL in the destination or source when determining changed rows? Don't we need special checks to ensure if a destination field is NULL the source should also be? Thus a change has occured and the record should be updated? December 26, 2007 10:26 AM
	andyleonard said: Hi Michael, Excellent question! This post was intended to cover the principles of Incremental Loads, and not as a demonstration of production-ready code. There are a couple approaches to handling NULLs in the source or destination, each with advantages and disadvantages. In my opinion, the chief consideration is data integrity and the next-to-chief consideration is metadata integrity. A good NULL trap can be tricky because NULL == NULL should never evaluate to True. I know NULL == NULL can evaluate to True with certain settings, but these settings also have side-effects. And then there's maintenance to consider... basically, there's no free lunch. A relatively straightforward method involves identifying a value for the field that the field will never contain (i.e. -1, "(empty)", or even the string "NULL") and using that value as a substitute for NULL. In the SSIS expression language you can write a change-detection expression like: (ISNULL(Dest_ColA) ? -1 : Dest_ColA) != (ISNULL(ColA) ? -1 : ColA) But again, if ColA is ever -1 this will evaluate as a change and fire an update. Why does this matter? Some systems include "number of updated rows" as a validation metric. :{> Andy December 26, 2007 12:50 PM
	Michael said: Hi Andy, Thanks for this great article! Do you have any hints for implementing your design with an Oracle Source. I am attempting to incrementally update from a table with 7 million rows with ~50 fields. The Lookup Task failed when I attempted to use it like you described above due to a Duplicate Key error...cache is full. I googled this and found an article suggesting enabling restrictions and enabling smaller cache amounts. However it is now extremely slow. Do you have any experience/advice on tweaking the lookup task for my environment? Is there value in attempting to port this solution to an Oracle to SQL environment? Is there a way to speed things up/replace the lookup task by using a SQL Execution Task which calls a left outer join? Is there major difference\impact in having multiple primary keys? Thanks Again December 26, 2007 1:47 PM
	Andy Leonard said: Now that our 5-month old son - Riley Cooper - is on the mend , I am hitting the speaking trail again! January 6, 2008 6:16 PM
	Jigs said: Hi AndY looks great and work also great but if there are more records to update than it just hangs while doing insert and update so what should i do ..is there any workaround by which we can avoid hanging od SSIS pacage. Please Suggest Thanks Jigu January 15, 2008 3:36 PM
	andyleonard said: Hi Bill and Jigu, Although I mention set-based updates here I did not demonstrate the principle because I felt the post was already too long - my apologies. I have since written more on Design Patterns. Part 3 of my series on ETL Instrumentnation (http://sqlblog.com/blogs/andy_leonard/archive/2007/11/18/ssis-design-pattern-etl-instrumentation-part-3.aspx#SetBasedUpdates) demonstrates set-based updates. I need to dedicate a post to set-based updates. :{> Andy January 16, 2008 7:10 AM
	Jai said: Hi Andy Thanks you did great help to understand data update through SSIS package April 5, 2008 6:16 PM
	Kenneth said: Hi Andy, I have a hard time following your instructions. Can you send me your sample project Thank You Kenneth [email protected] July 29, 2008 1:44 PM
	andyleonard said: Hi Kenneth, Sorry to hear you're having a hard time with my instructions. One of the last instructions is a link at the bottom of the page called "Get the code". It points to this URL:http://vsteamsystemcentral.com/dnn/Demos/IncrementalLoads/tabid/94/Default.aspx. Hope this helps, Andy July 29, 2008 1:59 PM
	EAD said: Not sure posted same question few places….May be you gurus can explain In SSIS Fuzzy grouping objects creates some temp tables and does the Fuzzy logic. I ran the trace to see how it does in one cursor it is taking very long time to process 150000 records. Same executes fine in any other test environments. The cursor is simple and I can post if needed. Any thoughts ? September 11, 2008 8:45 PM
	LNelson said: I have a similar package I am trying to create and this was a big help. The new rows write properly however I am getting an error on the changed rows because the SQL table i am writing to has an auto incremented identity spec column. The changes won't write to the SQL table. If I uncheck "keep identity" it writes new rows instead of updating existing. What am I missing? December 1, 2008 11:38 AM
	FDA said: Thanks a lot of Andy!! Very Helpful! December 17, 2008 3:48 AM
	Rajesh said: Hi Andy.. Thats the good alternative for slowly changing dimention...!! Welll done... What if the increamental is based on more than one columns...? And further to increase the complications, if any of the column included in the look up condition changes as well....? Last one...wht if the row is deleted from source....? January 6, 2009 3:23 AM
	Ken ([email protected]) said: it looks like your package handles new and updated rows. I don't see the code handling the deleted rows in source (asume that there is) Here is my two cents. in your lookup, you can split out the match and non-match rows. non match means new record and you can do an insert directly after the lookup. you can elimninate the 'new row' in your condition in 'conditional split' However, overall, your sample package is the best (at far as I have searched) sample on the net ( I love it, honestly). Keep up the great work and giving out sample package. Like most people, I do appreciate your efford. Ken January 7, 2009 8:10 PM
	andyleonard said: Hi Ken, Thanks for your kind words. I believe you're referring to functionality new to the SSIS 2008 Lookup Transformation - there is no Non-Match Rows output buffer in the SSIS 2005 Lookup Transformation. :{> Andy January 7, 2009 9:58 PM
	RVS said: Hi Andy, Thanks a lot for this article. It proved to be a great help for me. I was wondering if you can provide some solution to handle deleted rows from source table using lookup. I need this because I have to keep the historical data in the data warehouse. Thanks in advance, RVS [email protected] January 21, 2009 3:04 AM
	Charlie Asbornsen said: Andy, thanks for your help and effort. This is definitely more elegant than staging over to one database and then doing ExecuteSQLs to execute incremental loads. January 21, 2009 5:16 PM
	Charlie Asbornsen said: And re ranvijay's question, I would assume that when the row exists in the destination but not the source, the source RowID would show up as null, so you could do that as another split on the conditional. January 21, 2009 5:18 PM
	andyleonard said: Hi RVS and Charlie, RVS, Charlie answered your question before I could get to it! I love this community! I need to write more on this very topic. New features in SQL Server 2008 change this and make the Deletes as simple as New and Updated rows. I didn't mention Deletes in this post because the main focus was to get folks thinking about leveraging the data flow instead of T-SQL-based solutions (Charlie, in regards to your first comment). There's nothing wrong with T-SQL. But a data flow is built to buffer (or "paginate") rows. It bites off small chunks, acts on them, and then takes another bite. This greatly reduces the need to swap to disk - and we all know the impact of disk I/O on SQL Server performance. Charlie is correct. The way to do Deletes is to swap the Source and Destinations in the Correlate / Filter stages. Typically, I stage Deletes and Updates in a staging table near the table to be Deleted / Updated. Immediately after the data flow, I add an Execute SQL Task to perform a correlated (inner joined) update or delete with the target table. I do this because my simplest option inside a data flow is row-based Updates / Deletes using the OLE DB Command transformation. A set-based Update / Delete is a lot faster. I need to write more about that as well... :{> Andy January 21, 2009 5:29 PM
	Charlie Asbornsen said: Andy, Looks like I have some rewriting to do on the next version of the ETL. It's a good thing I enjoy working in SSIS! I'm working on building a data warehouse and BI solution for a government customer, and a lot of their 1970's era upstream data sources don't have ANY kind of data validation. In fact when we first installed in production we found out that they had some code fields in their data tables with a single quote for data! It played merry hob with our insert statements until we figured out what was happening. Then I got to figure out how to do D-SQL whitelisting with VB scripting in SSIS :) Of course since its the government we'll probaby have to wait until 3Q 2010 before we're allowed to upgrade to SQL 2008. We were all gung ho about VS 2008 (which we were allowed to get) but imagine my chagrin when I found out that I couldn't use my beloved BI Studio without SQL 2008... :P So I'll be using this for the next version... and possibly the version after that as well. Thanks a bunch! January 21, 2009 5:41 PM
	Charlie Asbornsen said: Me again. I think I made a mistake. If a row already exists in the destination table and it no longer exists in the source table, I want it deleted (sent to the deletes staging table). However, the lookup limits the row set in memory to items that are already in the source table, so its not really functioning as an outer join. Its perfect for determining inserts and updates, but I need to do something else to do deletes... I'm going to try adding an additional OLE DB source and point that at the same table the lookup is checking... hmm, maybe try the Merge? I'll see what happens and let you know. January 22, 2009 12:41 PM
	Charlie Asbornsen said: Actually I think I need a second pass... grrr. January 22, 2009 12:44 PM
	Charles Asbornsen said: Andy, Please feel free to combine this with the previous reply. What I wound up doing was creating a second data flow after the one that split the inserts and updates out. The deletes flow populated a deleted rows staging table with the deleted row id, which then was joined to the ultimate destination table in a delete command in an Execute SQL task. I would up reversing the lookup, but used the same technique by using a conditional split on whether or not the new column from the lookup was null, and if it was, the output went to the "deleted records" path, which populated the staging table. The reason I want to actually remove the data from the table as opposed to merely marking it as deleted is because the reason a row would disappear would be because it was a bad reference code in the first place. My big datawarehouse ETL adds new reference codes to the reference tables (which it needs to create in the first place because the source reference codes are held in these five gigantic tables which do not lend themselves to generating NV lists) for unmatched codes in the data tables (remember there's no validation at the source). When the reconciliation stick finally gets swung and the customer replaces the junk code it disappears from my ETL and I remove it from my table. It is different from a code that gets obsoleted; there's a reason to track those, but garbage just needs to be thrown out. Thanks again, I would have been very annoyed with myself if I wound up doing row-based IUDs... January 22, 2009 2:55 PM
	andyleonard said: Hi Charles, I wasn't clear in my earlier response but you figured it out anyway - apologies and kudos. You do need to do the Delete in another Data Flow Task. Excellent work! :{> Andy January 22, 2009 4:15 PM
	Charles Asbornsen said: Andy, Is there a limit to how many comparisons you can make in the Conditional Split Transformation Editor? I have a table with 20 columns, and I'm trying to do 19 comparisons. It's telling me that one of the columns doesn't exist in the input column collection. I can cut the expression and paste it back in and it picks a different column to complain about. Error 0xC0010009... it says the expression cannot be parsed, may contain invalid elements or might not be well formed, and there may also be an out-of-memory error. I've been looking at it for 1/2 an hour and all the columns it is variously complaining about are present in the input column collection, so I suspect it's a memory error. Should I alias the column names to be shorter (ie the problem is in the text box) or is it a metadata problem? I'm going home now but tomorrow I will see if splitting the staging table into 4 tables and splitting the conditions into 4 outputs (to be recombined later by an execute SQL command into the real staging table) does what I need. Thanks! Charlie January 22, 2009 5:54 PM
	RVS said: Hi Andy and Charles, I thank you for your comments. I still have a few doubts related to handling Deleted columns. I have created a solution to handle all three cases(add,update and delete). I have taken two OLEDB Source(one with source and data and another with destination table's data) then I have SORTED them and MERGED them(with FULL OUTER join) and finally used CONDITIONAL SPLIT to filter New, updated and Deleted data and used the OLEDB Command to do the required action. I am getting Deleted rows by using full outer join. I am getting expected result with this solution but I think this is not performance efficient as it is using sort, merge etc. I wanted to use Lookup as suggested by Andy. But the solution which you both have given is not fully clear to me. Will it be possible for you to send me a sketch of the proposed solution or explain it a bit in detail? Charles, regarding no. of comparisons, I don't think it is limited to 19 or 20 because I have used more than 35 comparisons and that is working fine. Please check if you have checked for null columns correctly. Thanks once again, RVS ([email protected]) January 23, 2009 6:57 AM
	Charlie Asbornsen said: Doh! Thanks Ranvijay. January 23, 2009 10:01 AM
	Charlie Asbornsen said: Actually what was happening was that since the comparison expression was so long I moved it into WordPad to type it and then copy/pasted into the rather annoyingly non-resizable condition field in the conditional split transformation editor. It turns out it doesn't like that. Maybe there were invisible control characters in the string, so I needed to just bite the bullet and type in the textbox. It works fine now. It would be nice to have a text visualizer for that field. Thanks! January 23, 2009 1:51 PM
	vidhya said: This was the excellent article and Andy illustration style is great. Thank you June 30, 2009 9:47 AM
	Nostromo said: Great tutorial! I'm new to SSIS and I worked through it without a hitch. Thanks!!! July 10, 2009 10:23 AM
	DVL said: Hellow, Many thanks for the step by step guide. It's nice to find a way to get your changed and new records in 2 separate outputs. But how who you get the deleted records? The only solution i found is to lookup every PK in the source db table and check if it still excists. If it does it will set the deleted_flag to 1. Do you have any idea to implement the deleted records into your solution? Mine is in a separete dataflow. Greetings August 27, 2009 8:05 AM
	CSu said: Great article! I originally used sort, merge join (with left outer join) and conditional split transforms to perform incremental load. Unfortunately it did not work as expected. Your article has simplified my design and it is now working perfectly. Thanks for sharing. :) October 26, 2009 7:26 AM
	hasan said: Dear Andy your solution is great but i have problem. the dimensions are not getting populated with the default data. does this work on the excel source because i have an excel source. December 29, 2009 7:31 AM
	Mike said: Hiya, Just read the article, confirms my approach to incremental loading on a series of smallish facts. I have used the "slowly changing dimension" element in the past to facilitate the same outcome, ie not using type2s (despite being a fact) - but it is much slower. RVA, re: "I am getting expected result with this solution but I think this is not performance efficient as it is using sort, merge etc"; if the sort(s) are the main problem, you can do the sort on the database and tell SSIS that the set is sorted to avoid using two sort dataflow tasks - not sure if that will give you sufficient gains? The Merge join, as you say, will still be not great within SSIS. Lastly - has anyone any experience of duplicated KEYS in the source table, that do not (yet) exist in the destination? I am performing bulk-inserts after the update/insert evaluation. I have a minor concern that if I have a key in the source data, that the FIRST record will correctly INSERT, does the lookup then add this key to memory, so that when the second key arrives it knows to update? Because, although I do not constrain the destination table, it will cause problems within the data (mini carteseans - shudder). Do I need to be aware of any settings or the like? I am about to do a test-case now - and see what happens... January 24, 2010 5:35 PM
	Mandar said: Hi Andy, I want to load data incrementally from source (MySQL 5.2) to SQL Server 2008, using SSIS 2008, based on modified date. Somehow I am not able do it as MySQL doesn't support parameters. Need some help on this. -regards, mandar March 15, 2010 6:40 AM
	Ramdas said: Thank you andy for this tutorial. I am using SSIS 2008, the Lookup task interface has changed a little bit, when you click on edit on the lookup task, the opening screen is layed out differently. March 25, 2010 9:46 AM
	KK said: In my source ID 3 Record has duplicated KEYS so i want first record Insert and Secode Record should be update in Destination table trough SSIS Can any one help me to resovle this problem. When I use SCD 2 type when it read record in target the id 3 record is not avlable in target so it’s treat for insert for second record also same. So that record insert two time I don’t want like that I want to first record insert and scoend record of ID 3 Update. So any way of resolve this problem . ID Name Date 1 Kiran 1/1/2010 12:00:00 AM 3 Rama 1/2/2010 12:00:00 AM 2 Dubai 1/2/2010 12:00:00 AM 3 Ramkumar 1/2/2010 12:00:00 AM March 25, 2010 5:11 PM
	Craig said: I need to incrementally load data from Sybase to SQL. There will be several hundred million rows. Will this approach work OK with this scenario? March 30, 2010 10:45 AM
	andyleonard said: Hi Craig, Maybe, but most likely not. This is one design pattern you can start with. I would test this, tweak it, and optimize like crazy to get as much performance out of your server as possible. :{> Andy March 30, 2010 10:52 AM
	jpedroalmeida said: Hy there from Portugal, Andy, i am a starter in SSIS and i found this article very useful and straightforward in explanation with text and images... Thanks a lot!! Cheers April 25, 2010 11:02 AM
	JohnnyReaction said: Hi Andy I amended your script to deal with different datatypes (saves a lot of debugging in the Conditional Split Transformation Editor): /* This script assists with the creation of the Conditional Split "Changed Rows" condition -- be sure your results aren't being truncated when you have a table with many columns / --- BEGIN SCRIPT --- USE master GO DECLARE @Filter varchar(max) SET @Filter = '' SELECT @Filter = @Filter + '((ISNULL(' + c.[name] + ')?'+ CASE WHEN c.system_type_id IN (35,104,167,175,231,239,241) THEN '""' WHEN c.system_type_id IN (58,61) THEN '(DT_DBTIMESTAMP)"1900-01-01"' ELSE '0' END + ':' + c.[name] + ')!=(ISNULL(Dest_' + c.[name] + ')?' + CASE WHEN c.system_type_id IN (35,104,167,175,231,239,241) THEN '""' WHEN c.system_type_id IN (58,61) THEN '(DT_DBTIMESTAMP)"1900-01-01"' ELSE '0' END +':Dest_' + c.[name] + ')) \|\| ' FROM sys.tables t INNER JOIN sys.columns c ON t.[object_id] = c.[object_id] WHERE SCHEMA_NAME( t.[schema_id] ) = 'dbo' AND t.[name] = 'DimUPRTable' AND c.[is_identity] = 0 AND c.[is_rowguidcol] = 0 ORDER BY c.[column_id] SET @Filter = LEFT(@Filter, (LEN(@Filter) - 2)) SELECT @Filter --SELECT -- c. --FROM -- sys.tables t --JOIN -- sys.columns c -- ON t.[object_id] = c.[object_id] --WHERE -- SCHEMA_NAME( t.[schema_id] ) = 'dbo' --AND t.[name] = 'DimUPRTable' --AND c.[is_identity] = 0 --AND c.[is_rowguidcol] = 0 --ORDER BY --c.[column_id] --SELECT -- schemas.name AS [Schema] -- ,tables.name AS [Table] -- ,columns.name AS [Column] -- ,CASE WHEN columns.system_type_id = 34 -- THEN 'byte[]' -- WHEN columns.system_type_id = 35 -- THEN 'string' -- WHEN columns.system_type_id = 36 -- THEN 'System.Guid' -- WHEN columns.system_type_id = 48 -- THEN 'byte' -- WHEN columns.system_type_id = 52 -- THEN 'short' -- WHEN columns.system_type_id = 56 -- THEN 'int' -- WHEN columns.system_type_id = 58 -- THEN 'System.DateTime' -- WHEN columns.system_type_id = 59 -- THEN 'float' -- WHEN columns.system_type_id = 60 -- THEN 'decimal' -- WHEN columns.system_type_id = 61 -- THEN 'System.DateTime' -- WHEN columns.system_type_id = 62 -- THEN 'double' -- WHEN columns.system_type_id = 98 -- THEN 'object' -- WHEN columns.system_type_id = 99 -- THEN 'string' -- WHEN columns.system_type_id = 104 -- THEN 'bool' -- WHEN columns.system_type_id = 106 -- THEN 'decimal' -- WHEN columns.system_type_id = 108 -- THEN 'decimal' -- WHEN columns.system_type_id = 122 -- THEN 'decimal' -- WHEN columns.system_type_id = 127 -- THEN 'long' -- WHEN columns.system_type_id = 165 -- THEN 'byte[]' -- WHEN columns.system_type_id = 167 -- THEN 'string' -- WHEN columns.system_type_id = 173 -- THEN 'byte[]' -- WHEN columns.system_type_id = 175 -- THEN 'string' -- WHEN columns.system_type_id = 189 -- THEN 'long' -- WHEN columns.system_type_id = 231 -- THEN 'string' -- WHEN columns.system_type_id = 239 -- THEN 'string' -- WHEN columns.system_type_id = 241 -- THEN 'string' -- WHEN columns.system_type_id = 241 -- THEN 'string' -- END AS [Type] -- ,columns.is_nullable AS [Nullable] --FROM -- sys.tables tables --INNER JOIN -- sys.schemas schemas --ON (tables.schema_id = schemas.schema_id ) --INNER JOIN -- sys.columns columns --ON (columns.object_id = tables.object_id) --WHERE -- tables.name <> 'sysdiagrams' -- AND tables.name <> 'dtproperties' --ORDER BY -- [Schema] -- ,[Table] -- ,[Column] -- ,[Type] July 28, 2010 8:26 AM
	Paul Klotka said: Using T-SQL to do change detection. I would not use a join to detect change because in the where clause you need to handle NULL values. For example if ColA in Source is NULL it doesn't matter what ColA is in the destination, the where clause will return false and not detect the change. To get around this I use a union to detect change. Here is an example. select ColId, ColA, ColB, ColC from Source union select ColId, ColA, ColB, ColC from Dest This returns a distinct set of rows, including handling NULL values. All that is left is to determine if the ColId appears more than once in the set. select ColId from ( select ColId, ColA, ColB, ColC from Source union select ColId, ColA, ColB, ColC from Dest ) x group by ColId having count() > 1 Now I have a list of keys which changed. I can take this list and sort it to use in a merge join in SSIS or I can use it as a subquery to join back to the Source table. See below. select ColId, ColA, ColB, ColC from Source s inner join ( select ColId from ( select ColId, ColA, ColB, ColC from Source union select ColId, ColA, ColB, ColC from Dest ) x group by ColId having count() > 1 ) y on s.ColId = y.ColId July 28, 2010 2:06 PM
	Chhavi said: Thanks for the good explanation and screenshots. I found this website to be extremly helpful and supportive. Please let me know if I can learn something more from you and rest of the guys visiting this website, so that we can become better in SSIS and SQL server 2005 or 2008. Please provide us similar articles so that we can through them and practice. Thanks again Andy. Long Live Andy :) August 18, 2010 3:59 PM
	AP said: This is excellent Article ! Great job October 11, 2010 10:12 AM
	TheAviator said: Thank you very very much. No where on the net I found it explained in so detail and clear. Thanks again October 22, 2010 11:08 AM
	V said: My requirement is Update: if records exist in both the table compare them, and update value in destination table if value is different. Insert: if record doesn't exist in destination table, add new record in destination table. Delete: if record exist in destination table but not in source table, delete record from destination table. The above code, perform only Insert and Update, however, it doesnt Delete data from destination table which has been deleted from Source data. I would not like to perform Truncate\Delete ALL data from destination table. Please let me know how shall i do this. Basically, it should perform Update, Insert, Delete in one single package (task) December 20, 2010 6:39 PM
	Dpostman said: Probably not the most common scenario, but if your source and destinations are coming from sql server could you just select checksum(*) as a column from your source and destination tables and test it to determine if the row has been updated? I would think it would be a pretty safe alternative when you have a lot of columns. Or has anyone created a hashing formula in an expression? (or would that be too slow to consider?) January 17, 2011 5:16 PM
	dbraver said: The following new feature of 2008 R2 seems does the same http://technet.microsoft.com/en-us/library/bb510625.aspx. February 16, 2011 7:28 PM
	Peter Schott said: Merge may be able to handle this, but if you start working with millions of rows, it tends not to perform too well. For smaller data sets, it's pretty effective, but I've seen that command sit for a while as Merge tries to calculate the updates/inserts. April 20, 2011 4:52 PM
	Samit Shah said: Hi Andy, Its great article. Very well explained. But I was wondering I am having around 20-30 tables in mysql and I have to use SSIS package for moving data of this tables to sql server. Some of the tables has 40-50 million records. I have to do this load very frequently (might be daily), is this best approach for it or if you can suggest some better approach. April 25, 2011 8:44 AM
	andyleonard said: Change Detection is a topic of many design patterns. Here I used a rather brute-force method for detecting updates, mainly to demonstrate the concept. There are more methods for detecting changed rows and I hope to blog about some soon. Thanks for the comments! :{> April 25, 2011 7:33 PM
	Kal said: This works great! Now the issue is, I have more than 100 tables which need an incremental load.. Do i have to build 100 packages? or is there an easy way out? Please Help.. May 12, 2011 6:24 AM
	Thiru-BI said: i just want to load based on the date ie if our table have date column we will capture that maximum date based on the loading if again running our package it allows which records have date greater than already capturing date. Allow this records only by the we can get only new records to target. can you please some idea to achive this? Thanks & Regards, Thirunavukkarasu P May 30, 2011 3:33 AM
	Michael Baumanns said: Hi Kal, i have actually the same problem. Maybe you can use the Slowly Changing Dimension Tool. This generates Updaten and insert command for you. May 30, 2011 9:50 AM
	Romualdo said: Hi Andy! Thanks a lot for that topic. I'm starting to use SSIS and i have some doubts: a) If my Destionation Database is null, i got error. Don't insert the rows. b) What can i do to delete rows on destination database who not exists on source database? c)If i have come source column with null, don't update the destination. But that i understant why a i see your sugest: (ISNULL(Dest_ColA) ? -1 : Dest_ColA) != (ISNULL(ColA) ? -1 : ColA) July 20, 2011 2:10 PM
	Lavanya said: Thank u a lot...nice explanation July 26, 2011 9:00 AM
	J Channin said: Could you please explain further.... "Note the query is executed on a row-by-row basis. For performance with large amounts of data, you will want to employ set-based updates instead." Thank you August 4, 2011 3:03 PM
	D Sharma said: Experts, i have a question.I'm working on loading a Very large table having existing data of the order of 150 million records which will keep on growing by adding 1 million records on a daily basis.Few days back the ETL started failing even after running for 24 hrs.In the DFT, we have source query pulling 1 million records which is LOOKed UP against the Destination table having 150 million records to check for new records.It is failing as the LOOKUP cannot hold data for 150 million records.i have tried changing the LOOKUP to Merge Join without success.Can you please suggest alternative designs to load the data in the large table successfully.Moreover, there is no way i can reduce the size of destination table.i already have indexes on all required columns.Hope i'm clear in explaining the scenario. August 17, 2011 10:47 AM
	andyleonard said: Hi D, You can limit the number of rows used by the Lookup by using a SQL query as the Lookup source instead of the entire table. Hope this helps, Andy August 17, 2011 11:11 AM
	D Sharma said: Thanks for the reply Andy.I have already mentioned that it's not possible for me to actually reduce the size of the LOOKUP Query since i need to check existing rows which can be anywhere in the table.Something came to my mind just now on which i would like expert comments.I'm thinking about splitting the target table into parts with sql query and use it to join with source sql query to find possible newer records which would be joined with the various parts of the destination table one by one to get to the actual new records.Will try that at work tomorrow, just came to mind now.Would appreciate a lot if you can suggest any better alternative. August 17, 2011 12:03 PM
	andyleonard said: Hi D, I recommend identifying the rows in the large table before you reach the lookup, and staging the data you need to return from the lookup - along with the lookup-matching criteria - in another table. Truncate this table prior to loading it. Populate it. Then use it for the lookup operation. Kimball refers to this a "key staging". Hope this helps, :{> August 17, 2011 12:44 PM
	dilip said: Hi andy, I tried this article in BIDS SSIS 2008 R2 but every time it had to to update it wont do any update instead it inserts a new row. As given in above example when ColID =1 it needs to update ColA but instead it inserts new colA with same Colid=1 so it has now two rows with Colid=1 one with cola = b and other with cola =c:( any idea where i went wrong....i followed the same steps which have been given by you. Thanks, dilip August 19, 2011 3:44 PM
	Neha said: Hi Andy, Very nice article. I had another question though, how do you handle data deletes. I see that the new records are inserted and changed ones are updated. What about the records that were deleted? Thanks Neha September 20, 2011 10:05 PM
	Jessica said: Hi Andy, Awesome posts. I have one question not sure if this is the right place. I am working on a data warehouse load that has comprises of multiple SSIS packages and my challenge is to make it rerunnable. Each package calls a stored procedure which is rerunnable i am trying to add something like a Batchid to each run. any ideas on how this should be approached? thanks!!! September 27, 2011 4:23 PM
	andyleonard said: Hi Jessica, I wrote a post recently about designing an SSIS Framework. It can help. There are other potential gotchas with DW loads, so please hit the email link in the upper right of this page. We'll. Probably do more good taking this offline. :{> September 27, 2011 5:29 PM
	srinivasan M said: I am the beginner for SSIS packages . It is very useful Thanks a lot October 4, 2011 1:12 AM
	indy said: Hi Andy, I followed the article published today - level 4 http://www.sqlservercentral.com/articles/Stairway+Series/76390/. I know you are going to publish article to take care of deletes as well. But I need to implement this in my project by this week. Do you have deletes article handy? Also, I neeed to work with 40 + tables and the SSIS package should refresh the destination database every 8 hours. How do I manage 40+ tables and data resfresh every 8 hours. Can you please suggest me better solution to achieve this? October 12, 2011 1:32 PM
	andyleonard said: Hi Indy, Please email me at [email protected]. Thanks, Andy October 12, 2011 8:01 PM
	Amu said: This OleDB command will be slow for lakh number of records.. By loading the data into stage table and update it outside the dataflow using execute sql task is one option.... Any other option is available to improve the performance of the package. November 10, 2011 5:46 AM
	andyleonard said: Hi Amu, You are correct and thank you for pointing this out. I have written another series about SSIS Incremental Loads for SQLServerCentral.com. I cover your suggestion in Step 4 of the Stairway to Integration Services (http://www.sqlservercentral.com/articles/Stairway+Series/76390/). :{> November 10, 2011 9:16 AM
	malli said: is there any chance. If source file is deleted, it need to effect on destination file how can we use condition for that in conditional split, the above ssis package is gud ( i am working wth it ) but the thing is it cant handle the deleted tables can u help me with that November 10, 2011 12:25 PM
	malli said: Sorry i mean does it able to handel the deleted rows ( not columns sorry for that ) in source table it shouls effect the destination to by the way i am using sorce as oracle data base November 10, 2011 2:53 PM
	andyleonard said: Hi Mali, Check out Step 5 of the Stairway to Integration Services - it talks about Deletes:http://www.sqlservercentral.com/articles/Integration+Services+(SSIS)/76395/ Hope this helps, Andy November 10, 2011 3:17 PM
	Reddy said: Hi Andy, if a cloumn has null values condition split saying "The expression results must be Boolean for a Conditional Split" my error is [Conditional Split [127]] Error: The expression "(STATUS != L_STATUS) \|\| (ORDER != L_ORDER)"on "output "Update" (167)" evaluated to NULL, but the "component "Conditional Split" (127)" requires a Boolean results. Modify the error row disposition on the output to treat this result as False (Ignore Failure) or to redirect this row to the error output (Redirect Row). The expression results must be Boolean for a Conditional Split. A NULL expression result is an error. November 11, 2011 10:17 AM
	Malli said: Hi Andy, Thanks a lot for that, right now i am facing a problem that my records r some millions so its is geeting error as out of memory some thing so on is there any suggesion on that Thanks Malli November 11, 2011 5:23 PM
	Reddy said: I got it anyway thanks for your post which u have posted recentelly. Thanks, Reddy November 11, 2011 5:25 PM
	malli said: Hi Andy The error is Error: The system reports 89 percent memory load. There are 3477643264 bytes of physical memory with 357049128 bytes free. There are 2147352579 bytes of virtual memory with 97837956 bytes free. The paging file has 5452554249 bytes with 1323617112 bytes free. November 11, 2011 5:27 PM
	Hi Andy, said: I am attempting to incrementally update from a table with 170 million rows with 78 fields. The Lookup Task failed when I attempted to use it like you described above due to a Duplicate Key error...cache is full. However it is now extremely slow. Do you have any dvice on tweaking the lookup task for my environment? my source is from Oracle Is there a way to speed things up/replace the lookup task by using a SQL Execution Task which calls a left outer join? Thanks Again November 15, 2011 9:15 PM
	andyleonard said: You should consider using a query in the Lookup Transformation and not selecting the table name from a dropdown. Selecting the table name essentially attempts to load the entire table into RAM before the data flow executes. Limiting the rows and columns returned will shrink the data volume returned. You can also look into key-staging. There's mention of it here (http://msdn.microsoft.com/en-us/library/cc671624.aspx) in the section on Targeted Staging. Another pattern to consider is Range-based lookups (http://blogs.msdn.com/b/mattm/archive/2008/11/25/lookup-pattern-range-lookups.aspx). Hope this helps, Andy November 15, 2011 11:56 PM
	John said: Hi Andy, This article has been veryuseful as I have built a package that almost works... ;-) It actually works for 4 out of 5 tables. The 5th table actually has a 5 field combined primary key. So in the Conditional Split --New rows, I put the following: Isnull(Dest_field1) && Isnull(Dest_field2) && Isnull(Dest_field3) && Isnull(Dest_field4) && Isnull(Dest_field5) which are the 5 fields that make the primary key. and in the -- Changed Rows, I put the following (Field1 == Dest_field1) && (Field2 == Dest_field2) && (Field3 == Dest_field3) && (Field4 == Dest_field4) && (Field5 == Dest_field5) && (TimeStamp != Dest_TimeStamp) which to me is just simply logical. The problem I'm having is that I get a insert Primary key violation... for which I'm not too sure how to troubleshoot since I only know the basics. What do you think I'm doing wrong? December 21, 2011 10:19 AM
	andyleonard said: Hi John, You want to use the Lookup Transformation to manage mapping your primary key fields. Remember, the Columns tab of the Lookup is akin to setting the JOIN ON clause. If you were joining this table in T-SQL, you would include all five fields. Hope this helps, Andy December 21, 2011 10:39 AM
	John said: Hi again Andy, Thank you for the pointer... It actually helped me understand and fix my problem. I had all columns chosen for the join in the Lookup transformation. By just joining the keys and it helped me understand the conditional split conditions. All I need now is to learn more about debugging and error handling which is still a big blur for me... Thanks again! Great blog! :-) December 21, 2011 12:02 PM
	Kingdom said: Thank you very much AndyLeonard, this is very helpful. Many thanks February 1, 2012 9:15 AM
	Samit Shah said: Hi Andy, Could you please advise how to do incremental load for 300+ tables, as it would be difficult to create dataflow for all tables and I have the requirement to use SSIS along with logging of how many records are inserted/updated. Could please help me out on the approach to be taken through SSIS. Thanks and Regards, Samit Shah February 3, 2012 4:19 PM
	andyleonard said: Hi Sanity, A couple options come to mind. 1. Use. Net to generate and save the packages. 2. Use BIML ( http://agilebi.com/blog/tag/biml/) Hope this helps, Andy February 3, 2012 6:43 PM
	Samit Shah said: Hi Andy Thanks for the help. I was able to create the packages using BIMLScript. Thanks and Regards, Samit Shah February 9, 2012 6:11 PM
	Hui Shi said: hi andy , great article. However, I have two questions 1. When you detect new rows, you used left out join the source table to the target table and find all records where columns from right tables are null. That's fine but what about if source and target tables contains billion records? Is that applicable? I know the other alternative might be extracting the max(modifiled_date) from the target table and get data from source where date is greater than max(modified_date), what about if there are no such audit columns on the source and target tables? Thanks Hui March 19, 2012 2:24 PM
	Peter Schott said: Hui, In those cases, you are probably better off using some sort of max(Created/Modified Date) - with an index! If you're replicating that table, it could potentially be added as a computed column on the replicated side with the index created there. That would let you know what records are new/changed since the last run. Pull those values into some form of staging table and compare against that - it will likely perform better that way than doing a direct join. March 19, 2012 3:10 PM
	Hui Shi said: thanks peter. do you know any way to add audit columns as a computed column on the replicated side? In my case, our source table did not contain such audit columns and in order to add that one, seems I need to write a trigger so that a timestamp will be inserted to each record whenever there is a transaction? Do you have any suggestions? March 19, 2012 3:54 PM
	Peter Schott said: Hui, that's where you start running into issues. If you don't have something in your row that's tracking Created/Updated Dates, you'll need to add it and possibly a trigger as well. Created Date is pretty straightforward to default. Updated Date would need an AFTER UPDATE trigger to populate it. I think it could still be done on your replicated side, but in this case, you're better off adding it to your main source database. You'll likely want it at some point even if you don't think you'll need it. If the data is strictly INSERT, you can get the max PK value (or some other unique value) and run against that. You have the option of Service Broker as well, but that takes some work and probably more triggers to manage. If you're on SQL 2008 Enterprise, CDC may be an option. In our case, we have Created/Updated Dates (some NULL) and were able to create a computed column to show COALESCE(Updated, Created, '') as WarehouseLoadDate. We could then index that and find just the rows that were new/changed. From there we processed as appropriate. March 19, 2012 4:58 PM
	jim said: hi andy i have a project where i need to load date each time for only 2 month and every year delete the last month and than reload with the next month. i'm having trouble with that i would be great if you could help me out with some ideas thanks jim July 16, 2012 2:47 AM
	Eliana said: Hi andy Can I replace the you source DB by a Excel file? Thanks Eliana July 16, 2012 9:28 PM
	andyleonard said: Hi Eliana, Yep, but there are a limited number of SSIS data types available from Excel (8, if memory serves) and mapping them to SQL Server data types can be "tricky". :{> July 17, 2012 4:57 PM
	Eliana said: Hi Andy Yepp I got a lot of headaches when I try to format import excel files to DB using SSIS solutions . Do you know some trickies for that? Now I'm trying to testing your solution but using 2 tables (1 source and 1 destination table) in the same DB I'm imported the excel file to mytable_source (tmp) and I made a lookup and split from here to update and insert new records into mytable_dest. But I'm still having issues... Mytable_Source have the same structure like Mytable_Source, even the Increment ID, but I want to use other fields to determine if the records are new or update. But no records are inserted and changed Any idea? July 18, 2012 12:36 AM
	andyleonard said: Hi Eliana, You may want to take a look at another series of articles I wrote about Incremental Loads. It provides more detail and screenshots. You can find the series at http://www.sqlservercentral.com/stairway/72494/. Hope this helps, Andy July 18, 2012 12:55 AM
	Eliana said: Hi Andy, I've an issue and I don't know what I'm doing wrong!!! 1. I have a SOURCE table with records and an empty DEST table. 2. I want to use 2 fields for a lookup, OrderDate and OrderNo 3. I choose only these in available lookup columns (join) and I used DES_ in the Output alias 4. In the conditional Split I'm using the following contidions 4.1 New Rows --> (OrderDate != DEST_OrderDate && OrderNo != DEST_OrderNo) 4.2 Change Rows --> (OrderDate == DEST_OrderDate && OrderNo == DEST_OrderNo) but When I ran any new rows has been inserted what I'm doing wrong? Thanks Eliana July 24, 2012 10:05 PM
	andyleonard said: Hi Eliana, You detect new rows by checking for a NULL (using the SSIS IsNull function) on any column returned from the Destination in the Lookup transformation. You use the expression currently in your New Rows detection to detect Changed Rows. Hope this helps, :{> July 25, 2012 10:08 AM
	Eliana said: Thanks for the help, it's working now Regards, Eliana July 25, 2012 7:34 PM
	andyleonard said: Good job, Eliana! :{> July 25, 2012 8:07 PM
	Eliana said: Hi Andy, I hope you can help me with this new issue Now I have a solution perfectly running separately but. If I want to run all together that is stuck in the validation phase of my second task. My solution have 3 stages 1. Import excel file to my source table (53647 rows) 2. Insert/update my dest table, from source table (using lookup and conditional split) 3. truncate source table. First stage is ok Second is stuck in a SSIS.Pipeline: Validation phase is beginning. showing yellow color in the Data flow but doing nothing. I setup Delay Validation as a True in each DataFlow but this is ignore I guess. If I ran each Dataflow separatelly that ran oh and faster What I have to do to improve it? Thanks Eliana July 25, 2012 10:42 PM
	Sarika said: Hello Andy, Your article has been very helpful. Thanks, Sarika July 30, 2012 3:26 PM
	Yogesh M said: Great Artical...Thanks sir October 19, 2012 3:34 AM
	Ron said: hi Andy, You said: You can limit the number of rows used by the Lookup by using a SQL query as the Lookup source instead of the entire table. Is it possible to dynamically pass a value from the destination table to a where clause in such a query? So I can, for example, fine max(ID) in the destination, and only get rows from the source that are greater than that (meaning only new rows)? thanks February 11, 2013 8:07 PM
	andyleonard said: Hi Ron, In SSIS 2008, yes you can. The Lookup Query is exposed in the Expressions for the Data Flow containing the lookup transformation. Hope this helps, :{> February 22, 2013 1:09 PM
	srinivas said: Hi friends, in increment loading i have small doubt.please clarify me. here we are useing oledb command for updateding records. without useing oledb command is it possible or not to update records. that time is it possible to use execute sql how to we use in control flow level. i follow same query to write in execute sql task. and mapping paramers variable.i write query in execute sql task like update table set name=? , sal=? where id=? and i map parmerts mapping. but its no updated any record. i want achive same result useing execute sql task.plese tell me wha steps i need follw. March 24, 2013 10:27 AM
	rohit said: thanks i does not clear tooo much can you provide the video that would help me lot. thanks thanks thanks in advance with regards rohit [email protected] March 25, 2013 5:50 AM
	Daniel Macho said: I recommend you take a look to the Dimension Merge SCD component and the video set from youtube (specially the 6th one) http://dimensionmergescd.codeplex.com April 3, 2013 10:45 AM
	Akshay said: New to SSIS ? Andy Leonard is the name to remember. Article of July 2007 still getting appreciated & lauded , Updated in sqlservercentral to date. Thank you very much ANDY for sharing your knowledge. April 13, 2013 2:27 PM
	Denis Goch said: Andy, Thank you very much for this contribution. I searched a lot for something like this and none of the things I found, worked like yours. I was trying to import a XML file into a SQL Server 2008 and your example worked just perfectly. Thanks again. April 15, 2013 9:38 PM
	Austine said: Thank you so much, am a baby in this hence my question, can I use same to do my incremental load on my fact table? what if I don't want to use the conditional split but just want to insert , update in the destination? April 17, 2013 4:20 PM
	Pratik said: Hi andy . How can I go for Update and Delete with this SSIS package. thanks. May 30, 2013 1:03 PM
	Saint said: Thanks Andy, your article gave me a head on on the job. July 1, 2013 2:58 AM

转载于:https://www.cnblogs.com/ifreesoft/p/3304649.html

你可能感兴趣的:(SSIS 增量更新)

增量更新世道无情
1.概述在我们开发的过程中，对于版本更新，按照我们一般套路都是在app刚一打开的时候，直接从服务器中下载最新的版本，然后下载安装就行。但是如果你app新的版本比较大，20M、30M的话，如果让用户下载，可能会比较耗流量、耗时间，需要用户去等待，所以这个时候就出现了增量更新。2.增量更新原理图如下增量更新原理.png3.增量更新算法核心比如用户手中当前版本是1.0，服务器中是2.0，并且服务器中肯定
redis 高可用 ZyyIsPig redis redis 数据库
主从模式全量更新1.主从复制期间的写操作写入replicationbuffer，如何避免缓冲区耗尽内存影响redis的稳定性（1）对写操作限流，避免写操作过多耗尽内存（2）使用高可用方案，主节点有故障及时切换到从节点（3）全量复制和增量复制相结合，减少同步时间和带宽使用（4）如果增量更新从节点请求的offset不在主节点的环形缓冲区，就会全量更新分摊主服务器压力生成和传输rdb文件是耗时的，从服务
前端框架 - htmx zhaojjjjjj163 html
前端框架很多，但是很多时候只是想要一个简单的功能，如点击一个按钮，然后发送一个请求，然后更新页面的某个部分，这个时候，就不需要复杂的前端框架，只需要一个简单的工具就可以了，这个时候，htmx就是一个很好的选择。htmx的实现原理是通过AJAX、HTML5和WebSocket等技术，将前端和后端的交互方式从传统的请求-响应模式转变为增量更新模式，从而实现了无刷新、无跳转的动态页面更新。htmx如何使
ETL.NET 助力海量数据轻松处理 ChaITSimpleLove .NET Core 跨平台 etl etl.net 大数据 big data .net
ETL.NET助力海量数据轻松处理什么是ETL&EtlT？AboutETLAboutEtlT谈谈ETL作用ETL对企业的作用ETL对个人职业发展的作用ETL.NET介绍ETL.NET功能特点1、它包含SSIS的所有转换和功能2、开箱即用的功能如何使用ETL.NET?ETL.NET相关资源Paillave.EtlNet系列Nuget包Examples应用举例1、创建控制台项目2、添加依赖nuget包
微软BI实战：微软BI ETL工具安装及认识工作区小黎子数据分析
微软BI是一套完善、完全集成的BI技术。它由三大部分组成，它们分别是SSIS,SSAS,SSRS。然而我们要学习SSIS,SSAS，SSRS就需要下载安装微软BI原生态的ETL开发工具SSDT。SSDT全称MicrosoftSQLServerDataTools，它是微软的一款BI开发工具，用于生成SQLServer关系数据库、AzureSQL数据库、AnalysisServices(AS)数据模型
HCIP-5 dgw2648633809 网络
BGP:边界网关路由协议工作于AS之间；标准的EGP协议；AS：自治系统0-65535其中1-64511公有64512--65535私有EGP协议的特征：1、可控性2、可靠性3、AS-BY-ASBGP特点：无类别路径矢量-----距离的升级版---AS-BY--AS使用单播更新来发送所有信息；基于TCP179端口工作触发、增量更新具有丰富的属性来取代IGP中度量进行选路可以在进项和出项对流量实施强
高翔博士Faster-LIO论文和算法解析超爱吃小蛋糕的66 激光SLAM 算法自动驾驶 SLAM c++
说明题目：Faster-LIO：快速激光IMU里程计参考链接：Faster-LIO：快速激光IMU里程计iVox(Faster-Lio):智行者高博团队开源的增量式稀疏体素结构Faster-Lio是高翔博士在Fast系列的新作，对标基线是Fast-LIO2，核心是提出一种新的"数据结构"ivox，类似于Fast-LIO2中的ikd-tree，用于点云配准搜索和地图的增量更新。效果上整个LIO系统的
如何使用UUP从windows更新服务器下载windows10原版镜像百口可乐__ Windows windows
UUP是指Windows10中的一种更新技术，全称为UnifiedUpdatePlatform。UUP的目标是提供更快、更高效的更新体验，它通过增量更新的方式来更新操作系统，只下载和安装实际变化的部分，而不是整个更新包。这样可以节省带宽和时间，加快更新速度。UUP还支持多种设备类型，包括PC、手机、Hololens等。UUPdump网站https://www.uupdump.cn/选择window
idea插件SubversionMavenIncrement 根据svn记录生成war增量更新包蚕豆的生活 idea java 插件 intellij-idea java
文章目录概要下载方式使用条件和提供的功能1.使用条件2.提供的功能演示小结概要在现在的公司和上家公司的时候，会有一些项目，这些项目每次升级的时候都需要打war包，然后不能给整包，需要修改了哪些文件给哪些文件，这样就造成了每次打完包，查看svn提交记录，解压war包，按目录结构手动筛选出修改了的文件，在打成zip包，发给运维，这样文件少了还行，文件多了难免会有遗漏的文件，造成很多麻烦。在网上找类似的
【腾讯TMQ】APP省流量更新监控最佳实践腾讯移动品质中心TMQ 精准测试流量腾讯 app
一、前言移动分发市场竞争已进入炽热化，已不再是当年野蛮生长阶段。各大分发市场都在走精细化与差异化路线。其中，省流量更新(增量更新)成为提升用户体验，增加用户留驻粘性的一项重要指标。所谓增量更新是指app可以通过增量apk的方式进行更新，而不用每次都下载应用全量apk包，该技术可以大大提升app升级效率，提升用户体验。基于以上的背景和考量，应用宝测试团队，进行了增量更新监控专项，监控自己的增量更新能
React Native 增量更新白嫖服务器
插件"rn-app-upgrade":"^2.1.8"组件代码：importReact,{PureComponent}from"react";import{View,Image,Text,InteractionManager,ProgressViewIOS,ProgressBarAndroid,Platform,Modal,StyleSheet,Dimensions,ScrollView,Nati
关于 android 热更新技术 qyhua android
Android应用实现热更新（也称为动态更新或增量更新）是指在不通过GooglePlay或其他应用市场重新发布完整APK包的情况下，应用能够从远程服务器下载并安装部分代码或资源文件以修复bug或增加新功能。这一机制提高了用户体验和产品迭代速度。以下是一些常见的Android热更新方案及其基本实现步骤：基于插件化技术：插件化框架如DroidPlugin、VirtualApk等允许应用加载外部的dex
（三） `MaterializedMySQL`同步机制解读 ascarl2010 clickhouse clickhouse
当使用ClickHouse的MaterializedMySQL引擎进行全量同步时，它主要依赖于两个关键机制：初始全量数据导入和随后的增量更新。以下是这些机制的详细解释：初始全量数据导入读取现有数据:当您在ClickHouse中创建一个MaterializedMySQL类型的数据库时，ClickHouse首先连接到指定的MySQL数据库。它读取MySQL数据库中所有表的当前状态，包括所有行和列的数据
Android增量更新未聞椛洺
blog.csdn.net/dd864140130/article/details/52928419
Hudi cleaning Bonyin 数据湖大数据
核心概念hudi提供了很多项数据服务来管理表中的数据，其中有一项服务称之为Cleaner（数据清理服务）。随着用户向表中写入的数据越多，对于每一次的更新，hudi都会产生一个版本的数据文件保存更新后的记录（COPY_ON_WRITE）或者是将这些增量更新的数据文件写入日志文件以避免重写更新版本的数据文件（MERGE-ON_READ）。在这个情况下，随着更新频率的增加，数据版本文件无限增长。但如果不
说说React中setState和replaceState的区别？是个车迷 react.js javascript 前端
在React中，setState()和replaceState()是用于更新组件状态的两个方法。它们之间有一些区别。1)setState(newState)：setState()方法用于更新组件的状态。它接收一个新状态对象作为参数，并将新状态与当前状态合并。React会合并状态更新并自动触发组件的重新渲染。这意味着setState()是基于当前状态的增量更新方式。例如：2)replaceState
3D Web轻量引擎HOOPS Communicator如何实现对大模型的渲染支持？慧都科技3D hoops 3D模型轻量化 3D WEB轻量化三维模型格式转换 3D模型格式转换 web端可视化 3D技术
除了读取轻松外，HOOPSCommunicator对超大模型的支持效果也非常好，它可以支持30GB的包含70万个零件和3.5亿个三角面的Catia装配模型！那么它是如何来实现对大模型的支持呢？我们将从以下几个方面与大家分享：最低帧率控制、增量更新、截流等级、边界预览、内存限制以及破碎模式轻量化。HOOPS_HOOPS试用_3D软件开发工具_HOOPS中国区指定经销商_慧都科技-HOOPS_HOOP
Ajax实现用户登录 ~码中赤兔~ javaweb ajax 服务器前端
1.Ajax：定义Ajax即Asynchronous(异步的)JavascriptAndXML，使用Ajax技术网页应用能够快速地将增量更新呈现在用户界面上使用Ajax，使得前端和数据库实现交互，例如，在登录验证中，使得输入框旁边可以提示用户名是否存在2.为什么使用Ajax局部刷新，不重新加载网页的的情况下，对网页某个网页进行更新，提高性能。3.语法：$.ajax使用方法$.ajax({});中间
Ajax和Axios之间的关系 Peter447 web端 ajax javascript 前端 Axios xhr
先看看介绍：Ajax：Ajax即AsynchronousJavascriptAndXML（异步JavaScript和XML），是基于JQuery封装好的一种便捷的Web数据交互的技术，使用Ajax技术网页应用能够快速地将增量更新呈现在用户界面上，而不需要重载（刷新）整个页面。其重点在于异步这个功能上,理解了异步就理解了Ajax。Axios:Axios是一个基于promise的http库，作用于no
使用azure-data factory tq_theSuperMan azure datafactory azure microsoft
data-fatory介绍AzureDataFactory（简写ADF）是Azure的云ETL服务，简单的说，就是云上的SSIS。ADF是基于云的ETL，用于数据集成和数据转换，不需要代码，直接通过UI（code-freeUI）来设计，可进行直观监控和管理。用户还可以把现有的SSISpackages部署到Azure，并和ADF完全兼容地运行。ADF适用的场景在大数据的世界中，原始的、无结构的数据通
[Unity--热更新之增量更新介绍] 大倪姥狮 unity c#
目录前言增量更新实现步骤总结前言在热更新中，增量更新指的是在进行版本更新时，只下载新版本与旧版本不同的文件部分，而不需要重新下载整个资源包。这样可以减少下载时间和网络流量，提高用户的更新速度和体验增量更新实现步骤1.服务器端生成差异包：服务器会比较新版本和旧版本之间的差异，找出文件的新增、修改和删除部分，并生成差异包。差异包中只包含了这些变动的具体内容，而不是整个文件2.客户端下载差异包：用户的设
Makefile CaoMeng
前言：android的Android.mk就是一段段Makefile单元，很多第三方库直接提供makefile，需要能够大致的读懂makefile文件,如增量更新的bspath库提供的makefile就有错误，需要修改。另外虽说现在google推荐使用cmake，但是如果遇见Android.mk还是需要能够读懂。什么是Makefile无论是c、c++首先要把源文件编译成中间代码文件，在Window
GaussDB新特性物化视图介绍恒云联盟云数据库 GaussDB gaussdb oracle 数据库
物化视图介绍物化视图是相对普通视图而言的。普通视图是虚拟表，而物化视图实际上就是存储SQL执行语句的结果，可以直接使用数据而不用重复执行查询语句，起到缓存的效果。按照刷新方式物化视图分为全量物化视图和增量物化视图两种：全量物化视图：仅支持对已创建的物化视图进行全量更新，而不支持进行增量更新。创建全量物化视图语法和CREATETABLEAS语法类似。增量物化视图：可以对物化视图增量刷新，需要用户手动
RN用自己的服务器热更新 rocky_tt
不使用第三方，用自己的服务器更新。RN的热更新其实就是替换App加载的js包，更新分为全量更新和增量更新；全量更新很简单，就是下载js包存到本地，下次APP读取新的路径；增量更新需要用到bsdiff第三方工具，bsdiff有分离和合并2个方法；bsdiff用分离方法将新旧文件（jsbundle_orgin,jsbundle_new）生成一个差异文件（简称patch包），将patch包放在服务器上供
finebi-数据更新与定时发邮件采蘑菇的姑娘 fineBI finebi
一、数据更新1、单表的更新/定时更新：单表更新时，会触发所有选择了「跟随父表更新的」子表（即使用了该单表的自助数据集）一起更新。额外地，「单表更新」支持设置更新方式，可选择「全量更新」和「增量更新」。2、自助数据集的更新/定时更新：用户对自助数据集使用到的基础表进行定时更新，基础表的更新会触发自助数据集的更新，从而实现自助数据集的定时更新。3、业务包的更新/定时更新：业务包更新时，会触发该业务包下
Android通过NDK开发完成增量更新功能 Coder_Sven
1，增量更新在目前的大部分热门应用中（QQ、微信、抖音等）都包含了一个名称类似libbspatch.so的动态库，而且通过：nm-Dxx.so查看这些库中的符号内容都差不多，因此它们肯定是实现了同一件事情，也就是”增量更新“增量更新的原理，就是将手机上已安装apk与服务器端最新apk进行二进制对比，得到差分包(即两个版本的差异文件)，用户更新程序时，只需要下载差分包，并在本地使用差分包与已安装ap
利用 bsdiff 实现增量更新 tmacfrank #Framework Android android
一、概述bsdiff是一个差量更新算法，算法原理是尽可能多的利用old文件中已有的内容，尽可能少的加入新的内容来构建new文件。通常的做法是对old文件和new文件做子字符串匹配或使用hash技术，提取公共部分，将new文件中剩余的部分打包成patch包。在Patch阶段，用copying和insertion两个基本操作即可将old文件和patch包合成new文件（需要记录增加的内容及其在文件的偏
2020-04-01 bokli_dw
学习与认知能力sigmoid/tanh函数RBF径向基网络感知器没有hiddenlayer层。双阈值激活函数。image.png初始化-样本/期望值-计算-增量更新。感知机是双极函数。感知器是没有增益时候停止计算。现在是满足精度时候停止。[图片上传失败...(image-b8208d-1586349847413)]
前端遇上Go: 静态资源增量更新的新实践美团技术团队前端 Go 静态资源增量更新美团
为什么要做增量更新美团金融的业务在过去的一段时间里发展非常快速。在业务增长的同时，我们也注意到，很多用户的支付环境，其实是在弱网环境中的。大家知道，前端能够服务用户的前提是JavaScript和CSS等静态资源能够正确加载。如果网络环境恶劣，那么我们的静态资源尺寸越大，用户下载失败的概率就越高。根据我们的数据统计，我们的业务中有2%的用户流失与资源加载有关。因此每次更新的代价越小、加载成功率越高，
大数据数据仓库建设流程概述 000X000 数据中台数据仓库实战数据仓库 big data hadoop
数据仓库的逻辑分层架构：想看懂数据仓库的逻辑分层架构，必须先弄懂以下4大概念。数据源：数据来源，互联网公司的数据来源随着公司的规模扩张而呈递增趋势，同时自不同的业务源，比如埋点采集，客户上报，API等。ODS层：数据仓库源头系统的数据表通常会原封不动地存储一份，这称为ODS层,ODS层也经常会被称为准备区。这一层做的工作是贴源，而这些数据和源系统的数据是同构，一般对这些数据分为全量更新和增量更新，
Enum用法不懂事的小屁孩 enum
以前的时候知道enum，但是真心不怎么用，在实际开发中，经常会用到以下代码: protected final static String XJ = "XJ"; protected final static String YHK = "YHK"; protected final static String PQ = "PQ";
【Spark九十七】RDD API之aggregateByKey bit1129 spark
1. aggregateByKey的运行机制 /** * Aggregate the values of each key, using given combine functions and a neutral "zero value". * This function can return a different result type
hive创建表是报错： Specified key was too long; max key length is 767 bytes daizj hive
今天在hive客户端创建表时报错，具体操作如下 hive> create table test2(id string); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:javax.jdo.JDODataSto
Map 与 JavaBean之间的转换周凡杨 java 自省转换反射
最近项目里需要一个工具类，它的功能是传入一个Map后可以返回一个JavaBean对象。很喜欢写这样的Java服务，首先我想到的是要通过Java 的反射去实现匿名类的方法调用，这样才可以把Map里的值set 到JavaBean里。其实这里用Java的自省会更方便，下面两个方法就是一个通过反射，一个通过自省来实现本功能。 1：JavaBean类 1 &nb
java连接ftp下载 g21121 java
有的时候需要用到java连接ftp服务器下载，上传一些操作，下面写了一个小例子。 /** ftp服务器地址 */ private String ftpHost; /** ftp服务器用户名 */ private String ftpName; /** ftp服务器密码 */ private String ftpPass; /** ftp根目录 */ private String f
web报表工具FineReport使用中遇到的常见报错及解决办法（二）老A不折腾 finereport web报表 java报表总结
抛砖引玉，希望大家能把自己整理的问题及解决方法晾出来，Mark一下，利人利己。出现问题先搜一下文档上有没有，再看看度娘有没有，再看看论坛有没有。有报错要看日志。下面简单罗列下常见的问题，大多文档上都有提到的。 1、没有返回数据集：在存储过程中的操作语句之前加上set nocount on 或者在数据集exec调用存储过程的前面加上这句。当S
linux 系统cpu 内存等信息查看墙头上一根草 cpu 内存 liunx
1 查看CPU 　　1.1 查看CPU个数　　# cat /proc/cpuinfo | grep "physical id" | uniq | wc -l 　　2 　　**uniq命令：删除重复行;wc –l命令：统计行数** 　　1.2 查看CPU核数　　# cat /proc/cpuinfo | grep "cpu cores" | u
Spring中的AOP aijuans spring AOP
Spring中的AOP Written by Tony Jiang @ 2012-1-18 （转）何为AOP AOP，面向切面编程。在不改动代码的前提下，灵活的在现有代码的执行顺序前后，添加进新规机能。来一个简单的Sample: 目标类： [java] view plain copy print ? package&nb
placeholder(HTML 5) IE 兼容插件 alxw4616 JavaScript jquery jQuery插件
placeholder 这个属性被越来越频繁的使用. 但为做HTML 5 特性IE没能实现这东西. 以下的jQuery插件就是用来在IE上实现该属性的. /** * [placeholder(HTML 5) IE 实现.IE9以下通过测试.] * v 1.0 by oTwo 2014年7月31日 11:45:29 */ $.fn.placeholder = function
Object类,值域,泛型等总结(适合有基础的人看) 百合不是茶泛型的继承和通配符变量的值域 Object类转换
java的作用域在编程的时候经常会遇到,而我经常会搞不清楚这个问题,所以在家的这几天回忆一下过去不知道的每个小知识点变量的值域; package 基础; /** * 作用域的范围 * * @author Administrator * */ public class zuoyongyu { public static vo
JDK1.5 Condition接口 bijian1013 java thread Condition java多线程
Condition 将 Object 监视器方法（wait、notify和 notifyAll）分解成截然不同的对象，以便通过将这些对象与任意 Lock 实现组合使用，为每个对象提供多个等待 set （wait-set）。其中，Lock 替代了 synchronized 方法和语句的使用，Condition 替代了 Object 监视器方法的使用。条件（也称为条件队列或条件变量）为线程提供了一
开源中国OSC源创会记录 bijian1013 hadoop spark MemSQL
一.Strata+Hadoop World（SHW）大会是全世界最大的大数据大会之一。SHW大会为各种技术提供了深度交流的机会，还会看到最领先的大数据技术、最广泛的应用场景、最有趣的用例教学以及最全面的大数据行业和趋势探讨。二.Hadoop &nbs
【Java范型七】范型消除 bit1129 java
范型是Java1.5引入的语言特性，它是编译时的一个语法现象，也就是说，对于一个类，不管是范型类还是非范型类，编译得到的字节码是一样的，差别仅在于通过范型这种语法来进行编译时的类型检查，在运行时是没有范型或者类型参数这个说法的。范型跟反射刚好相反，反射是一种运行时行为，所以编译时不能访问的变量或者方法(比如private)，在运行时通过反射是可以访问的，也就是说，可见性也是一种编译时的行为，在
【Spark九十四】spark-sql工具的使用 bit1129 spark
spark-sql是Spark bin目录下的一个可执行脚本，它的目的是通过这个脚本执行Hive的命令，即原来通过 hive>输入的指令可以通过spark-sql>输入的指令来完成。 spark-sql可以使用内置的Hive metadata-store，也可以使用已经独立安装的Hive的metadata store 关于Hive build into Spark
js做的各种倒计时 ronin47 js 倒计时
第一种：精确到秒的javascript倒计时代码 HTML代码: <form name="form1"> <div align="center" align="middle"
java-37.有n 个长为m+1 的字符串，如果某个字符串的最后m 个字符与某个字符串的前m 个字符匹配，则两个字符串可以联接 bylijinnan java
public class MaxCatenate { /* * Q.37 有n 个长为m+1 的字符串，如果某个字符串的最后m 个字符与某个字符串的前m 个字符匹配，则两个字符串可以联接， * 问这n 个字符串最多可以连成一个多长的字符串，如果出现循环，则返回错误。 */ public static void main(String[] args){
mongoDB安装开窍的石头 mongodb安装基本操作
mongoDB的安装 1:mongoDB下载 https://www.mongodb.org/downloads 2:下载mongoDB下载后解压
[开源项目]引擎的关键意义 comsci 开源项目
一个系统，最核心的东西就是引擎。。。。。而要设计和制造出引擎，最关键的是要坚持。。。。。。现在最先进的引擎技术，也是从莱特兄弟那里出现的，但是中间一直没有断过研发的
软件度量的一些方法 cuiyadll 方法
软件度量的一些方法http://cuiyingfeng.blog.51cto.com/43841/6775/在前面我们已介绍了组成软件度量的几个方面。在这里我们将先给出关于这几个方面的一个纲要介绍。在后面我们还会作进一步具体的阐述。当我们不从高层次的概念级来看软件度量及其目标的时候，我们很容易把这些活动看成是不同而且毫不相干的。我们现在希望表明他们是怎样恰如其分地嵌入我们的框架的。也就是我们度量的
XSD中的targetNameSpace解释 darrenzhu xml namespace xsd targetnamespace
参考链接: http://blog.csdn.net/colin1014/article/details/357694 xsd文件中定义了一个targetNameSpace后，其内部定义的元素，属性，类型等都属于该targetNameSpace,其自身或外部xsd文件使用这些元素，属性等都必须从定义的targetNameSpace中找：例如：以下xsd文件，就出现了该错误，即便是在一
什么是RAID0、RAID1、RAID0+1、RAID5，等磁盘阵列模式? dcj3sjt126com raid
RAID 1又称为Mirror或Mirroring，它的宗旨是最大限度的保证用户数据的可用性和可修复性。 RAID 1的操作方式是把用户写入硬盘的数据百分之百地自动复制到另外一个硬盘上。由于对存储的数据进行百分之百的备份，在所有RAID级别中，RAID 1提供最高的数据安全保障。同样，由于数据的百分之百备份，备份数据占了总存储空间的一半，因而，Mirror的磁盘空间利用率低，存储成本高。 Mir
yii2 restful web服务快速入门 dcj3sjt126com PHP yii2
快速入门 Yii 提供了一整套用来简化实现 RESTful 风格的 Web Service 服务的 API。特别是，Yii 支持以下关于 RESTful 风格的 API：支持 Active Record 类的通用API的快速原型涉及的响应格式（在默认情况下支持 JSON 和 XML) 支持可选输出字段的定制对象序列化适当的格式的数据采集和验证错误
MongoDB查询(3)——内嵌文档查询（七） eksliang MongoDB查询内嵌文档 MongoDB查询内嵌数组
MongoDB查询内嵌文档转载请出自出处：http://eksliang.iteye.com/blog/2177301 一、概述有两种方法可以查询内嵌文档：查询整个文档；针对键值对进行查询。这两种方式是不同的，下面我通过例子进行分别说明。二、查询整个文档例如:有如下文档 db.emp.insert({ &qu
android4.4从系统图库无法加载图片的问题 gundumw100 android
典型的使用场景就是要设置一个头像，头像需要从系统图库或者拍照获得，在android4.4之前，我用的代码没问题，但是今天使用android4.4的时候突然发现不灵了。baidu了一圈，终于解决了。下面是解决方案： private String[] items = new String[] { "图库","拍照" }; /* 头像名称 */
网页特效大全 jQuery等 ini JavaScript jquery css html5 ini
HTML5和CSS3知识和特效 asp.net ajax jquery实例分享一个下雪的特效 jQuery倾斜的动画导航菜单选美大赛示例你会选谁 jQuery实现HTML5时钟功能强大的滚动播放插件JQ-Slide 万圣节快乐！！！向上弹出菜单jQuery插件 htm5视差动画 jquery将列表倒转顺序推荐一个jQuery分页插件 jquery animate
swift objc_setAssociatedObject block(version1.2 xcode6.4) 啸笑天 version
import UIKit class LSObjectWrapper: NSObject { let value: ((barButton: UIButton?) -> Void)? init(value: (barButton: UIButton?) -> Void) { self.value = value
Aegis 默认的 Xfire 绑定方式，将 XML 映射为 POJO MagicMa_007 java POJO xml Aegis xfire
Aegis 是一个默认的 Xfire 绑定方式，它将 XML 映射为 POJO, 支持代码先行的开发.你开发服务类与 POJO,它为你生成 XML schema/wsdl XML 和注解映射概览默认情况下，你的 POJO 类被是基于他们的名字与命名空间被序列化。如果
js get max value in (json) Array qiaolevip 每天进步一点点学习永无止境 max 纵观千象
// Max value in Array var arr = [1,2,3,5,3,2];Math.max.apply(null, arr); // 5 // Max value in Jaon Array var arr = [{"x":"8/11/2009","y":0.026572007},{"x"
XMLhttpRequest 请求 XML,JSON ,POJO 数据 Luob. POJO json Ajax xml XMLhttpREquest
在使用XMlhttpRequest对象发送请求和响应之前，必须首先使用javaScript对象创建一个XMLHttpRquest对象。 var xmlhttp； function getXMLHttpRequest(){ if(window.ActiveXObject){ xmlhttp:new ActiveXObject("Microsoft.XMLHTTP
jquery wuai jquery
以下防止文档在完全加载之前运行Jquery代码，否则会出现试图隐藏一个不存在的元素、获得未完全加载的图像的大小等等 $(document).ready(function(){ jquery代码; }); <script type="text/javascript" src="c:/scripts/jquery-1.4.2.min.js&quo

SSIS 增量更新

Andy Leonard

SSIS Design Pattern - Incremental Loads

Comment Notification

Comments

Alberto Ferrari said:

andyleonard said:

David R Buckingham said:

Bill Mo said:

david boston said:

andyleonard said:

saul said:

Steve Hall said:

andyleonard said:

Bill Mo said:

Bobby said:

Andy Leonard said:

Michael Ross said:

Michael said:

andyleonard said:

Michael said:

Andy Leonard said:

Jigs said:

andyleonard said:

Jai said:

Kenneth said:

andyleonard said:

EAD said:

LNelson said:

FDA said:

Rajesh said:

Ken ([email protected]) said:

andyleonard said:

RVS said:

Charlie Asbornsen said:

Charlie Asbornsen said:

andyleonard said:

Charlie Asbornsen said:

Charlie Asbornsen said:

Charlie Asbornsen said:

Charles Asbornsen said:

andyleonard said:

Charles Asbornsen said:

RVS said:

Charlie Asbornsen said:

Charlie Asbornsen said:

vidhya said:

Nostromo said:

DVL said:

CSu said:

hasan said:

Mike said:

Mandar said:

Ramdas said:

KK said:

Craig said:

andyleonard said:

jpedroalmeida said:

JohnnyReaction said:

Paul Klotka said:

Chhavi said:

AP said:

TheAviator said:

V said:

Dpostman said:

dbraver said:

Peter Schott said:

Samit Shah said:

andyleonard said:

Kal said:

Thiru-BI said:

Michael Baumanns said:

Romualdo said:

Lavanya said:

J Channin said:

D Sharma said:

andyleonard said:

D Sharma said:

andyleonard said:

dilip said: