Dan Linstedt
Beyond this - the performance problems / issues may lie in the database - partitioning tables, dropping / re-creating indexes, striping raid arrays, etc... Without such a large set of results to deal with, you're average timings will be skewed by other users on the database, processes on the server, or network traffic. This seems to be an ideal test size set for producing mostly accurate averages.
Try tuning your maps with these steps first. Then move to tuning the session, iterate this sequence until you are happy, or cannot achieve better performance by continued efforts. If the performance is still not acceptable,. then the architecture must be tuned (which can mean changes to what maps are created). In this case, you can contact us - we tune the architecture and the whole system from top to bottom.
KEEP THIS IN MIND: In order to achieve optimal performance, it's always a good idea to strike abalance between the tools, the database, and the hardware resources. Allow each to do what they do best. Varying the architecture can make a huge difference in speed and optimization possibilities.
1. Utilize a database (like Oracle / Sybase / Informix / DB2 etc...) for significant data handling operations (such as sorts, groups, aggregates). In other words, staging tables can be a huge benefit to parallelism of operations. In parallel design - simply defined by mathematics, nearly always cuts your execution time. Staging tables have many benefits. Please see the staging table discussion in the methodologies section for full details.
3. If you can - localize all target tables, stored procedures, functions, views, sequences in the SOURCE database. Again, try not to connect across synonyms. Synonyms (remote database tables) could potentially affect performance by as much as a factor of 3 times or more.
5. Remember that Informatica suggests that each session takes roughly 1 to1/2 CPU's. In keeping with this - Informatica play's well with RDBMS engines on the same machine, but does NOT get along (performance wise) with ANY other engine (reporting engine, java engine, OLAP engine, java virtual machine, etc...).
7. TURN OFF VERBOSE LOGGING. The session log has a tremendous impact on the overall performance of the map. Force over-ride in the session, setting it to NORMAL logging mode. Unfortunately the logging mechanism is not "parallel" in the internal core, it is embedded directly
8. Turn off 'collect performance statistics'. This also has an impact - although minimal at times – it writes a series of performance data to the performance log. Removing this operation reduces reliance on the flat file operations. However, it may be necessary to have this turned on DURING your tuning exercise. It can reveal a lot about the speed of the reader, and writer threads.
sessions have a demonstrated problems with constraint based loads) 2) Move the flat file to local internal disk (if at all possible). Try not to read a file across the network, or from a RAID device. Most RAID array's are fast, but Informatica seems to top out, where internal disk continues to be much faster. Here - a link will NOT work to increase speed - it must be the full file itself – stored locally.
11. Separate complex maps - try to break the maps out in to logical threaded sections of processing. Re-arrange the architecture if necessary to allow for parallel processing. There may be more smaller components doing individual tasks, however the throughput will be proportionate to the degree of parallelism that is applied. A discussion on HOW to perform this task is posted on the methodologies page, please see this discussion for further details.
utilize the DBMS for what it was built for: reading/writing/sorting/grouping/filtering data
source feeds, etc... The balancing act is difficult without DBA knowledge. In order to achieve
are best in Informatica. This does not degrade from the use of the ETL tool, rather it enhances it
13. TUNE the DATABASE. Don't be afraid to estimate: small, medium, large, and extra large
throughput for each, turnaround time for load, is it a trickle feed? Give this information to your
expected to be high read/high write, which operations will sort, (order by), etc... Moving disks,
script to generate "fake" data for small, medium, large, and extra large data sets. Run each of
load size occurs.
enough disk space could potentially slow down your entire server during processing (in an
Otherwise you may not get a good picture of the space available during operation. Particularly if
JOINER object with heterogeneous sources.
closely to understand how the resources are being utilized, and where the hot spots are. Try to
in to EMC's disk storage array - while expensive, it appears to be extremely fast, I've heard (but
16. SESSION SETTINGS. In the session, there is only so much tuning you can do. Balancing
feel for what needs to be set in the session - or what needs to be changed in the database. Read
achieve is: OPTIMAL READ, OPTIMIAL THROUGHPUT, OPTIMAL WRITE. Over-tuning
write throughput is governed by your read and transformation speed, likewise, your read
problematic map, is to break it in to components for testing: 1) Read Throughput, tune for the reader, see what the settings are, send the write output to a flat file for less contention - Check the
a factor of 64k each shot - ignore the warning above 128k. If the Reader still appears to increase
Session Memory from 12MB to 24MB. If the reader still stabilizes, then you have a slow source,
continues to climb above where it stabilized, make note of the session settings. Check the
attempting to tune the reader here, and don't want the writer threads to slow you down. Change
much the reader slows down, it's optimal performance was reached with a flat file(s). This time -
then you've got some basic map tuning to do. Try to merge expression objects, set your lookups
aggregation, or lookups being performed. Etc... If you have a slow writer, change the map to a
of the original map, and break down the copies. Once the "slower" of the N targets is discovered,
etc... There are many database things you can do here.
or Data Warehouse itself. PMServer plays well with RDBMS (relational database management
Servers, Security Servers, application, and Report servers. All of these items should be broken
<FONT face=""">out to other machines. This is critical to improving performance on the PMServer machine.