TIPS: Informatica

  1. Install sever locally, domain name can set blank.

     

  2. Do not install the Metadata Reporter with the Informatica Client or Informatics Server. Metadata Reporter installation is tightly integrated with the installation of a suitable web server.

     

  3. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data.

     

  4. The Informatica repository resides on a relational database. The repository database tables contain the instructions required to extract, transform, and load data.

     

  5. Mappings: A set of source and target definitions along with transformations containing business logic that you build into the transformation. These are the instructions that the Informatica Server uses to transform and move data.

     

  6. Mapplets: A set of transformations that you can use in multiple mappings.

     

  7. Sessions and workflows store information about how and when the Informatica Server moves data. 

     

  8. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. -------------more "get started" p36

     

  9. A session is a type of task that you can put in a workflow. Each session corresponds to a single mapping. -------------more "get started" p36

     

  10. The goal of the design process is to create mappings that depict the flow of data between sources and targets, including changes made to the data before it reaches the targets.        

     

  11. Use Designer to create source and destination database definitions and save in the Repository by importing  from the database using ODBC or DB connection; Use Workflow Manager to set DB connection to the source and destination database and save the connection information to the Repository.

     

  12. After you install and configure the Informatica Client and the Informatica Server, you can register the Informatica Server with the repository that you indicated in the Informatica Server Configuration. You must register the Informatica Server before you can start it.

     

  13. Name Convention: Detail in P168 Get Started

     

  14. Source Qualifier: Every mapping includes a Source Qualifier transformation, representing all data read from source and temporarily stored by the informatica Server.

     

  15. Difference between router and filter transformation P110 Get Started

     

  16. Create Output Transformation (available in Mapplet Designer only) is Output in mapplet design

     

  17. You can use the following expression to calculate the total commissions of employees who exceeded their quarterly quota: SUM( COMMISSION, COMMISSION > QUOTA

     

  18. You can use non-aggregate expressions in group by ports to modify or replace groups. For example, if you want to replace AAA battery before grouping, you can create a new group by output port, named CORRECTED_ITEM, using the following expression:  IIF( ITEM = AAA battery, battery, ITEM )

     

  19. When you group values, the Informatica Server produces one row for each group. If you do not group values, the Informatica Server returns one row for all input rows. ***The Informatica Server typically returns the last row of each group (or the last row received) with the result of the aggregation.

     

  20. If you use a Filter transformation in the mapping, place the transformation before the Aggregator transformation to reduce unnecessary aggregation

     

  21. To filter out rows containing null values or spaces, use the ISNULL and IS_SPACES functions to test the value of the port. For example, if you want to filter out rows that contain NULLs in the FIRST_NAME port, use the following condition: IIF(ISNULL(FIRST_NAME),FALSE,TRUE)

     

  22. You cannot use a Joiner transformation in the following situations:

    Both input pipelines originate from the same Source Qualifier transformation.

    Both input pipelines originate from the same Normalizer transformation.

    Both input pipelines originate from the same Joiner transformation.

    Either input pipeline contains an Update Strategy transformation.

    You connect a Sequence Generator transformation directly before the Joiner transformation.

     

  23. You can designate only one Rank port in a Rank transformation.

     

  24. The Designer deletes the default group when you delete the last user-defined group from the list.

     

  25. If you want the same generated value to go to more than one target that receives data from a single preceding transformation, you can connect a Sequence Generator to that preceding transformation. This allows the Informatica Server to pass unique values to the transformation, then route rows from the transformation to targets.

     

  26. If you connect the CURRVAL port without connecting the NEXTVAL port, the Informatica Server passes a constant value for each row.

     

  27. If you select Reset for a non-reusable Sequence Generator, the Informatica Server generates values based on the original current value each time it starts the session. Otherwise, the Informatica Server updates the current value to reflect the last-generated value plus one, and then uses the updated value the next time it uses the Sequence Generator.

     

  28. The Informatica Server supports two kinds of outer joins:

    Left join - Informatica Server returns all rows for the table to the left of the join syntax and the rows from both tables that meet the join condition.

    Right join - Informatica Server returns all rows for the table to the right of the join syntax and the rows from both tables that meet the join condition.

     

  29. Each group in an XML definition is analogous to a relational table, and the Designer treats each group within the XML Source Qualifier as a separate source of data.

     

  30. Ports of two groups in one XML Source Qualifier cannot link to ports in one transformation

     

  31. The following expression evaluates to NULL: 8 * 10 - NULL

     

  32. When using an unconnected stored procedure in an expression, you need a method of returning the value of output parameters to a port. You have two options for capturing these values:

    +Assign the output value to a local variable.

    +Assign the output value to the system variable PROC_RESULT.

    By using PROC_RESULT, you assign the value of the return parameter directly to an output port, which can be applied directly to a target. You can also combine these two options by assigning one output parameter as PROC_RESULT, and the other parameters as variables.

     

  33. Informatica optimize:

    multi-source qualifier->single one;

    filter before aggregate->sorted before aggregate

    increase the number of partitions in a pipeline -> improve session performance

     

  34. A partition is a pipeline stage that executes in a single reader, transformation, or writer thread. By default, the Informatica Server defines a single partition in the source pipeline. If you use PowerCenter, you can increase the number of partitions. This increases the number of processing threads, which can improve session performance

     

  35. Slowly Changing Dimensions-SCD

    type1 SCD: If you want to load an updated row of previously existed row the previous data will be replaced. So we lose historical data.

    type2 SCD: Here we will add a new row for updated data. So we have both current and past records, which agrees with the concept of data warehousing maintaining historical data.

    type3 SCD: Here we will add new columns (add a dimension).

     

  36. By Definition, Active transformation is the transformation that changes the number of rows that pass through it...in union transformation the number of rows resulting from union can be (are) different from the actual number of rows. union is active becoz it eliminates duplicates from the sources

     

  37. Use sorter transformation ,select distinct option  ,duplicate rows will be eliminated. Use an aggregator transformation and use group by clause to eliminate the duplicate in columns.

     

  38. The Partitioning Option increases Power Center's performance through parallel data processing, and this option provides a thread-based architecture and automatic data partitioning that optimizes parallel processing on multiprocessor and grid-based hardware environments.

     

  39. You can override the sql Query in Workflow Manager. Like

     

    select * from tab_name where rownum<=1000

     

    minus

     

    select * from tab_name where rownum<=500;

     

  40. pmcmd命令

    可以对Workflow调度做控制;停止Informatica ETL Server等,具体使用格式如下。

    Usage: help [command]

    Usage: version

    Usage: pingserver

    Usage: getserverproperties

    Usage: shutdownserver <-complete|-stop|-abort>

    Usage: getserverdetails [-all|-running|-scheduled]

    Usage: getrunningsessionsdetails

    Usage: startworkflow [<-folder|-f> folder]

    [<-startfrom> taskInstancePath [<-recovery>]]

    [<-paramfile> paramfile] [-wait|-nowait] workflow

    举例: pmcmd startworkflow -s scottyxd:4001 -u Administrator -p Administrator -f tdbu wf_ods_employee

    Usage: stopworkflow [<-folder|-f> folder] [-wait|-nowait] workflow

    Usage: abortworkflow [<-folder|-f> folder] [-wait|-nowait] workflow

    Usage: waitworkflow [<-folder|-f> folder] workflow

    Usage: resumeworkflow [<-folder|-f> folder] [-wait|-nowait] [<-recovery>]

    workflow

    Usage: scheduleworkflow [<-folder|-f> folder] workflow

    Usage: unscheduleworkflow [<-folder|-f> folder] workflow

    Usage: getworkflowdetails [<-folder|-f> folder] workflow

    Usage: starttask [<-folder|-f> folder] <-workflow|-w> workflow

    [<-paramfile> paramfile] [-wait|-nowait] [<-recovery>]

    taskInstancePath

    Usage: stoptask [<-folder|-f> folder] <-workflow|-w> workflow [-wait|-nowait]

    taskInstancePath

    Usage: aborttask [<-folder|-f> folder] <-workflow|-w> workflow [-wait|-nowait]

    taskInstancePath

    Usage: waittask [<-folder|-f> folder] <-workflow|-w> workflow taskInstancePath

    Usage: resumeworklet [<-folder|-f> folder] <-workflow|-w> workflow

    [-wait|-nowait] [<-recovery>] taskInstancePath

    Usage: gettaskdetails [<-folder|-f> folder] <-workflow|-w> workflow

    taskInstancePath

    Usage: getsessionstatistics [<-folder|-f> folder] <-workflow|-w> workflow

    taskInstancePath

    Usage: connect <-serveraddr|-s> [host:]portno

    <<-user|-u> username|<-uservar|-uv> userEnvVar>

    <<-password|-p> password|<-passwordvar|-pv> passwordEnvVar>

     

    Usage: disconnect

    Usage: setwait

    Usage: setnowait

    Usage: unsetfolder

    Usage: setfolder folder

    Usage: showsettings

    Usage: exit

    (Stop: In this case data query from source databases is stopped immediately but whatever data has been loaded into buffer, there transformations and loading contunes.Abort: Same as Stop but in this case maximum time allowed for buffered data is 60 Seconds)

     

    pmcmd startworkflow -uv USERNAME -pv PASSWORD -s SALES:6258 -f east -w wSalesAvg -paramfile '/$PMRootDir/myfile.txt'

     

  41. A workflow includes many tasks(type: command, email, session)

     

  42. When you compared both basically connected lookup will return more values and unconnected returns one value. connect lookup is in the same pipeline of source and it will accept dynamic caching. Unconn lookup don't have that facility but in some special cases we can use Unconnected. if o/p of one lookup is going as i/p of another lookup this unconnected lookups are favorable.

     

  43. Number of blocks=0.9*( DTM buffer size/block size)*no. of partitions.

    here Number of blocks=(source+targets)*2

     

  44. Difference between source qualifier & joiner        

    It is to possible to join the two or more tables by using source qualifier. But provided the tables should have relationship.

    joiner transformation is used to join n (n>1) tables from same or different databases ,but source qualifier transformation is used to join only n tables from same database.

     

  45. Data warehouse Top down

    ODS-->ETL-->Data warehouse-->Data mart-->OLAP

    Data warehouse Bottom up

    ODS-->ETL-->Data mart-->Data warehouse-->OLAP

     

  46. A junk dimension is a used for constrain query purpose based on text and flag values.

    Some times a few dimensions discarded in a major dimensions , That time we kept in to one place the all discarded dimensions that is called junk dimensions.

  47. Data Move Module:

  • Normal Load: Normal load will write information to the database log file so that if any recovery is needed it is will be helpful. when the source file is a text file and loading data to a table, in such cases we should you normal load only, else the session will be failed.
  • Bulk Mode: Bulk load will not write information to the database log file so that if any recovery is needed we can't do any thing in such cases. In this case, Bulk load is pretty faster than normal load.  
  1. Reusable transformations can be used in multiple mappings. When you need to incorporate this transformation into mapping, you  add an instance of it to mapping. Later if you change the definition of the transformation ,all instances of it inherit the changes. Since the instance of reusable transformations is a pointer to that transforamtion, you  can change the transformations in the transformation developer, its instances automatically reflect these changes. This feature can save you  great deal of work.

     

  2. The aggregator stores data in the aggregate cache until it completes aggregate calculations. When u run a session that uses an aggregator transformation, the Informatica server creates index and data caches in memory to process the transformation. If the Informatica server requires more space,it stores overflow values in cache files

     

  3. For each Joiner transformation, the Informatica Server reads all the master rows before it reads the first detail row. For each Joiner transformation, the Informatica Server produces output rows as soon as it reads the first detail row.

     

  4. Within a session. When you configure a session, you can instruct the Informatica Server to either treat all rows in the same way (for example, treat all rows as inserts), or use instructions coded into the session mapping to flag rows for different database operations.

    Within a mapping. Within a mapping, you use the Update Strategy transformation to flag rows for insert, delete, update, or reject.

     

  5. Maplet consists of set of transformations that is reusable. A reusable transformation is a single transformation that can be reusable. If u create a variables or parameters in maplet that can not be used in another mapping or maplet. Unlike the variables that are created in a reusable transformation can be

    useful in any other mapping or maplet. We can not include source definitions in reusable transformations. But we can add sources to a maplet. Whole transformation logic will be hided in case of maplet. But it is transparent in case of reusable transformation. We cant use COBOL source qualifier, joiner, normalizer transformations in maplet. Where as we can make them as a reusable transformations.

     

  6. Grouping of session is known as batch. Two types:

    Sequential: Runs sessions one after the other

    Concurrent: Runs session at same time.

     

  7. In case of stored procedure transformation procedure will be compiled and executed in a relational data source. You need data base connection to import the stored procedure in to  your mapping. Where as in external procedure transformation procedure or function will be executed out side of data source. You need to make it as a DLL to access in u r mapping. No need to have data base connection in case of external procedure transformation.

     

  8. A standalone session is a session that is not nested in a batch. If a standalone session fails, you can run recovery using a menu command or pmcmd. These options are

    not available for batched sessions.

    To recover sessions using the menu:

    1. In the Server Manager, highlight the session you want to recover.

    2. Select Server Requests-Stop from the menu.

    3. With the failed session highlighted, select Server Requests-Start Session in Recovery Mode from the menu.

    To recover sessions using pmcmd:

    1.From the command line, stop the session.

    2. From the command line, start recovery.

     

  9. The values of dimension which is stored in fact table is called degenerate dimensions. these dimensions doesn't have its own dimensions.

     

  10. Just run the session in time stamp mode then automatically session log will not overwrite current session log. We can do this way also. using $PMSessionlogcount(specify the number of runs of the session log to save)

     

     

     

     

     

     

你可能感兴趣的:(TIPS,DW)