OLAP 大表和小表并行hash join


一个表50MB 



一个表10GB 





50M表做驱动表,放在PGA里



这时候慢在对对 10g 的全表扫描





对10个G扫描块 需要开并行





我有这样一个算法 



一个进程 读 50mb 



8进程 来 扫描 10gb 



一个 进程扫描 1.25gb 





50MB 都分发到 8个进程



超大表和小表之间做HASH JOIN,一般会启用用并行,ORACLE在并行HASH JOIN的时候会用到很多技术,比如 HASH HASH, 或者BROADCAST



对于超大表和小表做HASH JOIN,一定要让小表进行广播(Broadcast),通常情况下CBO会选择正确,但是如果统计信息不准,或者基数计算错误CBO选择了 HASH HASH join,这个时



候就很慢,观察现象就是它在做direct path write temp,这个时候就可以用HINT PQ_DISTRIBUTE 进行调整



PQ_DISTRIBUTE(驱动表 None, Broadcast) 如果外层表很小(HASH_AJ),

这个时候可以用 PQ_DISTRIBUTE(驱动表 Broadcast,None)





下面就是一个具体的例子, F 是一个超大表 T 是一个小表 



SQL&get; explain plan for select /*+ parallel(f 8) parallel(t 8) use_hash(t,f) full(f) full(t) PQ_DISTRIBUTE(f HASH, HASH) */ *

  2    from crs_data_fct f

  3    JOIN crs_time_perd_fdim t ON t.TIME_PERD_ID = f.TIME_PERD_ID;



Explained.



Elapsed: 00:00:00.83

SQL&get; select * from table(dbms_xplan.display);



PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------------------------------------------------------------------------

Plan hash value: 353396990



---------------------------------------------------------------------------------------------------------------------------------------------------

| Id  | Operation                       | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |

---------------------------------------------------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT                |                    |   239M|   696G|  8371  (20)| 00:01:58 |       |       |        |      |            |

|   1 |  PX COORDINATOR                 |                    |       |       |            |          |       |       |        |      |            |

|   2 |   PX SEND QC (RANDOM)           | :TQ10002           |   239M|   696G|  8371  (20)| 00:01:58 |       |       |  Q1,02 | P-&get;S | QC (RAND)  |

|*  3 |    HASH JOIN BUFFERED           |                    |   239M|   696G|  8371  (20)| 00:01:58 |       |       |  Q1,02 | PCWP |            |

|   4 |     PX RECEIVE                  |                    | 15808 |  1636K|    10   (0)| 00:00:01 |       |       |  Q1,02 | PCWP |            |

|   5 |      PX SEND HASH               | :TQ10000           | 15808 |  1636K|    10   (0)| 00:00:01 |       |       |  Q1,00 | P-&get;P | HASH       |

|   6 |       PX BLOCK ITERATOR         |                    | 15808 |  1636K|    10   (0)| 00:00:01 |       |       |  Q1,00 | PCWC |            |

|   7 |        TABLE ACCESS STORAGE FULL| CRS_TIME_PERD_FDIM | 15808 |  1636K|    10   (0)| 00:00:01 |       |       |  Q1,00 | PCWP |            |

|   8 |     PX RECEIVE                  |                    |   239M|   673G|  8267  (19)| 00:01:56 |       |       |  Q1,02 | PCWP |            |

|   9 |      PX SEND HASH               | :TQ10001           |   239M|   673G|  8267  (19)| 00:01:56 |       |       |  Q1,01 | P-&get;P | HASH       |

|  10 |       PX PARTITION LIST ALL     |                    |   239M|   673G|  8267  (19)| 00:01:56 |     1 |   951 |  Q1,01 | PCWC |            |

|  11 |        TABLE ACCESS STORAGE FULL| CRS_DATA_FCT       |   239M|   673G|  8267  (19)| 00:01:56 |     1 |   951 |  Q1,01 | PCWP |            |

---------------------------------------------------------------------------------------------------------------------------------------------------



Predicate Information (identified by operation id):

---------------------------------------------------



   3 - access("T"."TIME_PERD_ID"="F"."TIME_PERD_ID")



Note

-----

   - dynamic sampling used for this statement (level=6)



27 rows selected.



SQL&get; explain plan for select /*+ parallel(f,8) parallel(t 8) use_hash(t,f) full(f) full(t) PQ_DISTRIBUTE(t None, Broadcast) */ *

  2    from crs_data_fct f

  3    JOIN crs_time_perd_fdim t ON t.TIME_PERD_ID = f.TIME_PERD_ID;



Explained.



SQL&get; select * from table(dbms_xplan.display);



PLAN_TABLE_OUTPUT

------------------------------------------------------------------------------------------------------------------------------------------------------

Plan hash value: 271674260



---------------------------------------------------------------------------------------------------------------------------------------------------

| Id  | Operation                       | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |

---------------------------------------------------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT                |                    |   239M|   696G|  8371  (20)| 00:01:58 |       |       |        |      |            |

|   1 |  PX COORDINATOR                 |                    |       |       |            |          |       |       |        |      |            |

|   2 |   PX SEND QC (RANDOM)           | :TQ10001           |   239M|   696G|  8371  (20)| 00:01:58 |       |       |  Q1,01 | P-&get;S | QC (RAND)  |

|*  3 |    HASH JOIN                    |                    |   239M|   696G|  8371  (20)| 00:01:58 |       |       |  Q1,01 | PCWP |            |

|   4 |     PX RECEIVE                  |                    | 15808 |  1636K|    10   (0)| 00:00:01 |       |       |  Q1,01 | PCWP |            |

|   5 |      PX SEND BROADCAST          | :TQ10000           | 15808 |  1636K|    10   (0)| 00:00:01 |       |       |  Q1,00 | P-&get;P | BROADCAST  |

|   6 |       PX BLOCK ITERATOR         |                    | 15808 |  1636K|    10   (0)| 00:00:01 |       |       |  Q1,00 | PCWC |            |

|   7 |        TABLE ACCESS STORAGE FULL| CRS_TIME_PERD_FDIM | 15808 |  1636K|    10   (0)| 00:00:01 |       |       |  Q1,00 | PCWP |            |

|   8 |     PX PARTITION LIST ALL       |                    |   239M|   673G|  8267  (19)| 00:01:56 |     1 |   951 |  Q1,01 | PCWC |            |

|   9 |      TABLE ACCESS STORAGE FULL  | CRS_DATA_FCT       |   239M|   673G|  8267  (19)| 00:01:56 |     1 |   951 |  Q1,01 | PCWP |            |

---------------------------------------------------------------------------------------------------------------------------------------------------



Predicate Information (identified by operation id):

---------------------------------------------------



   3 - access("T"."TIME_PERD_ID"="F"."TIME_PERD_ID")



Note

-----

   - dynamic sampling used for this statement (level=6)



25 rows selected.



hints:PQ_DISTRIBUTE(小表 None, Broadcast)








你可能感兴趣的:(hash join)