Experience from WWPRT
1. via the results on an uncorrelated inner query to reduce the joint scale
Scenario,
Matched product -- product id not the same, but offeringname, geography and variantname is the same.
Now suppose that we have more than 1,000,000 products in DB.
And now we need to find out all matched products
I. The bad performance one, which will take more than 11mins to complete
select ov1.id, ov2.id, ov1.offeringname, vm1.variantname, vm1.geo from wwprt.offering_variant ov1 inner join wwprt.variant v1 on v1.id = ov1.variantid and v1.hasmap = 'Y' inner join wwprt.variant_map vm1 on vm1.variantid = v1.id inner join wwprt.offering_variant ov2 on ov2.offeringname = ov1.offeringname and ov2.id <> ov1.id inner join wwprt.variant v2 on v2.id = ov2.variantid and v2.hasmap = 'Y' inner join wwprt.variant_map vm2 on vm2.variantid = v2.id and vm2.variantname = vm1.variantname and vm2.geo = vm1.geo order by ov1.offeringname, vm1.variantname, vm1.geo, ov1.id, ov2.id WITH UR
Performance analysis,
a. ov1 will go through entire products
b. ov2 will go through entire products
c. entire products joined with entire products, then from the joint results, to find out our matched products.
Key: entire products joined with entire products, we will got huge number of the joint results.....
II. The good performance one, it only elpased 35 secs around
Idea, The implementation tries to quickly qualify offering variants via the results on an uncorrelated inner query
select ov1.id as ov1_id, ov2.id as ov2_id, ov1.offeringname, vm1.variantname, vm1.geo from wwprt.offering_variant ov1 inner join wwprt.variant v1 on v1.id = ov1.variantid and v1.hasmap = 'Y' inner join wwprt.variant_map vm1 on vm1.variantid = v1.id inner join wwprt.offering_variant ov2 on ov2.offeringname = ov1.offeringname and ov2.id <> ov1.id inner join wwprt.variant v2 on v2.id = ov2.variantid and v2.hasmap = 'Y' inner join wwprt.variant_map vm2 on vm2.variantid = v2.id and vm2.variantname = vm1.variantname and vm2.geo = vm1.geo where (ov1.offeringname, vm1.variantname, vm1.geo) in ( select ov3.offeringname, vm3.variantname, vm3.geo from wwprt.offering_variant ov3 inner join wwprt.variant v3 on v3.id = ov3.variantid and v3.hasmap = 'Y' inner join wwprt.variant_map vm3 on vm3.variantid = v3.id group by ov3.offeringname, vm3.variantname, vm3.geo having count(*) > 1)order by ov1.offeringname, vm1.variantname, vm1.geo, ov1_id, ov2_id with UR;
Performance Analysis
a. the scale of ov1 has been reduced, the ov1 only go through the results from the inner sub sql
select ov3.offeringname, vm3.variantname, vm3.geo from wwprt.offering_variant ov3 inner join wwprt.variant v3 on v3.id = ov3.variantid and v3.hasmap = 'Y' inner join wwprt.variant_map vm3 on vm3.variantid = v3.id group by ov3.offeringname, vm3.variantname, vm3.geo having count(*) > 1
b. ov2 will go through all prouduts
c. A very small part of products joined with entired products, then we will got a very small joint results to process
2. Using left join to replace of using not exists()
Reason:
Tim: Use of 'NOT EXISTS' is discouraged. It is usually several times faster to implement with an outer join. The NOT EXISTS sub query is executed at a row level so that, internal to the DB, it is issuing this query once for each row to be evaluated.
I. The SQL with not exists()
Explanation - before insert into PRODUCT_CTRY_JOIN_CN, first we need to avoid Unique Constraint error if the data already exists.
insert into WWPRT.PRODUCT_CTRY_JOIN_CN (PRODUCTID, COUNTRY, ANNDOCNO, DELETED, SYSOWNER ) select D.PROCESSINGPRODID, prodcty.country, prodcty.anndocno, prodcty.deleted, prodcty.sysowner from wwprt.product_ctry_join_cn prodcty inner join DUPLICATED_PRODUCT_MSG D on prodcty.productid = D.PRODUCTID and prodcty.country = D.country where // Avoid the unique constraint error if prodid and country already existed not exists( select 1 from wwprt.product_ctry_join_cn iprodcty where iprodcty.productid = D.PROCESSINGPRODID and iprodcty.country = D.country )
II.The SQL replaced with Left Join()
insert into WWPRT.PRODUCT_CTRY_JOIN_CN (PRODUCTID, COUNTRY, ANNDOCNO, DELETED, SYSOWNER ) select D.PROCESSINGPRODID, prodcty1.productid, prodcty1.country, prodcty1.anndocno, prodcty1.deleted, prodcty1.sysowner from wwprt.product_ctry_join_cn prodcty1 inner join DUPLICATED_PRODUCT_MSG D on prodcty1.productid = D.PRODUCTID and prodcty1.country = D.country // Avoid the unique constraint error if prodid and country already existed left join wwprt.product_ctry_join_cn prodcty2 on prodcty2.productid = D.PROCESSINGPRODID and prodcty2.country = D.country where prodcty2.productid is null and prodcty2.country is null