pg 提升子链接源码分析

pg 上拉子链接源码分析

此分析基于 version 13.8, 在pg 中把子查询分为两类,一类是from 中的子查询; 一类是where 中的子查询,叫做子链接(sublink)。

下面用sublink(子链接)指代where中的子查询, subquery(子查询) 指代from 中的子查询。

1. pull_up_sublinks

void pull_up_sublinks(PlannerInfo *root)
  • 被 subquery_planner 调用;也可能被pull_up_simple_subquery 调用,用于在 pull_up_subqueries 时处理子查询中的sublink。
  • 上拉子链接的入口
  • 调用 pull_up_sublinks_jointree_recurse 来递归的上拉子链接, 传入parse->jointree,返回处理后的jointree
  • 设置 root->parse->jointree 为处理后的 jointree (不为 FromExpr节点类型 则构建为FromExpr节点)

1.1 pull_up_sublinks_jointree_recurse

/*
 * Recurse through jointree nodes for pull_up_sublinks()
 *
 * In addition to returning the possibly-modified jointree node, we return
 * a relids set of the contained rels into *relids.
 */
static Node *
pull_up_sublinks_jointree_recurse(PlannerInfo *root, Node *jtnode,
								  Relids *relids)
{
	if (jtnode == NULL)
	{
		*relids = NULL;
	}
	else if (IsA(jtnode, RangeTblRef))
	{
		int			varno = ((RangeTblRef *) jtnode)->rtindex;

		*relids = bms_make_singleton(varno);
		/* jtnode is returned unmodified */
	}
	else if (IsA(jtnode, FromExpr))
	{
		FromExpr   *f = (FromExpr *) jtnode;
		List	   *newfromlist = NIL;
		Relids		frelids = NULL;
		FromExpr   *newf;
		Node	   *jtlink;
		ListCell   *l;

		/* First, recurse to process children and collect their relids */
		foreach(l, f->fromlist)
		{
			Node	   *newchild;
			Relids		childrelids;

			newchild = pull_up_sublinks_jointree_recurse(root,
														 lfirst(l),
														 &childrelids);
			newfromlist = lappend(newfromlist, newchild);
			frelids = bms_join(frelids, childrelids);
		}
		/* Build the replacement FromExpr; no quals yet */
		newf = makeFromExpr(newfromlist, NULL);
		/* Set up a link representing the rebuilt jointree */
		jtlink = (Node *) newf;
		/* Now process qual --- all children are available for use */
		newf->quals = pull_up_sublinks_qual_recurse(root, f->quals,
													&jtlink, frelids,
													NULL, NULL);

		/*
		 * Note that the result will be either newf, or a stack of JoinExprs
		 * with newf at the base.  We rely on subsequent optimization steps to
		 * flatten this and rearrange the joins as needed.
		 *
		 * Although we could include the pulled-up subqueries in the returned
		 * relids, there's no need since upper quals couldn't refer to their
		 * outputs anyway.
		 */
		*relids = frelids;
		jtnode = jtlink;
	}
	else if (IsA(jtnode, JoinExpr))
	{
		JoinExpr   *j;
		Relids		leftrelids;
		Relids		rightrelids;
		Node	   *jtlink;

		/*
		 * Make a modifiable copy of join node, but don't bother copying its
		 * subnodes (yet).
		 */
		j = (JoinExpr *) palloc(sizeof(JoinExpr));
		memcpy(j, jtnode, sizeof(JoinExpr));
		jtlink = (Node *) j;

		/* Recurse to process children and collect their relids */
		j->larg = pull_up_sublinks_jointree_recurse(root, j->larg,
													&leftrelids);
		j->rarg = pull_up_sublinks_jointree_recurse(root, j->rarg,
													&rightrelids);

		/*
		 * Now process qual, showing appropriate child relids as available,
		 * and attach any pulled-up jointree items at the right place. In the
		 * inner-join case we put new JoinExprs above the existing one (much
		 * as for a FromExpr-style join).  In outer-join cases the new
		 * JoinExprs must go into the nullable side of the outer join. The
		 * point of the available_rels machinations is to ensure that we only
		 * pull up quals for which that's okay.
		 *
		 * We don't expect to see any pre-existing JOIN_SEMI or JOIN_ANTI
		 * nodes here.
		 */
		switch (j->jointype)
		{
			case JOIN_INNER:
				j->quals = pull_up_sublinks_qual_recurse(root, j->quals,
														 &jtlink,
														 bms_union(leftrelids,
																   rightrelids),
														 NULL, NULL);
				break;
			case JOIN_LEFT:
				j->quals = pull_up_sublinks_qual_recurse(root, j->quals,
														 &j->rarg,
														 rightrelids,
														 NULL, NULL);
				break;
			case JOIN_FULL:
				/* can't do anything with full-join quals */
				break;
			case JOIN_RIGHT:
				j->quals = pull_up_sublinks_qual_recurse(root, j->quals,
														 &j->larg,
														 leftrelids,
														 NULL, NULL);
				break;
			default:
				elog(ERROR, "unrecognized join type: %d",
					 (int) j->jointype);
				break;
		}

		/*
		 * Although we could include the pulled-up subqueries in the returned
		 * relids, there's no need since upper quals couldn't refer to their
		 * outputs anyway.  But we *do* need to include the join's own rtindex
		 * because we haven't yet collapsed join alias variables, so upper
		 * levels would mistakenly think they couldn't use references to this
		 * join.
		 */
		*relids = bms_join(leftrelids, rightrelids);
		if (j->rtindex)
			*relids = bms_add_member(*relids, j->rtindex);
		jtnode = jtlink;
	}
	else
		elog(ERROR, "unrecognized node type: %d",
			 (int) nodeTag(jtnode));
	return jtnode;
}
  • 根据节点类型进行处理,返回修改后的新 jtnode 及 以此jtnode作为root涉及的 relids。
  • 第一次调用,节点类型必为 FromExpr。
  • 节点类型为 RangeTblRef – 叶子节点
    • 获取relids(bitmap), 返回
    • 可能为subquery 表, 这里不处理子查询中的sublink。 在调用pull_up_subqueries (subquery_planner)时处理
  • 节点类型为 FromExpr f=(FromExpr *) jtnode
    • 遍历 f->fromlist, 对于fromlist 中的每个成员递归调用 pull_up_sublinks_jointree_recurse处理(一直处理到叶子节点 RangeTblRef 才返回), 返回处理后的成员。根据处理后的结构构建 newfromlist 和 frelids (涉及的关系集合)
    • 根据 newfromlist 创建新的 FromExpr(newf)。
    • 调用 pull_up_sublinks_qual_recurse处理约束 f->quals(包含子链接,即实际在pull_up_sublinks_qual_recurse 中处理子链接) , 返回新的quals,设置到 newf->quals。
    • *relids = frelids; 返回涉及的relids
    • jtnode = jtlink; 返回新的jtnode
  • 节点类型为 JoinExpr j=jtnode actually copy
    • 对于join的左右子树(j->larg,j->rarg)递归调用 pull_up_sublinks_jointree_recurse处理, 并赋为新值。
    • 根据 j->jointype 处理 join 的关联条件(约束),都调用 pull_up_sublinks_qual_recurse, 传参不同
      • JOIN_INNER 内连接:传入 jtlink, bms_union(leftrelids,rightrelids)
      • JOIN_LEFT 左连接:传入 j->rarg,rightrelids, 只处理右表(nullable)相关子链接,并上拉子链接到rarg
      • JOIN_FULL 全连接, 不处理
      • JOIN_RIGHT 右连接:传入 j->larg,leftrelids, 只处理左表(nullable)相关子链接, 并上拉子链接到larg
    • 构造 relids(leftrelids, rightrelids,j->rtindex)返回
    • 返回新jtnode

1.2 pull_up_sublinks_qual_recurse

/*
 * Recurse through top-level qual nodes for pull_up_sublinks()
 *
 * jtlink1 points to the link in the jointree where any new JoinExprs should
 * be inserted if they reference available_rels1 (i.e., available_rels1
 * denotes the relations present underneath jtlink1).  Optionally, jtlink2 can
 * point to a second link where new JoinExprs should be inserted if they
 * reference available_rels2 (pass NULL for both those arguments if not used).
 * Note that SubLinks referencing both sets of variables cannot be optimized.
 * If we find multiple pull-up-able SubLinks, they'll get stacked onto jtlink1
 * and/or jtlink2 in the order we encounter them.  We rely on subsequent
 * optimization to rearrange the stack if appropriate.
 *
 * Returns the replacement qual node, or NULL if the qual should be removed.
 */
static Node *
pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node,
							  Node **jtlink1, Relids available_rels1,
							  Node **jtlink2, Relids available_rels2)
{
	if (node == NULL)
		return NULL;
	if (IsA(node, SubLink))
	{
		SubLink    *sublink = (SubLink *) node;
		JoinExpr   *j;
		Relids		child_rels;

		/* Is it a convertible ANY or EXISTS clause? */
		if (sublink->subLinkType == ANY_SUBLINK)
		{
			if ((j = convert_ANY_sublink_to_join(root, sublink,
												 available_rels1)) != NULL)
			{
				/* Yes; insert the new join node into the join tree */
				j->larg = *jtlink1;
				*jtlink1 = (Node *) j;
				/* Recursively process pulled-up jointree nodes */
				j->rarg = pull_up_sublinks_jointree_recurse(root,
															j->rarg,
															&child_rels);

				/*
				 * Now recursively process the pulled-up quals.  Any inserted
				 * joins can get stacked onto either j->larg or j->rarg,
				 * depending on which rels they reference.
				 */
				j->quals = pull_up_sublinks_qual_recurse(root,
														 j->quals,
														 &j->larg,
														 available_rels1,
														 &j->rarg,
														 child_rels);
				/* Return NULL representing constant TRUE */
				return NULL;
			}
			if (available_rels2 != NULL &&
				(j = convert_ANY_sublink_to_join(root, sublink,
												 available_rels2)) != NULL)
			{
				/* Yes; insert the new join node into the join tree */
				j->larg = *jtlink2;
				*jtlink2 = (Node *) j;
				/* Recursively process pulled-up jointree nodes */
				j->rarg = pull_up_sublinks_jointree_recurse(root,
															j->rarg,
															&child_rels);

				/*
				 * Now recursively process the pulled-up quals.  Any inserted
				 * joins can get stacked onto either j->larg or j->rarg,
				 * depending on which rels they reference.
				 */
				j->quals = pull_up_sublinks_qual_recurse(root,
														 j->quals,
														 &j->larg,
														 available_rels2,
														 &j->rarg,
														 child_rels);
				/* Return NULL representing constant TRUE */
				return NULL;
			}
		}
		else if (sublink->subLinkType == EXISTS_SUBLINK)
		{
			if ((j = convert_EXISTS_sublink_to_join(root, sublink, false,
													available_rels1)) != NULL)
			{
				/* Yes; insert the new join node into the join tree */
				j->larg = *jtlink1;
				*jtlink1 = (Node *) j;
				/* Recursively process pulled-up jointree nodes */
				j->rarg = pull_up_sublinks_jointree_recurse(root,
															j->rarg,
															&child_rels);

				/*
				 * Now recursively process the pulled-up quals.  Any inserted
				 * joins can get stacked onto either j->larg or j->rarg,
				 * depending on which rels they reference.
				 */
				j->quals = pull_up_sublinks_qual_recurse(root,
														 j->quals,
														 &j->larg,
														 available_rels1,
														 &j->rarg,
														 child_rels);
				/* Return NULL representing constant TRUE */
				return NULL;
			}
			if (available_rels2 != NULL &&
				(j = convert_EXISTS_sublink_to_join(root, sublink, false,
													available_rels2)) != NULL)
			{
				/* Yes; insert the new join node into the join tree */
				j->larg = *jtlink2;
				*jtlink2 = (Node *) j;
				/* Recursively process pulled-up jointree nodes */
				j->rarg = pull_up_sublinks_jointree_recurse(root,
															j->rarg,
															&child_rels);

				/*
				 * Now recursively process the pulled-up quals.  Any inserted
				 * joins can get stacked onto either j->larg or j->rarg,
				 * depending on which rels they reference.
				 */
				j->quals = pull_up_sublinks_qual_recurse(root,
														 j->quals,
														 &j->larg,
														 available_rels2,
														 &j->rarg,
														 child_rels);
				/* Return NULL representing constant TRUE */
				return NULL;
			}
		}
		/* Else return it unmodified */
		return node;
	}
	if (is_notclause(node))
	{
		/* If the immediate argument of NOT is EXISTS, try to convert */
		SubLink    *sublink = (SubLink *) get_notclausearg((Expr *) node);
		JoinExpr   *j;
		Relids		child_rels;
		
		if (sublink && IsA(sublink, SubLink))
		{
			if (sublink->subLinkType == EXISTS_SUBLINK)
			{
				if ((j = convert_EXISTS_sublink_to_join(root, sublink, true,
														available_rels1)) != NULL)
				{
					/* Yes; insert the new join node into the join tree */
					j->larg = *jtlink1;
					*jtlink1 = (Node *) j;
					/* Recursively process pulled-up jointree nodes */
					j->rarg = pull_up_sublinks_jointree_recurse(root,
																j->rarg,
																&child_rels);

					/*
					 * Now recursively process the pulled-up quals.  Because
					 * we are underneath a NOT, we can't pull up sublinks that
					 * reference the left-hand stuff, but it's still okay to
					 * pull up sublinks referencing j->rarg.
					 */
					j->quals = pull_up_sublinks_qual_recurse(root,
															 j->quals,
															 &j->rarg,
															 child_rels,
															 NULL, NULL);
					/* Return NULL representing constant TRUE */
					return NULL;
				}
				if (available_rels2 != NULL &&
					(j = convert_EXISTS_sublink_to_join(root, sublink, true,
														available_rels2)) != NULL)
				{
					/* Yes; insert the new join node into the join tree */
					j->larg = *jtlink2;
					*jtlink2 = (Node *) j;
					/* Recursively process pulled-up jointree nodes */
					j->rarg = pull_up_sublinks_jointree_recurse(root,
																j->rarg,
																&child_rels);

					/*
					 * Now recursively process the pulled-up quals.  Because
					 * we are underneath a NOT, we can't pull up sublinks that
					 * reference the left-hand stuff, but it's still okay to
					 * pull up sublinks referencing j->rarg.
					 */
					j->quals = pull_up_sublinks_qual_recurse(root,
															 j->quals,
															 &j->rarg,
															 child_rels,
															 NULL, NULL);
					/* Return NULL representing constant TRUE */
					return NULL;
				}
			}
		}
		/* Else return it unmodified */
		return node;
	}
	if (is_andclause(node))
	{
		/* Recurse into AND clause */
		List	   *newclauses = NIL;
		ListCell   *l;

		foreach(l, ((BoolExpr *) node)->args)
		{
			Node	   *oldclause = (Node *) lfirst(l);
			Node	   *newclause;

			newclause = pull_up_sublinks_qual_recurse(root,
													  oldclause,
													  jtlink1,
													  available_rels1,
													  jtlink2,
													  available_rels2);
			if (newclause)
				newclauses = lappend(newclauses, newclause);
		}
		/* We might have got back fewer clauses than we started with */
		if (newclauses == NIL)
			return NULL;
		else if (list_length(newclauses) == 1)
			return (Node *) linitial(newclauses);
		else
			return (Node *) make_andclause(newclauses);
	}
	/* Stop if not an AND */
	return node;
}
  • 分为三块来处理:1. sublink 2. not clause ; 3. and clause

  • 节点类型为 SubLink pull up any/some/in/exists

    • 只对 ANY_SUBLINK(包括any, some, in) 和 EXISTS_SUBLINK 类型进行处理

    • sublink->subLinkType 类型为 ANY_SUBLINK,则依次对available_rels1 和 available_rels2 调用 convert_ANY_sublink_to_join,若成功则构造 JoinExpr并返回。此时把 sublink 作为一个subquery给 pull up 为 join 的右表, 左表为入参的 jtlink1(fromexpr or JoinExpr)

    • 新JoinExpr的构造(以available_rels1为例):

      • convert_ANY_sublink_to_join 中会创建新的 JoinExpr, 并设置rarg 为sublink转化的subquery, quals设置为对于的约束。
      • 把 jtlink1 作为 larg , 并复制给jtlink1(替换jtlink1)。然后继续对上拉的j->rarg 调用pull_up_sublinks_jointree_recurse,尝试上拉rarg中的子链接。最后对上拉的quals,尝试继续上拉其中的表,与larg 或rarg 关联,构成新的join关系。
    • 若 sublink->subLinkType 为 EXISTS_SUBLINK, 则是调用 convert_EXISTS_sublink_to_join,直接上拉sublink 中的jointree( from clause), 其他逻辑同ANY_SUBLINK

  • 节点为 BoolExpr && NOT_EXPR is_notclause pull up not exists

    • 若为 not sublink 且type 为 EXISTS_SUBLINK,则进行处理。即只支持对not exists 进行pullup
    • 处理逻辑类似上面,只是调用convert_EXISTS_sublink_to_join时设置under_not 参数为true(表明使用antijoin),对上拉的quals调用pull_up_sublinks_qual_recurse 尝试上拉时,只支持上拉后与 rargs join。
  • 节点为 BoolExpr && AND_EXPR is_andclause

    • 遍历((BoolExpr *) node)->args, 对每个条件调用pull_up_sublinks_qual_recurse 处理。

1.2.1 convert_ANY_sublink_to_join

/*
 * convert_ANY_sublink_to_join: try to convert an ANY SubLink to a join
 *
 * The caller has found an ANY SubLink at the top level of one of the query's
 * qual clauses, but has not checked the properties of the SubLink further.
 * Decide whether it is appropriate to process this SubLink in join style.
 * If so, form a JoinExpr and return it.  Return NULL if the SubLink cannot
 * be converted to a join.
 *
 * The only non-obvious input parameter is available_rels: this is the set
 * of query rels that can safely be referenced in the sublink expression.
 * (We must restrict this to avoid changing the semantics when a sublink
 * is present in an outer join's ON qual.)  The conversion must fail if
 * the converted qual would reference any but these parent-query relids.
 *
 * On success, the returned JoinExpr has larg = NULL and rarg = the jointree
 * item representing the pulled-up subquery.  The caller must set larg to
 * represent the relation(s) on the lefthand side of the new join, and insert
 * the JoinExpr into the upper query's jointree at an appropriate place
 * (typically, where the lefthand relation(s) had been).  Note that the
 * passed-in SubLink must also be removed from its original position in the
 * query quals, since the quals of the returned JoinExpr replace it.
 * (Notionally, we replace the SubLink with a constant TRUE, then elide the
 * redundant constant from the qual.)
 *
 * On success, the caller is also responsible for recursively applying
 * pull_up_sublinks processing to the rarg and quals of the returned JoinExpr.
 * (On failure, there is no need to do anything, since pull_up_sublinks will
 * be applied when we recursively plan the sub-select.)
 *
 * Side effects of a successful conversion include adding the SubLink's
 * subselect to the query's rangetable, so that it can be referenced in
 * the JoinExpr's rarg.
 */
JoinExpr *
convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
							Relids available_rels)
{
	JoinExpr   *result;
	Query	   *parse = root->parse;
	Query	   *subselect = (Query *) sublink->subselect;
	Relids		upper_varnos;
	int			rtindex;
	ParseNamespaceItem *nsitem;
	RangeTblEntry *rte;
	RangeTblRef *rtr;
	List	   *subquery_vars;
	Node	   *quals;
	ParseState *pstate;

	Assert(sublink->subLinkType == ANY_SUBLINK);

	/*
	 * The sub-select must not refer to any Vars of the parent query. (Vars of
	 * higher levels should be okay, though.)
	 */
	if (contain_vars_of_level((Node *) subselect, 1))
		return NULL;

	/*
	 * The test expression must contain some Vars of the parent query, else
	 * it's not gonna be a join.  (Note that it won't have Vars referring to
	 * the subquery, rather Params.)
	 */
	upper_varnos = pull_varnos(root, sublink->testexpr);
	if (bms_is_empty(upper_varnos))
		return NULL;

	/*
	 * However, it can't refer to anything outside available_rels.
	 */
	if (!bms_is_subset(upper_varnos, available_rels))
		return NULL;

	/*
	 * The combining operators and left-hand expressions mustn't be volatile.
	 */
	if (contain_volatile_functions(sublink->testexpr))
		return NULL;

	/* Create a dummy ParseState for addRangeTableEntryForSubquery */
	pstate = make_parsestate(NULL);

	/*
	 * Okay, pull up the sub-select into upper range table.
	 *
	 * We rely here on the assumption that the outer query has no references
	 * to the inner (necessarily true, other than the Vars that we build
	 * below). Therefore this is a lot easier than what pull_up_subqueries has
	 * to go through.
	 */
	nsitem = addRangeTableEntryForSubquery(pstate,
										   subselect,
										   makeAlias("ANY_subquery", NIL),
										   false,
										   false);
	rte = nsitem->p_rte;
	parse->rtable = lappend(parse->rtable, rte);
	rtindex = list_length(parse->rtable);

	/*
	 * Form a RangeTblRef for the pulled-up sub-select.
	 */
	rtr = makeNode(RangeTblRef);
	rtr->rtindex = rtindex;

	/*
	 * Build a list of Vars representing the subselect outputs.
	 */
	subquery_vars = generate_subquery_vars(root,
										   subselect->targetList,
										   rtindex);

	/*
	 * Build the new join's qual expression, replacing Params with these Vars.
	 */
	quals = convert_testexpr(root, sublink->testexpr, subquery_vars);

	/*
	 * And finally, build the JoinExpr node.
	 */
	result = makeNode(JoinExpr);
	result->jointype = JOIN_SEMI;
	result->isNatural = false;
	result->larg = NULL;		/* caller must fill this in */
	result->rarg = (Node *) rtr;
	result->usingClause = NIL;
	result->quals = quals;
	result->alias = NULL;
	result->rtindex = 0;		/* we don't need an RTE for it */


	return result;
}
  • 转为 JOIN_SEMI
  • 把整个 subquery 作为rargs 提升,而不是提升subquery 中的表,提升后为表和subquery 进行semijoin。subquery 可以通过pull_up_subquery 继续提升。
    • 通过调用addRangeTableEntryForSubquery生成RangeTblEntry
  • ANY 类型子链接如果是相关子链接,即子查询中引用 了父查询的列属性,则不能提升。
  • ANY 类型子链接 sublink->testexpr(Opexpr arg1(left) arg2)没有引用上一层的列,不能提升,这种情况下上层的表和子链接中的表不能构成连接关系。if (bms_is_empty(upper_varnos))
  • 只能与指定的表 构成连接关系 , 即 sublink->testexpr 只能包含 available_rels 中的关系 if (!bms_is_subset(upper_varnos, available_rels))
  • sublink->testexpr 没有列属性也不能提升
  • sublink->testexpr 中不能包含volatile functions
  • 返回设置了rarg(新创建的RangeTblRef,指向新创建的rte(subquery)),quals(sublink->testexpr转换),jointype的 JoinExpr 对象

1.2.2 convert_EXISTS_sublink_to_join

/*
 * convert_EXISTS_sublink_to_join: try to convert an EXISTS SubLink to a join
 *
 * The API of this function is identical to convert_ANY_sublink_to_join's,
 * except that we also support the case where the caller has found NOT EXISTS,
 * so we need an additional input parameter "under_not".
 */
JoinExpr *
convert_EXISTS_sublink_to_join(PlannerInfo *root, SubLink *sublink,
							   bool under_not, Relids available_rels)
{
	JoinExpr   *result;
	Query	   *parse = root->parse;
	Query	   *subselect = (Query *) sublink->subselect;
	Node	   *whereClause;
	int			rtoffset;
	int			varno;
	Relids		clause_varnos;
	Relids		upper_varnos;

	Assert(sublink->subLinkType == EXISTS_SUBLINK);

	/*
	 * Can't flatten if it contains WITH.  (We could arrange to pull up the
	 * WITH into the parent query's cteList, but that risks changing the
	 * semantics, since a WITH ought to be executed once per associated query
	 * call.)  Note that convert_ANY_sublink_to_join doesn't have to reject
	 * this case, since it just produces a subquery RTE that doesn't have to
	 * get flattened into the parent query.
	 */
	if (subselect->cteList)
		return NULL;

	/*
	 * Copy the subquery so we can modify it safely (see comments in
	 * make_subplan).
	 */
	subselect = copyObject(subselect);

	/*
	 * See if the subquery can be simplified based on the knowledge that it's
	 * being used in EXISTS().  If we aren't able to get rid of its
	 * targetlist, we have to fail, because the pullup operation leaves us
	 * with noplace to evaluate the targetlist.
	 */
	if (!simplify_EXISTS_query(root, subselect))
		return NULL;

	/*
	 * Separate out the WHERE clause.  (We could theoretically also remove
	 * top-level plain JOIN/ON clauses, but it's probably not worth the
	 * trouble.)
	 */
	whereClause = subselect->jointree->quals;
	subselect->jointree->quals = NULL;

	/*
	 * The rest of the sub-select must not refer to any Vars of the parent
	 * query.  (Vars of higher levels should be okay, though.)
	 */
	if (contain_vars_of_level((Node *) subselect, 1))
		return NULL;

	/*
	 * On the other hand, the WHERE clause must contain some Vars of the
	 * parent query, else it's not gonna be a join.
	 */
	if (!contain_vars_of_level(whereClause, 1))
		return NULL;

	/*
	 * We don't risk optimizing if the WHERE clause is volatile, either.
	 */
	if (contain_volatile_functions(whereClause))
		return NULL;

	/*
	 * The subquery must have a nonempty jointree, but we can make it so.
	 */
	replace_empty_jointree(subselect);

	/*
	 * Prepare to pull up the sub-select into top range table.
	 *
	 * We rely here on the assumption that the outer query has no references
	 * to the inner (necessarily true). Therefore this is a lot easier than
	 * what pull_up_subqueries has to go through.
	 *
	 * In fact, it's even easier than what convert_ANY_sublink_to_join has to
	 * do.  The machinations of simplify_EXISTS_query ensured that there is
	 * nothing interesting in the subquery except an rtable and jointree, and
	 * even the jointree FromExpr no longer has quals.  So we can just append
	 * the rtable to our own and use the FromExpr in our jointree. But first,
	 * adjust all level-zero varnos in the subquery to account for the rtable
	 * merger.
	 */
	rtoffset = list_length(parse->rtable);
	OffsetVarNodes((Node *) subselect, rtoffset, 0);
	OffsetVarNodes(whereClause, rtoffset, 0);

	/*
	 * Upper-level vars in subquery will now be one level closer to their
	 * parent than before; in particular, anything that had been level 1
	 * becomes level zero.
	 */
	IncrementVarSublevelsUp((Node *) subselect, -1, 1);
	IncrementVarSublevelsUp(whereClause, -1, 1);

	/*
	 * Now that the WHERE clause is adjusted to match the parent query
	 * environment, we can easily identify all the level-zero rels it uses.
	 * The ones <= rtoffset belong to the upper query; the ones > rtoffset do
	 * not.
	 */
	clause_varnos = pull_varnos(root, whereClause);
	upper_varnos = NULL;
	while ((varno = bms_first_member(clause_varnos)) >= 0)
	{
		if (varno <= rtoffset)
			upper_varnos = bms_add_member(upper_varnos, varno);
	}
	bms_free(clause_varnos);
	Assert(!bms_is_empty(upper_varnos));

	/*
	 * Now that we've got the set of upper-level varnos, we can make the last
	 * check: only available_rels can be referenced.
	 */
	if (!bms_is_subset(upper_varnos, available_rels))
		return NULL;

	/* Now we can attach the modified subquery rtable to the parent */
	parse->rtable = list_concat(parse->rtable, subselect->rtable);

	/*
	 * And finally, build the JoinExpr node.
	 */
	result = makeNode(JoinExpr);
	result->jointype = under_not ? JOIN_ANTI : JOIN_SEMI;
	result->isNatural = false;
	result->larg = NULL;		/* caller must fill this in */
	/* flatten out the FromExpr node if it's useless */
	if (list_length(subselect->jointree->fromlist) == 1)
		result->rarg = (Node *) linitial(subselect->jointree->fromlist);
	else
		result->rarg = (Node *) subselect->jointree;
	result->usingClause = NIL;
	result->quals = whereClause;
	result->alias = NULL;
	result->rtindex = 0;		/* we don't need an RTE for it */

	/*LightDB add, 2022/2/9, for S202201136600, semijoin hints*/
	result->qb_name = ((Query *) sublink->subselect)->qb_name;
	
	return result;
}
  • JOIN_ANTI(not exists) 或JOIN_SEMI

  • 直接提升subquery 中的表, 提升后是表之间join

    	if (list_length(subselect->jointree->fromlist) == 1)
    		result->rarg = (Node *) linitial(subselect->jointree->fromlist);
    	else
    		result->rarg = (Node *) subselect->jointree;
    
  • 不能包含CTE

  • simplify_EXISTS_query(其他地方也会使用, modify cte 仍需判断)

    • 不能含有set operation(UNION/INTERSECT/EXCEPT), agg, group sets(group by grouping sets((brand,segment),(brand),(segment),())), 窗口函数, set-returning function(返回多行), modify ctes(有insert/update/delete), having 条件, offset (limit), rowmark(for update/share)

    • 对于limit 不为常量不支持, 为常量但不大于0也不支持pull up, 其他情况会把limit 优化掉(直接去掉,不影响语义),然后就可以pull up了.

    • 其他优化:

        query->targetList = NIL;
      	query->groupClause = NIL;
      	query->windowClause = NIL;
      	query->distinctClause = NIL;
      	query->sortClause = NIL;
      	query->hasDistinctOn = false;
      
  • 去除了whereclause(subselect->jointree->quals)的子链接中不能引用外层对象,The rest of the sub-select must not refer to any Vars of the parent query. (Vars of higher levels should be okay, though.)

  • whereclause(subselect->jointree->quals) 需要引用外层对象,即需要为相关子链接。非相关子链接不能pull up

  • whereclause(subselect->jointree->quals)中不能包含volatile 函数

  • 检测引用的上层的对象都在 available_rels 中, 不在则不支持提升。Now that we’ve got the set of upper-level varnos, we can make the last check: only available_rels can be referenced.

  • 返回设置了rarg(subselect->jointree->fromlist| subselect->jointree),quals(subselect->jointree->quals),jointype的 JoinExpr 对象

2. 总结

pg 支持对any 类型(包括in, any, some)的子链接提升,把子链接中的查询提升到from中作为一个子查询与表构成semijoin。此子查询可以被 pull up subquery 流程继续提升。

提升条件:

  • 子链接的查询中不能引用父查询中表的列属性 any (select * from t2 where )

    chuhx@postgres=# explain select * from t1 a where key1=any (select key1 from t2 where a.key1=t2.key1);
                                      QUERY PLAN                                   
    -------------------------------------------------------------------------------
     Seq Scan on t1 a  (cost=0.00..2671.15 rows=1130 width=8)
       Filter: (SubPlan 1)
       SubPlan 1
         ->  Index Only Scan using t2_pkey on t2  (cost=0.15..2.17 rows=1 width=4)
               Index Cond: (key1 = a.key1)
    (5 rows)
    
    chuhx@postgres=# 
    
    
  • 子链接的条件需要引用父查询中的表的列属性 t1.key1 = any

  • 子链接的条件引用的表需要符合要求available_rels,比如左连接,子链接的条件只能使用右表的字段才能提升

    chuhx@postgres=# explain select * from t1 a left join t3 c on a.key1=c.key1 and a.key2=any (select key1 from t2);
                                     QUERY PLAN                                  
    -----------------------------------------------------------------------------
     Merge Left Join  (cost=196.91..694.38 rows=12769 width=16)
       Merge Cond: (a.key1 = c.key1)
       Join Filter: (hashed SubPlan 1)
       ->  Index Scan using key1_i on t1 a  (cost=0.15..45.06 rows=2260 width=8)
       ->  Sort  (cost=158.51..164.16 rows=2260 width=8)
             Sort Key: c.key1
             ->  Seq Scan on t3 c  (cost=0.00..32.60 rows=2260 width=8)
       SubPlan 1
         ->  Seq Scan on t2  (cost=0.00..32.60 rows=2260 width=4)
    (9 rows)
    
    chuhx@postgres=# explain select * from t1 a left join t3 c on a.key1=c.key1 and c.key2=any (select key1 from t2);
                                      QUERY PLAN                                  
    ------------------------------------------------------------------------------
     Merge Left Join  (cost=225.45..659.07 rows=25538 width=16)
       Merge Cond: (a.key1 = c.key1)
       ->  Index Scan using key1_i on t1 a  (cost=0.15..45.06 rows=2260 width=8)
       ->  Sort  (cost=225.30..230.95 rows=2260 width=8)
             Sort Key: c.key1
             ->  Hash Join  (cost=60.85..99.39 rows=2260 width=8)
                   Hash Cond: (c.key2 = t2.key1)
                   ->  Seq Scan on t3 c  (cost=0.00..32.60 rows=2260 width=8)
                   ->  Hash  (cost=32.60..32.60 rows=2260 width=4)
                         ->  Seq Scan on t2  (cost=0.00..32.60 rows=2260 width=4)
    (10 rows)
    
  • 子链接的条件中不能使用volatile 函数, 如下使用了random

    chuhx@postgres=# explain select * from t1 a where key1+random()=any (select key1 from t2);
                              QUERY PLAN                          
    --------------------------------------------------------------
     Seq Scan on t1 a  (cost=38.25..99.10 rows=1130 width=8)
       Filter: (hashed SubPlan 1)
       SubPlan 1
         ->  Seq Scan on t2  (cost=0.00..32.60 rows=2260 width=4)
    (4 rows)
    
    chuhx@postgres=# explain select * from t1 a where key1=any (select key1 from t2);
                                QUERY PLAN                            
    ------------------------------------------------------------------
     Hash Join  (cost=60.85..99.39 rows=2260 width=8)
       Hash Cond: (a.key1 = t2.key1)
       ->  Seq Scan on t1 a  (cost=0.00..32.60 rows=2260 width=8)
       ->  Hash  (cost=32.60..32.60 rows=2260 width=4)
             ->  Seq Scan on t2  (cost=0.00..32.60 rows=2260 width=4)
    (5 rows)
    
    chuhx@postgres=# 
    

pg 支持对exists/not exists 类型的子链接提升,把子链接中的表提升到from中与表构成semijoin/antijoin。

  • 不能包含CTE

  • 不能含有set operation(UNION/INTERSECT/EXCEPT), agg, group sets(group by grouping sets((brand,segment),(brand),(segment),())), 窗口函数, set-returning function(返回多行), modify ctes(有insert/update/delete), having 条件, offset (limit), rowmark(for update/share)

  • 对于limit 不为常量不支持, 为常量但不大于0也不支持pull up, 其他情况会把limit 优化掉(直接去掉,不影响语义)

    chuhx@postgres=# explain select * from t1 a where exists (select * from t2 where a.key1=t2.key1 limit 1);
                                QUERY PLAN                            
    ------------------------------------------------------------------
     Hash Join  (cost=60.85..99.39 rows=2260 width=8)
       Hash Cond: (a.key1 = t2.key1)
       ->  Seq Scan on t1 a  (cost=0.00..32.60 rows=2260 width=8)
       ->  Hash  (cost=32.60..32.60 rows=2260 width=4)
             ->  Seq Scan on t2  (cost=0.00..32.60 rows=2260 width=4)
    (5 rows)
    
    chuhx@postgres=# explain select * from t1 a where exists (select * from t2 where a.key1=t2.key1 limit -1);
                                       QUERY PLAN                                   
    --------------------------------------------------------------------------------
     Seq Scan on t1 a  (cost=0.00..4942.45 rows=1130 width=8)
       Filter: (SubPlan 1)
       SubPlan 1
         ->  Limit  (cost=0.15..2.17 rows=1 width=8)
               ->  Index Scan using t2_pkey on t2  (cost=0.15..2.17 rows=1 width=8)
                     Index Cond: (key1 = a.key1)
    (6 rows)
    
  • 去除了whereclause(subselect->jointree->quals)的子链接中不能引用外层对象

    chuhx@postgres=# explain select * from t1 a where exists (select * from (select * from t2 where t2.key2=a.key1) x where a.key1=x.key1);
                                    QUERY PLAN                                
    --------------------------------------------------------------------------
     Seq Scan on t1 a  (cost=0.00..4948.10 rows=1130 width=8)
       Filter: (SubPlan 1)
       SubPlan 1
         ->  Index Scan using t2_pkey on t2  (cost=0.15..2.18 rows=1 width=0)
               Index Cond: (key1 = a.key1)
               Filter: (key2 = a.key1)
    (6 rows)
    
  • whereclause(subselect->jointree->quals) 需要引用外层对象,即需要为相关子链接

    chuhx@postgres=# explain select * from t1 a where exists (select * from t2 where key1>10);
                              QUERY PLAN                          
    --------------------------------------------------------------
     Result  (cost=0.05..32.65 rows=2260 width=8)
       One-Time Filter: $0
       InitPlan 1 (returns $0)
         ->  Seq Scan on t2  (cost=0.00..38.25 rows=753 width=0)
               Filter: (key1 > 10)
       ->  Seq Scan on t1 a  (cost=0.05..32.65 rows=2260 width=8)
    (6 rows)
    
  • whereclause(subselect->jointree->quals)中不能包含volatile 函数

  • 检测引用的上层的对象都在 available_rels 中, 不在则不支持提升

你可能感兴趣的:(lightdb,数据库,postgresql,sublink)