cuichao1900

PostgreSQL 源码解读（70）- 查询语句#55（make_one_rel函数#20-...

本节大体介绍了动态规划算法实现(standard_join_search)中的join_search_one_level->make_join_rel->populate_joinrel_with_paths->add_paths_to_joinrel函数中的hash_inner_and_outer函数，该函数尝试构造hash join访问路径。

一、数据结构

Cost相关
注意:实际使用的参数值通过系统配置文件定义,而不是这里的常量定义!

 typedef double Cost; /* execution cost (in page-access units) */

 /* defaults for costsize.c's Cost parameters */
 /* NB: cost-estimation code should use the variables, not these constants! */
 /* 注意:实际值通过系统配置文件定义,而不是这里的常量定义! */
 /* If you change these, update backend/utils/misc/postgresql.sample.conf */
 #define DEFAULT_SEQ_PAGE_COST  1.0       //顺序扫描page的成本
 #define DEFAULT_RANDOM_PAGE_COST  4.0      //随机扫描page的成本
 #define DEFAULT_CPU_TUPLE_COST  0.01     //处理一个元组的CPU成本
 #define DEFAULT_CPU_INDEX_TUPLE_COST 0.005   //处理一个索引元组的CPU成本
 #define DEFAULT_CPU_OPERATOR_COST  0.0025    //执行一次操作或函数的CPU成本
 #define DEFAULT_PARALLEL_TUPLE_COST 0.1    //并行执行,从一个worker传输一个元组到另一个worker的成本
 #define DEFAULT_PARALLEL_SETUP_COST  1000.0  //构建并行执行环境的成本
 
 #define DEFAULT_EFFECTIVE_CACHE_SIZE  524288    /*先前已有介绍, measured in pages */

 double      seq_page_cost = DEFAULT_SEQ_PAGE_COST;
 double      random_page_cost = DEFAULT_RANDOM_PAGE_COST;
 double      cpu_tuple_cost = DEFAULT_CPU_TUPLE_COST;
 double      cpu_index_tuple_cost = DEFAULT_CPU_INDEX_TUPLE_COST;
 double      cpu_operator_cost = DEFAULT_CPU_OPERATOR_COST;
 double      parallel_tuple_cost = DEFAULT_PARALLEL_TUPLE_COST;
 double      parallel_setup_cost = DEFAULT_PARALLEL_SETUP_COST;
 
 int         effective_cache_size = DEFAULT_EFFECTIVE_CACHE_SIZE;
 Cost        disable_cost = 1.0e10;//1后面10个0,通过设置一个巨大的成本,让优化器自动放弃此路径
 
 int         max_parallel_workers_per_gather = 2;//每次gather使用的worker数

二、源码解读

hash join的算法实现伪代码如下:
Step 1
FOR small_table_row IN (SELECT * FROM small_table)
LOOP
slot := HASH(small_table_row.join_key);
INSERT_HASH_TABLE(slot,small_table_row);
END LOOP;

Step 2
FOR large_table_row IN (SELECT * FROM large_table) LOOP
slot := HASH(large_table_row.join_key);
small_table_row = LOOKUP_HASH_TABLE(slot,large_table_row.join_key);
IF small_table_row FOUND THEN
output small_table_row + large_table_row;
END IF;
END LOOP;

hash_inner_and_outer
该函数创建hash join访问路径。


//------------------------------------------------ hash_inner_and_outer

/*
 * hash_inner_and_outer
 *    Create hashjoin join paths by explicitly hashing both the outer and
 *    inner keys of each available hash clause.
 *    通过显式对外表和内表(应用每个可用的hash条件)进行hash操作,创建hash join访问路径
 *
 * 'joinrel' is the join relation
 * 'outerrel' is the outer join relation
 * 'innerrel' is the inner join relation
 * 'jointype' is the type of join to do
 * 'extra' contains additional input values
 */
static void
hash_inner_and_outer(PlannerInfo *root,
                     RelOptInfo *joinrel,
                     RelOptInfo *outerrel,
                     RelOptInfo *innerrel,
                     JoinType jointype,
                     JoinPathExtraData *extra)
{
    JoinType    save_jointype = jointype;
    bool        isouterjoin = IS_OUTER_JOIN(jointype);
    List       *hashclauses;
    ListCell   *l;

    /*
     * We need to build only one hashclauses list for any given pair of outer
     * and inner relations; all of the hashable clauses will be used as keys.
     * 只需要为给定的外表和内表对构建一个hashclauses条件链表;所有的hashable子句将用作hash键。
     *
     * Scan the join's restrictinfo list to find hashjoinable clauses that are
     * usable with this pair of sub-relations.
     * 扫描连接的约束条件restrictinfo链表，找到可用于这对子关系的hash连接hashjoinable子句。
     */
    hashclauses = NIL;
    foreach(l, extra->restrictlist)
    {
        RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);

        /*
         * If processing an outer join, only use its own join clauses for
         * hashing.  For inner joins we need not be so picky.
         * 如果处理外连接，则仅使用其自己的连接子句进行哈希操作。对于内连接，则无需如此操作。
         */
        if (isouterjoin && RINFO_IS_PUSHED_DOWN(restrictinfo, joinrel->relids))
            continue;

        if (!restrictinfo->can_join ||
            restrictinfo->hashjoinoperator == InvalidOid)
            continue;           /* 不能被hash.not hashjoinable */

        /*
         * Check if clause has the form "outer op inner" or "inner op outer".
         * 检查条件是否有形如outer op inner或者inner op outer的形式
         */
        if (!clause_sides_match_join(restrictinfo, outerrel, innerrel))
            continue;           /* no good for these input relations */

        hashclauses = lappend(hashclauses, restrictinfo);//加入到hash条件中
    }

    /* If we found any usable hashclauses, make paths */
    //如发现可用于hash连接的条件,则构建hash连接访问路径,如无则无法构建
    if (hashclauses)
    {
        /*
         * We consider both the cheapest-total-cost and cheapest-startup-cost
         * outer paths.  There's no need to consider any but the
         * cheapest-total-cost inner path, however.
         * 外表:既考虑了成本最低的总成本，也考虑了外表启动成本最低的访问路径。
         * 内表:除了成本最低的内部路径之外，不需要考虑任何其他路径。
         */
        Path       *cheapest_startup_outer = outerrel->cheapest_startup_path;
        Path       *cheapest_total_outer = outerrel->cheapest_total_path;
        Path       *cheapest_total_inner = innerrel->cheapest_total_path;

        /*
         * If either cheapest-total path is parameterized by the other rel, we
         * can't use a hashjoin.  (There's no use looking for alternative
         * input paths, since these should already be the least-parameterized
         * available paths.)
         * 如果其中一个关系参数化了其中一个成本最低的访问路径，那么不能使用hash join。
         * (没有必要寻找替代的输入路径，因为这些路径应该已经是参数化最少的可用路径了。)
         */
        if (PATH_PARAM_BY_REL(cheapest_total_outer, innerrel) ||
            PATH_PARAM_BY_REL(cheapest_total_inner, outerrel))
            return;//直接退出

        /* Unique-ify if need be; we ignore parameterized possibilities */
        //如果需要保证唯一性,丢弃参数化
        if (jointype == JOIN_UNIQUE_OUTER)
        {
            cheapest_total_outer = (Path *)
                create_unique_path(root, outerrel,
                                   cheapest_total_outer, extra->sjinfo);
            Assert(cheapest_total_outer);
            jointype = JOIN_INNER;
            try_hashjoin_path(root,
                              joinrel,
                              cheapest_total_outer,
                              cheapest_total_inner,
                              hashclauses,
                              jointype,
                              extra);
            /* no possibility of cheap startup here */
        }
        else if (jointype == JOIN_UNIQUE_INNER)
        {
            cheapest_total_inner = (Path *)
                create_unique_path(root, innerrel,
                                   cheapest_total_inner, extra->sjinfo);
            Assert(cheapest_total_inner);
            jointype = JOIN_INNER;
            try_hashjoin_path(root,
                              joinrel,
                              cheapest_total_outer,
                              cheapest_total_inner,
                              hashclauses,
                              jointype,
                              extra);
            if (cheapest_startup_outer != NULL &&
                cheapest_startup_outer != cheapest_total_outer)
                try_hashjoin_path(root,
                                  joinrel,
                                  cheapest_startup_outer,
                                  cheapest_total_inner,
                                  hashclauses,
                                  jointype,
                                  extra);
        }
        else//其他连接类型
        {
            /*
             * For other jointypes, we consider the cheapest startup outer
             * together with the cheapest total inner, and then consider
             * pairings of cheapest-total paths including parameterized ones.
             * There is no use in generating parameterized paths on the basis
             * of possibly cheap startup cost, so this is sufficient.
             * 对于其他连接类型，我们考虑成本最低的的外表启动和内表启动访问路径，
             * 然后考虑包括参数化路径在内的成本最低的访问路径对。
             * 在基于可能较低的启动成本的基础上生成参数化路径是没有用的，上面的做法就足够了。
             */
            ListCell   *lc1;
            ListCell   *lc2;

            if (cheapest_startup_outer != NULL)//启动成本最低的外表访问路径
                try_hashjoin_path(root,
                                  joinrel,
                                  cheapest_startup_outer,
                                  cheapest_total_inner,
                                  hashclauses,
                                  jointype,
                                  extra);//构建hash join访问路径

            foreach(lc1, outerrel->cheapest_parameterized_paths)//遍历外表参数化路径
            {
                Path       *outerpath = (Path *) lfirst(lc1);

                /*
                 * We cannot use an outer path that is parameterized by the
                 * inner rel.
                 * 不能使用被内表参数化使用的外表访问路径
                 */
                if (PATH_PARAM_BY_REL(outerpath, innerrel))
                    continue;

                foreach(lc2, innerrel->cheapest_parameterized_paths)//遍历内表参数化路径
                {
                    Path       *innerpath = (Path *) lfirst(lc2);

                    /*
                     * We cannot use an inner path that is parameterized by
                     * the outer rel, either.
                     * 同样的,不能使用被外表参数化使用的内表访问路径
                     */
                    if (PATH_PARAM_BY_REL(innerpath, outerrel))
                        continue;

                    if (outerpath == cheapest_startup_outer &&
                        innerpath == cheapest_total_inner)
                        continue;   /* already tried it */

                    try_hashjoin_path(root,
                                      joinrel,
                                      outerpath,
                                      innerpath,
                                      hashclauses,
                                      jointype,
                                      extra);//构建hash连接访问路径
                }
            }
        }

        /*
         * If the joinrel is parallel-safe, we may be able to consider a
         * partial hash join.  However, we can't handle JOIN_UNIQUE_OUTER,
         * because the outer path will be partial, and therefore we won't be
         * able to properly guarantee uniqueness.  Similarly, we can't handle
         * JOIN_FULL and JOIN_RIGHT, because they can produce false null
         * extended rows.  Also, the resulting path must not be parameterized.
         * We would be able to support JOIN_FULL and JOIN_RIGHT for Parallel
         * Hash, since in that case we're back to a single hash table with a
         * single set of match bits for each batch, but that will require
         * figuring out a deadlock-free way to wait for the probe to finish.
         * 如果连接是并行安全的，可以考虑并行哈希连接。
         * 但是，我们不能处理JOIN_UNIQUE_OUTER，因为外部路径是部分的，因此我们不能正确地保证惟一性。
         * 类似地，我们不能处理JOIN_FULL和JOIN_RIGHT，因为它们会产生假空扩展行。
         * 此外，生成的路径不能被参数化。
         * 我们将能够支持JOIN_FULL和JOIN_RIGHT用于并行哈希，
         * 因为在这种情况下，我们将返回到一个哈希表，每个批处理只有一组匹配位，
         * 但这需要找到一种没有死锁的方式来等待探测完成。
         */
        if (joinrel->consider_parallel &&
            save_jointype != JOIN_UNIQUE_OUTER &&
            save_jointype != JOIN_FULL &&
            save_jointype != JOIN_RIGHT &&
            outerrel->partial_pathlist != NIL &&
            bms_is_empty(joinrel->lateral_relids))
        {
            Path       *cheapest_partial_outer;
            Path       *cheapest_partial_inner = NULL;
            Path       *cheapest_safe_inner = NULL;

            cheapest_partial_outer =
                (Path *) linitial(outerrel->partial_pathlist);

            /*
             * Can we use a partial inner plan too, so that we can build a
             * shared hash table in parallel?
             * 我们是否也可以使用部分内表访问路径，以便并行构建共享哈希表?
             */
            if (innerrel->partial_pathlist != NIL && enable_parallel_hash)
            {
                cheapest_partial_inner =
                    (Path *) linitial(innerrel->partial_pathlist);
                try_partial_hashjoin_path(root, joinrel,
                                          cheapest_partial_outer,
                                          cheapest_partial_inner,
                                          hashclauses, jointype, extra,
                                          true /* parallel_hash */ );
            }

            /*
             * Normally, given that the joinrel is parallel-safe, the cheapest
             * total inner path will also be parallel-safe, but if not, we'll
             * have to search for the cheapest safe, unparameterized inner
             * path.  If doing JOIN_UNIQUE_INNER, we can't use any alternative
             * inner path.
             * 通常，假设连接是并行安全的，最便宜的总内表访问路径也是并行安全的，
             * 但如果不是，我们将不得不寻找成本最低的安全的、非参数化的内表访问路径。
             * 如果执行JOIN_UNIQUE_INNER，则不能使用任何替代的内表访问路径。
             */
            if (cheapest_total_inner->parallel_safe)
                cheapest_safe_inner = cheapest_total_inner;
            else if (save_jointype != JOIN_UNIQUE_INNER)
                cheapest_safe_inner =
                    get_cheapest_parallel_safe_total_inner(innerrel->pathlist);

            if (cheapest_safe_inner != NULL)
                try_partial_hashjoin_path(root, joinrel,
                                          cheapest_partial_outer,
                                          cheapest_safe_inner,
                                          hashclauses, jointype, extra,
                                          false /* parallel_hash */ );
        }
    }
}
 

//----------------------------- try_hashjoin_path

 /*
  * try_hashjoin_path
  *    Consider a hash join path; if it appears useful, push it into
  *    the joinrel's pathlist via add_path().
  *    尝试构造hash join访问路径.
  *    如果该访问路径可用,通过add_path函数添加到连接新生成的关系joinrel中的pathlist链表中
  */
 static void
 try_hashjoin_path(PlannerInfo *root,
                   RelOptInfo *joinrel,
                   Path *outer_path,
                   Path *inner_path,
                   List *hashclauses,
                   JoinType jointype,
                   JoinPathExtraData *extra)
 {
     Relids      required_outer;
     JoinCostWorkspace workspace;
 
     /*
      * Check to see if proposed path is still parameterized, and reject if the
      * parameterization wouldn't be sensible.
      * 检查建议的路径是否仍然是参数化的，如果参数化不合理，则拒绝。
      * 
      */
     required_outer = calc_non_nestloop_required_outer(outer_path,
                                                       inner_path);
     if (required_outer &&
         !bms_overlap(required_outer, extra->param_source_rels))
     {
         /* Waste no memory when we reject a path here */
         bms_free(required_outer);
         return;
     }
 
     /*
      * See comments in try_nestloop_path().  Also note that hashjoin paths
      * never have any output pathkeys, per comments in create_hashjoin_path.
      * 参见try_nestloop_path()中的注释。
      * 还要注意，hash join访问路径从来没有任何输出路径键，参见create_hashjoin_path中的注释.
      */
     initial_cost_hashjoin(root, &workspace, jointype, hashclauses,
                           outer_path, inner_path, extra, false);//初步估算成本
 
     if (add_path_precheck(joinrel,
                           workspace.startup_cost, workspace.total_cost,
                           NIL, required_outer))//初始判断
     {
         add_path(joinrel, (Path *)
                  create_hashjoin_path(root,
                                       joinrel,
                                       jointype,
                                       &workspace,
                                       extra,
                                       outer_path,
                                       inner_path,
                                       false,    /* parallel_hash */
                                       extra->restrictlist,
                                       required_outer,
                                       hashclauses));//创建hash join访问路径,并添加
     }
     else
     {
         /* Waste no memory when we reject a path here */
         bms_free(required_outer);
     }
 }
 

//------------------ create_hashjoin_path

 /*
  * create_hashjoin_path
  *    Creates a pathnode corresponding to a hash join between two relations.
  *    创建hash join访问路径Node
  *
  * 'joinrel' is the join relation
  * 'jointype' is the type of join required
  * 'workspace' is the result from initial_cost_hashjoin
  * 'extra' contains various information about the join
  * 'outer_path' is the cheapest outer path
  * 'inner_path' is the cheapest inner path
  * 'parallel_hash' to select Parallel Hash of inner path (shared hash table)
  * 'restrict_clauses' are the RestrictInfo nodes to apply at the join
  * 'required_outer' is the set of required outer rels
  * 'hashclauses' are the RestrictInfo nodes to use as hash clauses
  *      (this should be a subset of the restrict_clauses list)
  */
 HashPath *
 create_hashjoin_path(PlannerInfo *root,
                      RelOptInfo *joinrel,
                      JoinType jointype,
                      JoinCostWorkspace *workspace,
                      JoinPathExtraData *extra,
                      Path *outer_path,
                      Path *inner_path,
                      bool parallel_hash,
                      List *restrict_clauses,
                      Relids required_outer,
                      List *hashclauses)
 {
     HashPath   *pathnode = makeNode(HashPath);
 
     pathnode->jpath.path.pathtype = T_HashJoin;
     pathnode->jpath.path.parent = joinrel;
     pathnode->jpath.path.pathtarget = joinrel->reltarget;
     pathnode->jpath.path.param_info =
         get_joinrel_parampathinfo(root,
                                   joinrel,
                                   outer_path,
                                   inner_path,
                                   extra->sjinfo,
                                   required_outer,
                                   &restrict_clauses);
     pathnode->jpath.path.parallel_aware =
         joinrel->consider_parallel && parallel_hash;
     pathnode->jpath.path.parallel_safe = joinrel->consider_parallel &&
         outer_path->parallel_safe && inner_path->parallel_safe;
     /* This is a foolish way to estimate parallel_workers, but for now... */
     pathnode->jpath.path.parallel_workers = outer_path->parallel_workers;
 
     /*
      * A hashjoin never has pathkeys, since its output ordering is
      * unpredictable due to possible batching.  XXX If the inner relation is
      * small enough, we could instruct the executor that it must not batch,
      * and then we could assume that the output inherits the outer relation's
      * ordering, which might save a sort step.  However there is considerable
      * downside if our estimate of the inner relation size is badly off. For
      * the moment we don't risk it.  (Note also that if we wanted to take this
      * seriously, joinpath.c would have to consider many more paths for the
      * outer rel than it does now.)
      * hashjoin从来没有路径键，因为由于可能的批处理，其输出顺序不可预测。
      * 如果内部关系足够小，可以指示执行器它不执行批处理，然后可以假设输出继承外部关系的顺序，这样可以节省排序步骤。
      * 然而，如果对内部关系大小的估计严重不足，就会有相当大的负面影响。
      * (还要注意，如果我们想认真对待这个问题，那就是joinpath.c将不得不考虑比现在更多的外表访问路径。)
      */
     pathnode->jpath.path.pathkeys = NIL;
     pathnode->jpath.jointype = jointype;
     pathnode->jpath.inner_unique = extra->inner_unique;
     pathnode->jpath.outerjoinpath = outer_path;
     pathnode->jpath.innerjoinpath = inner_path;
     pathnode->jpath.joinrestrictinfo = restrict_clauses;
     pathnode->path_hashclauses = hashclauses;
     /* final_cost_hashjoin will fill in pathnode->num_batches */
 
     final_cost_hashjoin(root, pathnode, workspace, extra);//最终的成本估算
 
     return pathnode;
 }

三、跟踪分析

测试脚本如下

testdb=# explain verbose select dw.*,grjf.grbh,grjf.xm,grjf.ny,grjf.je 
testdb-# from t_dwxx dw,lateral (select gr.grbh,gr.xm,jf.ny,jf.je 
testdb(#                         from t_grxx gr inner join t_jfxx jf 
testdb(#                                        on gr.dwbh = dw.dwbh 
testdb(#                                           and gr.grbh = jf.grbh) grjf
testdb-# order by dw.dwbh;
                                           QUERY PLAN                                            
-------------------------------------------------------------------------------------------------
 Sort  (cost=20070.93..20320.93 rows=100000 width=47)
   Output: dw.dwmc, dw.dwbh, dw.dwdz, gr.grbh, gr.xm, jf.ny, jf.je
   Sort Key: dw.dwbh
   ->  Hash Join  (cost=3754.00..8689.61 rows=100000 width=47)
         Output: dw.dwmc, dw.dwbh, dw.dwdz, gr.grbh, gr.xm, jf.ny, jf.je
         Inner Unique: true
         Hash Cond: ((gr.dwbh)::text = (dw.dwbh)::text)
         ->  Hash Join  (cost=3465.00..8138.00 rows=100000 width=31)
               Output: gr.grbh, gr.xm, gr.dwbh, jf.ny, jf.je
               Hash Cond: ((jf.grbh)::text = (gr.grbh)::text)
               ->  Seq Scan on public.t_jfxx jf  (cost=0.00..1637.00 rows=100000 width=20)
                     Output: jf.ny, jf.je, jf.grbh
               ->  Hash  (cost=1726.00..1726.00 rows=100000 width=16)
                     Output: gr.grbh, gr.xm, gr.dwbh
                     ->  Seq Scan on public.t_grxx gr  (cost=0.00..1726.00 rows=100000 width=16)
                           Output: gr.grbh, gr.xm, gr.dwbh
         ->  Hash  (cost=164.00..164.00 rows=10000 width=20)
               Output: dw.dwmc, dw.dwbh, dw.dwdz
               ->  Seq Scan on public.t_dwxx dw  (cost=0.00..164.00 rows=10000 width=20)
                     Output: dw.dwmc, dw.dwbh, dw.dwdz
(20 rows)

启动gdb,设置断点跟踪

(gdb) b hash_inner_and_outer
Breakpoint 1 at 0x7b066b: file joinpath.c, line 1684.
(gdb) c
Continuing.

Breakpoint 1, hash_inner_and_outer (root=0x2676078, joinrel=0x26d2bc0, outerrel=0x26814e0, innerrel=0x2682a10, 
    jointype=JOIN_INNER, extra=0x7ffd6ea6b9d0) at joinpath.c:1684
1684        JoinType    save_jointype = jointype;

连接类型为JOIN_INNER

(gdb) p jointype
$1 = JOIN_INNER

1号和3号RTE的连接(即t_dwxx和t_grxx)

(gdb) p *joinrel->relids->words
$3 = 10

开始遍历连接条件,获取hash连接条件

1697        foreach(l, extra->restrictlist)
(gdb) 
1699            RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);

成功获取,t_dwxx.dwbh = t_grxx.dwbh

(gdb) 
1697        foreach(l, extra->restrictlist)
(gdb) 
1722        if (hashclauses)
(gdb) p *hashclauses
$4 = {type = T_List, length = 1, head = 0x26d4068, tail = 0x26d4068}

获取成本最低的外表启动路径/成本最低的外表访问路径/成本最低的内部访问路径
分别是外表顺序扫描/外表顺序扫描/内部顺序扫描

(gdb) n
1729            Path       *cheapest_startup_outer = outerrel->cheapest_startup_path;
(gdb) 
1730            Path       *cheapest_total_outer = outerrel->cheapest_total_path;
(gdb) 
1731            Path       *cheapest_total_inner = innerrel->cheapest_total_path;
(gdb) p *cheapest_startup_outer
$5 = {type = T_Path, pathtype = T_SeqScan, parent = 0x26814e0, pathtarget = 0x2681718, param_info = 0x0, 
  parallel_aware = false, parallel_safe = true, parallel_workers = 0, rows = 10000, startup_cost = 0, total_cost = 164, 
  pathkeys = 0x0}
(gdb) p *cheapest_total_outer
$6 = {type = T_Path, pathtype = T_SeqScan, parent = 0x26814e0, pathtarget = 0x2681718, param_info = 0x0, 
  parallel_aware = false, parallel_safe = true, parallel_workers = 0, rows = 10000, startup_cost = 0, total_cost = 164, 
  pathkeys = 0x0}
(gdb) p *cheapest_total_inner
$7 = {type = T_Path, pathtype = T_SeqScan, parent = 0x2682a10, pathtarget = 0x2682c48, param_info = 0x0, 
  parallel_aware = false, parallel_safe = true, parallel_workers = 0, rows = 100000, startup_cost = 0, total_cost = 1726, 
  pathkeys = 0x0}

如外表成本最低的启动路径不为NULL,则尝试hash连接

(gdb) n
1740                PATH_PARAM_BY_REL(cheapest_total_inner, outerrel))
(gdb) 
1739            if (PATH_PARAM_BY_REL(cheapest_total_outer, innerrel) ||
(gdb) 
1744            if (jointype == JOIN_UNIQUE_OUTER)
(gdb) 
1760            else if (jointype == JOIN_UNIQUE_INNER)
(gdb) 
1796                if (cheapest_startup_outer != NULL)
(gdb) 
1797                    try_hashjoin_path(root,

进入try_hashjoin_path

(gdb) step
try_hashjoin_path (root=0x2676078, joinrel=0x26d2bc0, outer_path=0x26853b8, inner_path=0x26cf610, hashclauses=0x26d4090, 
    jointype=JOIN_INNER, extra=0x7ffd6ea6b9d0) at joinpath.c:737
737     required_outer = calc_non_nestloop_required_outer(outer_path,

try_hashjoin_path->初步估算成本

...
751     initial_cost_hashjoin(root, &workspace, jointype, hashclauses,
(gdb) p workspace
$9 = {startup_cost = 3465, total_cost = 4261, run_cost = 796, inner_run_cost = 0, 
  inner_rescan_run_cost = 6.9528109284473596e-310, outer_rows = 3.7882102964330281e-317, 
  inner_rows = 2.0115578425988515e-316, outer_skip_rows = 2.0115578425988515e-316, 
  inner_skip_rows = 6.9528109284331305e-310, numbuckets = 131072, numbatches = 2, inner_rows_total = 100000}

try_hashjoin_path->进入函数create_hashjoin_path

(gdb) n
759                  create_hashjoin_path(root,
(gdb) step
create_hashjoin_path (root=0x2676078, joinrel=0x26d2bc0, jointype=JOIN_INNER, workspace=0x7ffd6ea6b850, 
    extra=0x7ffd6ea6b9d0, outer_path=0x26853b8, inner_path=0x26cf610, parallel_hash=false, restrict_clauses=0x26d3098, 
    required_outer=0x0, hashclauses=0x26d4090) at pathnode.c:2330
2330        HashPath   *pathnode = makeNode(HashPath);

try_hashjoin_path->create_hashjoin_path->计算成本并返回

(gdb) 
2370        final_cost_hashjoin(root, pathnode, workspace, extra);
(gdb) 
2372        return pathnode;
(gdb) 
2373    }
(gdb) p *pathnode
$10 = {jpath = {path = {type = T_HashPath, pathtype = T_HashJoin, parent = 0x26d2bc0, pathtarget = 0x26d2df8, 
      param_info = 0x0, parallel_aware = false, parallel_safe = true, parallel_workers = 0, rows = 100000, 
      startup_cost = 3465, total_cost = 5386, pathkeys = 0x0}, jointype = JOIN_INNER, inner_unique = false, 
    outerjoinpath = 0x26853b8, innerjoinpath = 0x26cf610, joinrestrictinfo = 0x26d3098}, path_hashclauses = 0x26d4090, 
  num_batches = 2, inner_rows_total = 100000}

try_hashjoin_path->添加路径

(gdb) n
try_hashjoin_path (root=0x2676078, joinrel=0x26d2bc0, outer_path=0x26853b8, inner_path=0x26cf610, hashclauses=0x26d4090, 
    jointype=JOIN_INNER, extra=0x7ffd6ea6b9d0) at joinpath.c:758
758         add_path(joinrel, (Path *)
(gdb) 
776 }
(gdb)

回到hash_inner_and_outer,继续循环

(gdb) 
hash_inner_and_outer (root=0x2676078, joinrel=0x26d2bc0, outerrel=0x26814e0, innerrel=0x2682a10, jointype=JOIN_INNER, 
    extra=0x7ffd6ea6b9d0) at joinpath.c:1805
1805                foreach(lc1, outerrel->cheapest_parameterized_paths)

结束函数调用

1904    }
(gdb) 
add_paths_to_joinrel (root=0x2676078, joinrel=0x26d2bc0, outerrel=0x26814e0, innerrel=0x2682a10, jointype=JOIN_INNER, 
    sjinfo=0x7ffd6ea6bac0, restrictlist=0x26d3098) at joinpath.c:315
315     if (joinrel->fdwroutine &&
(gdb) p *joinrel->pathlist
$11 = {type = T_List, length = 2, head = 0x26d4160, tail = 0x26d3e30}

查看joinrel的路径链表

(gdb) p *(Node *)joinrel->pathlist->head->data.ptr_value
$12 = {type = T_HashPath}
(gdb) p *(Node *)joinrel->pathlist->head->next->data.ptr_value
$13 = {type = T_MergePath}
(gdb) p *(HashPath *)joinrel->pathlist->head->data.ptr_value
$14 = {jpath = {path = {type = T_HashPath, pathtype = T_HashJoin, parent = 0x26d2bc0, pathtarget = 0x26d2df8, 
      param_info = 0x0, parallel_aware = false, parallel_safe = true, parallel_workers = 0, rows = 100000, 
      startup_cost = 3465, total_cost = 5386, pathkeys = 0x0}, jointype = JOIN_INNER, inner_unique = false, 
    outerjoinpath = 0x26853b8, innerjoinpath = 0x26cf610, joinrestrictinfo = 0x26d3098}, path_hashclauses = 0x26d4090, 
  num_batches = 2, inner_rows_total = 100000}
(gdb) p *(MergePath *)joinrel->pathlist->head->next->data.ptr_value
$15 = {jpath = {path = {type = T_MergePath, pathtype = T_MergeJoin, parent = 0x26d2bc0, pathtarget = 0x26d2df8, 
      param_info = 0x0, parallel_aware = false, parallel_safe = true, parallel_workers = 0, rows = 100000, 
      startup_cost = 10035.66023721841, total_cost = 11955.396048959938, pathkeys = 0x2685928}, jointype = JOIN_INNER, 
    inner_unique = false, outerjoinpath = 0x26ce070, innerjoinpath = 0x26cf610, joinrestrictinfo = 0x26d3098}, 
  path_mergeclauses = 0x26d3eb8, outersortkeys = 0x0, innersortkeys = 0x26d3f18, skip_mark_restore = false, 
  materialize_inner = false}

DONE!
函数initial_cost_hashjoin和final_cost_hashjoin在下一小节介绍.

四、参考资料

allpaths.c
cost.h
costsize.c
PG Document:Query Planning

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/6906/viewspace-2374828/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/6906/viewspace-2374828/

你可能感兴趣的:(数据库,数据结构与算法)

Redis 线上操作最佳实践阿贾克斯的黎明 java redis
在2024年9月19日，Redis作为一种高性能的内存数据库，在许多线上应用中发挥着重要作用。为了确保Redis在生产环境中的稳定运行和高效性能，以下是一些Redis线上操作的最佳实践。一、配置优化1.内存设置-根据实际需求合理设置Redis的内存限制。可以通过maxmemory参数来限制Redis使用的内存大小，避免因内存使用过多导致系统内存不足。-同时，设置合适的内存淘汰策略，如volatil
Redis-py 实战指南：从安装到向量索引，Python 操作 Redis 全解析佑瞻数据库与知识图谱 redis python 数据库人工智能
在Python开发中，操作Redis数据库是很多场景下的刚需，而redis-py作为Redis官方推荐的Python客户端，更是我们绕不开的工具。但你是否在安装时踩过版本兼容的坑？是否在连接集群或配置TLS时犯过难？甚至想尝试向量索引却不知从何下手？今天我们就从基础到进阶，手把手带你玩转redis-py，让Python操作Redis变得简单又高效。一、redis-py安装：避坑指南首先，我们需要安
Go-Redis × 向量检索实战用 HNSW 在 Redis 中索引与查询文本 Embedding（Hash & JSON 双版本） Hello.Reader 数据库运维缓存技术 golang redis embedding
1.场景与思路痛点：把“文本内容”转成向量后，如何在本地Redis里做近似向量搜索（KNN），而不依赖外部向量数据库？方案：利用HuggingFace模型sentence-transformers/all-MiniLM-L6-v2生成384维Float32向量；借助RediSearch的HNSW索引能力，在Hash或JSON文档里存储&查询向量；用go-redisv9的高阶API（FTCreate
Oracle分区表插入数据库时间时报ORA-14400 Indestructible
使用springdatajpa插入数据时，需要表中的createtime保存为数据库时间，而不是应用服务器时间，实现这个功能只需要在实体类上面加@DynamicInsert就可以了。代码如下：@Entity@Table(name="ENTITY")@DynamicInsertpublicclassEntity{@Column(nullable=false)privateDatecreatetime
Spring AI 概述与功能简介 drebander AI 编程 spring 人工智能 java
SpringAI是一个由Spring团队开发的开源框架，旨在为人工智能（AI）和机器学习（ML）提供一个成熟且高效的开发平台。它将Spring生态系统的设计理念应用于AI开发，尤其强调模块化、可移植性以及简洁的集成。SpringAI提供了丰富的功能，涵盖从AI模型的调用到与数据库的集成等多个方面，帮助开发者构建和管理AI驱动的应用程序。1.SpringAI背景SpringAI的背景源于Spring
RabitQ 量化：既省内存又提性能大禹智库《向量数据库指南》《实战AI智能体》人工智能 AI自动化大禹智库 AI智能体向量数据库
突破高维向量内存瓶颈：MlivusCloudRaBitQ量化技术的工程实践与调优指南作为大禹智库高级研究员，拥有三十余年向量数据库与AI系统架构经验的我发现，在当今多模态AI落地的核心场景中，高维向量引发的内存资源消耗问题已成为制约系统规模化部署的“卡脖子”因素。特别是在大规模图像检索、个性化推荐系统和语义搜索引擎中，动辄数亿级别的向量数据需要实时处理，传统全精度索引方式会让内存资源消耗呈指数级增
python爬虫从入门到精通大模型猫叔 python 爬虫数据库
目录一、正确认识Python爬虫二、了解爬虫的本质1.熟悉Python编程2.了解HTML3.了解网络爬虫的基本原理4.学习使用Python爬虫库三、了解非结构化数据的存储1.本地文件2.数据库四、掌握各种技巧，应对特殊网站的反爬措施1.User-Agent2.Cookies3.IP代理五、学习爬虫框架，搭建工程化的爬虫1.创建Scrapy项目2.创建Spider3.编写Spider4.运行Spi
Node.js特训专栏-实战进阶：16. RBAC权限模型设计爱分享的程序员 Node.js node.js 安全算法前端
欢迎来到Node.js实战专栏！在这里，每一行代码都是解锁高性能应用的钥匙，让我们一起开启Node.js的奇妙开发之旅！Node.js特训专栏主页专栏内容规划详情我将从RBAC权限模型的基础概念、核心组件讲起，详细阐述其设计原则、数据库模型设计，还会结合代码示例展示在实际开发中的实现方式，以及探讨模型的扩展与优化。RBAC权限模型设计：从理论到实战的完整方案在现代应用系统中，权限管理是保障数据安全
InfluxDB 数据模型：桶、测量、标签与字段详解（一）计算机毕设定制辅导-无忧 #InfluxDB db
一、引言**在大数据和物联网蓬勃发展的当下，时间序列数据的处理需求呈爆发式增长。InfluxDB作为一款高性能的开源时序数据库，凭借其卓越的特性，在时序数据库领域占据了重要地位，被广泛应用于各种场景。InfluxDB专为时间序列数据设计，拥有高效的存储和查询性能。它采用独特的存储引擎，能够快速写入大量带有时间戳的数据，并支持灵活的查询操作。其核心设计针对时间序列数据的特点进行了优化，包括时间索引、
[特殊字符] Spring Boot 常用注解全解析：20 个高频注解 + 使用场景实例库库林_沙琪马 springboot spring boot 后端 java
一文掌握SpringBoot中最常用的20个注解，涵盖开发、配置、Web、数据库、测试等场景，配合示例讲解，一站式掌握！一、核心配置类注解1.@SpringBootApplication作用：标记为SpringBoot应用的入口类，包含了@Configuration、@EnableAutoConfiguration和@ComponentScan。使用场景：主启动类上唯一标注一次。@SpringBo
基于Python的Google Patents专利数据爬取实战：从入门到精通 Python爬虫项目 2025年爬虫实战项目 python 开发语言爬虫 scrapy selenium
摘要本文将详细介绍如何使用Python构建一个高效的GooglePatents专利爬虫，涵盖最新技术如Playwright浏览器自动化、异步请求处理、反反爬策略等。文章包含完整的代码实现、性能优化技巧以及数据处理方法，帮助读者全面掌握专利数据采集技术。1.引言在当今知识经济时代，专利数据已成为企业技术研发、市场竞争分析的重要资源。GooglePatents作为全球最大的专利数据库之一，收录了来自全
openGauss数据库源码解析 | openGauss简介(七） openGauss小助手数据库 openGauss
1.5.5数据库安全1.访问控制管理用户对数据库的访问控制权限涵盖数据库系统权限和对象权限。openGauss数据库支持基于角色的访问控制机制（role-basedaccesscontrol，RBAC），将角色和权限关联起来，通过将权限赋予给对应的角色，再将角色授予给用户，可实现用户访问控制权限管理。其中登录访问控制通过用户标识和认证技术来共同实现，而对象访问控制则基于用户在对象上的权限，通过对象
第8天 | openGauss中一个数据库可以存储在多个表空间中 yBmZlQzJ openGauss 数据库 oracle gaussdb opengauss
接着昨天继续学习openGauss,今天是第8天了。今天学习内容是o一个数据库可以存储在多个表空间中。老规矩，先登陆墨天轮为我准备的实训实验室root@modb:~#su-ommomm@modb:~$gsql-r作业要求1.创建表空间newtbs1、ds_location1，查看表空间omm=#CREATETABLESPACEnewtbs1RELATIVELOCATION'tablespace/t
第7天 | openGauss中一个数据库中可以创建多个模式 yBmZlQzJ openGauss 数据库 oracle opengauss
接着昨天继续学习openGauss,今天是第7天了。今天学习内容是openGauss数据库、用户和模式的关系和访问方式，理解模式是在数据库层面，用户是在实例层面。今早去参加了区里的一个会议，学习来晚了点，抓紧交作业了。老规矩，先登陆墨天轮为我准备的实训实验室，并创建好表空间和数据库root@modb:~#su-ommomm@modb:~$gsql-romm=#CREATETABLESPACEmus
第9天 | openGauss中一个表空间可以存储多个数据库 yBmZlQzJ openGauss 数据库 oracle postgresql opengauss
接着昨天继续学习openGauss,今天是第9天了。今天学习内容是o一个数据库可以存储在多个表空间中。老规矩，先登陆墨天轮为我准备的实训实验室root@modb:~#su-ommomm@modb:~$gsql-r作业要求1.创建表空间newtbs1omm=#CREATETABLESPACEnewtbs1RELATIVELOCATION'tablespace/tablespace_1';CREATE
到底DB::listen(function ($query) { ... })；为什么是回调函数？快点好好学习吧 Laravel 数据库
DB::listen(function($query){...});是Laravel中用于监听数据库查询的一个方法。它的核心作用是通过回调函数捕获和处理每个执行的SQL查询及其相关信息。这种设计的选择（使用回调函数）是基于灵活性、解耦性和事件驱动架构的考虑。1.为什么使用回调函数？在DB::listen()方法中，使用回调函数的主要原因包括：a)灵活性回调函数允许开发者以灵活的方式处理每个查询事件
Android8.0一些系统数据库的变更总结留给时光吧
1.SettingsProvider之前的一些系统设置内容如亮度、音量大小等都存储在settings.db这个数据库中，但在8.0上数据库不见了。在8.0上都存在几个xml文件中了，其实从6.0开始就已经开始了这种操作，只不过6.0上并没有删除原始数据库，从7.0开始系统删除了原始数据库。简单看一下源码：android\frameworks\base\packages\SettingsProvid
Java数据结构与算法(爬楼梯动态规划) 盘门 java数据结构与算法实战 java 动态规划开发语言
前言爬楼梯就是一个斐波那契数列问题，采用动态规划是最合适不过的。实现原理初始化:dp[0]=1;dp[1]=2;转移方程：dp[i]=dp[i-1]+d[i-2];边界条件:无具体代码实现classSolution{publicintclimbStairs(intn){if(n==1){return1;}int[]dp=newint[n];dp[0]=1;dp[1]=2;for(inti=2;i<
Python,Go开发光电效应与日常应用APP Geeker-2025 python golang
以下是一个基于Python与Go开发的光电效应科普与应用APP的完整技术方案，结合了物理原理模拟、实时数据处理及生活场景应用，参考了工业级开发实践（如光电实验数据处理和能源设备控制）：---###一、系统架构设计```mermaidgraphLRA[Go微服务层]-->B[Python科学计算层]A-->C[数据库/物联网]B-->D[硬件接口]D-->E[传感器/实验设备]subgraph前端A
[Python] -项目实战5- Python 实现简易学生成绩管理系统踏雪无痕老爷子 Python python 开发语言
一、为什么做这个项目？学习OOP和GUI基础：通过类与对象封装学生信息，熟悉Tkinter构建窗口、表格、按钮等。实用性强：可添加、查询、删除、修改学生记录，是常见管理系统的基本功能。扩展性好：后续可以接入数据库、图表展示、权限控制等功能。二、核心技术与工具tkinter：Python内置的桌面GUI库，用于构建窗口界面、表单和按钮。sqlite3：轻量级关系数据库，适合小型持久化存储，无需部署服
Kettle--MySQL生产数据库千万、亿级数据量迁移方案及性能优化 m0_67401761 面试学习路线阿里巴巴 android 前端后端
大家好，我是贾斯汀！【实战前言】（1）不管你是学生，还是已经工作了的小伙伴，可能你在过去、现在或者未来，会遇到这样的问题，公司/项目用的是Oracle/DB2/MySQL等关系型数据库，因公司发展需求，需要完成旧数据库数据安全迁移到新数据库的重要使命，新旧数据库可能是同一种类型的数据库，也可能是不同类型的数据库，相同类型数据库还好，比如都是MySQL数据库，那么你主要只需要考虑如何将数据安全、高效
数据库第八次作业--备份和索引倪旻萱数据库
一、备份与恢复作业：创库,建表：CREATEDATABASEbooksDB;usebooksDB;CREATETABLEbooks(bk_idINTNOTNULLPRIMARYKEY,bk_titleVARCHAR(50)NOTNULL,copyrightYEARNOTNULL);CREATETABLEauthors(auth_idINTNOTNULLPRIMARYKEY,auth_nameVAR
PostgreSQL常用命令与工具指南 Mr.小海 Linux 服务器 postgresql 数据库算法架构网络协议 linux 运维开发
文章目录PostgreSQL常用命令与工具指南简介1.连接与基本操作连接数据库环境变量设置（避免密码输入）常用元命令2.数据库与表管理数据库操作创建数据库删除数据库修改数据库属性表操作创建表修改表结构删除表索引管理创建索引删除索引3.数据操作(CRUD)插入数据查询数据更新数据删除数据事务控制4.账号与权限管理角色/用户操作创建角色修改角色删除角色权限控制授予权限撤销权限查看权限5.常用函数字符串
人脸识别：AI 如何精准 “认人”？田园Coder 人工智能科普人工智能科普
1.人脸识别的基本原理：从“看到脸”到“认出人”1.1什么是人脸识别技术人脸识别是基于人的面部特征信息进行身份认证的生物识别技术。它通过摄像头采集人脸图像，利用AI算法提取面部特征（如眼距、鼻梁高度、下颌轮廓等），再与数据库中的模板比对，最终判断“是否为同一个人”。与指纹识别、虹膜识别等生物识别技术相比，人脸识别的优势在于“非接触性”（无需触碰设备）和“自然性”（符合人类习惯，如刷脸支付无需额外操
redis-缓存三剑客（缓存击穿，缓存穿透，缓存雪崩） hzx790688184 redis redis
redis-缓存击穿，缓存穿透，缓存雪崩缓存三剑客（缓存击穿，缓存穿透，缓存雪崩）缓存击穿请求一个不存在的数据时，请求到数据库，数据库不存在该数据，会导致每次请求都会到数据库缓存穿透当热点key过期时，突然大量请求访问，直接访问到数据库缓存雪崩大批量的key同时失效，或redis宕机，导致大量的请求直接访问数据库缓存三剑客（缓存击穿，缓存穿透，缓存雪崩）缓存击穿请求一个不存在的数据时，请求到数据库
【数据结构与算法-Day 4】从O(1)到O(n²)，全面掌握空间复杂度分析吴师兄大模型数据结构与算法数据结构与算法 python 时间复杂度大模型人工智能数据结构深度学习
Langchain系列文章目录01-玩转LangChain：从模型调用到Prompt模板与输出解析的完整指南02-玩转LangChainMemory模块：四种记忆类型详解及应用场景全覆盖03-全面掌握LangChain：从核心链条构建到动态任务分配的实战指南04-玩转LangChain：从文档加载到高效问答系统构建的全程实战05-玩转LangChain：深度评估问答系统的三种高效方法（示例生成、手
缓存三剑客解决方案爱学习的小熊猫_ 缓存 redis
缓存三剑客解决方案1.缓存雪崩定义：大量缓存数据在同一时间点集体失效，导致所有请求直接穿透到数据库，引发数据库瞬时高负载甚至崩溃。解决方案：设置过期随机值，避免大量缓存同时失效。//缓存雪崩防护：随机过期时间+双层缓存//设置随机过期时间（基础时间+随机偏移）Randomrandom=newRandom();longexpire=baseExpire+random.nextInt(5*60*100
PHPStorm携手ThinkPHP8：开启高效开发之旅奔跑吧邓邓子项目攻略 phpstorm ThinkPHP ThinkPHP8 php开发
目录一、前期准备1.1开发环境搭建1.2配置Xdebug二、PHPStorm集成ThinkPHP82.1导入ThinkPHP8项目2.2配置PHP解释器2.3配置服务器三、ThinkPHP8项目开发基础3.1项目结构剖析3.2控制器与方法创建3.3视图渲染与数据传递四、数据库操作与模型定义4.1数据库配置4.2模型定义与使用4.3数据库迁移与种子五、高级开发技巧与优化5.1路由优化与管理5.2中间
秒杀系统设计思路先生zeng
昨天遇到这个问题，发现自己临时总结的不是很好，所以现在想重新整理一下思路。分析一下问题:类似淘宝那种做秒杀系统活动，你是如何设计的？场景分析:1.需到达某个时刻才可以开始秒杀(某个时刻之前需要控制拒绝请求)。2.一瞬间大量的请求到后台，服务器，数据库，缓存都会扛不住。(前端拦截、削峰，限流)3.满足条件才可以进行秒杀(最先过滤这些不满足条件的)4.防止恶意刷单请求，网站攻击(SQL注入，CSRF)
七、Zabbix — Proxy分布式监控胖胖不胖、《Zabbix速学即学即用》zabbix 分布式服务器运维监控
目录配置Zabbix-proxy代理1.安装代理2.安装并配置数据库（proxy不能与zabbix-server共享数据库）3.发送zabbix-server源码包中初始化脚本到proxy主机并导入数据库4.修改代理配置文件5.web页面添加并配置代理Zabbix-agent客户端配置1.修改配置文件2.web页面修改，把这些主机修改为通过代理获取数据减少zabbix-server压力便于多地设备
分享100个最新免费的高匿HTTP代理IP mcj8089 代理IP 代理服务器匿名代理免费代理IP 最新代理IP
推荐两个代理IP网站： 1. 全网代理IP：http://proxy.goubanjia.com/ 2. 敲代码免费IP：http://ip.qiaodm.com/ 120.198.243.130:80,中国/广东省 58.251.78.71:8088,中国/广东省 183.207.228.22:83,中国/
mysql高级特性之数据分区 annan211 java 数据结构 mongodb 分区 mysql
mysql高级特性 1 以存储引擎的角度分析，分区表和物理表没有区别。是按照一定的规则将数据分别存储的逻辑设计。器底层是由多个物理字表组成。 2 分区的原理分区表由多个相关的底层表实现，这些底层表也是由句柄对象表示，所以我们可以直接访问各个分区。存储引擎管理分区的各个底层表和管理普通表一样(所有底层表都必须使用相同的存储引擎)，分区表的索引只是
JS采用正则表达式简单获取URL地址栏参数 chiangfai js 地址栏参数获取
GetUrlParam:function GetUrlParam(param){ var reg = new RegExp("(^|&)"+ param +"=([^&]*)(&|$)"); var r = window.location.search.substr(1).match(reg); if(r!=null
怎样将数据表拷贝到powerdesigner (本地数据库表) Array_06 powerDesigner
================================================== 1、打开PowerDesigner12，在菜单中按照如下方式进行操作 file->Reverse Engineer->DataBase 点击后，弹出 New Physical Data Model 的对话框 2、在General选项卡中 Model name:模板名字，自
logbackのhelloworld 飞翔的马甲日志 logback
一、概述 1.日志是啥？当我是个逗比的时候我是这么理解的：log.debug()代替了system.out.print(); 当我项目工作时，以为是一堆得.log文件。这两天项目发布新版本，比较轻松，决定好好地研究下日志以及logback。传送门1：日志的作用与方法： http://www.infoq.com/cn/articles/why-and-how-log 上面的作
新浪微博爬虫模拟登陆随意而生新浪微博
转载自：http://hi.baidu.com/erliang20088/item/251db4b040b8ce58ba0e1235 近来由于毕设需要，重新修改了新浪微博爬虫废了不少劲，希望下边的总结能够帮助后来的同学们。现行版的模拟登陆与以前相比，最大的改动在于cookie获取时候的模拟url的请求
synchronized 香水浓 java thread
Java语言的关键字，可用来给对象和方法或者代码块加锁，当它锁定一个方法或者一个代码块的时候，同一时刻最多只有一个线程执行这段代码。当两个并发线程访问同一个对象object中的这个加锁同步代码块时，一个时间内只能有一个线程得到执行。另一个线程必须等待当前线程执行完这个代码块以后才能执行该代码块。然而，当一个线程访问object的一个加锁代码块时，另一个线程仍然
maven 简单实用教程 AdyZhang maven
1. Maven介绍 1.1. 简介 java编写的用于构建系统的自动化工具。目前版本是2.0.9，注意maven2和maven1有很大区别，阅读第三方文档时需要区分版本。 1.2. Maven资源见官方网站；The 5 minute test，官方简易入门文档；Getting Started Tutorial，官方入门文档；Build Coo
Android 通过 intent传值获得null aijuans android
我在通过intent 获得传递兑现过的时候报错，空指针,我是getMap方法进行传值，代码如下 1 2 3 4 5 6 7 8 9 public void getMap(View view){ Intent i =
apache 做代理报如下错误：The proxy server received an invalid response from an upstream baalwolf response
网站配置是apache＋tomcat,tomcat没有报错，apache报错是： The proxy server received an invalid response from an upstream server. The proxy server could not handle the request GET /. Reason: Error reading fr
Tomcat6 内存和线程配置 BigBird2012 tomcat6
1、修改启动时内存参数、并指定JVM时区（在windows server 2008 下时间少了8个小时）在Tomcat上运行j2ee项目代码时，经常会出现内存溢出的情况，解决办法是在系统参数中增加系统参数： window下，在catalina.bat最前面 set JAVA_OPTS=-XX:PermSize=64M -XX:MaxPermSize=128m -Xms5
Karam与TDD bijian1013 Karam TDD
一.TDD 测试驱动开发（Test-Driven Development,TDD）是一种敏捷（AGILE）开发方法论，它把开发流程倒转了过来，在进行代码实现之前，首先保证编写测试用例，从而用测试来驱动开发（而不是把测试作为一项验证工具来使用）。 TDD的原则很简单： a.只有当某个
[Zookeeper学习笔记之七]Zookeeper源代码分析之Zookeeper.States bit1129 zookeeper
public enum States { CONNECTING, //Zookeeper服务器不可用，客户端处于尝试链接状态 ASSOCIATING, //？？？ CONNECTED, //链接建立，可以与Zookeeper服务器正常通信 CONNECTEDREADONLY, //处于只读状态的链接状态，只读模式可以在
【Scala十四】Scala核心八：闭包 bit1129 scala
Free variable A free variable of an expression is a variable that’s used inside the expression but not defined inside the expression. For instance, in the function literal expression (x: Int) => (x
android发送json并解析返回json ronin47 android
package com.http.test; import org.apache.http.HttpResponse; import org.apache.http.HttpStatus; import org.apache.http.client.HttpClient; import org.apache.http.client.methods.HttpGet; import
一份IT实习生的总结 brotherlamp PHP php资料 php教程 php培训 php视频
今天突然发现在不知不觉中自己已经实习了 3 个月了，现在可能不算是真正意义上的实习吧，因为现在自己才大三，在这边撸代码的同时还要考虑到学校的功课跟期末考试。让我震惊的是，我完全想不到在这 3 个月里我到底学到了什么，这是一件多么悲催的事情啊。同时我对我应该 get 到什么新技能也很迷茫。所以今晚还是总结下把，让自己在接下来的实习生活有更加明确的方向。最后感谢工作室给我们几个人这个机会让我们提前出来
据说是2012年10月人人网校招的一道笔试题-给出一个重物重量为X,另外提供的小砝码重量分别为1，3，9。。。3^N。将重物放到天平左侧，问在两边如何添加砝码 bylijinnan java
public class ScalesBalance { /** * 题目： * 给出一个重物重量为X,另外提供的小砝码重量分别为1，3，9。。。3^N。（假设N无限大，但一种重量的砝码只有一个） * 将重物放到天平左侧，问在两边如何添加砝码使两边平衡 * * 分析： * 三进制 * 我们约定括号表示里面的数是三进制，例如 47=(1202
dom4j最常用最简单的方法 chiangfai dom4j
要使用dom4j读写XML文档,需要先下载dom4j包,dom4j官方网站在 http://www.dom4j.org/目前最新dom4j包下载地址:http://nchc.dl.sourceforge.net/sourceforge/dom4j/dom4j-1.6.1.zip 解开后有两个包,仅操作XML文档的话把dom4j-1.6.1.jar加入工程就可以了,如果需要使用XPath的话还需要
简单HBase笔记 chenchao051 hbase
一、Client-side write buffer 客户端缓存请求描述：可以缓存客户端的请求，以此来减少RPC的次数，但是缓存只是被存在一个ArrayList中，所以多线程访问时不安全的。可以使用getWriteBuffer()方法来取得客户端缓存中的数据。默认关闭。二、Scan的Caching 描述： next( )方法请求一行就要使用一次RPC,即使
mysqldump导出时出现when doing LOCK TABLES daizj mysql mysqdump 导数据
　　执行　mysqldump -uxxx -pxxx -hxxx -Pxxxx database tablename > tablename.sql　导出表时，会报 mysqldump: Got error: 1044: Access denied for user 'xxx'@'xxx' to database 'xxx' when doing LOCK TABLES 解决
CSS渲染原理 dcj3sjt126com Web
从事Web前端开发的人都与CSS打交道很多，有的人也许不知道css是怎么去工作的，写出来的css浏览器是怎么样去解析的呢？当这个成为我们提高css水平的一个瓶颈时，是否应该多了解一下呢？一、浏览器的发展与CSS
《阿甘正传》台词 dcj3sjt126com
Part Ⅰ: 《阿甘正传》Forrest Gump经典中英文对白 Forrest: Hello! My names Forrest. Forrest Gump. You wanna Chocolate? I could eat about a million and a half othese. My momma always said life was like a box ochocol
Java处理JSON dyy_gusi json
Json在数据传输中很好用，原因是JSON 比 XML 更小、更快，更易解析。在Java程序中，如何使用处理JSON，现在有很多工具可以处理，比较流行常用的是google的gson和alibaba的fastjson，具体使用如下： 1、读取json然后处理 class ReadJSON { public static void main(String[] args)
win7下nginx和php的配置 geeksun nginx
1. 安装包准备 nginx : 从nginx.org下载nginx-1.8.0.zip php：从php.net下载php-5.6.10-Win32-VC11-x64.zip， php是免安装文件。 RunHiddenConsole: 用于隐藏命令行窗口 2. 配置 # java用8080端口做应用服务器，nginx反向代理到这个端口即可 p
基于2.8版本redis配置文件中文解释 hongtoushizi redis
转载自： http://wangwei007.blog.51cto.com/68019/1548167 在Redis中直接启动redis-server服务时, 采用的是默认的配置文件。采用redis-server xxx.conf 这样的方式可以按照指定的配置文件来运行Redis服务。下面是Redis2.8.9的配置文
第五章常用Lua开发库3-模板渲染 jinnianshilongnian nginx lua
动态web网页开发是Web开发中一个常见的场景，比如像京东商品详情页，其页面逻辑是非常复杂的，需要使用模板技术来实现。而Lua中也有许多模板引擎，如目前我在使用的lua-resty-template，可以渲染很复杂的页面，借助LuaJIT其性能也是可以接受的。如果学习过JavaEE中的servlet和JSP的话，应该知道JSP模板最终会被翻译成Servlet来执行；而lua-r
JZSearch大数据搜索引擎颠覆者 JavaScript
系统简介：大数据的特点有四个层面：第一，数据体量巨大。从TB级别，跃升到PB级别；第二，数据类型繁多。网络日志、视频、图片、地理位置信息等等。第三，价值密度低。以视频为例，连续不间断监控过程中，可能有用的数据仅仅有一两秒。第四，处理速度快。最后这一点也是和传统的数据挖掘技术有着本质的不同。业界将其归纳为4个“V”——Volume，Variety，Value，Velocity。大数据搜索引
10招让你成为杰出的Java程序员 pda158 java 编程框架
如果你是一个热衷于技术的 Java 程序员，那么下面的 10 个要点可以让你在众多 Java 开发人员中脱颖而出。　　 1. 拥有扎实的基础和深刻理解 OO 原则　　对于 Java 程序员，深刻理解 Object Oriented Programming（面向对象编程）这一概念是必须的。没有 OOPS 的坚实基础，就领会不了像 Java 这些面向对象编程语言
tomcat之oracle连接池配置小网客 oracle
tomcat版本7.0 配置oracle连接池方式：修改tomcat的server.xml配置文件： <GlobalNamingResources> <Resource name="utermdatasource" auth="Container" type="javax.sql.DataSou
Oracle 分页算法汇总 vipbooks oracle sql 算法 .net
这是我找到的一些关于Oracle分页的算法，大家那里还有没有其他好的算法没？我们大家一起分享一下！ -- Oracle 分页算法一 select * from ( select page.*,rownum rn from (select * from help) page -- 20 = (currentPag