yangdelong

基于cg实现EPI渲染

http://www.cs.cornell.edu/~kb/projects/epigpu/

Abstract: The render cache and the edge-and-point image (EPI) are alternative point-based rendering techniques that combine interactive performance with expensive, high quality shading for complex scenes. They use sparse sampling and intelligent reconstruction to enable fast framerates and to decouple shading from the display update.

We present a hybrid CPU/GPU multi-pass system that accelerates these techniques by utilizing programmable graphics processing units (GPUs) to achieve better framerates while freeing the CPU for other uses such as high-quality shading (including global illumination). Because the render cache and EPI differ from the traditional graphics pipeline in interesting ways, we encountered several challenges in using the GPU effectively. We discuss our optimizations to achieve good performance, limitations with the current generation hardware, as well as possibilities for future improvements.

The following diagram shows the data flow on the GPU. Squares on the figure represent textures, rectangles VBOs. The colored boxes are the different shaders used, grouped by the stage in which they are used: Point processing (green), Edge finding and rasterization (blue) and Image filters (red). Click on each one to visualize the corresponding Cg shader source code.

Point processing

Age & penalize

/*
==========================================================================================
 Cg Acceleration Research

 Edgar Velázquez Armendáriz - edgar [at] graphics [dot] cornell [dot] edu
------------------------------------------------------------------------------------------
 ageMain.cg

 Fragment program to age the points and just copy the current imageID.
==========================================================================================
*/

/**
 * fp40: # 17 instructions, 1 R-regs, 1 H-regs
 */
void ageMain(in half2 pos                       : WPOS,
                uniform samplerRECT current     : TEXUNIT0,
                out half4 OUT                   : COLOR) {
                
    // STATIC DATA
    const static half INCREMENT = 1;        // [0,255]
                
    // Gets the image id and age
    OUT = texRECT(current, pos);


    // New points are not penalised
    if (OUT.r > 0) {

        // Penalization for points that were not projected to the image
        // Compares the vectors at once. If all elements of this comparison eq 0, means that
        // all of them are filled with ones, and that is an invalid imageID. Using DeMorgan:
        //
        // if (A) -> Penalize
        // A = (x eq 0) && (y eq 0) && (z eq 0)
        // !A = !(x eq 0) || !(y eq 0) || !(z eq 0)
        // !A = x neq 0 || y neq 0 || z neq 0 <-- This is like the Cg function !any()
        // !(!A) = A
        if ( !any(OUT.gba < half3(1, 1, 127/255.0)) ) {
            OUT.r += 8/255.0;
            //OUT.r = 0;
        }

        // Color change penalty, using the flag
        if (OUT.a >= 128/255.0) {
            //OUT.r += 28/255.0;
            OUT.r = 1;
        }

    }

    // Finally age the point. Because this is an RGBA8 buffer, values are saturated for free
    OUT.r += INCREMENT/255.0;
                
}

Point projection

void ageMain(in half2 pos                       : WPOS,
                uniform samplerRECT current     : TEXUNIT0,
                out half4 OUT                   : COLOR) {

    // STATIC DATA
    const static half INCREMENT = 1;        // [0,255]

    // Gets the image id and age
    OUT = texRECT(current, pos);

    // New points are not penalised
    if (OUT.r > 0) {

        // Penalization for points that were not projected to the image
        // Compares the vectors at once. If all elements of this comparison eq 0, means that
        // all of them are filled with ones, and that is an invalid imageID. Using DeMorgan:
        //
        // if (A) -> Penalize
        // A = (x eq 0) && (y eq 0) && (z eq 0)
        // !A = !(x eq 0) || !(y eq 0) || !(z eq 0)
        // !A = x neq 0 || y neq 0 || z neq 0      <-- This is like the Cg function !any()
        // !(!A) = A
        if ( !any(OUT.gba < half3(1, 1, 127/255.0)) ) {
            OUT.r += 8/255.0;
            //OUT.r = 0;
        }

        // Color change penalty, using the flag
        if (OUT.a >= 128/255.0) {
            //OUT.r += 28/255.0;
            OUT.r = 1;
        }

    }

    // Finally age the point. Because this is an RGBA8 buffer, values are saturated for free
    OUT.r += INCREMENT/255.0;

}

/* Vertex information to be transfered */
struct vertexInfo {
    float4 pos                  : POSITION;
    half2 uv                    : TEXCOORD0;
    half4 color                 : COLOR0;
    half4 colorSec              : COLOR1;
    half2 subPixel;
};


// vp40: # 18 instructions, 2 R-regs
void vertMain( uniform float4x4 ModelViewProj : state.matrix.mvp,
                uniform half2 c,                // Vector <width/2, height/2> for subpixel transformation
                in vertexInfo IN,
                out vertexInfo OUT ) {

    // Transformed position of the vertex into clip coordinates
    OUT.pos = mul(ModelViewProj, IN.pos);

    // Force points into clipping plane, to avoid the backgroud points to be
    // deleted by the depth cull
    OUT.pos.z = clamp(OUT.pos.z, -1e38, OUT.pos.w * (0.999999523162841796875));

    // Just copy the input color, and the secondary color which contains the encoded pointID
    OUT.color    = IN.color;
    OUT.colorSec = IN.colorSec;

    // The rectangle texture coordinates encode the index of the vertex into the
    // original array of data, in such a way that all its original data is read from
    // the packed texture
    OUT.uv = IN.uv;

    // Get the pixel mapping, integer and fractional part (characteristic and mantissa)
    const half2 gamma = c * OUT.pos.xy / OUT.pos.w + c;

    // The fractional part is stored in a varying parameter:
    // Each mantissa will give me the information about subpixel location
    OUT.subPixel    = frac(gamma);
    OUT.subPixel.y = 1 - OUT.subPixel.y;   // Calculated with origin on bottom left corner, It must
                                            // be in the upper left corner.

}

// fp40: # 15 instructions, 1 R-regs, 1 H-regs
void fragMain( uniform samplerRECT packData    : TEXUNIT0,
                in vertexInfo IN,
                in float3 pos                  : WPOS,
                out float depth                 : DEPTH,
                out half4 outputs[3]            : COLOR0    ) {

    // UPDATE TO REFLECT CHANGES IN ENVIRONMENT
    const half MAX_AGE = 255;      // [0,255]
    const half FLAGS    = 64;       // [0,255]      // FLAGS == 0x40

    // After the texture operation, do some math to cover the latency
    half4 idAgePacked = texRECT(packData, IN.uv);

    // Subpixel info
    half2 subPixV   = floor(half2(4,4) * IN.subPixel);
    half subPix     = 1/255.0h * (4*subPixV.y + subPixV.x);     // subPix (not scaled) is in the range [0, 15]

    // The secondary color contains the pointID, with the LSB in R and the MSB in B, so I just
    // need to copy that information
    half4 idVertex = half4(FLAGS/255.0, IN.colorSec.rgb);

    // If this is an invalid point, because of the age, discard
    if (idAgePacked.r >= MAX_AGE/255.0h) { outputs[0] = half4(1,0,0, subPix); }//discard; }

    // Normal color, it also copies the subpixel info into the final RGBA texture,
    // ready to be attached to the EPI-GPU.
    outputs[0] = half4(IN.color.rgb, subPix);

    // The final output contains the Flags in R and the pointID splitted in GBA
    outputs[1] = idVertex;

    // In the third render buffer, I will store:
    //   r - Priority
    //   g - seqnum
    //   b - flags/seqnum info
    //   a - 0 as flag -> invalid pixels have a = 1
    half priority = saturate(idAgePacked.r/2.0 - 8/255.0);      // priority = max(0,age-16) / 2
    outputs[2] = half4(priority, IN.color.a, 1/255.0h * FLAGS + IN.color.a, 0);

}

Predicted projection

/**
* Vertex program to just transfor a vertex. This shader is meant
* to work on an environment where depth test is disabled
*
* vp40: # 5 instructions, 1 R-regs
*/
void vertSimple(    uniform float4x4 ModelViewProj : state.matrix.mvp,
                    in float4 IN                   : POSITION,
                    out float4 OUT                  : POSITION )
{

    // Transformed position of the vertex into clip coordinates, after transforming!
    OUT = mul(ModelViewProj, IN);
    OUT.y *= -1.0; // Flips the image on y
    OUT.z = 0;      // Avoid one DP4 instruction, because I just do not care about Z!
}

/**
* Just draw white pixels!
*
* fp40: # 1 instructions, 0 R-regs, 0 H-regs
*/
half3 fragSimple() : COLOR {

    return half3(1,1,1);
}

Set Image ID

/**
* Simple vertex output/input structor
*/
struct vertSimpleData {
    float4 pos      : POSITION;
    half4 col      : COLOR;
};

/**
* Fragment program to copy the imageID as color. IMPORTANT: Although the 24 bit image Id
* was passed as RGB color, it has to be written into the GBA channels, because the R
* channel contains the age, so swizzle will be used
*
* fp40: # 1 instructions, 0 R-regs, 0 H-regs
*/
void copyColorFrag( in half4 IN        : COLOR,
                    out half4 OUT       : COLOR )
{

    OUT.gba = IN.rgb;
}

// This shader receives the PointID encoded in the x,y position, so it has to be transformed
// and also transformed into homogeneous clip space
//
// vp40:    # 29 instructions, 2 R-regs (MIMD branching)
//          # 23 instructions, 2 R-regs (regular code)
void colorCopyVert( uniform float LEN,          // The lenght of the point cloud texture
                    in vertSimpleData IN,
                    out vertSimpleData OUT )
{

    // The original layout of the pointid_flags that was read as vertexes is
    //
    // R - flags
    // G - LSB of pointid
    // B
    // A - MSB

    // A pixel with no point ID and flags = 0x10 means that no point was mapped there. That
    // Translates into an incoming vertex (16,0)
    if ( any(IN.pos.xy != float2(16,0)) ) {

        // First I reconstruct the index
        float2 tmp = float2(1/256.0, 256.0) * IN.pos.xy;

        // This is interesting: the data written to the pointid_flags texture was meant to be
        // unsigned bytes, the scientific notation of the pointid. However, the vertices are
        // interpreted as SIGNED shorts, so any number above 0x7FFF is interpreted as a negative
        // number. With the y-part I have no problem, because the range will never get that high
        // until I have around 8 million points. But the LSB has lot of this troubles, so to convert
        // that byte to the format I need, I just add 0xFF to the integer part of the result, just
        // for the negative numbers.
        // I am using floor and 256, this is valid because all the number have a flags field, therefore
        // the division of IN.pos.x and 256 will always have a fractional part, moving all the results
        // one unit behind. This way the instruction count is reduced from 23 to 20 instructions.
        float index = floor(tmp.x) + tmp.y + (tmp.x < 0 ? 256 : 0);

        // DEBUG!
        //index = IN.col.r * 255.0f + IN.col.g*255.0*256.0 + IN.col.b*255.0*256.0*256.0;

        // I need the fractional and integer part. I can get that info
        // in one Cg instruction. The fractional part is stored in x, and
        // the integral part will be in y
        float2 intFrac;
        intFrac.x = modf(index/LEN, intFrac.y);

        // The regions without points will have and index equal to zero. However, in the real
        // implementation the points used are 32 and above, so writing trash data to point 0 will not
        // be a problem, and it is one less test for this shader

        // Calculates the homogeneous xy coordinates
        intFrac = (1/LEN - 1.0).xx + float2(2.0, 2/LEN) * intFrac;

        // Just copy the position results. z is always 0 and w 1. And by now
        // intFrac contains number in the range [-1, 1] for valid values
        OUT.pos = float4(intFrac, 0, 1);

        // Copies the output color
        OUT.col = IN.col;

    }
    else {

        OUT.pos = float4(-2,-2,-2,1);
    }

}

Depth Cull

// The output of the depth vertex shader
struct depthVertexInfo {
    float4 pos                  : POSITION;
    half4 texCoords[5];        // To hold all the interpolated texture coordinates
};

/**
* Vertex shader for the depth filter that performs the multiple texture
* coordinates interpolation in advance.
*
* vp40: # 15 instructions, 2 R-regs
*/
void depthVertMain( uniform float4x4 ModelViewProj : state.matrix.mvp,
                in half2 uv                        : TEXCOORD0,
                in float4 pos                      : POSITION,
                out depthVertexInfo OUT) {

    // Transformed position of the vertex into clip coordinates
    OUT.pos = mul(ModelViewProj, pos);

    static const half offset    = 1.0h;     // Using TEXTURE_RECTANGE, coords are not normalized
    static const half3 offsetV = half3(offset, 0, -offset);

    // Interpolate!
    OUT.texCoords[0].xy = uv + offsetV.zx;
    OUT.texCoords[0].zw = uv + offsetV.yx;
    OUT.texCoords[1].xy = uv + offsetV.xx;
    OUT.texCoords[1].zw = uv + offsetV.zy;
    OUT.texCoords[2].xy = uv;
    OUT.texCoords[2].zw = uv + offsetV.xy;
    OUT.texCoords[3].xy = uv + offsetV.zz;
    OUT.texCoords[3].zw = uv + offsetV.yz;
    OUT.texCoords[4].xy = uv + offsetV.xz;

}

/**
* Fragment shader to perform the depth culling. It uses a custom vertex shader to precalculate
* all the texture coordinates, instead of making uv + half2(offset, -offset), just fetch them.
*
* fp40: # 38 instructions, 2 R-regs, 2 H-regs
*/
void depthMain( uniform samplerRECT depthTex,
                uniform float4 zTransform,      // The four factors for scaling the projected z-values
                in depthVertexInfo IN,
                out float depth                 : DEPTH,
                out half4 outputs[3]            : COLOR0 ) {

    // Now compute the 3x3 depth filter, first getting the average;
    float z;
    float4 alfa;
    float4 beta;


    // Instead of making a whole if/else block, I use this small instruction which
    // compiles into a shader without real branches. It is faster and reduces
    // instruction count by 4 compared with the whole branch version.
    //
    // --> WITHOUT THIS, the depth is also blurred on the very same pass!
    //if (z == 1) { discard; }


    // Lookup the values
    alfa.x = texRECT( depthTex, IN.texCoords[0].xy ).r;     // uv + (-s, s)
    alfa.y = texRECT( depthTex, IN.texCoords[0].zw ).r;     // uv + ( 0, s)
    alfa.z = texRECT( depthTex, IN.texCoords[1].xy ).r;     // uv + ( s, s)

    alfa.w = texRECT( depthTex, IN.texCoords[1].zw ).r;     // uv + (-s, 0)
    z      = texRECT( depthTex, IN.texCoords[2].xy ).r;     // uv
    beta.x = texRECT( depthTex, IN.texCoords[2].zw ).r;     // uv + ( s, 0)

    beta.y = texRECT( depthTex, IN.texCoords[3].xy ).r;     // uv + (-s, -s)
    beta.z = texRECT( depthTex, IN.texCoords[3].zw ).r;     // uv + ( 0, -s)
    beta.w = texRECT( depthTex, IN.texCoords[4].xy ).r;     // uv + ( s, -s)


    // Invalid values have a depth of 1, so perform a trick to get rid of them
    half4 alfaF = alfa < 1.0h.xxxx ? 1.0h.xxxx : 0.0h.xxxx;
    half4 betaF = beta < 1.0h.xxxx ? 1.0h.xxxx : 0.0h.xxxx;
    //half4 alfaF = !step(1.0h.xxxx, alfa);
    //half4 betaF = !step(1.0h.xxxx, beta);

    // Use those factors to get the propper values
    //alfa = (alfaF != 0.0h.xxxx) ? alfa : 0.0f.xxxx;
    //beta = (betaF != 0.0h.xxxx) ? beta : 0.0f.xxxx;
    alfa *= alfaF;
    beta *= betaF;

    // To make a fast add of all values, construct a matrix with 4 rows, 4 columns
    // The first two rows will have the 8 surrounding depth values, and the other
    // two have the element count. This way, all the sums are performed in the
    // same operation, and it is faster than make the explicit sums for all 18 values.
    float4x4 values = float4x4(alfa, beta, alfaF, betaF);
    float4   sumPart = mul(values, 1.0f.xxxx);

    // The sum of values will be in x, the number of elements in y
    float2 sumCount = float2(z + sumPart.x, 1 + sumPart.z) + sumPart.yw;

    // To perform the depth cull, the boundaries of the test must be
    // calculated, because the z values read from the depth buffer
    // do not map lineary with the model's depth.
    float average       = sumCount.x / sumCount.y;

    // Offset
    float2 vecTmp = (average.xx * zTransform.xy) + zTransform.zw;
    float boundary = vecTmp.x / vecTmp.y;


    // The inferior limit is in boundary.x, the upper limit in boundary.y
    if (z > boundary) {

        // Clear each buffer to its corresponding clear color. It is
        // faster to clear all them to the same color, with a single
        // instruction, but this is the logic of the application

        outputs[0] = half4(0,0,0,0);
        outputs[1] = half4(16/255.0,0,0,0);     // Clear with invalid point flag
        outputs[2] = half4(0,0,0,0);

        //z = 1;
    }
    else {
        //z = boundary;
        outputs[0] = 1.0h.xxxx;
        outputs[1] = 1.0h.xxxx;
        outputs[2] = 1.0h.xxxx;
    }
    z = boundary;


    // Depth test must be enabled for the depth texture to be written.
    // In order to erase the previos pixels, and at the same time allow to write
    // in the color buffer, the DepthTest function must be GL_ALWAYS.
    // This way, I will have written both the color and the depth

    // If the depth is not the last thing written, everything gets messed up
    depth = z;

}

Edge finding and rasterization

Silhouette edge finder

// # 25 instructions, 3 R-regs, 2 H-regs
half3 main(half2 uv : WPOS,
            uniform float3 eye,
            uniform float crease,           // Index of the first crease edge
            uniform float border,           // Index of the first border edge
            uniform float totalCount,       // Total number of edges to be tested
            uniform float texLen,           // Len of the texture
            uniform samplerRECT sV0,
            //uniform samplerRECT sV1,
            uniform samplerRECT sN0,
            uniform samplerRECT sN1) : COLOR
{

    // For each edge, recovers its vertex and normals from floating point textures
    float3 v0 = texRECT(sV0, uv).rgb;
    //float3 v1 = texRECT(sV1, uv).rgb;
    float3 n0 = texRECT(sN0, uv).rgb;
    float3 n1 = texRECT(sN1, uv).rgb;

    // Calculate the index of the current fragment given its position.
    // The values of WPOS are {0.5, 1.5, 2.5, ...}, so this computes
    // a linearization of the position
    half2 uvAux = uv - half2(0.5, 0.5);
    float index = (uvAux.x + uvAux.y*texLen);

    // The result color to be written
    half3 res = half3(0, 0, 0);

    // Calculates the vector from the eye to V0
    float3 p = eye - v0;


    // I will only calculate the appropiate edges
    if (index < totalCount) {

        float2 dot01;               // x-dot0, y-dot1
        dot01.x = dot(p, n0);
        dot01.y = dot(p, n1);

        const float2 sign01 = sign(dot01);

        // Regular and Crease edges require dot1 calculation
        if (index < border) {

            res = ((index < crease) ? (sign01.x != sign01.y) :          // Regular edge test
                    (sign01.x > 0 || sign01.y > 0)).rrr;                // Crease edge test
        }
        else {
            res = (sign01.x > 0).rrr;                                   // Border edge test
        }

    }


    // Finally, return the color
    return res;
}

Raster Edges

// vp40: # 40 instructions, 3 R-regs
// orig: # 66 instructions, 4 R-regs
struct VertexOutput {
    float4 position : POSITION;
    float4 edgeVertices : TEXCOORD1;
};

VertexOutput EdgeRastersVP(float4 position : POSITION,
                           float3 edgeVO : TEXCOORD0,
                           const uniform float width,
                           const uniform float height,
                           const uniform float4x4 modelViewProjMatrix) {

    VertexOutput output;
    float2 iVertex0, iVertex1, direction;
    float4 tVertex0, tVertex1;

    // transform each vertex into homogenous clip-space
    tVertex0 = mul(modelViewProjMatrix, position);
    tVertex1 = mul(modelViewProjMatrix, float4(edgeVO.xyz, 1.0f));

    // transform each vertex into image space

    // IMPROVEMENT 2: 1 vectorization, new vectors (41 ins)
    float4 iVertexTmp = ((( float4(tVertex0.xy, tVertex1.xy) / float4(tVertex0.ww, tVertex1.ww) ) * 0.5)
        + 0.5f.xxxx) * float4(width, height, width, height);
    iVertex0 = iVertexTmp.xy;
    iVertex1 = iVertexTmp.zw;

    // IMPROVEMENT 1: 2 vectorizations, no new vectors (49 ins)
    //iVertex0 = (((tVertex0 / tVertex0.ww) * 0.5) + 0.5f.xx) * float2(width, height);
    //iVertex1 = (((tVertex1 / tVertex1.ww) * 0.5) + 0.5f.xx) * float2(width, height);

    // ORIGINAL: One by one (66 ins)
    //iVertex0.x = ((tVertex0.x / tVertex0.w) * 0.5 + 0.5) * width;
    //iVertex0.y = ((tVertex0.y / tVertex0.w) * 0.5 + 0.5) * height;
    //iVertex1.x = ((tVertex1.x / tVertex1.w) * 0.5 + 0.5) * width;
    //iVertex1.y = ((tVertex1.y / tVertex1.w) * 0.5 + 0.5) * height;


    direction = normalize(iVertex1 - iVertex0) * 2;

    // these are small edges
    bool p = (floor(iVertex0.x) == floor(iVertex1.x) && floor(iVertex0.y) == floor(iVertex1.y));

    // transform vertex back to homogenous clip-space

    // IMPROVEMENT 3: vectorize (40 ins)
    tVertex0.xy = ((iVertex0 - direction) / float2(width, height) - 0.5f.xx) * tVertex0.ww / 0.5f.xx;

    // ORIGINAL: 41 ins after improvement 2
    //tVertex0.x = ((iVertex0.x - direction.x) / width - 0.5) * tVertex0.w / 0.5;
    //tVertex0.y = ((iVertex0.y - direction.y) / height - 0.5) * tVertex0.w / 0.5;


    output.position = tVertex0;
    //output.position.z -= 0.15*output.position.z;

    // cull out small edges
    if(p)
        output.position.x = -1e38;

    // order the edges so that the slope is the same for both vertices of an edge
    // (so that it is passed correctly into the fragment program after interpolation)
    if(((iVertex0.x == iVertex1.x) && (iVertex0.y < iVertex1.y)) || (iVertex0.x < iVertex1.x))
        output.edgeVertices = float4(iVertex0.xy, iVertex1.xy);
    else
        output.edgeVertices = float4(iVertex1.xy, iVertex0.xy);

    return output;
}

// fp40: # 23 instructions, 3 R-regs, 2 H-regs
// orig: # 45 instructions, 2 R-regs, 1 H-regs
half3 EdgeRastersFP( in float3 position : WPOS,
                     in float4 edgeVertices : TEXCOORD1,
                     uniform samplerRECT depthImage,
                     uniform float BIAS) : COLOR0
{

    // vertices for the edge
    float2 edgeVertex0 = edgeVertices.xy;
    float2 edgeVertex1 = edgeVertices.zw;

    // find the bounding positions of the pixel

    // IMPROVEMENT 1: Vectorize the offsets
    // No instruction count change
    float4 lrtb = position.xxyy + float4(-0.5, 0.5, -0.5, 0.5); // left, right, top, bottom
    float left = lrtb.x;
    float right = lrtb.y;
    float top = lrtb.z;
    float bottom = lrtb.w;

    /*
    float left = position.x - 0.5;
    float right = position.x + 0.5;
    float top = position.y - 0.5;
    float bottom = position.y + 0.5;
    */


    // parametrize the line, to P0 + t * direction
    // tLeft, tTop - variables for parametric equations
    float2 edgeDirection = edgeVertex1 - edgeVertex0;

    // IMPROVEMENT 2: After Improvement 1, vectorize the computation and the test
    // Instruction count: from 44 to 23

    // parametrize the line, to P0 + t * direction
    // tLeft, tTop - variables for parametric equations
    float2 intersectionXY;
    float2 tLeftTop;
    float2 xyLocation;
    half2 pXY;

    tLeftTop = (lrtb.xw - edgeVertex0) / edgeDirection;

    intersectionXY = edgeDirection * tLeftTop.yx + edgeVertex0;
    pXY = (!((intersectionXY > lrtb.yw) || (intersectionXY < lrtb.xz) || (tLeftTop.yx < 0.0f.xx) || (tLeftTop.yx > 1.0f.xx)));
    xyLocation = exp2(floor((intersectionXY - lrtb.xz) * 7)) * pXY;

    return float3(xyLocation, 0) / 255.0;

}

Image filters

Pixel Classify & Point Cull

// # 82 instructions, 3 R-regs, 2 H-regs, no if-code
// # 84 instructions, 2 R-regs, 3 H-regs, if code fp40
// This unified shader performs both the pixel classify and the point cull, using MRT!
//   OUT[0] = PixelClass
//   OUT[1] = PointCull
//
// Original:
//   -PixelClass:   # 94 instructions, 3 R-regs, 1 H-regs
//   -PointCull:    # 4 instructions, 2 R-regs, 0 H-regs
void PixelClassPointCull(   in half2 pixelPos                               : WPOS,
                            const uniform samplerRECT edgeIntersections,
                            const uniform samplerRECT subPixelLocations,
                            const uniform samplerRECT bitExtract,
                            const uniform samplerRECT emptyOrder,
                            out half3 OUT[2]                                : COLOR0)
{
    const static float BOTTOM = 0;
    const static float RIGHT = 1;
    const static float TOP    = 2;
    const static float LEFT   = 3;


    // Save all intersections in a single half4 vector
    //   - Top:    x
    //   - Botton: y
    //   - Left:   z
    //   - Right: w
    half4 Intersections;

    // Extract the all 4 intersections around the pixel.
    Intersections.xz = texRECT(edgeIntersections, pixelPos).rg;                                                 // Top-Left
    Intersections.y = texRECT(edgeIntersections, half2(pixelPos.x, pixelPos.y - 1)).r; // Bottom
    Intersections.w = texRECT(edgeIntersections, half2(pixelPos.x + 1, pixelPos.y)).g; // Right

    const half4 colorSample = texRECT(subPixelLocations, pixelPos);

    // Scale by 255 so that the intersections are in [0, 8]. For all intersections, at once
    Intersections = round(Intersections * 255.0);

    // intersection information
    half intersectionCount = dot( step(0.001953125h.xxxx, Intersections), 1.0h.xxxx );

    if (intersectionCount == 2) {

        // construct a 5 bit mask whos information is given by the following
        // the 2 MSB: indicates where left = 3, top = 2, right = 1, bottom = 0
        float4 edgeIntersection;

        // Top intersection
        if(Intersections.x > 0) {
            edgeIntersection.z = texRECT(bitExtract, float2(Intersections.x, 1)).r;
            edgeIntersection.w = TOP;
            //intersectionCount++;
        }

        // Bottom intersection
        if(Intersections.y > 0) {
            edgeIntersection.xy = edgeIntersection.zw;
            edgeIntersection.z = texRECT(bitExtract, float2(Intersections.y, 1)).r;
            edgeIntersection.w = BOTTOM;
            //intersectionCount++;
        }

        // Left intersection
        if(Intersections.z > 0) {
            edgeIntersection.xy = edgeIntersection.zw;
            edgeIntersection.z = texRECT(bitExtract, float2(Intersections.z, 1)).r;
            edgeIntersection.w = LEFT;
            //intersectionCount++;
        }

        // Right intersection
        if(Intersections.w > 0) {
            edgeIntersection.xy = edgeIntersection.zw;
            edgeIntersection.z = texRECT(bitExtract, float2(Intersections.w, 1)).r;
            edgeIntersection.w = RIGHT;
            //intersectionCount++;
        }

        half3 tColor = float3(0, 0, 255);
        //half4 colorSample = texRECT(subPixelLocations, pixelPos);
        half2 subPixelMask = round(colorSample.ba * 255);

        if(subPixelMask.x == 0)
            subPixelMask.y = 16;

        // Do edge ordering
        //float emptyIndex =    edgeIntersection.y * 256 + edgeIntersection.w * 64 +
        //                  edgeIntersection.x * 8 + edgeIntersection.z;
        float emptyIndex = dot(edgeIntersection, half4(8,256,1,64));
        float t = texRECT(emptyOrder, float2(emptyIndex, subPixelMask.y)).r;

        if(t >= 2)
            tColor.b = 0;

        if(t == 1 || t == 3) {
            tColor.rg = edgeIntersection.zw * float2(8,4) + edgeIntersection.xy;
        }
        else {
            tColor.rg = edgeIntersection.xy * float2(8,4) + edgeIntersection.zw;
        }

        // Pixel class info
        OUT[0] = tColor / 255.0;

    }
    else {
        OUT[0] = half3( (intersectionCount > 2 ? half2(1, 15/255.0) : half2(0,0) ),1);
    }

    // And now writes the point cull
    OUT[1] = half3(colorSample.rg, colorSample.b * OUT[0].b);
}

Reachability

struct vertexInfo {
    float4 pos              : POSITION;
    half4 texCoords[8];    // To hold all the interpolated texture coordinates
};

/**
* Vertex shader for the neighbor reach, interpolates coordinates
*
* # 8 instructions, 1 R-regs
*/
void NeighborReachVert(
                uniform float4x4 ModelViewProj      : state.matrix.mvp,
                in half2 uv                        : TEXCOORD0,
                in float4 pos                      : POSITION,
                out vertexInfo OUT)
{
    // Transformed position of the vertex into clip coordinates
    OUT.pos = mul(ModelViewProj, pos);

    // Using TEXTURE_RECTANGE, coords are not normalized

    // Interpolate
    OUT.texCoords[0].xy = uv + half2(-1, 0);        // -1, 0
    OUT.texCoords[0].zw = uv + half2( 1, 0);        // 1, 0
    OUT.texCoords[1].xy = uv + half2( 0, 1);        // 0, 1

}

/**
* Vertex shader for the reachability that performs the multiple texture
* coordinates interpolation in advance.
*
* # 19 instructions, 2 R-regs
*/
void ReachabilityVert(
                uniform float4x4 ModelViewProj      : state.matrix.mvp,
                in half2 uv                        : TEXCOORD0,
                in float4 pos                      : POSITION,
                out vertexInfo OUT)
{

    // Transformed position of the vertex into clip coordinates
    OUT.pos = mul(ModelViewProj, pos);

    // Using TEXTURE_RECTANGE, coords are not normalized

    // Interpolate!
    OUT.texCoords[0].xy = uv + half2(-2, 0);        // -2, 0
    OUT.texCoords[0].zw = uv + half2(-1, 0);        // -1, 0
    OUT.texCoords[1].xy = uv;                       // 0, 0
    OUT.texCoords[1].zw = uv + half2( 1, 0);        // 1, 0
    OUT.texCoords[2].xy = uv + half2( 2, 0);        // 2, 0

    OUT.texCoords[2].zw = uv + half2(-2, 1);        // -2, 1
    OUT.texCoords[3].xy = uv + half2(-1, 1);        // -1, 1
    OUT.texCoords[3].zw = uv + half2( 0, 1);        // 0, 1
    OUT.texCoords[4].xy = uv + half2( 1, 1);        // 1, 1
    OUT.texCoords[4].zw = uv + half2( 2, 1);        // 2, 1

    OUT.texCoords[5].xy = uv + half2(-2, 2);        // -2, 2
    OUT.texCoords[5].zw = uv + half2(-1, 2);        // -1, 2
    OUT.texCoords[6].xy = uv + half2( 0, 2);        // 0, 2
    OUT.texCoords[6].zw = uv + half2( 1, 2);        // 1, 2
    OUT.texCoords[7].xy = uv + half2( 2, 2);        // 2, 2

}

/**
* Vertex shader for the reachability copy that performs the multiple texture
* coordinates interpolation in advance.
*
* # 17 instructions, 2 R-regs
*/
void CopyReachabilityVert(
                uniform float4x4 ModelViewProj      : state.matrix.mvp,
                in half2 uv                        : TEXCOORD0,
                in float4 pos                      : POSITION,
                out vertexInfo OUT)
{

    // Transformed position of the vertex into clip coordinates
    OUT.pos = mul(ModelViewProj, pos);

    // Using TEXTURE_RECTANGE, coords are not normalized

    // Interpolate!
    OUT.texCoords[0].xy = uv + half2(-2,-2);        // -2,-2
    OUT.texCoords[0].zw = uv + half2(-1,-2);        // -1,-2
    OUT.texCoords[1].xy = uv + half2( 0,-2);        // 0,-2
    OUT.texCoords[1].zw = uv + half2( 1,-2);        // 1,-2
    OUT.texCoords[2].xy = uv + half2( 2,-2);        // 2,-2

    OUT.texCoords[2].zw = uv + half2(-2,-1);        // -2,-1
    OUT.texCoords[3].xy = uv + half2(-1,-1);        // -1,-1
    OUT.texCoords[3].zw = uv + half2( 0,-1);        // 0,-1
    OUT.texCoords[4].xy = uv + half2( 1,-1);        // 1,-1
    OUT.texCoords[4].zw = uv + half2( 2,-1);        // 2,-1

}

// New:      # 19 instructions, 2 R-regs, 2 H-regs
// Original: # 28 instructions, 1 R-regs, 2 H-regs
half4 NeighborReach(in half2 pos : WPOS,
    in vertexInfo IN,
    const uniform samplerRECT pixelClass,
    const uniform samplerRECT neighborTableLR,
    const uniform samplerRECT neighborTableRL,
    const uniform samplerRECT neighborTableVER) : COLOR {

    half4 outColor;
    half4 olrb;         // origin, left, right, bottom

    olrb.x = texRECT(pixelClass, pos).g;
    olrb.y = texRECT(pixelClass, IN.texCoords[0].xy).g;     // -1, 0
    olrb.z = texRECT(pixelClass, IN.texCoords[0].zw).g;     // 1, 0
    olrb.w = texRECT(pixelClass, IN.texCoords[1].xy).g;     // 0, 1

    olrb = round(olrb * 255);

    outColor.x = texRECT(neighborTableRL, olrb.yx).x; // half2(left, origin)
    outColor.y = texRECT(neighborTableLR, olrb.xz).x; // half2(origin, right)
    outColor.z = texRECT(neighborTableVER, olrb.wx).x; // half2(bottom, origin)
    outColor.w = step(14.5, olrb.x);    // origin
    return outColor / half4(255.0h.xxx, 1);
}

// New:      # 140 instructions, 13 R-regs, 2 H-regs
// Original: # 165 instructions, 10 R-regs, 3 H-regs
half4 Reachability(in half2 pos : WPOS,
    in vertexInfo IN,
    const uniform samplerRECT neighborTable,
    const uniform samplerRECT pixelClass,
    const uniform samplerRECT orTable,
    const uniform samplerRECT chainTable) : COLOR
{

    half4 color = half4(0, 0, 0, 0);

    //half reachability[15];
    half3 reachability013;
    half3 reachability456;
    half4 reachability789A;
    half4 reachabilityBCDE;

    half2 argument;

    const half3 nr00 = round(texRECT(neighborTable, IN.texCoords[0].xy).rgb * 255);     // -2, 0
    const half3 nr01 = round(texRECT(neighborTable, IN.texCoords[0].zw).rgb * 255);     // -1, 0
    const half4 nr02 = round(texRECT(neighborTable, IN.texCoords[1].xy) * 255);         // 0, 0
    const half3 nr03 = round(texRECT(neighborTable, IN.texCoords[1].zw).rgb * 255);     // 1, 0
    const half3 nr04 = round(texRECT(neighborTable, IN.texCoords[2].xy).rgb * 255);     // 2, 0

    const half3 nr05 = round(texRECT(neighborTable, IN.texCoords[2].zw).rgb * 255);     // -2, 1
    const half3 nr06 = round(texRECT(neighborTable, IN.texCoords[3].xy).rgb * 255);     // -1, 1
    const half3 nr07 = round(texRECT(neighborTable, IN.texCoords[3].zw).rgb * 255);     // 0, 1
    const half3 nr08 = round(texRECT(neighborTable, IN.texCoords[4].xy).rgb * 255);     // 1, 1
    const half3 nr09 = round(texRECT(neighborTable, IN.texCoords[4].zw).rgb * 255);     // 2, 1

    const half3 nr10 = round(texRECT(neighborTable, IN.texCoords[5].xy).rgb * 255);     // -2, 2
    const half3 nr11 = round(texRECT(neighborTable, IN.texCoords[5].zw).rgb * 255);     // -1, 2
    const half3 nr12 = round(texRECT(neighborTable, IN.texCoords[6].xy).rgb * 255);     // 0, 2
    const half3 nr13 = round(texRECT(neighborTable, IN.texCoords[6].zw).rgb * 255);     // 1, 2
    const half3 nr14 = round(texRECT(neighborTable, IN.texCoords[7].xy).rgb * 255);     // 2, 2

    // ROW 0
    reachability013.y = nr02.r;
    reachability013.x = texRECT(chainTable, half2(nr02.r, nr01.r)).x;

    reachability013.z = nr02.g;
    reachability456.x = texRECT(chainTable, half2(nr02.g, nr03.g)).x;

    // To mask latency
    color.g += dot(step(8.0h.xxx, reachability013), half3(4,8,16));


    // ROW 1
    reachability789A.x = nr02.b;

    argument.x = texRECT(chainTable, half2(nr02.r, nr01.b)).x;
    argument.y = texRECT(chainTable, half2(nr02.b, nr07.r)).x;
    reachability456.z = texRECT(orTable, argument).x;

    argument.x = texRECT(chainTable, half2(reachability013.x, nr00.b)).x;
    argument.y = texRECT(chainTable, half2(reachability456.z, nr06.r)).x;
    reachability456.y = texRECT(orTable, argument).x;

    // To mask latency
    color.g += dot(step(8.0h.xxx, reachability456), half3(32,64,128));

    argument.x = texRECT(chainTable, half2(nr02.g, nr03.b)).x;
    argument.y = texRECT(chainTable, half2(nr02.b, nr07.g)).x;
    reachability789A.y = texRECT(orTable, argument).x;

    argument.x = texRECT(chainTable, half2(reachability456.x, nr04.b)).x;
    argument.y = texRECT(chainTable, half2(reachability789A.y, nr08.g)).x;
    reachability789A.z = texRECT(orTable, argument).x;

    // ROW 2
    reachabilityBCDE.y = texRECT(chainTable, float2(nr02.b, nr07.b)).x;

    argument.x = texRECT(chainTable, half2(reachability456.z, nr06.b)).x;
    argument.y = texRECT(chainTable, half2(reachabilityBCDE.y, nr12.r)).x;
    reachabilityBCDE.x = texRECT(orTable, argument).x;

    argument.x = texRECT(chainTable, half2(reachability456.y, nr05.b)).x;
    argument.y = texRECT(chainTable, half2(reachabilityBCDE.x, nr11.r)).x;
    reachability789A.w = texRECT(orTable, argument).x;

    // To mask latency
    color.b += dot(step(8.0h.xxxx, reachability789A), half4(1,2,4,8));

    argument.x = texRECT(chainTable, half2(reachability789A.y, nr08.b)).x;
    argument.y = texRECT(chainTable, half2(reachabilityBCDE.y, nr12.g)).x;
    reachabilityBCDE.z = texRECT(orTable, argument).x;

    argument.x = texRECT(chainTable, half2(reachability789A.z, nr09.b)).x;
    argument.y = texRECT(chainTable, half2(reachabilityBCDE.z, nr13.g)).x;
    reachabilityBCDE.w = texRECT(orTable, argument).x;

    // To mask latency
    color.b += dot(step(8.0h.xxxx, reachabilityBCDE), half4(16,32,64,128));

    color.a = nr02.a;

    return color / 255.0;
}

// New:         # 66 instructions, 2 R-regs, 4 H-regs
//              # 70 instructions, 2 R-regs, 5 H-regs - with if branch
// Original:    # 114 instructions, 3 R-regs, 4 H-regs
half3 CopyReachability(in half2 pos : WPOS,
                       in vertexInfo IN,
                       const uniform samplerRECT pixelClass,
                       const uniform samplerRECT reachability) : COLOR
{

    //half4 neighbor;
    half4 outColor = round(texRECT(reachability, pos) * 255);

    // Paralelize

    // First block
    half4 neighborA;
    neighborA.x = texRECT(reachability, IN.texCoords[0].xy).b;
    neighborA.y = texRECT(reachability, IN.texCoords[0].zw).b;
    neighborA.z = texRECT(reachability, IN.texCoords[1].xy).b;
    neighborA.w = texRECT(reachability, IN.texCoords[1].zw).b;

    // Multiply and round
    neighborA = round(neighborA * 255);

    // First fmod operations
    neighborA.yzw = fmod(neighborA.yzw, half3(128, 128, 32));

    // One extra fmod
    neighborA.z = fmod(neighborA.z, 64);

    // Values at once
    neighborA = step(half4(128, 64, 32, 16), neighborA);
    outColor.r += dot(neighborA, half4(1,2,4,8));

    // Second block
    half4 neighborB;
    neighborB.x = texRECT(reachability, IN.texCoords[2].xy).b;
    neighborB.y = texRECT(reachability, IN.texCoords[2].zw).b;
    neighborB.z = texRECT(reachability, IN.texCoords[3].xy).b;
    neighborB.w = texRECT(reachability, IN.texCoords[3].zw).b;

    // Multiply and round
    neighborB = round(neighborB * 255);

    // Fmod operations
    neighborB = fmod(neighborB, half4(16, 8, 4, 2));

    // Values at once
    neighborB = step(half4(8, 4, 2, 0.9), neighborB);
    outColor.r += dot(neighborB, half4(16,32,64,128));

    // (round(tex*255)) < 128 : 0 ? 1
    half4 neighbor;
    neighbor = round(texRECT(reachability, IN.texCoords[4].xy) * 255);      // 1,-1
    outColor.g += step(128, neighbor.g);

    // (round(tex*255)) mod 128 < 64 ? 0 : 2
    neighbor = round(texRECT(reachability, IN.texCoords[0].zw) * 255);      // 2,-1
    neighbor.g -= 128 * step(128, neighbor.g);
    outColor.g += step(64, neighbor.g) * 2;

    if(outColor.a > 0) {
        outColor.rgb = half3(255, 255, 255);
    }

    return outColor.rgb / 255.0;
}

Interpolation

// This version: # 214 instructions, 25 R-regs, 9 H-regs
// Original:     # 325 instructions, 2 R-regs, 9 H-regs
// Time:         6.3 ms
half4 Interpolation5(in half2 pos : WPOS,
    const uniform samplerRECT reachability,
    const uniform samplerRECT colorImage,
    const uniform samplerRECT prioritySeqnum,
    const uniform samplerRECT priorityTable) : COLOR
{
    half modulator, centerModulator, weight = 0;
    half3 selfColor;
    half3 averageColor = float3(0, 0, 0);
    half3 reach = round(texRECT(reachability, pos).rgb * 255);
    half3 color = float3(0, 0, 0);

    half3 rNeighbors[8];
    half3 gNeighbors[8];
    half3 bNeighbors[8];

    half3 rWeights[8];
    half3 gWeights[8];
    half3 bWeights[8];

    rWeights[0] = 1h;
    rWeights[1] = 1h;
    rWeights[2] = 1h;
    rWeights[3] = 1h;
    rWeights[4] = 1h;

    rWeights[5] = 1h;
    rWeights[6] = 4h;
    rWeights[7] = 8h;
    gWeights[0] = 4h;
    gWeights[1] = 1h;

    gWeights[2] = 1h;
    gWeights[3] = 8h;
    //            8h;
    gWeights[4] = 8h;
    gWeights[5] = 1h;

    gWeights[6] = 1h;
    gWeights[7] = 4h;
    bWeights[0] = 8h;
    bWeights[1] = 4h;
    bWeights[2] = 1h;

    bWeights[3] = 1h;
    bWeights[4] = 1h;
    bWeights[5] = 1h;
    bWeights[6] = 1h;
    bWeights[7] = 1h;

    rNeighbors[0] = texRECT(colorImage, pos.xy + half2(-2, -2)).rgb;
    rNeighbors[1] = texRECT(colorImage, pos.xy + half2(-1, -2)).rgb;
    rNeighbors[2] = texRECT(colorImage, pos.xy + half2( 0, -2)).rgb;
    rNeighbors[3] = texRECT(colorImage, pos.xy + half2( 1, -2)).rgb;
    rNeighbors[4] = texRECT(colorImage, pos.xy + half2( 2, -2)).rgb;
    rNeighbors[5] = texRECT(colorImage, pos.xy + half2(-2, -1)).rgb;
    rNeighbors[6] = texRECT(colorImage, pos.xy + half2(-1, -1)).rgb;
    rNeighbors[7] = texRECT(colorImage, pos.xy + half2( 0, -1)).rgb;

    gNeighbors[0] = texRECT(colorImage, pos.xy + half2( 1, -1)).rgb;
    gNeighbors[1] = texRECT(colorImage, pos.xy + half2( 2, -1)).rgb;
    gNeighbors[2] = texRECT(colorImage, pos.xy + half2( 2, 0)).rgb;
    gNeighbors[3] = texRECT(colorImage, pos.xy + half2(-1, 0)).rgb;
    gNeighbors[4] = texRECT(colorImage, pos.xy + half2( 1, 0)).rgb;
    gNeighbors[5] = texRECT(colorImage, pos.xy + half2( 2, 0)).rgb;
    gNeighbors[6] = texRECT(colorImage, pos.xy + half2(-2, 1)).rgb;
    gNeighbors[7] = texRECT(colorImage, pos.xy + half2(-1, 1)).rgb;

    bNeighbors[0] = texRECT(colorImage, pos.xy + half2( 0, 1)).rgb;
    bNeighbors[1] = texRECT(colorImage, pos.xy + half2( 1, 1)).rgb;
    bNeighbors[2] = texRECT(colorImage, pos.xy + half2( 2, 1)).rgb;
    bNeighbors[3] = texRECT(colorImage, pos.xy + half2(-2, 2)).rgb;
    bNeighbors[4] = texRECT(colorImage, pos.xy + half2(-1, 2)).rgb;
    bNeighbors[5] = texRECT(colorImage, pos.xy + half2( 0, 2)).rgb;
    bNeighbors[6] = texRECT(colorImage, pos.xy + half2( 1, 2)).rgb;
    bNeighbors[7] = texRECT(colorImage, pos.xy + half2( 2, 2)).rgb;

    // Data for reach.r: 2x4 fmod, 2x4 step operations with those results
    const half4 reachRfmod1 = fmod(reach.rrrr, half4(2,4,8,16));
    const half4 reachRfmod2 = fmod(reach.rrrr, half4(32,64,128,256));
    const half4 reachRstep1 = step(half4(1,2,4,8),      reachRfmod1);
    const half4 reachRstep2 = step(half4(16,32,64,128), reachRfmod2);

    // Data for reach.g: 2x4 fmod, 2x4 step operations with those results
    const half4 reachGfmod1 = fmod(reach.gggg, half4(2,4,8,16));
    const half4 reachGfmod2 = fmod(reach.gggg, half4(32,64,128,256));
    const half4 reachGstep1 = step(half4(1,2,4,8),      reachGfmod1);
    const half4 reachGstep2 = step(half4(16,32,64,128), reachGfmod2);

    // Data for reach.b: 2x4 fmod, 2x4 step operations with those results
    const half4 reachBfmod1 = fmod(reach.bbbb, half4(2,4,8,16));
    const half4 reachBfmod2 = fmod(reach.bbbb, half4(32,64,128,256));
    const half4 reachBstep1 = step(half4(1,2,4,8),      reachBfmod1);
    const half4 reachBstep2 = step(half4(16,32,64,128), reachBfmod2);

    // Data for the rNeighbors.b
    const half4 rNeighborsStep1 = step(0.0001.xxxx,
        half4(rNeighbors[0].b, rNeighbors[1].b, rNeighbors[2].b, rNeighbors[3].b));
    const half4 rNeighborsStep2 = step(0.0001.xxxx,
        half4(rNeighbors[4].b, rNeighbors[5].b, rNeighbors[6].b, rNeighbors[7].b));

    // Data for the gNeighbors.b
    const half4 gNeighborsStep1 = step(0.0001.xxxx,
        half4(gNeighbors[0].b, gNeighbors[1].b, gNeighbors[2].b, gNeighbors[3].b));
    const half4 gNeighborsStep2 = step(0.0001.xxxx,
        half4(gNeighbors[4].b, gNeighbors[5].b, gNeighbors[6].b, gNeighbors[7].b));

    // Data for the bNeighbors.b
    const half4 bNeighborsStep1 = step(0.0001.xxxx,
        half4(bNeighbors[0].b, bNeighbors[1].b, bNeighbors[2].b, bNeighbors[3].b));
    const half4 bNeighborsStep2 = step(0.0001.xxxx,
        half4(bNeighbors[4].b, bNeighbors[5].b, bNeighbors[6].b, bNeighbors[7].b));

    // R - modulators
    const half4 rModulator1 = rNeighborsStep1 * reachRstep1;
    const half4 rModulator2 = rNeighborsStep2 * reachRstep2;

    // G - modulators
    const half4 gModulator1 = gNeighborsStep1 * reachGstep1;
    const half4 gModulator2 = gNeighborsStep2 * reachGstep2;

    // B - modulators
    const half4 bModulator1 = bNeighborsStep1 * reachBstep1;
    const half4 bModulator2 = bNeighborsStep2 * reachBstep2;

    // ****** ROW 0 ******

    modulator = rModulator1.x;
    averageColor += modulator * rNeighbors[0] * rWeights[0];
    weight += modulator * rWeights[0].x;

    modulator = rModulator1.y;
    averageColor += modulator * rNeighbors[1] * rWeights[1];
    weight += modulator * rWeights[1].x;

    modulator = rModulator1.z;
    averageColor += modulator * rNeighbors[2] * rWeights[2];
    weight += modulator * rWeights[2].x;

    modulator = rModulator1.w;
    averageColor += modulator * rNeighbors[3] * rWeights[3];
    weight += modulator * rWeights[3].x;

    modulator = rModulator2.x;
    averageColor += modulator * rNeighbors[4] * rWeights[4];
    weight += modulator * rWeights[4].x;


    // ****** ROW 1 ******

    modulator = rModulator2.y;
    averageColor += modulator * rNeighbors[5] * rWeights[5];
    weight += modulator * rWeights[5].x;

    modulator = rModulator2.z;
    averageColor += modulator * rNeighbors[6] * rWeights[6];
    weight += modulator * rWeights[6].x;

    modulator = rModulator2.w;
    averageColor += modulator * rNeighbors[7] * rWeights[7];
    weight += modulator * rWeights[7].x;

    modulator = gModulator1.x;
    averageColor += modulator * gNeighbors[0] * gWeights[0];
    weight += modulator * gWeights[0].x;

    modulator = gModulator1.y;
    averageColor += modulator * gNeighbors[1] * gWeights[1];
    weight += modulator * gWeights[1].x;


    // ****** ROW 2 ******

    modulator = gModulator1.z;
    averageColor += modulator * gNeighbors[2] * gWeights[2];
    weight += modulator * gWeights[2].x;

    modulator = gModulator1.w;
    averageColor += modulator * gNeighbors[3] * gWeights[3];
    weight += modulator * gWeights[3].x;


    selfColor = texRECT(colorImage, pos).rgb;
    centerModulator = step(0.0001, selfColor.b);
    averageColor += centerModulator * selfColor * 32;
    weight += centerModulator * 32;


    modulator = gModulator2.x;
    averageColor += modulator * gNeighbors[4] * gWeights[4];
    weight += modulator * gWeights[4].x;

    modulator = gModulator2.y;
    averageColor += modulator * gNeighbors[5] * gWeights[5];
    weight += modulator * gWeights[5].x;


    // ****** ROW 3 ******

    modulator = gModulator2.z;
    averageColor += modulator * gNeighbors[6] * gWeights[6];
    weight += modulator * gWeights[6].x;

    modulator = gModulator2.w;
    averageColor += modulator * gNeighbors[7] * gWeights[7];
    weight += modulator * gWeights[7].x;

    modulator = bModulator1.x;
    averageColor += modulator * bNeighbors[0] * bWeights[0];
    weight += modulator * bWeights[0].x;

    modulator = bModulator1.y;
    averageColor += modulator * bNeighbors[1] * bWeights[1];
    weight += modulator * bWeights[1].x;

    modulator = bModulator1.z;
    averageColor += modulator * bNeighbors[2] * bWeights[2];
    weight += modulator * bWeights[2].x;


    // ****** ROW 4 ******

    modulator = bModulator1.w;
    averageColor += modulator * bNeighbors[3] * bWeights[3];
    weight += modulator * bWeights[3].x;

    modulator = bModulator2.x;
    averageColor += modulator * bNeighbors[4] * bWeights[4];
    weight += modulator * bWeights[4].x;

    modulator = bModulator2.y;
    averageColor += modulator * bNeighbors[5] * bWeights[5];
    weight += modulator * bWeights[5].x;

    modulator = bModulator2.z;
    averageColor += modulator * bNeighbors[6] * bWeights[6];
    weight += modulator * bWeights[6].x;

    modulator = bModulator2.w;
    averageColor += modulator * bNeighbors[7] * bWeights[7];
    weight += modulator * bWeights[7].x;

    // Discards pixels without samples in the 5x5 neighborhood
    if (weight < 1) discard;


    half4 outColor;

    outColor.rgb = averageColor / weight;
    outColor.a = saturate(weight / 255.0 + centerModulator);

    // Priority calculation
    const half pWeight = outColor.a;
    half priority;

    // If this is an invalid point, get its priority from the table,
    // else just get its previously stablished priority value
    if (pWeight > 64/255.0) {       // The value was already normalized!
        priority = texRECT(prioritySeqnum, pos).r;
    }
    else {
        priority = texRECT(priorityTable, half2(pWeight * 255 + 0.5, 0.5)).r;
    }
    outColor.a = priority;

    return outColor;
}

Anti-aliasing

// This: # 23 instructions, 3 R-regs, 1 H-regs
// Orig: # 33 instructions, 3 R-regs, 1 H-regs

half4 AntiAliasing(
    in half2 pixelPos : WPOS,
    const uniform samplerRECT pixelClass,
    const uniform samplerRECT color,
    const uniform samplerRECT neighborWeightTable) : COLOR
{
    int dx, dy;
    half4 selfColor;
    half2 edge;
    half3 neighborColor;
    half2 bitmask;
    half neighbor, weight;

    // Will not be needed until later, mask latencies
    selfColor = texRECT(color, pixelPos);

    edge = round(texRECT(pixelClass, pixelPos).rg * half2(255, 255*64));
    bitmask = half2(edge.r + edge.g, 0);

    half2 neighborWeight = texRECT(neighborWeightTable, bitmask).rg;
    neighbor = neighborWeight.r;
    weight = neighborWeight.g;

    half2 d;    // x=dx, y=dy
    d.x = modf(neighbor/4.0, d.y);
    d = half2(-1, 1) + half2(4, -1) * d;
    const half2 neighborCoord = pixelPos + d;

    neighborColor = texRECT(color, neighborCoord).rgb;

    half4 outColor;

    outColor.rgb = lerp(neighborColor, selfColor.rgb, weight);
    outColor.a = selfColor.a;

    return outColor;
}

你可能感兴趣的:(image,processing,float,Matrix,shader,output)

python笔记1 lu_32 python
1.计算面积与周长：r=8s=r*rprint("面积是")print(s)z=r+r+r+rprint("周长是")print(z)#面积是#64#周长是#322.输入圆的半径，计算出圆的面积和周长：r=input("请输入半径：")r=float(r)s=3.14*r*rprint("圆的面积：",s)r=input("请输入圆的半径")r=int(r)s=3.14*r*rprint("圆的半
llama源码学习·model.py[1]RMSNorm归一化小杜不吃糖 llama python
一、model.py中的RMSNorm源码classRMSNorm(torch.nn.Module):def__init__(self,dim:int,eps:float=1e-6):super().__init__()self.eps=epsself.weight=nn.Parameter(torch.ones(dim))def_norm(self,x):returnx*torch.rsqrt(
python opencv轮廓检测_python opencv中的不规则形状检测和测量 weixin_39584529 python opencv轮廓检测
正如我在评论中提到的那样,对于这个问题,分水岭似乎是一个很好的方法.但是当你回答时,定义标记的前景和背景是困难的部分！我的想法是使用形态梯度沿着冰晶获得良好的边缘并从那里开始工作;形态梯度似乎很有效.importnumpyasnpimportcv2img=cv2.imread('image.png')blur=cv2.GaussianBlur(img,(7,7),2)h,w=img.shape[:
QT编程之图像数据操作（QImage、QPixmap、QBitmap、QPicture） byxdaz QT qt 开发语言
一、介绍Qt一共提供了四个这样继承QPaintDevice的绘图设备类，分别是：QPixmap、QBitmap、QImage和QPicture。其中：QPixmap专门为图像在屏幕上的显示做了优化。QBitmap是QPixmap的一个子类，它的色深限定为1，你可以使用QPixmap的isQBitmap()函数来确定这个QPixmap是不是一个QBitmap。QImage专门为图像的像素级访问做了优
ArkTs进阶万事顺心开发语言鸿蒙 typescript
字符串加号两边只要有字符串，就是拼接的作用。模版字符串（`xxx`）主要用于拼接多个变量的字符串拼接letname:string='Tom'console.log(`姓名：${name}`)类型转换1.字符串转数字Number():字符串直接转数字，转换失败返回NaN(字符串中包含非数字)（常用）parseInt():去掉小数部分转数字（取整），转换失败返回NaNparseFloat():保留小数
图像拼接-UDIS详细推导和精读Unsupervised Deep Image Stitching: ReconstructingStitched Features to Images cccc来财算法计算机视觉深度学习
无监督粗对齐1.基于消融的策略主要是为了找到重叠区，去除无效区2.拼接域的TransformerLayer无监督图像重建1.低分辨率变形单应性变换仅能表示同一深度的空间变换，在实际的图像拼接任务中，由于输入图像的多样性和复杂性，经过第一阶段的粗对齐后，图像往往无法完全对齐。为了让网络能够感知到这些错位区域，特别是在高分辨率和大视差的情况下，设计了低分辨率变形分支，先在低分辨率下对图像进行处理和学习
浅谈StarRocks数据库简介及应用微笑的曙光（StevenLi）数据库数据库
StarRocks是一款高性能的实时分析型数据库，专为复杂的SQL查询提供极高的性能，尤其适用于数据分析场景。它是一款开源的新一代极速全场景MPP（MassivelyParallelProcessing，大规模并行处理）数据库，致力于构建极速和统一的分析体验。StarRocks兼容MySQL协议，用户可以使用MySQL客户端和常用的BI（BusinessIntelligence，商业智能）工具进行
vite静态资源压缩-图片压缩
安装插件：pnpmivite-plugin-imagemin-Dvite.config.ts中使用import{defineConfig,loadEnv}from"vite";importvuefrom"@vitejs/plugin-vue";importviteImageminfrom"vite-plugin-imagemin";//省略其它....return{plugins:[vue(),v
Harmonyos开发——TypeScript基础凌煦 Harmonyos typescript javascript
TypeScript基础一、变量类型（1）number型：可以表示int、float、double同时也可表示8、16进制等letnum1:number=0letnum2:number=12.3（2）string型：表示字符串letstr:string="helloworld!"（3）boolean型letfin:Boolean=true（4）any型：可以跳过类型检测（不建议常用）leta:an
Development Problems Based On PyTorch woxiwangxuehaocpp pytorch 深度学习人工智能
问题解决RuntimeError:unabletowritetofile:Nospaceleftondevice(28)问题描述：Traceback(mostrecentcalllast):File"/opt/conda/lib/python3.10/multiprocessing/queues.py",line244,in_feedobj=_ForkingPickler.dumps(obj)Fi
【容器镜像】：获取原始 rootfs 及各系统大小对比 Talbot3的笔记容器 docker linux
之前一秒构建了alpine的容器镜像，甚至使用静态编译的应用不需要rootfs就可以运行，这也是golang在容器时代大流行的主要原因。如果不用科学上网，就可以从零构建基础IT设施，速度又很快，这大大增强了研发进度。下面介绍各rootfs的来源linuxcontainers，并根据images.linuxcontainers.org的镜像结构和搜索结果中提供的索引解析方法，我们可以通过以下步骤获取
用Python写一个天气预报小程序穿梭的编织者 Python脚本 python 小程序
一、界面效果二、完整代码importtkinterastkfromtkinterimportttkimportrequestsimportjsonfromdatetimeimportdatetimefromPILimportImage,ImageTkimportiofromttkbootstrapimportStyleclassWeatherApp:def__init__(self,root):s
Python写一个脚本——30行代码——1秒实现PDF任意页码拆分穿梭的编织者 Python精选 pdf python
一、引入库importosfromPyPDF2importPdfReader,PdfWriter二、定义拆分方法defsplit_pdf(input_path,output_dir,ranges):ifnotos.path.exists(output_dir):os.makedirs(output_dir)withopen(input_path,'rb')asfile:pdf=PdfReader(
5.PE——使用代码在任意节空白区添加shellcode 蓝屏达人 PE文件结构 windows
继上一篇手动添加shellcode，这篇以代码来实现，要思路的话还请去上一篇看，这篇为纯代码main代码：#pragmaonce#include"FileUtil.h"#include"ImageUtil.h"intmain(){unsignedintsize;char*buf=ReadFile("D:\\project\\cpp\\test.exe",&size);//检查是否读取成功if(bu
LeetCode第85题_最大矩形 @蓝莓果粒茶算法 leetcode 算法职场和发展数据结构 c++python unity
LeetCode第85题：最大矩形题目描述给定一个仅包含0和1的二维二进制矩阵，找出只包含1的最大矩形，并返回其面积。难度困难问题链接最大矩形示例示例1:输入：matrix=[["1","0","1","0","0"],["1","0","1","1","1"],["1","1","1","1","1"],["1","0","0","1","0"]]输出：6解释：最大矩形如上图所示。示例2:输入：
大模型架构记录7-langchain 处女座_三月 LLM langchain
一Langchain的应用目录：langchain的overviewprompttemplatemodelsandoutputparsers1.什么是langchain,为什么需要langchain?问题：如何没有langchain会怎么样？一个项目可能会包括：调用多个不同的大模型（gpt4,视频生成...)向量数据库数据类型（读取，trunk的切分...)langchain是面于大模型开发的框架
模拟类似 DeepSeek 的对话二川bro 前端智能AI 前端人工智能
以下是一个完整的JavaScript数据流式获取实现方案，模拟类似DeepSeek的对话式逐段返回效果。包含前端实现、后端模拟和详细注释：流式对话演示#output{border:1pxsolid#ccc;padding:20px;margin:20px;min-height:200px;font-family:monospace;white-space:pre-wrap;}.loading{di
什么是yocto(理清yocto poky openembedded bitbake间关系) 口袋物联 TI AM62x平台从入门到精通系列 yocto yocto poky bitbake openembedded
一基本概念TheYoctoProjectisanopen-sourceprojectthatdeliversasetoftoolsthatcreateoperatingsystemimagesforembeddedLinuxsystems.PokyisthereferenceoperatingsystemdistributionbuiltwithYoctoProjecttools,andOpenE
每天一道算法题【蓝桥杯】【下降路径最小和】桦0 题解算法蓝桥杯 c++leetcode
思路使用dp表来解决问题为了方便填写dp表，多初始化一圈格子状态转移方程dp[i][j]=min(dp[i-1][j-1],min(dp[i-1][j],dp[i-1][j+1]))+matrix[i-1][j-1];每个元素等于上一行元素最小的那个加上本格元素最后遍历最后一行dp表找最小值for(intj=1;jusingnamespacestd;classSolution{public:int
使用Python的 multiprocessing 模块实现多进程并行计算（上完整代码）小码小李开发语言 python 数据库
使用Python的multiprocessing模块实现多进程并行计算的较为详细复杂的示例代码，用于计算一个较大范围内数字的平方，并将结果汇总。以下是一个更具体、复杂且详尽的多进程并行计算代码示例，用于分析多个大型文本文件中单词出现的频率：importmultiprocessingimporttimeimportrefromcollectionsimportCounter#函数用于读取单个文件内容
【Python】multiprocessing 模块：多进程并行计算彬彬侠 Python基础 multiprocessing 多进程 Process Pool Manager Lock python
Pythonmultiprocessing模块Python的multiprocessing模块用于多进程并行计算，可以充分利用多核CPU进行任务加速，突破PythonGIL（全局解释器锁）的限制，提高程序执行效率。1.为什么使用multiprocessing？Python默认的threading模块使用线程进行并发，但由于GIL（全局解释器锁）的存在，多线程无法真正实现CPU级别的并行计算，适用于
Flutter中使用image_picker拍照并上传 aiguangyuan Flutter 移动端开发 Flutter
1.安装插件配置image_picker插件。dependencies:flutter:sdk:flutterflutter_localizations:sdk:flutterdate_format:^1.0.6flutter_cupertino_date_picker:^1.0.26+2flutter_swiper:^1.1.6fluttertoast:^7.1.6http:^0.12.2flu
Flutter桌面开发（三、widget布局与表单）左钦杨 flutter javascript android
一、流式布局横铺或者竖着铺Row或者Column这俩都是有Children的就是可以有多个子元素例子：Row(Children:[Container(),Container(),Container(),]）Container类似于html中的DIV可以设置背景border和宽度高度Container(decoration:BoxDecoration(image:DecorationImage(im
使用Qt创建悬浮窗口水瓶丫头站住 Qt Qt
在Qt中创建悬浮窗口（如无边框、可拖动的浮动面板或提示框）可以通过以下方法实现。以下是几种常见场景的解决方案：方法1：使用无边框窗口+鼠标事件拖动适用于自定义浮动工具窗口（如Photoshop的工具栏）。#include#includeclassFloatingWindow:publicQWidget{public:FloatingWindow(QWidget*parent=nullptr):QW
CListCtrl使用完全指南 panjean VC/MFC转载的文章 list header sorting wizard callback listview
创建图形列表并和CListCtrl关联：m_image_list.Create(IDB_CALLER2,16,10,RGB(192,192,192));m_image_list.SetBkColor(GetSysColor(COLOR_WINDOW));m_caller_list.SetImageList(&m_image_list,LVSIL_SMALL);为报表添加4列：char*szColu
第13届蓝桥杯青少组C++中级组省赛星卯教育tony 电子学会C语言考级蓝桥杯C++竞赛 c++蓝桥杯算法
一、选择题（100分）选择题1：（20分）以下对main函描述正确的是（C）。A.main函数必须写在所有函数的前面B.main函数必须写在所有函数的后面C.main函数可以写在任何位置，但不能放到其他函数里D.main函数必须卸载固定位置选择题2：（20分）已知chara；floatb；doublec;执行语句c=a+b+c；后变量c的类型是（C）。A.charB.floatC.doubleD.
ueditor图片上传跨域问题 hello_simon javascript php ueditor 图片上传跨域
ueditor置于A域，图片置于B域，B域需放flash跨域策略文件crossdomain.xml，如下修改ueditor配置文件ueditor.config.js//图片上传配置区,imageUrl:"http://abupload.xxx.com/imageUp.php"//图片上传提交地址,imagePath:"http://abupload.xxx.com/"
python笔记：进程和线程—分布式进程 zyckhuntoria python foundation
一、分布式进程Process可以分布到多台机器上，而Thread最多只能分布到同一台机器的多个CPU上。Python的multiprocessing模块不但支持多进程，其中managers子模块还支持把多进程分布到多台机器上。一个服务进程可以作为调度者，将任务分布到其他多个进程中，依靠网络通信。由于managers模块封装很好，不必了解网络通信的细节，就可以很容易地编写分布式多进程程序。二、举例实
How to improve the solder creep of solder paste during the soldering process? px5213344 pcb工艺
Solderpastecreepingtinforprintingqualityandsolderingeffectiscritical.Toimprovethesolderpasteinthesolderingprocesstoclimbthetin,youcanstartfromthefollowingaspects:1.thepasteselectionandprocessingSolder
推荐开源项目：YFT Design - 强大的在线图片设计工具咎旗盼Jewel
推荐开源项目：YFTDesign-强大的在线图片设计工具yft-design基于fabric.js的图片设计,fabric.jsandvue3andtypescriptandelement-plus,supportingthemostcommonlyusedelementtypessuchastext,images,shapes,lines,QRcodes,andbarcodes.Eachelem
html页面js获取参数值 0624chenhong html
1.js获取参数值js function GetQueryString(name) { var reg = new RegExp("(^|&)"+ name +"=([^&]*)(&|$)"); var r = windo
MongoDB 在多线程高并发下的问题 BigCat2013 mongodb DB 高并发重复数据
最近项目用到 MongoDB , 主要是一些读取数据及改状态位的操作. 因为是结合了最近流行的 Storm进行大数据的分析处理，并将分析结果插入Vertica数据库，所以在多线程高并发的情境下, 会发现 Vertica 数据库中有部分重复的数据. 这到底是什么原因导致的呢？笔者开始也是一筹莫展，重复去看 MongoDB 的 API , 终于有了新发现： com.mongodb.DB 这个类有
c++ 用类模版实现链表(c++语言程序设计第四版示例代码) CrazyMizzz 数据结构 C++
#include<iostream> #include<cassert> using namespace std; template<class T> class Node { private: Node<T> * next; public: T data;
最近情况麦田的设计者感慨考试生活
在五月黄梅天的岁月里，一年两次的软考又要开始了。到目前为止，我已经考了多达三次的软考，最后的结果就是通过了初级考试（程序员）。人啊，就是不满足，考了初级就希望考中级，于是，这学期我就报考了中级，明天就要考试。感觉机会不大，期待奇迹发生吧。这个学期忙于练车，写项目，反正最后是一团糟。后天还要考试科目二。这个星期真的是很艰难的一周，希望能快点度过。
linux系统中用pkill踢出在线登录用户被触发 linux
由于linux服务器允许多用户登录，公司很多人知道密码，工作造成一定的障碍所以需要有时踢出指定的用户 1/#who 查出当前有那些终端登录（用 w 命令更详细） # who root pts/0 2010-10-28 09:36 (192
仿QQ聊天第二版肆无忌惮_ qq
在第一版之上的改进内容: 第一版链接: http://479001499.iteye.com/admin/blogs/2100893 用map存起来号码对应的聊天窗口对象,解决私聊的时候所有消息发到一个窗口的问题. 增加ViewInfo类,这个是信息预览的窗口,如果是自己的信息,则可以进行编辑. 信息修改后上传至服务器再告诉所有用户,自己的窗口
java读取配置文件知了ing
1，java读取.properties配置文件 InputStream in; try { in = test.class.getClassLoader().getResourceAsStream("config/ipnetOracle.properties");//配置文件的路径 Properties p = new Properties()
__attribute__ 你知多少？矮蛋蛋 C++gcc
原文地址: http://www.cnblogs.com/astwish/p/3460618.html GNU C 的一大特色就是__attribute__ 机制。__attribute__ 可以设置函数属性（Function Attribute ）、变量属性（Variable Attribute ）和类型属性（Type Attribute ）。 __attribute__ 书写特征是：
jsoup使用笔记 alleni123 java 爬虫 JSoup
<dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.7.3</version> </dependency> 2014/08/28 今天遇到这种形式，
JAVA中的集合 Collectio 和Map的简单使用及方法百合不是茶 list map set
List ,set ,map的使用方法和区别 java容器类类库的用途是保存对象，并将其分为两个概念： Collection集合：一个独立的序列，这些序列都服从一条或多条规则;List必须按顺序保存元素，set不能重复元素；Queue按照排队规则来确定对象产生的顺序（通常与他们被插入的
杀LINUX的JOB进程 bijian1013 linux unix
今天发现数据库一个JOB一直在执行，都执行了好几个小时还在执行，所以想办法给删除掉系统环境： ORACLE 10G Linux操作系统操作步骤如下：第一步.查询出来那个job在运行，找个对应的SID字段 select * from dba_jobs_running--找到job对应的sid &n
Spring AOP详解 bijian1013 java spring AOP
最近项目中遇到了以下几点需求，仔细思考之后，觉得采用AOP来解决。一方面是为了以更加灵活的方式来解决问题，另一方面是借此机会深入学习Spring AOP相关的内容。例如，以下需求不用AOP肯定也能解决，至于是否牵强附会，仁者见仁智者见智。 1.对部分函数的调用进行日志记录，用于观察特定问题在运行过程中的函数调用
[Gson六]Gson类型适配器(TypeAdapter) bit1129 Adapter
TypeAdapter的使用动机 Gson在序列化和反序列化时，默认情况下，是按照POJO类的字段属性名和JSON串键进行一一映射匹配，然后把JSON串的键对应的值转换成POJO相同字段对应的值，反之亦然，在这个过程中有一个JSON串Key对应的Value和对象之间如何转换(序列化/反序列化)的问题。以Date为例，在序列化和反序列化时，Gson默认使用java.
【spark八十七】给定Driver Program，如何判断哪些代码在Driver运行，哪些代码在Worker上执行 bit1129 driver
Driver Program是用户编写的提交给Spark集群执行的application，它包含两部分作为驱动： Driver与Master、Worker协作完成application进程的启动、DAG划分、计算任务封装、计算任务分发到各个计算节点(Worker)、计算资源的分配等。计算逻辑本身，当计算任务在Worker执行时，执行计算逻辑完成application的计算任务
nginx 经验总结 ronin47 nginx 总结
　　　深感nginx的强大，只学了皮毛，把学下的记录。　　　获取Header 信息，一般是以$http_XX（ＸＸ是小写）获取body,通过接口，再展开，根据Ｋ取Ｖ　　　获取uri,以$arg_XX &n
轩辕互动-1.求三个整数中第二大的数2.整型数组的平衡点 bylijinnan 数组
import java.util.ArrayList; import java.util.Arrays; import java.util.List; public class ExoWeb { public static void main(String[] args) { ExoWeb ew=new ExoWeb(); System.out.pri
Netty源码学习-Java-NIO-Reactor bylijinnan java 多线程 netty
Netty里面采用了NIO-based Reactor Pattern 了解这个模式对学习Netty非常有帮助参考以下两篇文章： http://jeewanthad.blogspot.com/2013/02/reactor-pattern-explained-part-1.html http://gee.cs.oswego.edu/dl/cpjslides/nio.pdf
AOP通俗理解 cngolon spring AOP
1.我所知道的aop 初看aop,上来就是一大堆术语，而且还有个拉风的名字，面向切面编程，都说是OOP的一种有益补充等等。一下子让你不知所措，心想着：怪不得很多人都和我说aop多难多难。当我看进去以后，我才发现：它就是一些java基础上的朴实无华的应用，包括ioc，包括许许多多这样的名词，都是万变不离其宗而已。 2.为什么用aop&nb
cursor variable 实例 ctrain variable
create or replace procedure proc_test01 as type emp_row is record( empno emp.empno%type, ename emp.ename%type, job emp.job%type, mgr emp.mgr%type, hiberdate emp.hiredate%type, sal emp.sal%t
shell报bash: service: command not found解决方法 daizj linux shell service jps
今天在执行一个脚本时，本来是想在脚本中启动hdfs和hive等程序，可以在执行到service hive-server start等启动服务的命令时会报错，最终解决方法记录一下：脚本报错如下： ./olap_quick_intall.sh: line 57: service: command not found ./olap_quick_intall.sh: line 59
40个迹象表明你还是PHP菜鸟 dcj3sjt126com 设计模式 PHP 正则表达式 oop
你是PHP菜鸟，如果你：1. 不会利用如phpDoc 这样的工具来恰当地注释你的代码2. 对优秀的集成开发环境如Zend Studio 或Eclipse PDT 视而不见3. 从未用过任何形式的版本控制系统，如Subclipse4. 不采用某种编码与命名标准，以及通用约定，不能在项目开发周期里贯彻落实5. 不使用统一开发方式6. 不转换（或）也不验证某些输入或SQL查询串（译注：参考PHP相关函
Android逐帧动画的实现 dcj3sjt126com android
一、代码实现： private ImageView iv; private AnimationDrawable ad; @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout
java远程调用linux的命令或者脚本 eksliang linux ganymed-ssh2
转载请出自出处： http://eksliang.iteye.com/blog/2105862 Java通过SSH2协议执行远程Shell脚本(ganymed-ssh2-build210.jar) 使用步骤如下： 1.导包官网下载: http://www.ganymed.ethz.ch/ssh2/ ma
adb端口被占用问题 gqdy365 adb
最近重新安装的电脑，配置了新环境，老是出现： adb server is out of date. killing... ADB server didn't ACK * failed to start daemon * 百度了一下，说是端口被占用，我开个eclipse，然后打开cmd，就提示这个，很烦人。一个比较彻底的解决办法就是修改
ASP.NET使用FileUpload上传文件 hvt .net C#hovertree asp.net webform
前台代码： <asp:FileUpload ID="fuKeleyi" runat="server" /> <asp:Button ID="BtnUp" runat="server" onclick="BtnUp_Click" Text="上传" />
代码之谜（四）- 浮点数（从惊讶到思考） justjavac 浮点数精度代码之谜 IEEE
在『代码之谜』系列的前几篇文章中，很多次出现了浮点数。浮点数在很多编程语言中被称为简单数据类型，其实，浮点数比起那些复杂数据类型（比如字符串）来说，一点都不简单。单单是说明 IEEE浮点数就可以写一本书了，我将用几篇博文来简单的说说我所理解的浮点数，算是抛砖引玉吧。一次面试记得多年前我招聘 Java 程序员时的一次关于浮点数、二分法、编码的面试，多年以后，他已经称为了一名很出色的
数据结构随记_1 lx.asymmetric 数据结构笔记
第一章 1.数据结构包括数据的逻辑结构、数据的物理/存储结构和数据的逻辑关系这三个方面的内容。 2.数据的存储结构可用四种基本的存储方法表示，它们分别是顺序存储、链式存储、索引存储和散列存储。 3.数据运算最常用的有五种，分别是查找/检索、排序、插入、删除、修改。 4.算法主要有以下五个特性：输入、输出、可行性、确定性和有穷性。 5.算法分析的
linux的会话和进程组网络接口 linux
会话：一个或多个进程组。起于用户登录，终止于用户退出。此期间所有进程都属于这个会话期。会话首进程：调用setsid创建会话的进程1.规定组长进程不能调用setsid，因为调用setsid后，调用进程会成为新的进程组的组长进程.如何保证？先调用fork，然后终止父进程，此时由于子进程的进程组ID为父进程的进程组ID，而子进程的ID是重新分配的，所以保证子进程不会是进程组长，从而子进程可以调用se
二维数组元素的连续求解 1140566087 二维数组 ACM
import java.util.HashMap; public class Title { public static void main(String[] args){ f(); } // 二位数组的应用 //12、二维数组中，哪一行或哪一列的连续存放的0的个数最多，是几个0。注意，是“连续”。 public static void f(){
也谈什么时候Java比C++快 windshome java C++
刚打开iteye就看到这个标题“Java什么时候比C++快”，觉得很好笑。你要比，就比同等水平的基础上的相比，笨蛋写得C代码和C++代码，去和高手写的Java代码比效率，有什么意义呢？我是写密码算法的，深刻知道算法C和C++实现和Java实现之间的效率差，甚至也比对过C代码和汇编代码的效率差，计算机是个死的东西，再怎么优化，Java也就是和C