ECC校验优化之路

引子：今天上嵌入式课程时，老师讲到Linux的文件系统，讲的重点是Linux中对于nand flash的ECC校验和纠错。上课很认真地听完，确实叹服代码作者的水平。

晚上特地下载了Linux最新的内核，找到了作者自己写的那篇文章（路径为"linux-3.13.5\Documentation\mtd\nand_ecc.txt"），现摘录如下：

Introduction

============



Having looked at the linux mtd/nand driver and more specific at nand_ecc.c

I felt there was room for optimisation. I bashed the code for a few hours

performing tricks like table lookup removing superfluous code etc.

After that the speed was increased by 35-40%.

Still I was not too happy as I felt there was additional room for improvement.



Bad! I was hooked.

I decided to annotate my steps in this file. Perhaps it is useful to someone

or someone learns something from it.





The problem

===========



NAND flash (at least SLC one) typically has sectors of 256 bytes.

However NAND flash is not extremely reliable so some error detection

(and sometimes correction) is needed.



This is done by means of a Hamming code. I'll try to explain it in

laymans terms (and apologies to all the pro's in the field in case I do

not use the right terminology, my coding theory class was almost 30

years ago, and I must admit it was not one of my favourites).



As I said before the ecc calculation is performed on sectors of 256

bytes. This is done by calculating several parity bits over the rows and

columns. The parity used is even parity which means that the parity bit = 1

if the data over which the parity is calculated is 1 and the parity bit = 0

if the data over which the parity is calculated is 0. So the total

number of bits over the data over which the parity is calculated + the

parity bit is even. (see wikipedia if you can't follow this).

Parity is often calculated by means of an exclusive or operation,

sometimes also referred to as xor. In C the operator for xor is ^



Back to ecc.

Let's give a small figure:



byte   0:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp0 rp2 rp4 ... rp14

byte   1:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp1 rp2 rp4 ... rp14

byte   2:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp0 rp3 rp4 ... rp14

byte   3:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp1 rp3 rp4 ... rp14

byte   4:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp0 rp2 rp5 ... rp14

....

byte 254:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp0 rp3 rp5 ... rp15

byte 255:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp1 rp3 rp5 ... rp15

           cp1  cp0  cp1  cp0  cp1  cp0  cp1  cp0

           cp3  cp3  cp2  cp2  cp3  cp3  cp2  cp2

           cp5  cp5  cp5  cp5  cp4  cp4  cp4  cp4



This figure represents a sector of 256 bytes.

cp is my abbreviation for column parity, rp for row parity.



Let's start to explain column parity.

cp0 is the parity that belongs to all bit0, bit2, bit4, bit6.

so the sum of all bit0, bit2, bit4 and bit6 values + cp0 itself is even.

Similarly cp1 is the sum of all bit1, bit3, bit5 and bit7.

cp2 is the parity over bit0, bit1, bit4 and bit5

cp3 is the parity over bit2, bit3, bit6 and bit7.

cp4 is the parity over bit0, bit1, bit2 and bit3.

cp5 is the parity over bit4, bit5, bit6 and bit7.

Note that each of cp0 .. cp5 is exactly one bit.



Row parity actually works almost the same.

rp0 is the parity of all even bytes (0, 2, 4, 6, ... 252, 254)

rp1 is the parity of all odd bytes (1, 3, 5, 7, ..., 253, 255)

rp2 is the parity of all bytes 0, 1, 4, 5, 8, 9, ...

(so handle two bytes, then skip 2 bytes).

rp3 is covers the half rp2 does not cover (bytes 2, 3, 6, 7, 10, 11, ...)

for rp4 the rule is cover 4 bytes, skip 4 bytes, cover 4 bytes, skip 4 etc.

so rp4 calculates parity over bytes 0, 1, 2, 3, 8, 9, 10, 11, 16, ...)

and rp5 covers the other half, so bytes 4, 5, 6, 7, 12, 13, 14, 15, 20, ..

The story now becomes quite boring. I guess you get the idea.

rp6 covers 8 bytes then skips 8 etc

rp7 skips 8 bytes then covers 8 etc

rp8 covers 16 bytes then skips 16 etc

rp9 skips 16 bytes then covers 16 etc

rp10 covers 32 bytes then skips 32 etc

rp11 skips 32 bytes then covers 32 etc

rp12 covers 64 bytes then skips 64 etc

rp13 skips 64 bytes then covers 64 etc

rp14 covers 128 bytes then skips 128

rp15 skips 128 bytes then covers 128



In the end the parity bits are grouped together in three bytes as

follows:

ECC    Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0

ECC 0   rp07  rp06  rp05  rp04  rp03  rp02  rp01  rp00

ECC 1   rp15  rp14  rp13  rp12  rp11  rp10  rp09  rp08

ECC 2   cp5   cp4   cp3   cp2   cp1   cp0      1     1



I detected after writing this that ST application note AN1823

(http://www.st.com/stonline/) gives a much

nicer picture.(but they use line parity as term where I use row parity)

Oh well, I'm graphically challenged, so suffer with me for a moment :-)

And I could not reuse the ST picture anyway for copyright reasons.





Attempt 0

=========



Implementing the parity calculation is pretty simple.

In C pseudocode:

for (i = 0; i < 256; i++)

{

    if (i & 0x01)

       rp1 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp1;

    else

       rp0 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp1;

    if (i & 0x02)

       rp3 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp3;

    else

       rp2 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp2;

    if (i & 0x04)

      rp5 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp5;

    else

      rp4 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp4;

    if (i & 0x08)

      rp7 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp7;

    else

      rp6 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp6;

    if (i & 0x10)

      rp9 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp9;

    else

      rp8 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp8;

    if (i & 0x20)

      rp11 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp11;

    else

    rp10 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp10;

    if (i & 0x40)

      rp13 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp13;

    else

      rp12 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp12;

    if (i & 0x80)

      rp15 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp15;

    else

      rp14 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp14;

    cp0 = bit6 ^ bit4 ^ bit2 ^ bit0 ^ cp0;

    cp1 = bit7 ^ bit5 ^ bit3 ^ bit1 ^ cp1;

    cp2 = bit5 ^ bit4 ^ bit1 ^ bit0 ^ cp2;

    cp3 = bit7 ^ bit6 ^ bit3 ^ bit2 ^ cp3

    cp4 = bit3 ^ bit2 ^ bit1 ^ bit0 ^ cp4

    cp5 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ cp5

}





Analysis 0

==========



C does have bitwise operators but not really operators to do the above

efficiently (and most hardware has no such instructions either).

Therefore without implementing this it was clear that the code above was

not going to bring me a Nobel prize :-)



Fortunately the exclusive or operation is commutative, so we can combine

the values in any order. So instead of calculating all the bits

individually, let us try to rearrange things.

For the column parity this is easy. We can just xor the bytes and in the

end filter out the relevant bits. This is pretty nice as it will bring

all cp calculation out of the if loop.



Similarly we can first xor the bytes for the various rows.

This leads to:





Attempt 1

=========



const char parity[256] = {

    0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,

    1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,

    1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,

    0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,

    1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,

    0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,

    0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,

    1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,

    1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,

    0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,

    0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,

    1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,

    0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,

    1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,

    1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,

    0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0

};



void ecc1(const unsigned char *buf, unsigned char *code)

{

    int i;

    const unsigned char *bp = buf;

    unsigned char cur;

    unsigned char rp0, rp1, rp2, rp3, rp4, rp5, rp6, rp7;

    unsigned char rp8, rp9, rp10, rp11, rp12, rp13, rp14, rp15;

    unsigned char par;



    par = 0;

    rp0 = 0; rp1 = 0; rp2 = 0; rp3 = 0;

    rp4 = 0; rp5 = 0; rp6 = 0; rp7 = 0;

    rp8 = 0; rp9 = 0; rp10 = 0; rp11 = 0;

    rp12 = 0; rp13 = 0; rp14 = 0; rp15 = 0;



    for (i = 0; i < 256; i++)

    {

        cur = *bp++;

        par ^= cur;

        if (i & 0x01) rp1 ^= cur; else rp0 ^= cur;

        if (i & 0x02) rp3 ^= cur; else rp2 ^= cur;

        if (i & 0x04) rp5 ^= cur; else rp4 ^= cur;

        if (i & 0x08) rp7 ^= cur; else rp6 ^= cur;

        if (i & 0x10) rp9 ^= cur; else rp8 ^= cur;

        if (i & 0x20) rp11 ^= cur; else rp10 ^= cur;

        if (i & 0x40) rp13 ^= cur; else rp12 ^= cur;

        if (i & 0x80) rp15 ^= cur; else rp14 ^= cur;

    }

    code[0] =

        (parity[rp7] << 7) |

        (parity[rp6] << 6) |

        (parity[rp5] << 5) |

        (parity[rp4] << 4) |

        (parity[rp3] << 3) |

        (parity[rp2] << 2) |

        (parity[rp1] << 1) |

        (parity[rp0]);

    code[1] =

        (parity[rp15] << 7) |

        (parity[rp14] << 6) |

        (parity[rp13] << 5) |

        (parity[rp12] << 4) |

        (parity[rp11] << 3) |

        (parity[rp10] << 2) |

        (parity[rp9]  << 1) |

        (parity[rp8]);

    code[2] =

        (parity[par & 0xf0] << 7) |

        (parity[par & 0x0f] << 6) |

        (parity[par & 0xcc] << 5) |

        (parity[par & 0x33] << 4) |

        (parity[par & 0xaa] << 3) |

        (parity[par & 0x55] << 2);

    code[0] = ~code[0];

    code[1] = ~code[1];

    code[2] = ~code[2];

}



Still pretty straightforward. The last three invert statements are there to

give a checksum of 0xff 0xff 0xff for an empty flash. In an empty flash

all data is 0xff, so the checksum then matches.



I also introduced the parity lookup. I expected this to be the fastest

way to calculate the parity, but I will investigate alternatives later

on.





Analysis 1

==========



The code works, but is not terribly efficient. On my system it took

almost 4 times as much time as the linux driver code. But hey, if it was

*that* easy this would have been done long before.

No pain. no gain.



Fortunately there is plenty of room for improvement.



In step 1 we moved from bit-wise calculation to byte-wise calculation.

However in C we can also use the unsigned long data type and virtually

every modern microprocessor supports 32 bit operations, so why not try

to write our code in such a way that we process data in 32 bit chunks.



Of course this means some modification as the row parity is byte by

byte. A quick analysis:

for the column parity we use the par variable. When extending to 32 bits

we can in the end easily calculate p0 and p1 from it.

(because par now consists of 4 bytes, contributing to rp1, rp0, rp1, rp0

respectively)

also rp2 and rp3 can be easily retrieved from par as rp3 covers the

first two bytes and rp2 the last two bytes.



Note that of course now the loop is executed only 64 times (256/4).

And note that care must taken wrt byte ordering. The way bytes are

ordered in a long is machine dependent, and might affect us.

Anyway, if there is an issue: this code is developed on x86 (to be

precise: a DELL PC with a D920 Intel CPU)



And of course the performance might depend on alignment, but I expect

that the I/O buffers in the nand driver are aligned properly (and

otherwise that should be fixed to get maximum performance).



Let's give it a try...





Attempt 2

=========



extern const char parity[256];



void ecc2(const unsigned char *buf, unsigned char *code)

{

    int i;

    const unsigned long *bp = (unsigned long *)buf;

    unsigned long cur;

    unsigned long rp0, rp1, rp2, rp3, rp4, rp5, rp6, rp7;

    unsigned long rp8, rp9, rp10, rp11, rp12, rp13, rp14, rp15;

    unsigned long par;



    par = 0;

    rp0 = 0; rp1 = 0; rp2 = 0; rp3 = 0;

    rp4 = 0; rp5 = 0; rp6 = 0; rp7 = 0;

    rp8 = 0; rp9 = 0; rp10 = 0; rp11 = 0;

    rp12 = 0; rp13 = 0; rp14 = 0; rp15 = 0;



    for (i = 0; i < 64; i++)

    {

        cur = *bp++;

        par ^= cur;

        if (i & 0x01) rp5 ^= cur; else rp4 ^= cur;

        if (i & 0x02) rp7 ^= cur; else rp6 ^= cur;

        if (i & 0x04) rp9 ^= cur; else rp8 ^= cur;

        if (i & 0x08) rp11 ^= cur; else rp10 ^= cur;

        if (i & 0x10) rp13 ^= cur; else rp12 ^= cur;

        if (i & 0x20) rp15 ^= cur; else rp14 ^= cur;

    }

    /*

       we need to adapt the code generation for the fact that rp vars are now

       long; also the column parity calculation needs to be changed.

       we'll bring rp4 to 15 back to single byte entities by shifting and

       xoring

    */

    rp4 ^= (rp4 >> 16); rp4 ^= (rp4 >> 8); rp4 &= 0xff;

    rp5 ^= (rp5 >> 16); rp5 ^= (rp5 >> 8); rp5 &= 0xff;

    rp6 ^= (rp6 >> 16); rp6 ^= (rp6 >> 8); rp6 &= 0xff;

    rp7 ^= (rp7 >> 16); rp7 ^= (rp7 >> 8); rp7 &= 0xff;

    rp8 ^= (rp8 >> 16); rp8 ^= (rp8 >> 8); rp8 &= 0xff;

    rp9 ^= (rp9 >> 16); rp9 ^= (rp9 >> 8); rp9 &= 0xff;

    rp10 ^= (rp10 >> 16); rp10 ^= (rp10 >> 8); rp10 &= 0xff;

    rp11 ^= (rp11 >> 16); rp11 ^= (rp11 >> 8); rp11 &= 0xff;

    rp12 ^= (rp12 >> 16); rp12 ^= (rp12 >> 8); rp12 &= 0xff;

    rp13 ^= (rp13 >> 16); rp13 ^= (rp13 >> 8); rp13 &= 0xff;

    rp14 ^= (rp14 >> 16); rp14 ^= (rp14 >> 8); rp14 &= 0xff;

    rp15 ^= (rp15 >> 16); rp15 ^= (rp15 >> 8); rp15 &= 0xff;

    rp3 = (par >> 16); rp3 ^= (rp3 >> 8); rp3 &= 0xff;

    rp2 = par & 0xffff; rp2 ^= (rp2 >> 8); rp2 &= 0xff;

    par ^= (par >> 16);

    rp1 = (par >> 8); rp1 &= 0xff;

    rp0 = (par & 0xff);

    par ^= (par >> 8); par &= 0xff;



    code[0] =

        (parity[rp7] << 7) |

        (parity[rp6] << 6) |

        (parity[rp5] << 5) |

        (parity[rp4] << 4) |

        (parity[rp3] << 3) |

        (parity[rp2] << 2) |

        (parity[rp1] << 1) |

        (parity[rp0]);

    code[1] =

        (parity[rp15] << 7) |

        (parity[rp14] << 6) |

        (parity[rp13] << 5) |

        (parity[rp12] << 4) |

        (parity[rp11] << 3) |

        (parity[rp10] << 2) |

        (parity[rp9]  << 1) |

        (parity[rp8]);

    code[2] =

        (parity[par & 0xf0] << 7) |

        (parity[par & 0x0f] << 6) |

        (parity[par & 0xcc] << 5) |

        (parity[par & 0x33] << 4) |

        (parity[par & 0xaa] << 3) |

        (parity[par & 0x55] << 2);

    code[0] = ~code[0];

    code[1] = ~code[1];

    code[2] = ~code[2];

}



The parity array is not shown any more. Note also that for these

examples I kinda deviated from my regular programming style by allowing

multiple statements on a line, not using { } in then and else blocks

with only a single statement and by using operators like ^=





Analysis 2

==========



The code (of course) works, and hurray: we are a little bit faster than

the linux driver code (about 15%). But wait, don't cheer too quickly.

THere is more to be gained.

If we look at e.g. rp14 and rp15 we see that we either xor our data with

rp14 or with rp15. However we also have par which goes over all data.

This means there is no need to calculate rp14 as it can be calculated from

rp15 through rp14 = par ^ rp15;

(or if desired we can avoid calculating rp15 and calculate it from

rp14).  That is why some places refer to inverse parity.

Of course the same thing holds for rp4/5, rp6/7, rp8/9, rp10/11 and rp12/13.

Effectively this means we can eliminate the else clause from the if

statements. Also we can optimise the calculation in the end a little bit

by going from long to byte first. Actually we can even avoid the table

lookups



Attempt 3

=========



Odd replaced:

        if (i & 0x01) rp5 ^= cur; else rp4 ^= cur;

        if (i & 0x02) rp7 ^= cur; else rp6 ^= cur;

        if (i & 0x04) rp9 ^= cur; else rp8 ^= cur;

        if (i & 0x08) rp11 ^= cur; else rp10 ^= cur;

        if (i & 0x10) rp13 ^= cur; else rp12 ^= cur;

        if (i & 0x20) rp15 ^= cur; else rp14 ^= cur;

with

        if (i & 0x01) rp5 ^= cur;

        if (i & 0x02) rp7 ^= cur;

        if (i & 0x04) rp9 ^= cur;

        if (i & 0x08) rp11 ^= cur;

        if (i & 0x10) rp13 ^= cur;

        if (i & 0x20) rp15 ^= cur;



        and outside the loop added:

    rp4  = par ^ rp5;

    rp6  = par ^ rp7;

    rp8  = par ^ rp9;

    rp10  = par ^ rp11;

    rp12  = par ^ rp13;

    rp14  = par ^ rp15;



And after that the code takes about 30% more time, although the number of

statements is reduced. This is also reflected in the assembly code.





Analysis 3

==========



Very weird. Guess it has to do with caching or instruction parallellism

or so. I also tried on an eeePC (Celeron, clocked at 900 Mhz). Interesting

observation was that this one is only 30% slower (according to time)

executing the code as my 3Ghz D920 processor.



Well, it was expected not to be easy so maybe instead move to a

different track: let's move back to the code from attempt2 and do some

loop unrolling. This will eliminate a few if statements. I'll try

different amounts of unrolling to see what works best.





Attempt 4

=========



Unrolled the loop 1, 2, 3 and 4 times.

For 4 the code starts with:



    for (i = 0; i < 4; i++)

    {

        cur = *bp++;

        par ^= cur;

        rp4 ^= cur;

        rp6 ^= cur;

        rp8 ^= cur;

        rp10 ^= cur;

        if (i & 0x1) rp13 ^= cur; else rp12 ^= cur;

        if (i & 0x2) rp15 ^= cur; else rp14 ^= cur;

        cur = *bp++;

        par ^= cur;

        rp5 ^= cur;

        rp6 ^= cur;

        ...





Analysis 4

==========



Unrolling once gains about 15%

Unrolling twice keeps the gain at about 15%

Unrolling three times gives a gain of 30% compared to attempt 2.

Unrolling four times gives a marginal improvement compared to unrolling

three times.



I decided to proceed with a four time unrolled loop anyway. It was my gut

feeling that in the next steps I would obtain additional gain from it.



The next step was triggered by the fact that par contains the xor of all

bytes and rp4 and rp5 each contain the xor of half of the bytes.

So in effect par = rp4 ^ rp5. But as xor is commutative we can also say

that rp5 = par ^ rp4. So no need to keep both rp4 and rp5 around. We can

eliminate rp5 (or rp4, but I already foresaw another optimisation).

The same holds for rp6/7, rp8/9, rp10/11 rp12/13 and rp14/15.





Attempt 5

=========



Effectively so all odd digit rp assignments in the loop were removed.

This included the else clause of the if statements.

Of course after the loop we need to correct things by adding code like:

    rp5 = par ^ rp4;

Also the initial assignments (rp5 = 0; etc) could be removed.

Along the line I also removed the initialisation of rp0/1/2/3.





Analysis 5

==========



Measurements showed this was a good move. The run-time roughly halved

compared with attempt 4 with 4 times unrolled, and we only require 1/3rd

of the processor time compared to the current code in the linux kernel.



However, still I thought there was more. I didn't like all the if

statements. Why not keep a running parity and only keep the last if

statement. Time for yet another version!





Attempt 6

=========



THe code within the for loop was changed to:



    for (i = 0; i < 4; i++)

    {

        cur = *bp++; tmppar  = cur; rp4 ^= cur;

        cur = *bp++; tmppar ^= cur; rp6 ^= tmppar;

        cur = *bp++; tmppar ^= cur; rp4 ^= cur;

        cur = *bp++; tmppar ^= cur; rp8 ^= tmppar;



        cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur;

        cur = *bp++; tmppar ^= cur; rp6 ^= cur;

        cur = *bp++; tmppar ^= cur; rp4 ^= cur;

        cur = *bp++; tmppar ^= cur; rp10 ^= tmppar;



        cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur; rp8 ^= cur;

        cur = *bp++; tmppar ^= cur; rp6 ^= cur; rp8 ^= cur;

        cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp8 ^= cur;

        cur = *bp++; tmppar ^= cur; rp8 ^= cur;



        cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur;

        cur = *bp++; tmppar ^= cur; rp6 ^= cur;

        cur = *bp++; tmppar ^= cur; rp4 ^= cur;

        cur = *bp++; tmppar ^= cur;



        par ^= tmppar;

        if ((i & 0x1) == 0) rp12 ^= tmppar;

        if ((i & 0x2) == 0) rp14 ^= tmppar;

    }



As you can see tmppar is used to accumulate the parity within a for

iteration. In the last 3 statements is added to par and, if needed,

to rp12 and rp14.



While making the changes I also found that I could exploit that tmppar

contains the running parity for this iteration. So instead of having:

rp4 ^= cur; rp6 = cur;

I removed the rp6 = cur; statement and did rp6 ^= tmppar; on next

statement. A similar change was done for rp8 and rp10





Analysis 6

==========



Measuring this code again showed big gain. When executing the original

linux code 1 million times, this took about 1 second on my system.

(using time to measure the performance). After this iteration I was back

to 0.075 sec. Actually I had to decide to start measuring over 10

million iterations in order not to lose too much accuracy. This one

definitely seemed to be the jackpot!



There is a little bit more room for improvement though. There are three

places with statements:

rp4 ^= cur; rp6 ^= cur;

It seems more efficient to also maintain a variable rp4_6 in the while

loop; This eliminates 3 statements per loop. Of course after the loop we

need to correct by adding:

    rp4 ^= rp4_6;

    rp6 ^= rp4_6

Furthermore there are 4 sequential assignments to rp8. This can be

encoded slightly more efficiently by saving tmppar before those 4 lines

and later do rp8 = rp8 ^ tmppar ^ notrp8;

(where notrp8 is the value of rp8 before those 4 lines).

Again a use of the commutative property of xor.

Time for a new test!





Attempt 7

=========



The new code now looks like:



    for (i = 0; i < 4; i++)

    {

        cur = *bp++; tmppar  = cur; rp4 ^= cur;

        cur = *bp++; tmppar ^= cur; rp6 ^= tmppar;

        cur = *bp++; tmppar ^= cur; rp4 ^= cur;

        cur = *bp++; tmppar ^= cur; rp8 ^= tmppar;



        cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;

        cur = *bp++; tmppar ^= cur; rp6 ^= cur;

        cur = *bp++; tmppar ^= cur; rp4 ^= cur;

        cur = *bp++; tmppar ^= cur; rp10 ^= tmppar;



        notrp8 = tmppar;

        cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;

        cur = *bp++; tmppar ^= cur; rp6 ^= cur;

        cur = *bp++; tmppar ^= cur; rp4 ^= cur;

        cur = *bp++; tmppar ^= cur;

        rp8 = rp8 ^ tmppar ^ notrp8;



        cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;

        cur = *bp++; tmppar ^= cur; rp6 ^= cur;

        cur = *bp++; tmppar ^= cur; rp4 ^= cur;

        cur = *bp++; tmppar ^= cur;



        par ^= tmppar;

        if ((i & 0x1) == 0) rp12 ^= tmppar;

        if ((i & 0x2) == 0) rp14 ^= tmppar;

    }

    rp4 ^= rp4_6;

    rp6 ^= rp4_6;





Not a big change, but every penny counts :-)





Analysis 7

==========



Actually this made things worse. Not very much, but I don't want to move

into the wrong direction. Maybe something to investigate later. Could

have to do with caching again.



Guess that is what there is to win within the loop. Maybe unrolling one

more time will help. I'll keep the optimisations from 7 for now.





Attempt 8

=========



Unrolled the loop one more time.





Analysis 8

==========



This makes things worse. Let's stick with attempt 6 and continue from there.

Although it seems that the code within the loop cannot be optimised

further there is still room to optimize the generation of the ecc codes.

We can simply calculate the total parity. If this is 0 then rp4 = rp5

etc. If the parity is 1, then rp4 = !rp5;

But if rp4 = rp5 we do not need rp5 etc. We can just write the even bits

in the result byte and then do something like

    code[0] |= (code[0] << 1);

Lets test this.





Attempt 9

=========



Changed the code but again this slightly degrades performance. Tried all

kind of other things, like having dedicated parity arrays to avoid the

shift after parity[rp7] << 7; No gain.

Change the lookup using the parity array by using shift operators (e.g.

replace parity[rp7] << 7 with:

rp7 ^= (rp7 << 4);

rp7 ^= (rp7 << 2);

rp7 ^= (rp7 << 1);

rp7 &= 0x80;

No gain.



The only marginal change was inverting the parity bits, so we can remove

the last three invert statements.



Ah well, pity this does not deliver more. Then again 10 million

iterations using the linux driver code takes between 13 and 13.5

seconds, whereas my code now takes about 0.73 seconds for those 10

million iterations. So basically I've improved the performance by a

factor 18 on my system. Not that bad. Of course on different hardware

you will get different results. No warranties!



But of course there is no such thing as a free lunch. The codesize almost

tripled (from 562 bytes to 1434 bytes). Then again, it is not that much.





Correcting errors

=================



For correcting errors I again used the ST application note as a starter,

but I also peeked at the existing code.

The algorithm itself is pretty straightforward. Just xor the given and

the calculated ecc. If all bytes are 0 there is no problem. If 11 bits

are 1 we have one correctable bit error. If there is 1 bit 1, we have an

error in the given ecc code.

It proved to be fastest to do some table lookups. Performance gain

introduced by this is about a factor 2 on my system when a repair had to

be done, and 1% or so if no repair had to be done.

Code size increased from 330 bytes to 686 bytes for this function.

(gcc 4.2, -O3)





Conclusion

==========



The gain when calculating the ecc is tremendous. Om my development hardware

a speedup of a factor of 18 for ecc calculation was achieved. On a test on an

embedded system with a MIPS core a factor 7 was obtained.

On  a test with a Linksys NSLU2 (ARMv5TE processor) the speedup was a factor

5 (big endian mode, gcc 4.1.2, -O3)

For correction not much gain could be obtained (as bitflips are rare). Then

again there are also much less cycles spent there.



It seems there is not much more gain possible in this, at least when

programmed in C. Of course it might be possible to squeeze something more

out of it with an assembler program, but due to pipeline behaviour etc

this is very tricky (at least for intel hw).



Author: Frans Meulenbroeks

Copyright (C) 2008 Koninklijke Philips Electronics NV.

加密套件以及ECDH 追梦-北极星 Wifi linux
1、加密套件：密码，算法以及安全设置http://wemedia.ifeng.com/30498593/wemedia.shtml2、ECDH密钥协商算法ECDH密钥协商算法-OrcHome3、ecdh的原理https://www.cnblogs.com/fishou/p/4206451.htmlECDH:ECC算法用途比RSA还猛，不仅可以加解密、签名验证。还可以与DH结合使用，用于密钥磋商，这
Python的加密与解密_pyarmor解码 2401_84584583 程序员 python 网络安全
随着信息化和数字化社会的发展，人们对信息安全和保密的重要性认识不断提高，于是在1997年，美国国家标准局公布实施了“美国数据加密标准（DES）”，民间力量开始全面介入密码学的研究和应用中，采用的加密算法有DES、RSA、SHA等。随着对加密强度需求的不断提高，近期又出现了AES、ECC等。使用密码学可以达到以下目的：保密性：防止用户的标识或数据被读取。数据完整性：防止数据被更改。身份验证：确保数据
对称加密和非对称加密算法分类，国密算法分类。铁锤2号各种小问题小技巧
对称加密算法对称加密算法加密和解密使用的是同一个密钥。常用的对称加密算法包括：DES、3DES、AES、RC4、RC5、RC6。非对称加密算法指加密和解密使用不同密钥的加密算法，也称为公私钥加密。假设两个用户要加密交换数据，双方交换公钥，使用时一方用对方的公钥加密，另一方即可用自己的私钥解密。常见的非对称加密算法：RSA、DSA（数字签名用）、ECC（移动设备用）、Diffie-Hellman散列
python 的sm2 生成密钥的方法，gmssl里没有提供密钥生成 CissSimkey python 算法机器学习
"""Author:tangleiDateTime:2024-11#importrandom#random不安全所以替换为secrets中的算法#选择素域，设置椭圆曲线参数"""importsecretsclassSM2_Key():default_ecc_table={'n':'FFFFFFFEFFFFFFFFFFFFFFFFFFFFFFFF7203DF6B21C6052B53BBF40939D
双算法SSL证书：满足等保、密评要求的安全利器运维
什么是双算法SSL证书？双算法SSL证书就是一种既能用国际上的加密方法（比如RSA、ECC），也能用中国特有的加密技术（比如SM2、SM3、SM4）的SSL证书。它有以下几个显著特点：合规又国际化：既满足国内的安全规定，也符合国际标准，可以和其他国家的系统无缝对接。安全且高效：结合两种加密方式的优点，根据不同情况选择最合适的加密手段，既保证了安全性，也提高了效率。广泛的兼容性：这种证书可以根据环境
双算法SSL证书/双证书 httpsssl证书
双算法SSL证书或双证书，在网络安全领域，特别是在SSL/TLS协议中，指的是同时支持国际通用加密算法和国家商用密码（简称“国密”）算法的SSL证书。以下是对双算法SSL证书/双证书的详细解释：一、定义与特点1.定义：双算法SSL证书是指同时支持国际加密算法（如RSA、ECC等）和国密算法（如SM2、SM3、SM4等）的SSL证书。2.特点：合规性与国际化：同时满足国内法规要求和国际标准，确保与国
密评改造应该选用什么样的SSL证书 https
密评，即商用密码应用安全性评估，是指对采用商用密码技术、产品和服务的信息系统密码应用的合规性、正确性和有效性进行评估。密评改造则是针对现有信息系统不符合密评要求的部分进行调整、升级和完善的过程。一、密评改造应该选用SSL证书的类型：1.国密算法：密评改造专用SSL证书优先采用SM2、SM3、SM4等国产密码算法，同时兼容RSA、DSA或ECC等国际认可的加密算法，以确保数据传输的安全性。2.国产品
一篇文章讲清楚ECC和S/4HANA的主要区别 syounger SAP学习制造
今天给大家分享一下SAPECC和SAPS/4HANA的主要区别，开始比较之前，让我们先回顾一下这两款SAP的重量级ERP产品。什么是SAPECC？SAPECC全称是SAPERP中央组件(ERPCentralComponent）。SAPECC采用模块化设计，可高度定制。我们可以配置SAPECC以满足业务各个领域的需求，从财务到后勤。什么是SAPS/4HANA？SAPS/4HANA是SAP最新一代的E
SM2 加密工具和密钥对生成 TechCraft maven java
在本文中，我们将探讨两个用于SM2加密的实用工具：Sm2Utils和Sm2KeyPairUtil。这两个工具可以帮助您生成SM2加密密钥对、使用SM2算法进行加密和解密。1.SM2简介SM2国密SM2算法是中国国家密码管理局（CNCA）发布的一种非对称加密算法。它采用椭圆曲线密码体系（EllipticCurveCryptography，ECC）进行密钥交换、数字签名和公钥加密等操作。SM2算法和R
密码学之椭圆曲线（ECC）零度° 密码学密码学 python
1.椭圆曲线加密ECC概述1.1ECC定义与原理椭圆曲线密码学（ECC）是一种基于椭圆曲线数学的公钥密码体系，它利用了椭圆曲线上的点构成的阿贝尔群和相应的离散对数问题来实现加密和数字签名。ECC的安全性依赖于椭圆曲线离散对数问题（ECDLP）的难解性。在ECC中，首先需要选择一个椭圆曲线和一个基点，然后生成密钥对。私钥是一个随机整数，而公钥是这个随机整数与基点的标量乘积。ECC的加密过程包括选择一
SD NAND Flash简介！深圳市雷龙发展有限公司 nor flash nand SD NAND SD卡 TF卡
SDNANDFlash是一种特殊形式的NANDFlash，其内部有包含一个SD控制器及NANDFlash。他的特点主要有封装小，使用方便的特点。目前市面上的SDNANDFlash的容量主要有1Gb，2Gb，4Gb等。封装形式是LGA-8。对于使用者来说，可以把它单纯的看做是一个SD（TF）卡，存储一些数据，图片或音频。也可以把它作为功能更强大的NANDFlash，免去您程序上做ECC校验及坏块管理
Dell R730 2U服务器实践1：开机管理 skywalk8163 软硬件调试服务器
新入手一台DellR7302U服务器，用来做FreeBSD下的编译工作和Ubuntu下简单的AI学习和调试。服务器配置：CPU：E52680V4×214核心内存：DDR4ECC16G×22133MHz网卡：双千双万Intel(R)2PX540/2PI350rNDC硬盘：SSD446.63GB×3RAID：ERCH730Mini(嵌入式)组raid5（实际原系统组了raid0）单电源服务器已经装好了
硬件除法器原理_[ECC&RSA]除法器 weixin_39834788 硬件除法器原理
“在ECC和RSA算法硬件实现(Barrett约减和Montgomery约减)中，需要提前计算某些参数，会应用到除法器。”01—传统除法器传统除法器的设计非常单纯：一、先取除数和被除数的正负关系，然后正值化被除数。传统除法器因为需要递减的关系，所以除数就取负值的补码，方便操作。二、被除数递减与除数，每一次的递减，商数递增。三、直到被除数小于除数，递减过程剩下的是余数。四、输出的结果根据除数和被除数
SpringBoot2-Jwt wang_peng SpringBoot java 服务器前端
1.官网jwt.io/libraries2.选jose4jpomorg.bitbucket.b_cjose4j0.9.43.创建jwt工具publicclassJwtUtil{privatestaticStringsecret="e0e775bfcad04ecc94807b028dfca4d5";//"12345678123456781234567812345678";//注意密钥长短（最少32个
跟着团子学SAP PS：PS初代增强包OPS_PS_CI_1概览一只团子（Lucas Chu） PS erp sap pmp 项目管理
之前在很多文章中我都有建议上实施PS模块的项目均激活PS模块初代的增强包：OPS_PS_CI_1，因为此增强包不仅带来PS模块性能的提升、也有大量非常有意义的新功能。此增强包被包含在早在2007年的年末SAP针对ECC6.0推出的增强包3中（SAPenhancementpackage3forSAPERP6.0）。如果不知道是否激活可在前台输入事物代码SFW5进行查看，进入业务功能界面后，可通过CR
Nand flash的基础知识 Otis_L 嵌入式存储
文章目录什么是NandFlash？NandFlash生产过程NandFlash的物理结构Flash的基本特性Flash的种类NandFlash的特殊硬件结构ECC闪存内部原理什么是NandFlash？NandFlash是一种非易失性随机访问存储介质，基于浮栅（FloatingGate）晶体管设计，通过浮栅来锁存电荷，电荷被存储在浮栅中，他们在无电源供应的情况下仍然可以保持。数据在Flash内存单元
商用密码 .:::. 商用密码
商用密码（CommercialCryptography）涉及到多个方面，包括但不限于数据加密、数字签名、身份验证和安全通信等。商用密码的目的是保护信息的机密性、完整性和可用性，确保数据在存储和传输过程中的安全。以下是一些Java商用密码方向的关键技术和应用领域：1.加密算法对称加密：AES、DES、3DES等，主要用于数据加密，特点是加密和解密使用同一个密钥。非对称加密：RSA、ECC（椭圆曲线加
SAP 支持主干网更新蒋_2bcd
说明：红线到2020.1就走不通了;绿色线是可以通的；绿色虚线得通过配置才能通的。大前提系统激活启用签名notes功能，由于ECC的SAP_BASIS版本为731,2020年之后，SAP继续支持此版本系统通过RFC连接至SAP，但是不支持BW和SRM，通过RFC方式连接SAP支持新主干网，需要配置HTTPS或者downloadservice的方式自动下载notes方案一：配置1和2线前提条件：1.
椭圆曲线加密 superdont 图像加密计算机视觉图像处理
椭圆曲线加密（EllipticCurveCryptography，ECC）是一种公钥加密算法，它基于椭圆曲线上的数学运算来实现安全的通信。以下是椭圆曲线加密的基本过程：1.参数选择：选择一个适当的椭圆曲线和一个基础点。椭圆曲线由一个有限域上的方程定义，而基础点是曲线上的一个固定点。2.密钥生成：每个通信方都会生成一对密钥，包括一个私钥和一个公钥。私钥是一个随机数，而公钥是基于私钥和基础点的运算求得
提高网络安全：不同认证的解释网络研究院网络研究院安全网络认证加密应用
在不断发展的网络安全领域，加密算法和证书的选择在确保敏感信息的机密性和完整性方面发挥着关键作用。虽然RSA公钥加密系统等传统方法几十年来一直是安全通信和数据传输的基石，但椭圆曲线加密(ECC)正在成为希望增强安全状况的组织的替代选择。使用Imperva，您有两种证书选择：自带证书或让Imperva为您管理。让我们探讨为什么ECC、2KRSA证书和Imperva管理的认证是构建安全环境时需要考虑的选
HDMI ECC编码并行计算蚂蚁cd 经验分享 fpga开发
HDMIECC编码并行计算ECC原理串行计算原理图片摘自hdmi1.4协议手册并行计算推导根据推得第8次输出用第0次输出结果与输入表示为最后第一次发帖，比较匆忙，见谅！后期会重新整理。
SM2加解密、签名验签爱吃鱼的简大Boss 信息安全 Java基础 SM2
导论SM2是国家密码管理局于2010年12月17日发布的椭圆曲线公钥密码算法，在我们国家商用密码体系中被用来替换RSA算法。国产SM2算法，是基于ECC的，但二者在签名验签、加密解密过程中或许有些许区别，目前鄙人还不太清楚，后期有机会的话会回来补充。普通的软密钥，在签名验签、加密解密时，使用的0009规范；如果是硬密钥，例如密码钥匙是0016规范（SKF），密码设备是0018规范（SDF）；在涉及
ECC算法学习（二）Security加密函数赑屃王者算法知识整理学习
文章目录一、SecureEnclave1.SecureEnclave2.kSecAttrTokenIDSecureEnclave二、使用系统钥匙串存储数据一、SecureEnclave1.SecureEnclavehttps://support.apple.com/en-ng/guide/security/sec59b0b31ff/webSecureEnclave是集成到Apple系统(SoC)芯
深入理解 Golang 的 crypto/elliptic：椭圆曲线密码学的实践指南 walkskyer golang标准库 golang 密码学爬虫
深入理解Golang的crypto/elliptic：椭圆曲线密码学的实践指南引言crypto/elliptic库概览基本使用教程高级应用案例性能与安全考量结论引言在当今数字时代，数据安全和加密技术成为了信息技术领域的重中之重。特别是在网络通信和数据存储领域，有效的加密手段对保护个人隐私和商业机密至关重要。椭圆曲线密码学（EllipticCurveCryptography,ECC）作为一种新兴的加
国密SM2: 加解密实现 java代码完整示例码上农民国密SM2 java python 开发语言
国家密码管理局于2010年12月17日发布了SM2算法，并要求现有的基于RSA算法的电子认证系统、密钥管理系统、应用系统进升级改造，使用支持国密SM2算法的证书。基于ECC椭圆曲线算法的SM2算法，则普遍采用256位密钥长度，它的单位安全强度相对较高，在工程应用中比较难以实现，破译或求解难度基本上是指数级的。因此，SM2算法可以用较少的计算能力提供比RSA算法更高的安全强度，而所需的密钥长度却远比
神经网络的权重是什么？ conch0329 神经网络人工智能深度学习
请参考这个视频https://www.bilibili.com/video/BV18P4y1j7uH/?spm_id_from=333.788&vd_source=1a3cc412e515de9bdf104d2101ecc26a左边是拟合的函数，右边是均方和误差，也就是把左边的拟合函数隐射到了右边，右边是真实值与预测值之间的均方误差和，本质是一个二次函数。所以才会有梯度下降的概念，梯度下降是什么？
区块链安全盾之密码学及算法（2）哈啦呼噜
本文由“币嗨Bihi内容合伙人计划”赞助今天呼噜继续和大家一起学习区块链安全盾之密码学及算法（2）——椭圆椭圆曲线ECC算法。在正式开始复杂而高深的学习前，我们理解下数学上椭圆的有意思的一面。所谓的一个椭圆曲线是满足一个特殊方程的点集，可用方程式y^2=x^3+ax+b表示。也有其他椭圆曲线的代表，但学术上一个椭圆曲线是一个满足一个变量为二阶，另一个变量为3阶的二元方程。一个椭圆曲线不仅仅是一个漂
OCP NVME SSD规范解读-8.SMART日志要求-2 古猫先生 OCP 算法机器学习人工智能
SMART-7：软错误ECC计数可能是记录了被第一级ECC（比如LDPCHardDecode）成功纠正过的读取错误次数。这意味着数据恢复成功，但依然表明存储介质出现了某种程度上的可靠性下降。LDPC码是一种基于稀疏矩阵的纠错码，它由一组奇偶校验方程组成，其中大部分元素为零，因此得名“低密度”。LDPC码的优点是可以有效地纠正大量的错误，尤其是对于高密度存储设备来说。LDPC解码可以分为硬解码和软解
奇妙的安全旅行之ECC算法我是开发者FTD 加密算法加密解密算法
hi，大家好，我是开发者FTD。今天我们来介绍一下非对称加密算法的ECC算法。ECC算法简介ECC是EllipticCurvesCryptography的缩写，意为椭圆曲线密码编码学。和RSA算法一样，ECC算法也属于公开密钥算法。最初由Koblitz和Miller两人于1985年提出，其数学基础是利用椭圆曲线上的有理点构成Abel加法群上椭圆离散对数的计算困难性。ECC算法的数学理论非常深奥和复
国产SSL证书——CFCA 涂样丶 ssl 网络 SSL证书 https CFCA证书国产证书
CFCA证书，如同一位忠诚的守护者，为网络通信提供了一道坚固的防线。它通过为网站和服务器提供身份验证和数据加密服务，确保了信息传输的安全性和完整性。自诞生之初，CFCA就肩负着推动国内网络安全产业发展的使命，不断探索和创新，以适应日益严峻的网络威胁。在技术层面，CFCASSL证书采用了国际先进的加密算法，如RSA、ECC等，这些算法就像是一把把锋利的剑，能够有效地抵御各种网络攻击。同时，CFCA还
java杨辉三角 3213213333332132 java基础
package com.algorithm; /** * @Description 杨辉三角 * @author FuJianyong * 2015-1-22上午10:10:59 */ public class YangHui { public static void main(String[] args) { //初始化二维数组长度 int[][] y
《大话重构》之大布局的辛酸历史白糖_ 重构
《大话重构》中提到“大布局你伤不起”，如果企图重构一个陈旧的大型系统是有非常大的风险，重构不是想象中那么简单。我目前所在公司正好对产品做了一次“大布局重构”，下面我就分享这个“大布局”项目经验给大家。背景公司专注于企业级管理产品软件，企业有大中小之分，在2000年初公司用JSP/Servlet开发了一套针对中
电驴链接在线视频播放源码 dubinwei 源码电驴播放器视频 ed2k
本项目是个搜索电驴（ed2k）链接的应用,借助于磁力视频播放器（官网： http://loveandroid.duapp.com/ 开放平台），可以实现在线播放视频，也可以用迅雷或者其他下载工具下载。项目源码： http://git.oschina.net/svo/Emule,动态更新。也可从附件中下载。项目源码依赖于两个库项目，库项目一链接： http://git.oschina.
Javascript中函数的toString()方法周凡杨 JavaScript js toString function object
简述 The toString() method returns a string representing the source code of the function. 简译之，Javascript的toString()方法返回一个代表函数源代码的字符串。句法 function.
struts处理自定义异常 g21121 struts
很多时候我们会用到自定义异常来表示特定的错误情况，自定义异常比较简单，只要分清是运行时异常还是非运行时异常即可，运行时异常不需要捕获，继承自RuntimeException，是由容器自己抛出，例如空指针异常。非运行时异常继承自Exception，在抛出后需要捕获，例如文件未找到异常。此处我们用的是非运行时异常，首先定义一个异常LoginException: /** * 类描述：登录相
Linux中find常见用法示例 510888780 linux
Linux中find常见用法示例 ·find path -option [ -print ] [ -exec -ok command ] {} \; find命令的参数；
SpringMVC的各种参数绑定方式 Harry642 springMVC 绑定表单
1. 基本数据类型(以int为例，其他类似)： Controller代码： @RequestMapping("saysth.do") public void test(int count) { } 表单代码： <form action="saysth.do" method="post&q
Java 获取Oracle ROWID aijuans java oracle
A ROWID is an identification tag unique for each row of an Oracle Database table. The ROWID can be thought of as a virtual column, containing the ID for each row. The oracle.sql.ROWID class i
java获取方法的参数名 antlove java jdk parameter method reflect
reflect.ClassInformationUtil.java package reflect; import javassist.ClassPool; import javassist.CtClass; import javassist.CtMethod; import javassist.Modifier; import javassist.bytecode.CodeAtt
JAVA正则表达式匹配查找替换提取操作百合不是茶 java 正则表达式替换提取查找
正则表达式的查找;主要是用到String类中的split(); String str; str.split();方法中传入按照什么规则截取,返回一个String数组常见的截取规则: str.split("\\.")按照.来截取 str.
Java中equals()与hashCode()方法详解 bijian1013 java set equals()hashCode()
一.equals()方法详解 equals()方法在object类中定义如下： public boolean equals(Object obj) { return (this == obj); } 很明显是对两个对象的地址值进行的比较（即比较引用是否相同）。但是我们知道，String 、Math、I
精通Oracle10编程SQL(4)使用SQL语句 bijian1013 oracle 数据库 plsql
--工资级别表 create table SALGRADE ( GRADE NUMBER(10), LOSAL NUMBER(10,2), HISAL NUMBER(10,2) ) insert into SALGRADE values(1,0,100); insert into SALGRADE values(2,100,200); inser
【Nginx二】Nginx作为静态文件HTTP服务器 bit1129 HTTP服务器
Nginx作为静态文件HTTP服务器在本地系统中创建/data/www目录，存放html文件(包括index.html) 创建/data/images目录，存放imags图片在主配置文件中添加http指令 http { server { listen 80; server_name
kafka获得最新partition offset blackproof kafka partition offset 最新
kafka获得partition下标，需要用到kafka的simpleconsumer import java.util.ArrayList; import java.util.Collections; import java.util.Date; import java.util.HashMap; import java.util.List; import java.
centos 7安装docker两种方式 ronin47
第一种是采用yum 方式 yum install -y docker
java-60-在O(1)时间删除链表结点 bylijinnan java
public class DeleteNode_O1_Time { /** * Q 60 在O(1)时间删除链表结点 * 给定链表的头指针和一个结点指针(!!)，在O(1)时间删除该结点 * * Assume the list is: * head->...->nodeToDelete->mNode->nNode->..
nginx利用proxy_cache来缓存文件 cfyme cache
user zhangy users; worker_processes 10; error_log /var/vlogs/nginx_error.log crit; pid /var/vlogs/nginx.pid; #Specifies the value for ma
[JWFD开源工作流]JWFD嵌入式语法分析器负号的使用问题 comsci 嵌入式
假如我们需要用JWFD的语法分析模块定义一个带负号的方程式，直接在方程式之前添加负号是不正确的，而必须这样做： string str01 = "a=3.14;b=2.71;c=0;c-((a*a)+(b*b))" 定义一个0整数c,然后用这个整数c去
如何集成支付宝官方文档 dai_lm android
官方文档下载地址 https://b.alipay.com/order/productDetail.htm?productId=2012120700377310&tabId=4#ps-tabinfo-hash 集成的必要条件 1. 需要有自己的Server接收支付宝的消息 2. 需要先制作app，然后提交支付宝审核，通过后才能集成调试的时候估计会真的扣款，请注意
应该在什么时候使用Hadoop datamachine hadoop
原帖地址：http://blog.chinaunix.net/uid-301743-id-3925358.html 存档，某些观点与我不谋而合，过度技术化不可取，且hadoop并非万能。 --------------------------------------------万能的分割线-------------------------------- 有人问我，“你在大数据和Hado
在GridView中对于有外键的字段使用关联模型进行搜索和排序 dcj3sjt126com yii
在GridView中使用关联模型进行搜索和排序首先我们有两个模型它们直接有关联: class Author extends CActiveRecord { ... } class Post extends CActiveRecord { ... function relations() { return array( '
使用NSString 的格式化大全 dcj3sjt126com Objective-C
格式定义The format specifiers supported by the NSString formatting methods and CFString formatting functions follow the IEEE printf specification; the specifiers are summarized in Table 1. Note that you c
使用activeX插件对象object滚动有重影蕃薯耀 activeX插件滚动有重影
使用activeX插件对象object滚动有重影 <object style="width:0;" id="abc" classid="CLSID:D3E3970F-2927-9680-BBB4-5D0889909DF6" codebase="activex/OAX339.CAB#
SpringMVC4零配置 hanqunfeng springmvc4
基于Servlet3.0规范和SpringMVC4注解式配置方式，实现零xml配置，弄了个小demo，供交流讨论。项目说明如下： 1.db.sql是项目中用到的表，数据库使用的是oracle11g 2.该项目使用mvn进行管理，私服为自搭建nexus,项目只用到一个第三方 jar，就是oracle的驱动； 3.默认项目为零配置启动，如果需要更改启动方式，请
《开源框架那点事儿16》：缓存相关代码的演变 j2eetop 开源框架
问题引入上次我参与某个大型项目的优化工作，由于系统要求有比较高的TPS，因此就免不了要使用缓冲。该项目中用的缓冲比较多，有MemCache，有Redis，有的还需要提供二级缓冲，也就是说应用服务器这层也可以设置一些缓冲。当然去看相关实现代代码的时候，大致是下面的样子。 [java] view plain copy print ? public vo
AngularJS浅析 kvhur JavaScript
概念 AngularJS is a structural framework for dynamic web apps. 了解更多详情请见原文链接：http://www.gbtags.com/gb/share/5726.htm Directive 扩展html，给html添加声明语句，以便实现自己的需求。对于页面中html元素以ng为前缀的属性名称，ng是angular的命名空间
架构师之jdk的bug排查(一)---------------split的点号陷阱 nannan408 split
1.前言. jdk1.6的lang包的split方法是有bug的,它不能有效识别A.b.c这种类型,导致截取长度始终是0.而对于其他字符,则无此问题.不知道官方有没有修复这个bug. 2.代码 String[] paths = "object.object2.prop11".split("'"); System.ou
如何对10亿数据量级的mongoDB作高效的全表扫描 quentinXXZ mongodb
本文链接: http://quentinXXZ.iteye.com/blog/2149440 一、正常情况下，不应该有这种需求首先，大家应该有个概念，标题中的这个问题，在大多情况下是一个伪命题，不应该被提出来。要知道，对于一般较大数据量的数据库，全表查询，这种操作一般情况下是不应该出现的，在做正常查询的时候，如果是范围查询，你至少应该要加上limit。说一下，
C语言算法之水仙花数 qiufeihu c 算法
/** * 水仙花数 */ #include <stdio.h> #define N 10 int main() { int x,y,z; for(x=1;x<=N;x++) for(y=0;y<=N;y++) for(z=0;z<=N;z++) if(x*100+y*10+z == x*x*x
JSP指令 wyzuomumu jsp
jsp指令的一般语法格式： <%@ 指令名属性 =”值 ” %> 常用的三种指令： page,include,taglib page指令语法形式： <%@ page 属性 1=”值 1” 属性 2=”值 2”%> include指令语法形式： <%@include file=”relative url”%> (jsp可以通过 include

ECC校验优化之路

你可能感兴趣的:(ECC)