yangjvn

Computer Organization and Design笔记

原文链接：http://harttle.com/2013/11/30/computer-organization-and-design.html

Abstractions

Concepts

Moore’s law

Over the history of computing hardware, the number of transistors on integrated circuits doubles approximately every two years.

Compiler

A program that translates high-level language statements into assembly language statements.

Assembler

A program that translates a symbolic version of instructions into the binary version.

High-level programming langrage

A portable language that is composed of words and algebraic notation that can be translated by a compiler into assembly language.

Assembly language

Asymbolic representation of machine instructions.

Machine language

A binary representation of machine instructions.

5 components of a computer

Input
Output
Memory
Datapath
Control

The last two sometimes combined and called the processor.

Instruction set architecture
One key interface between the levels of abstraction is the instruction set architecture-the interface between the hardware and low-level software.

Performance

Measurement and Limitation

C P U t i m e = = I C \times C P I \times C l o c k c y c l e t i m e I C \times C P I C l o c k r a t e

P o w e r = C a p a c i t i v e \times V o l t a g e 2 \times F r e q u e n c y

Fallacies and Pitfalls

Pitfall: Expecting the improvement of one aspect of a computer to increse overall performance by an amount proportional to the size of the improvement.

Amdahl’s law

E T a f t e r i m p r o v e m e n t = E T a f f e c t e d A m o u n t o f i m p r o v e m e n t + E T u n a f f e c t e d

Pitfall: Using a subset of the performance equation as a performance metric.

For example:

M I P S = = I C E T \times 10 6 C l o c k r a t e C P I \times 10 6

Instruction per program is not considered.

Execution time is the only valid and unimpeachable measure of performance.

Instructions

Common goal of computer designers: to find a language that makes it easy to build the hardware and the compiler while maximizing performance and minimizing cost and power.

Stored-program concept

The idea that instructions and data of many types can be stored in memory as numbers, leading to the stored-program computer.

Design Principle

Each category of MIPS instructions is associates with constructs that appear in programming language:

The arithmetic: assignment statements.
Data transfer: dealing with data structures like arrays or structures.
Conditional branches: if statement.
Unconditional jumps: procedure call and returns and for, case/switch statements.
1. Simplicity favors regularity
Requiring every instruction to have exactly three operands, no more and no less, conforms to the philosophy of keeping the hardware simple.
1. Smaller is faster
Number of registers is limited in 32. A large number of registerss may increse the clock cycle time simply because it takes electronic signals longer when they must travel farther.

Big-endian and Small-endian

Big-endian: computers use the address of the leftmost byte ( or “big end” ) as the word address.
Small-endian: otherwise.

MIPS is in the big-endian camp.
1. Make the common case fast
Avoid the load instruction when arithmetic instruction’s one operand is a constant.

Example:
```
addi    $s3,    $s3,    4


# instead of

lw      $t0,    AddrConst4($s1)
add     $s3,    $s3,    $t0
```
One’s complement

A notation that represents the most negative value by 10…000 and the most positive value by 01…11, leaving an equal number of negatives and positives but ending up with two 0s.

The term is also used to mean the inversion of every bit.

two’s complement

−x=x¯+1 , leading 0s mean positive, and leading 1s mean negative.

Two’s complement gets its name from the rule that the unsigned sum of an n-bit number and its negative is 2n .
1. Good design demands good compromises
The compromise chosen by the MIPS designers is to keep all instructions the same length , thereby requiring different kinds of instruction formats for different kinds of instructions.
- R-type: op-6bits, rs-5bits, rt-5bits, rd-5bits, shamt-5bits, funct-6bits.
- I-type: op-6bits, rs-5bits, rt-5bits, const or addr-16bits
- J-type: op-6bits, addr-26bits
The simplicity of the “equipment” : MIPS doesn’t include branch on less than (blt is a pseudo-instruction) because it’s too complicated; either it would stretch the clock cycle time or it would take extra clock cycles per instruction. Two faster instructions (slt and beq) are more usefull.

Register usage

Spilling registers

The process of putting less commonly used variables (or those needed later) into memory.

Dada-transfer instruction

A command that moves data between memory and registers.

Sign extension

Copy the sign repeatedly to fill the rest of the bits in the left.

Alignment restriction

MIPS tries to keep the stack aligned to word addresses, allowing the program to always use lw and sw to access the stack.

C string variable or an array of bytes will pack 4 bytes per word, and a Java variable or array of shorts packs 2 halfwords per word.

Stack spilling

$sp is a stack pointer, which is adjusted by one word for each register that is saved or restored.

Procedure call

preserved by caller

$t0-$t9: ten temporary registers, $a0-$a3: four argument registers, $v0-v1: two return registers

Not preserved by the calee on a procedure call. The caller will push them.
preserved by callee

$s0-$s7: eight saved registers, $ra: return address, $sp: the stack pointer and stack above $sp

Must be preserved on a procedure call. If used, the callee saves and restores them.

Iteration can significantly improve performance by removing the overhead associated with procedure calls.

Data allocate

mips-mem

$gp: the global pointer that is reserved to point to the static area.

$fp: A value denoting the location of the saved registers and local variables for a given procedure.

The frame point is convenient because all references to variables in the stack within a procedure will have the same offset.

more than 4 paras: the trailing params are stored at $fp+1, $fp+2, etc.
Pointers vs Arrays: we calculate address (index multiply 4) in array addressing, while add 4 directly in pointer addressing.

Reserved registers

$zero always equals 0.
$at is reserved by the assembler to handle large constants in immediate instructions and addresses in load/store instructions.
$k0, $k1 reserved for the operating system to handle exception procedure. They’re used to retain the address of instruction that causes the exception (which can be get from EPC register), when exception procedure exits and restores all registers before the exception. Thus the program can jump back to the normal procedure.

Addressing Mode

Basic block

Basic block is a sequence of instructions without branches, except possibly at the end , and without branch targets or branch labels, except possibly at the begining.

One of the first early phases of compilation is breaking the program into basic blocks.

Branch instruction

Branch instructions use 16-bit field address, which means no program will larger than 216 .

While the destination of branch instructions is likely to be close to the branch. If we use PC-relative addressing , the sum will allow the program to be as large as 232 .

As the same time, the L1 in beq $s0, $s1, L1 could be bigger than 216 , MIPS will break it into 2 instructions:
“`
bne s0, s1, L2
j L1
L2:


### Jump instruction

Jump instructions use 26-bit field adress, MIPS stretch the distance by having it refer to the number of *words* to the next instruction instead of number of *bytes*. The address now becomes 28-bit, MIPS will add the first 4-bit of PC automatically to get a 32-bit address(**pseudodirect addressing**).

> The loader and linker must be careful to avoid placing a program across an address boundary of 256MB, which is $2^{28}$ bit. 

### Summary

* Immediate addressing: `addi $rs,  rt,   imm`
* Register addresing: `jr   $ra`
* Base addressing: `lw $rt, addr`
* PC-relative addressing: `bne  $s0,    $s1,    Label`
* Pseudodirect addressing: `j   Label`

## Parallelism and Synchronization

**Data Race**

Two memory accesses form a data race if they are from different threads to ssame location, at least one is a write, and they occur one after another.


One typical operation for building synchronization operations is the *atomic exchange* or *atomic swap*, while the challenge lies in that it requires both a memory read and a write in a single, uninterruptible instruction.

MIPS contains a pair of instructions called a `load linked` and `store conditional`. These are used in sequence: if the content of the memory specified by `load linked` is changed before the `store conditional`, then the store fails.

> Although it was presented for multi-processor sync, atomic exchange is also useful for the OS in dealing with multiple processes in a single processor. To ensure nothing interferes in a single processor, the `store conditional` also fails if the processor does a context switch between the two instructions.

> Store conditional will fail after either another attempted store or any exception, in particular, only register-register instructions can safely be permited between the two instructions; otherwise, it is possible to create deadlock situations.

## Translating & Starting

### Compiler

The compiler transforms the C program to *assembly language program*, a symbolic form of what the machine understands.

**Assembly Language**

A symbolic language that can be translated into binary machine language.


### Assembler

Assembler translate the *assembly language program* into machine language program, called object file.

> Pseudoinstructions give MIPS a richer set of assembly language instructions than those implemented by the hardware. The only cost is reserving one register, `$at`.

### Linker

**Linker**, also called *link editor*, is a systems program that combines independenty assembled machine language programs and resolves all undefined labels into an executable file. There are 3 steps for the linker:

1. Place code and data modules symbolically in memory.
2. Determine the addresses of data and instrucion labels.
3. Patch both the internal and external references.

> Linker allows compiling and assembling each procedure independently, particulaly library routines, avoiding processing the whole program every time a small routine changes.
> Typically, the *executable file* has the same format as an object file, except that it contains no unresolved references.

### Loader

A systems program that places an object program in main memory so that it is ready to execute.

### Dynamically linked libraries

The static libraries has a few cons:

* The library routines become part of the exe file. The program need re-compiled when library updated.
* It loads all libs that are called anywhere in the exe, even not called.

Lazy procedure linkage of DLLs:

The first time the lib is called, the program calls the dummy entry and follows the indirect jump. It points to code that puts a number in a register to identify the desired lib and jumps to the dynamic linker/loader. The linker/loader finds the lib, remap it, and chages the address in the indirect jump location to the lib.

### Starging a Java Program

Rather than compile to the assemly language of a target computer, Java is compiled first to the instructions that are easy tointerpret: the *Java bytecode* instruction set. A interpreter called *Java Virtual Machine* can execute Java bytecodes.

Pro: portability; Con: lower performance. *Just In Time compilers (JIT)* typically find the "hot" methods and compile them into native instruction set, and is saved for next run.

> Typically, the unoptimized C program is faster than the interpreted Java code. Using the JIT compiler makes Java faster than the unoptimized C and slightly slower than highest optimized C code.

## Fallacies and Pitfalls

> Fallacy: More powerful instructions mean higher performance.

Complex instructions consume more time.

> Fallacy: Write in assembly language to obtain the highest performance.

This battle between compilers and assembly language coders is one situation in which humans are losing ground. Today's C compilers generally ignore register hints made by programmers.

> Fallacy: The importance of commercial binary compatibility means successful instruction set don't change.

While backwards binary compatibility is sacrosanct, the x86 architecture has grown dramatically.

> Fallacy: Forgetting that sequential word addresses in machines with byte addressing do not differ by one.

The address of next word can be found by incrementing the address in a register by the size of word, not one.

> Pitfall: Using a pointer to an automatic variable outside its defining procedure.

The memory that contains the an array that is local to that procedure will be reused as soon as the procedure returns.

# Arithmetic for Computers

## Addition and Subtraction

1. Add (`add`), and immediate (`addi`), and subtract (`sub`) cause exceptions on overflow.
    MIPS detects overflow with an *exception* (or *interrupt* ),  which is an unscheduled procedure call. The address of current instruction is saved and the computer jumps to predefined address to invoke the appropriate routine for that exception.

    > MIPS uses *exception program counter* (EPC) to contain the address of the instruction that causes the exception. The instruction *move from system control* (`mfc0`) is used to copy EPC into a general-purpose register.

2. Add unsigned (`addu`), add immediate unsigned (`addiu`), and subtract unsigned (`subu`) do not cause exceptions on overflow.
    Programmers can trap overflow anyway: when overflow occurs, the sign bit of the result is not properly set. Compairing with sign bits of operands, the sign bit of the result can be determined. 

> SIMD (single instruction, multiple data): By partitioning the carry chains within a 64-bit adder, a processor could perform simultaneous operations on a short vecters of eight 8-bit operands, four 16-bit operands, etc. Vectors and 8-bit data often appears in multimedia routine.

## Multiplication

multiplicand * multiplier = product

### Sequential Version of the Multiplication

![sequential-multiply](/assets/img/blog/multi1.png)

![sequential-multiply-illu](/assets/img/blog/multi-illu.png)

Refined version:

* Init: put multiplier to the left 32-bit of the product register.
* Cycle: 
    1. if the last bit of product register is 1, add the left 32-bit with the multiplicand
    2. shift right the product register
* Final: the product register contains the 64-bit product

![refined](/assets/img/blog/multi2.png)

### Faster Multiplication

A way to organize these 32 addtions is in a parallel tree:

![parallel](/assets/img/blog/multi3.png)

### Multiply in MIPS

The registers `Hi` and `Lo` contains the 64-bit product. Call `mflo` to fetch the 32-bit product, `mfhi` can be used to get `Hi` to test for overflow.

## Division

Dividend = Quotient * Divisor + Remainder

### Division Algorithm

![divide](/assets/img/blog/divide1.png)

![divide-illu](/assets/img/blog/divide-illu.png)

Improved version:

* Init: put the dividend in the right 32-bit of remainder register.
* Cycle: 
    1. subtract the left 32-bit of remainder by the divisor
    2. shift left the remaider register
    3. set the last bit as new quotient bit
* Final: the left 32-bit contains the remainder, right 32-bit contains the quotient.

![divede-improved](/assets/img/blog/divide2.png)

### Faster Division

**SRT division**: try to guess several quotient bits per step, using a table lookup based on the upper bits of the dividend and remainder. The key is guessing the value to subtract.

### Divide in MIPS

`Hi` contains the remainder, and `Lo` contains the quotient after the divide instruction complete.

> MIPS divide instructions ignore overflow. MIPS software must check the divisor to discover division by 0 as well as overflow.

## Floating Point

**scientific notation** A notation that renders numbers with a single digit to the left of the decimal point.

**normalized** A number in floating-point notation that has no leading 0s.

**fraction** The value, generally between 0 and 1, placed in the fraction field.

**exponent** In the numerical representation system of floating-point arithmetic, the value that is placed in the exponent field.

*overflow* the exponent is too large to be represented in the exponent field.

**floating point** Computer arithmetic that represents numbers in which the binary point is not fixed.

In general, floating-point numbers are of the form: $(-1)^S \times F \times 2^E$

MIPS float: sign(1 bit) + exponent(8 bit) + fraction(23 bit)
MIPS double: s(1 bit) + exponent(11 bit) + fraction(52 bit)

IEEE 754 uses a bias of 127 for single precesion, and makes the leading 1 implicit. Since 0 has no leading 1, it's given the reserved exponent 0 so that hardware won't attach a leading 1.

Thus 00...00 represents 0; the representation of the rest are in the following form:

$(-1)^S \times (1 + Fraction)\times 2^(Exponent - Bias)$

> The exponent is located left and the bias is for comparison convenience.

# The Processor

## Introduction

A abstract view of the implementation of the MIPS subset showing the major functional units and the major connections between them:

![abstract-view](/assets/img/blog/mips-abs.png)

The basic implementation of the MIPS subset, including the necessary multiplexors and control lines:

![mips-basic](/assets/img/blog/mips-basic.png)

**asserted**

The signal is logically high or true.

**deasserted**

The signal is logically low or false.

**Clocking methodology**

The approach used to determine when data is valid and stable relative to the clock.

**Edge-triggered clocking**

A clocking scheme in which all state changes occur on a clock edge.

**control signal**

A signal used for multiplexor selection or for directing the operation of a functional unit; contrasts with a *data signal* , which contains information that is operated on by a funcional unit. 

> The state element is changed only when the write control signal is asserted and a clock edge occurs.

## A Simple Implementation Scheme

*ALUOp* indicates whether the operation to be performed should be add for loas and stores, substract for beq, or determined by the operation encoded in the funct field.

**Multiple levels of decoding**

The main control unit generates the ALUOp bits, which then are used as input to the ALU control that generates the actual signals to control the ALU unit.

The datapath with all necessary multiplexors and all control lines identified:

![mips-all](/assets/img/blog/mips-all.png)

**Single-cycle implementation**

Also called single clock cycle implementation. An implementation in which an instruction is executed in one clock cycle.

> The clock cycle is determined by the longest possible path in the processor.

## An Overview of Pipelining

Assuming ideal conditions:

Time between instructions(pipelined) = Time between instructions(nonpipelined) / Number of pipe states

> Pipelining improves performance by increasing instruction throughput, as opposed to decreasing the execution time of an individual instruction, also called latency.

**Structual Hazards**

When a planned instruction cannot execute in the proper clock cycle because the hardware does not support the combination of instructions that are set to execute.

**Data Hazards**

Data Hazards occure when the pipeline must be stalled because one step must wait for another to complete.

**Control Hazards**

Arising from the need to make a decision based on the results of one instruction while others are executing.

## Pipelined Datapath and Control

The datapath is separated into 5 pieces:

1. IF: Instruction fetch
2. ID: Instruction decode and register file read
3. EX: Execution or address calculation
4. MEM: Data memory access
5. WB: Write back

We can divede the control lines into 5 groups according to the pipeline state

1. IF: Signals to read instruction memory and write PC are always asserted, nothing to control.
2. ID: As in the previous state, the same thing happens at every clock cycle, no optional control lines to set.
3. EX: The signals to be set are RegDst, ALUOp, and ALUSrc. The signals select the Result register, the ALU operation, and either Read data 2 or a sign-extended immediate for the ALU.
4. MEM: The control lines set are: Branch, MemRead, and MemWrite. These signals are set by the branch equal, load, and store instructions.
5. WB: MemtoReg, which decides between sending the ALU result or the memory value to the register file; Reg Write, which writes the chosen value.

**Multiple-clock-cycle pipeline diagram** gives overviews of pipelining situations:

![multiple-clock-cycle-pipeline](/assets/img/blog/multiple-clock-cycle-pipeline.png)

**Single-clock-cycle diagram** represents a vertical slice through a set of multiple-clock-cycle diagrams, showing the usage of the datapath by each of the instructions in the pipeline at the designated clock cycle.

![single-clock-cycle-pipeline](/assets/img/blog/single-clock-cycle-pipeline.png)


## Data Hazards

### Forwarding

When the result required by the current instruction is computed during EX stage of the previous instruction( or previous previous instruction), forwarding is used. See the two pairs of hazard conditions:

1. EX/MEM.RegisterRd = ID/EX.RegisterRs
2. EX/MEM.RegisterRd = ID/EX.RegisterRt
3. MEM/WB.RegisterRd = ID/EX.RegisterRs
4. MEM/WB.RegisterRd = ID/EX.RegisterRt

The *forwarding unit* is used to forward new values from EX/MEM and MEM/WB pipeline-register to ID/EX pipeline-register when the conditions above occured.

> Note that the values in EX/MEM is newer than MEM/WB, we will forward MEM/WB only when EX/MEM needn't to be forwarded.

### Stalls

When the result required by the current instruction is computed during MEM stage of the previous instruction, we must stall the pipeline.

The *hazard detection unit* operates during the ID stage, checking for load instructions and insert the stall between the load and its use. The single condition is:

if( ID/EX.MemRead and
((ID/EX.RegisterRt = IF/ID.RegisterRs) or
(ID/EX.RegisterRt = IF/ID.RegisterRt)))
stall the pipeline
“`

By identifying the hazard in the ID stage, we can insert a bubble into the pipeline by changing the EX, MEM and WB control fields of the ID/EX pipeline registeer to 0.

Actually, only the signals RegWrite and MemWrite need be 0, while the other control signals can be don’t cares.

pipe-control

Control Hazards

Discarding instructions means we must be able to flush instructions in the IF, ID, and EX stages of the pipeline.

Moving the branch decision up requires 2 actions to occur earlier: computing the branch target address and evaluating the branch decision.

Dynamic branch prediction

Prediction of branches at runtime using runtime information.

Dynamic prediction buffer

Also called branch history table , a small memory that is indexed by the lower portion of the branch instruction and that contains one or more bits indicating whether the branch was recently taken or not.

The simple 1-bit prediction scheme has a performance shortcoming: even if a branch is almost always taken, we can predict incorrectly twice, rather than once, when it is not taken.

In a 2-bit scheme*, a prediction must be wrong twice before it is changed.

branch delay slot

The slot directly after a delayed branch instruction, which in the MIPS architecture is filled by an instruction that does not affect the branch.

The delayed branch is a simple solution to control hazards in a 5-stage pipeline. With longer pipelines, superscalar execution, and dymnamic branch prediction, it’s now redundant.

branch target buffer

A structure that caches the destination PC or destination instruction for a branch. It’s usually organized as a cache with tags, making it more costly than a simple prediction buffer.

correlating predictor

A branch predictor that combines local behavior of a particular branch and global information about the behavior of some recent number of executed branches.

tournament branch predictor

A branch predictor with multiple predictions for each branch and a selection mechanism that chooses which predictor to enable for a given branch.

Exceptions

exception refer to unexpected change internal, while interrupt refer to that external.

MIPS save the address of the offending instruction in the exception program counter (EPC) , and the Cause register holds a field that indicates the reason for the exception.

Vectored interrupt

An interrupt for which the address to which control is transferrd is determined by the cause of the exception.

Exception Procedure

We must flush the instructions follow the offending instruction from the pipeline.
Use the EX.Flush signal to prevent the instruction in the EX stage from writing its result in the WB stage.
Save the address of the offending instruction in the EPC.

Imprecise interrupt

Also called imprecise exception , Interrupts or exceptions in pipelined computers that are not associated with the exact instruction that was the cause of the interrupt or exception.

Precise interrupt

Also called precise exception , An interrupt or exception that is always associated with the correct instruction in pipelined computers.

Parallelism

Instruction-level parallelism (ILP)

The parallelism among instructions.

Multiple issue

A scheme whereby multiple instructions are launched in one clock cycle.

static multiple issue

An approach to implementing a multiple-issue processor where many decisions are made by the compiler before execution.

issue slots

The positions from which instructions could issue in a given block cycle.

Very Long Instruction Word (VLIW)

A style of instruction set architecture that launches many operations that are defined to be independent in a single wide instruction, typically with many separate opcode fields.

A simple 2-issue MIPS processor: one of the instructions can be an integer ALU operation or branch and the other can be a load or store.

Dynamic multiple issue

An approach to implementing a multiple-issue processor where many decisions are made during executino by the processor.

superscalar

An advanced pipelining technique that enables the processor to execute more than one instruction per clock cycle by selecting them during execution.

dynamic pipeline scheduling

Hardware support for reordering the order of instruction execution so as to avoid stalls.

dps

when a instruction issues, it’s copied to a reservation station for the appropriate functional unit.
If an operand is not in the register file or reorder buffer, it must be waiting to be preduced by a functional unit.

speculation

An approach whereby the compiler or processor guesses the outcome of an instruction to remove it as a dependence in executing other instructions.

In the case of speculation in software, the compiler usually inserts additional instructions that check the accuracy of the speculation and provide a fix-up routine to use when speculation is incorrect.

In hardware speculation, the processor usually buffers the speculative results until it knows they are no longer speculative.

Exploiting Memory Hierarchy

Introduction

Principle of locality

Temporal locality : If a data location is referenced then it will tend to be referenced again soon.
spatial locality : If a data location is referenced, data locations with nearby addresses will tend to be referenced soon.

Memory hierarchy

A structrue that uses multiple levels of memories; as the distance from the processor increases, the size of the memories and the access time both increase.

The main memory is implemented from DRAM, levels closer to the processor use SRAM, the largest and slowest level is usually magnetic disk.

The Basics of Caches

direct mapped cache

A cache structure in which each memory location is mapped to exactly one location in the cache.

Cache index = (Block address) modulo (Number of blocks in the cache)

A valid bit is used to indicate whether an entry contains a valid address. Tag contains the address information required to identify whether the associated block in the hierarchy corresponds to a requested word.

Accessing a Cache

direct-mapped-cache

Larger blocks exploit spatial locality to lower miss rate, while miss rate may go up eventually if the block size becomes a significant fraction to the cache size, because:

The number of blocks can be held by the cache will become small, a great deal of competition occurs.
The cost of miss increases.

early restart : Uppon miss, resume execution as soon as the requested word of the block is returned, rather than wait for the entire block.

Handling Misses

Cache Miss : A request for data from the cache that cannot be filled because the data is not present in the cache.

Out-of-order processors can allow execution of instructions while waiting for a cache miss, In-order-processors stall on a cache miss.

Handling Writes

Write Through

Always write the data in to both the memory and the cache.

Write Buffer

A queue that holds data while the data is waiting to be written to memory. It’s used for fewer memory access and higher performance.

Write Back

When write occurs, the new value is written only in the cache, the modified block is written to the lower level of the hierarchy when it’s replaced.

Split Cache

A scheme in which a level of the memory hierarchy is composed of two independent caches that operate in parallel with each other, with one handling instructions and one handling data.

A combined cache with a total size equal to the sum of the two split caches will usually have a better hit rate.

Measuring and Improving Cache Performance

Memory-stall clock cycles = Insturctions / Program * Misses / Instruction * Miss penalty

AMAT(Average Memory Access Time) = Time for a hit + Miss rate * Miss penalty

Reducing Cache Misses by More Flexible Placement of Blocks

fully associative

A cache structure in which a block can be placed in any location in the cache.

set associative

A cache that has a fixed number of locations where each block can be placed.

Set index = (Block nubmer) modulo (Number of sets in the cache)

Increasing degree of associativity:

usually decreases the miss rate
a potential increase in the hit time

least recently used(LRU)

A replacement scheme in which the block replaced is the one that has been unused for the longest time.

set-associative

Reducing Miss Penalty Using Multilevel Caches

The design for a primary and secondary cache are significantly different:

The primary cache focus on minimizing hit time to yield a shorter clock cycle or fewer pipeline stages.
The secondary cache focus on miss rate to reduce the penalty of long memory acess time.

Global Miss Rate

The fraction of references that miss in all relatives that miss in all levels of a multilevel cache.

Local Miss Rate

The fraction of references to one level of a cache that miss; used in multilevel hierarchies.

Performance in out-of-order processors
Memory-stall cycles / Instruction = Misses / Instruction * (Total miss latency - Overlapped miss latency)

autotuning : Considering block size and number of caches, some numerical libraries parameterize their algorithms and then search the parameter space at runtime to find the best combination for a particular computer.

Virtual Memory

virtual Memory implements the translation of a program’s address space to physical addresses.

Major motivations

Allow efficient and safe sharing of memory among multiple programs
Rmove the programming burdens of a small, limited amount of main memory

A virtual memory is called a page, and a virtual memory miss is called a page fault.

Address translation

Also called address mapping, the process by which a virtual address is mapped to an address used to access memory.

Several decisions in designing virtual memory systems

Pages should be large enough to amortize the high access time
Organizations that reduce the page fault are attractive: fully associative placement
Page faults can be handled in software because the overhead will be small compared to the disk access time.
Write-through will not work since write takes too long. Instead, virtual memory use write-back.

segmentation : A variable-size address mapping scheme in which an address consists of 2 parts: a segment number, which is mapped to a physical address, and a segment offset.

Memory Page

page table

The table contains the virtual to physical address translations in a virtual memory system. The table, which is stored in memory, is typically indexed by the virtual page number; each entry in the table contains the physical page number for that virtual page if the page is currently in memory.

page-table

reference bit

Also called use bit, which is set whenever a page is accessed. A simple inplementation for LRU(least recently used).

dirty bit

Dirty bit is set when any word in a page is written, which indicates whether the page needs to be written out before it’s memory can be given to another page.

A modified page is often called a dirty page.

Techs used to reduce page table storage:

Keep a limit register that restricts the size of the page table for a given process.
Divide the page table and let it grow from the highest addr down, and from the lowest addr up. There will be 2 pagetables and 2 separate limits.
Apply a hashing function to the virtual addr so that the page table need be only the size of the number of physical pages in main memory.
Multilevel page tables.
Allow the page tables to be paged, just as memory data.

TLB

translation-lookaside buffer(TLB)

A cache that keeps track of recently used address mappings to try to avoid an access to the page table, which makes addressing translation fast.

The hardware maintains an index that indicates the recommended entry to replace, thich is chosen randomly considering the complexity for hardware.

tlb

Virtual Addressed Cache

virtual addressed cache

A cache that is accessed with a virtual address rather than a physical address.

Aliasing occurs when there are two virtual addresses for the same page. This ambiguity would allow one program to write the data without the other program being aware that the data had changed.

A common compromise between physical addressing and virtual addressing is caches that are virtually indexed using just the page offset portion(which is physical address), but use physical tags. There is no alias problem in this case.

Implementing Protection with Virtual Memory

The hardware must provide at least 3 basic capabilities:

At least 2 modes to indicate kernel(or supervisor) process, or user(or executive) process.
A portion of processor state that a user process can read but not write.
The processor can go from user mode to supervisor mode and vice versa.

Sharing information across processes: the operating system modifies the page table of the accessing process, the write access bit can be used to restrict the sharing to just read.

Context Switch

Without TLB: change the page table register to the new address.
With TLB: clear the TLB entries that belong to the older process.

Handling TLB Misses and Page Faults

All misses are classified into one of the 3 categories:

Compulsory misses: A block that has never been in the cache.
Capacity misses: Blocks are replaced and then later retieved.
Conflick misses: Also called collision misses, these are cache misses that occur in set -associative or direct-mapped caches when multiple blocks compete for the same set.

Upon page fault, the OS complete:

Look up the page table entry to find the location of the referenced disk page.
Choose a physical page to replace. Write back the page if it’s dirty.
Read in the physical page.

Since TLB miss is much more frequent, the OS loads the TLB from the page table without examine the entry and restarts the instruction. If the entry is invalid, another exception occurs and the OS recognizes the page fault.

When an exception first occurs, the processor sets a bit that disables all other exceptions.

unmapped

A portion of the address space that cannot have page faults.

The OS places exception entry point code and the exception stack in unmapped memory.

thrashing and wording set

Continuously swapping pages between memory and disk is called thrashing, the set of popular pages is called working set.

prefeching

A tech in which data blocks needed in the future are brought into the cache early by the use of special instructions that specify the address of the block.

Fallacies and Pitfalls

Pitfall: Forgeting to account for byte addressing or the cache block size in simulating a cache.

Pitfall: Ignoring memory system behavior when writing programs or then generate code ina a compiler.

Pitfall: Using average memory access time to evaluate the memory hierarchy of an out-of-order processor.

If the processor continues to execute instructions, and may even sustain more cache misses during a cache miss.

Pitfall: Extending an address space by adding segments on top of an unsegmented address space.

This would cause addressing problems.

Storage and Other I/O Topics

Introduction

io-dev

Dependability, Reliability, and Availability

Dependability is the quality of delivered service such that reliance can justifiably be placed on this service.

Reliability is a measure of the continuous service accomplishment(or, equivalently, of the time to failure) from a reference point.

mean time to failure(MTTF) is a reliability measure. annual failure rate(AFR) is the percentage of devices that would be expected to fail in a year for a given MTTF. Service interruption is measured as mean time to repair(MTTR). Mean time between failures(MTBF) is simply the sum of MTTF and MTTR.

Availability is a measure of service accomplishment with respect to the alternation between the two states of accomplishment and interruption.

Availability = MTTF / MTBF

3 ways to improve MTTF:

Fault avoidance: preventing fault occurrence by construction.
Fault tolerance: using redundancy to allow the service to comply with the service specification despite faults occurring, which applies primarily to hardware faults.
Fault forecasting: predicting the presence and creation of faults, which applies to hardware and software faults, allowing the component to be replaced before it fails.

Disk Storage

nonvolatile

Storage device where data retins its value even when power is removed.

track

One of thousands of concentric circles that makes up the surface of a magnetic disk.

sector

One of the segments that make up a track on a magnetic disk; a sector is the smallest amount of information that is read or written on a disk.

Originally, all tracks had the same number of sectors and the same number of bits. With the introduction of zone bit recording(ZBR) , disk drives changed to a varying number of sectors per track. Thus increasing the drive capacity.

cylinder

Cylinder referes to all the tracks under the heads at a given point on all surfaces.

Seek time

The time for the process of positioning a read/write head over the proper track on a disk.

Rotational latency

Also called rotational delay , the time required for the desired sector of a disk to rotate under the read/write head; usually assumed to be half the rotation time.

Transfer time

Transfer time is the time to transfer a block of bits.

Disk controller usually handles the detailed control of the disk and the transfer between the disk and the memory. Controller time is the overhead the controller imposes in performing an I/O access.

Fash Storage

NOR flash : storage cell is similiar to a standard NOR gate. Typically used for BIOS memory.

NAND flash : offers greater storage density, but memory could only be read and written in blocks as wiring needed for random accesses was removed. Typically used for USB key.

wear leveling

To cope with bit wearing out, most NAND flash products include a controller to spread the writes by remaapping blocks that have been written many times to less trodden blocks.

Connecting Processors, Memory, and I/O Devices

processor-memory bus

A bus that connects processor and memory and that is short, generally high speed, and matched to the memory system so as to maximize memory-processor bandwidth.

I/O bus

By contrast, I/O bus can be lengthy, can have many types of devices connected to them, and often have a wide range in data bandwidth of the devices connected to them.

backplane bus

A bus that is designed to allow processors, memory, and I/O devices to coexist on a single bus.

I/O buses donot typically interface directly to the memoty but use either a processor-memory or a backplane bus to connect to memory.

I/O transaction

A sequence of operations over the interconnect that includes a request and may include a response, either of which may caryy data. A transaction is initiated by a single request and may take many individual bus operations.

Synchronous bus

A bus that includes a clock in the control lines and a fixed protocol for communicating that is relative to the clock.

Asynchronous interconnect

Uses a handshaking protocol for coordinating usage rather than a clock; can accommodate a wide variety of devices of differing speeds.

handshaking protocol

A series of steps used to coordinate asynchronous bus transfers in which the sender and receiver proceed to the next step only when both parties agree that the current step has been completed.

The I/O Interconnects of the x86 Processors

north bridge

The chip for memory controller hub next to the processor.

south bridge

The one connected to north bridge, witch is the I/O controller hub.

intel-io

Interfacing I/O Devices to the Processor, Memory, and Operating System

Giving commands to I/O devices

memory-mapped I/O

An I/O scheme in which portions of address space are assigned to I/O devices, and reads and writes to those addresses are interpreted as commands to the I/O device.

Communicating with the Processor

polling : The process of periodically checking the status of an I/O device to determine the need to service the device.
Interrupt-driven I/O : An I/O scheme that employs interrupts to indicate to the processor that an I/O device needs attention.

Transferring the Data between a Device and Memory

Use the processor to transfer data between a device and memory based on polling.
Make the transfer of data interrupt friven.
direct memory access (DMA) : Having the device controller transfer data directly to ro from the memory without involving the processor.

I/O Performance Measures

transaction processing

A type of application that involves handing small short operations(called transactions) that typically require both I/O and computation. Transaction processing applications typically have both response time requirements and throughput performance.

I/O Rate

Performance measure of I/Os per unit time, such as reads per second.

data rage

Performance measure of bytes per unit time, such as GB/sec.

Designing and I/O System

General approaches:

Find the weakest link in the I/O system, which will constrain the design.
Configure this component to sustain the required bandwidth.
Determine the requirements for the rest of the system and configure them to support this bandwidth.

Parallelism and I/O: Redundant Arrays of Inexpensive Disks

redundant arrays of inexpensive disks (RAID)

An organization of disks that uses an array of small and inexpensive disks so as to increase both performance and reliability.

No Redundancy (RAID 0)

Simply spreading data over multiple disks, called striping.

Mirroring (RAID 1)

Writing the identical data to multiple disks to increase data availability.

Error Detecting and Correcting Code (RAID 2)

RAID 2 borrows an error detection and correction scheme most often used for memories.

Bit-Interleaved Parit (RAID 3)

protection group is the group of data disks or blocks that share a common check disk or block. The cost of higher availability can be reduced to 1/n, where n is the number of disks in a protection group.

Block-Interleaved Parity (RAID 4)

The parity is stored as blocks and associated with a set of data blocks.

Distributed Block-Interleaved Parity (RAID 5)

To fix the parity-write bottleneck, the parity information can be spread throughout all the disks so that there is no single bottleneck for writes.

P+R Redundancy (RAID 6)

When a single failure correction is not sufficient, parity can be generalized to have a second calculation over the data and another check disk of information. The second check block allows recovery from a second failure.

Fallacies and Pitfalls

Fallacy: The rated mean time to failure of disks is almost 140 years, so disks practically never fail.

MTTF is calculated by putting thousands of disks in a room, run them for a few months, and count the number that fail.

The annual failure rate (AFR) is a more useful measure.

Fallacy: A GB/sec interconnect can transfer 1 GB of data in 1 second.

Generally cannot use 100% of any computer resource.
The definition of a GB of storage ( 230 ) and a GB persecond of bandwidth ( 109 ) donot agree.

OSs are the best place to schedule disk accesses.

Since the disk knows the actual mapping of the logical addresses onto a physical geometry of sectors, tracks, and surfaes, it can reduce the rotational and seek latencies by rescheduling.

Pitfall: Using the peak transfer rate of a portion of the I/O system to make performance projections or performance comparisons.

The peak performance is based on unrealistic assumptions about the system or are unattainble because of other system limitations.

Multicores, Multiprocessors, and Clusters

Introduction

multiprocessor

A computer system with at least two processors. This is in contrast to a uniprocessor , which has one.

job-level parallelism or process-level parallelism

Utilizing multiple processors by running independent programs simultaneously.

parallel processing program

A single program that runs on multiple processors simultaneously.

cluster

A set of compters connected over a local area network (LAN) that functions as a single large message-passing multiprocessor.

multicore microprocessor

A microprocessor containing multiple processors (“cores”) in a single integrated circuit.

para-cate

The Difficulty of Creating Paralel Processing Programs

According to Amdahl’s law

Execution time after improvement = Execution time affected by improvement / Amount of improvement + Execution time unaffected.

Thus,

$ S p e e d - u p = 1 ( 1 - F r a c t i o n t i m e a f f e c t e d ) + F r a c t i o n t i m e a f f e c t e d 100

Strong scaling

Speed-up achieved on a multiprocessor without increasing the size of problem.

Weak scaling

Speed-up achieved on a multiprocessor while increasing the size of the problem proportionally to the increase in the number of processors.

Shared Memory Multiprocessors

shared memory multiprocessor (SMP)

A parallel processor with a single address space, implying implicit communication with loads and stores.

Single address space multiplrocessors come in two styles:

uniform memory access (UMA) : A multiprocessor in which accesses to main memory take about the same amount of time no matter which processor requets the access and no matter which word is asked.
nonuniform memory access (NUMA) : A type of single address space multiprocessor in which some memory accesses are much faster than others depending on which processor asks for which word.

Data sharing:

synchronization

The process of coordinating the behavior of two or more processes, which may be running on different processors.

lock

A synchronization device that allows access to data to only one processor at a time.

shared-mem

Distributed Memory Multiprocessors

distri-mem

There were several attempts to build high-performance computers based on high-performance message-passing networks, while they were all too expensive than using LAN.

A weakness of separate memories for user memory turns into a strength in system availability.

It’s easier to replace a machine without bringing down the system in a cluster than in an SMP.
It’s easier to expand the system without bringing down the application that runs on top of the cluster.

Lower cost, high availability, improved power efficiency, and rapid, incremental expandability make clusters attractive to service providers for the Word Wide Web.

Hardware Multithreading

hardware multithreading

Increasing utilization of a processor by switching to annoher thread when one thread is stalled. To permit this, we must duplicate the independent state. For example, each thread would have a separate copy of the register file and the PC.

There are two main approaches to hardware multithreading.

Fine-grained multithreading A version of hardware multithreading that suggests switching between threads after every instruction.

Hiding the throughput losses that arise from both short and long stalls; while slows down the execution of the individual threads.
Coarse-grained multithreading A version of hardware multithreading that suggests switching between threads onlly after significant events, such as a cache miss.

Relieves the need to have thread switching be essentially free and is much less likely to slow down the execution of an individual thread; while it’s limited in its ability to overcome throughput losses, especially from shorter stalls, since thread switch requires pipeline be emptied or frozen (pipeline start-up cost).
Simultaneous multithreading (SMT) A version of multithreading that lowers the cost of multithreading by utilizing the resources neede for multiple issue, dynamically schedule microarchitecture.

thread-multi

你可能感兴趣的:(设计,硬件)

QQ群采集助手，精准引流必备神器 2401_87347160 其他经验分享
功能概述微信群查找与筛选工具是一款专为微信用户设计的辅助工具，它通过关键词搜索功能，帮助用户快速找到相关的微信群，并提供筛选是否需要验证的群组的功能。主要功能关键词搜索：用户可以输入关键词，工具将自动查找包含该关键词的微信群。筛选功能：工具提供筛选机制，用户可以选择是否只显示需要验证或不需要验证的群组。精准引流：通过上述功能，用户可以更精准地找到目标群组，进行有效的引流操作。3.设备需求该工具可以
【iOS】MVC设计模式 Magnetic_h ios mvc 设计模式 objective-c 学习 ui
MVC前言如何设计一个程序的结构，这是一门专门的学问，叫做"架构模式"（architecturalpattern），属于编程的方法论。MVC模式就是架构模式的一种。它是Apple官方推荐的App开发架构，也是一般开发者最先遇到、最经典的架构。MVC各层controller层Controller/ViewController/VC（控制器）负责协调Model和View，处理大部分逻辑它将数据从Mod
微服务下功能权限与数据权限的设计与实现 nbsaas-boot 微服务 java 架构
在微服务架构下，系统的功能权限和数据权限控制显得尤为重要。随着系统规模的扩大和微服务数量的增加，如何保证不同用户和服务之间的访问权限准确、细粒度地控制，成为设计安全策略的关键。本文将讨论如何在微服务体系中设计和实现功能权限与数据权限控制。1.功能权限与数据权限的定义功能权限：指用户或系统角色对特定功能的访问权限。通常是某个用户角色能否执行某个操作，比如查看订单、创建订单、修改用户资料等。数据权限：
理解Gunicorn：Python WSGI服务器的基石范范0825 ipython linux 运维
理解Gunicorn：PythonWSGI服务器的基石介绍Gunicorn，全称GreenUnicorn，是一个为PythonWSGI（WebServerGatewayInterface）应用设计的高效、轻量级HTTP服务器。作为PythonWeb应用部署的常用工具，Gunicorn以其高性能和易用性著称。本文将介绍Gunicorn的基本概念、安装和配置，帮助初学者快速上手。1.什么是Gunico
c++ 的iostream 和 c++的stdio的区别和联系黄卷青灯77 c++算法开发语言 iostream stdio
在C++中，iostream和C语言的stdio.h都是用于处理输入输出的库，但它们在设计、用法和功能上有许多不同。以下是两者的区别和联系：区别1.编程风格iostream（C++风格）：C++标准库中的输入输出流类库，支持面向对象的输入输出操作。典型用法是cin（输入）和cout（输出），使用>操作符来处理数据。更加类型安全，支持用户自定义类型的输入输出。#includeintmain(){in
消息中间件有哪些常见类型 xmh-sxh-1314 java
消息中间件根据其设计理念和用途，可以大致分为以下几种常见类型：点对点消息队列（Point-to-PointMessagingQueues）：在这种模型中，消息被发送到特定的队列中，消费者从队列中取出并处理消息。队列中的消息只能被一个消费者消费，消费后即被删除。常见的实现包括IBM的MQSeries、RabbitMQ的部分使用场景等。适用于任务分发、负载均衡等场景。发布/订阅消息模型（Pub/Sub
【一起学Rust | 设计模式】习惯语法——使用借用类型作为参数、格式化拼接字符串、构造函数广龙宇一起学Rust #Rust设计模式 rust 设计模式开发语言
提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录前言一、使用借用类型作为参数二、格式化拼接字符串三、使用构造函数总结前言Rust不是传统的面向对象编程语言，它的所有特性，使其独一无二。因此，学习特定于Rust的设计模式是必要的。本系列文章为作者学习《Rust设计模式》的学习笔记以及自己的见解。因此，本系列文章的结构也与此书的结构相同（后续可能会调成结构），基本上分为三个部分
Google earth studio 简介陟彼高冈yu 旅游
GoogleEarthStudio是一个基于Web的动画工具，专为创作使用GoogleEarth数据的动画和视频而设计。它利用了GoogleEarth强大的三维地图和卫星影像数据库，使用户能够轻松地创建逼真的地球动画、航拍视频和动态地图可视化。网址为https://www.google.com/earth/studio/。GoogleEarthStudio是一个基于Web的动画工具，专为创作使用G
DIV+CSS+JavaScript技术制作网页（旅游主题网页设计与制作）云南大理 STU学生网页设计网页设计期末网页作业 html静态网页 html5期末大作业网页设计 web大作业
️精彩专栏推荐作者主页:【进入主页—获取更多源码】web前端期末大作业：【HTML5网页期末作业(1000套)】程序员有趣的告白方式：【HTML七夕情人节表白网页制作(110套)】文章目录二、网站介绍三、网站效果▶️1.视频演示2.图片演示四、网站代码HTML结构代码CSS样式代码五、更多源码二、网站介绍网站布局方面：计划采用目前主流的、能兼容各大主流浏览器、显示效果稳定的浮动网页布局结构。网站程
关于城市旅游的HTML网页设计——(旅游风景云南 5页)HTML+CSS+JavaScript 二挡起步 web前端期末大作业 javascript html css 旅游风景
⛵源码获取文末联系✈Web前端开发技术描述网页设计题材，DIV+CSS布局制作,HTML+CSS网页设计期末课程大作业|游景点介绍|旅游风景区|家乡介绍|等网站的设计与制作|HTML期末大学生网页设计作业，Web大学生网页HTML：结构CSS：样式在操作方面上运用了html5和css3，采用了div+css结构、表单、超链接、浮动、绝对定位、相对定位、字体样式、引用视频等基础知识JavaScrip
HTML网页设计制作大作业（div+css）云南我的家乡旅游景点带文字滚动二挡起步 web前端期末大作业 web设计网页规划与设计 html css javascript dreamweaver 前端
Web前端开发技术描述网页设计题材，DIV+CSS布局制作,HTML+CSS网页设计期末课程大作业游景点介绍|旅游风景区|家乡介绍|等网站的设计与制作HTML期末大学生网页设计作业HTML：结构CSS：样式在操作方面上运用了html5和css3，采用了div+css结构、表单、超链接、浮动、绝对定位、相对定位、字体样式、引用视频等基础知识JavaScript：做与用户的交互行为文章目录前端学习路线
121. 买卖股票的最佳时机薄荷糖的味道_fb40
给定一个数组，它的第i个元素是一支给定股票第i天的价格。如果你最多只允许完成一笔交易（即买入和卖出一支股票），设计一个算法来计算你所能获取的最大利润。注意你不能在买入股票前卖出股票。示例1:输入:[7,1,5,3,6,4]输出:5解释:在第2天（股票价格=1）的时候买入，在第5天（股票价格=6）的时候卖出，最大利润=6-1=5。注意利润不能是7-1=6,因为卖出价格需要大于买入价格。示例2:输入:
Faiss Tips：高效向量搜索与聚类的利器焦习娜Samantha
FaissTips：高效向量搜索与聚类的利器faiss_tipsSomeusefultipsforfaiss项目地址:https://gitcode.com/gh_mirrors/fa/faiss_tips项目介绍Faiss是由FacebookAIResearch开发的一个用于高效相似性搜索和密集向量聚类的库。它支持多种硬件平台，包括CPU和GPU，能够在海量数据集上实现快速的近似最近邻搜索（AN
ARM中断处理过程落汤老狗嵌入式linux
一、前言本文主要以ARM体系结构下的中断处理为例，讲述整个中断处理过程中的硬件行为和软件动作。具体整个处理过程分成三个步骤来描述：1、第二章描述了中断处理的准备过程2、第三章描述了当发生中的时候，ARM硬件的行为3、第四章描述了ARM的中断进入过程4、第五章描述了ARM的中断退出过程本文涉及的代码来自3.14内核。另外，本文注意描述ARM指令集的内容，有些sourcecode为了简短一些，删除了T
CX8836：小体积大功率升降压方案推荐（附Demo设计指南）诚芯微科技社交电子
CX8836是一颗同步四开关单向升降压控制器，在4.5V-40V宽输入电压范围内稳定工作，持续负载电流10A，能够在输入高于或低于输出电压时稳定调节输出电压，可适用于USBPD快充、车载充电器、HUB、汽车启停系统、工业PC电源等多种升降压应用场合，为大功率TYPE-CPD车载充电器提供最优解决方案。提供CX8836Demo测试、CX8836样品申请及CX8836方案开发技术支持。CX8836同升
数据仓库——维度表一致性墨染丶eye 背诵数据仓库
数据仓库基础笔记思维导图已经整理完毕，完整连接为：数据仓库基础知识笔记思维导图维度一致性问题从逻辑层面来看，当一系列星型模型共享一组公共维度时，所涉及的维度称为一致性维度。当维度表存在不一致时，短期的成功难以弥补长期的错误。维度时确保不同过程中信息集成起来实现横向钻取货活动的关键。造成横向钻取失败的原因维度结构的差别，因为维度的差别，分析工作涉及的领域从简单到复杂，但是都是通过复杂的报表来弥补设计
高级 ECharts 技巧：自定义图表主题与样式 SnowMan1993 echarts 信息可视化数据分析
ECharts是一个强大的数据可视化库，提供了多种内置主题和样式，但你也可以根据项目的设计需求，自定义图表的主题与样式。本文将介绍如何使用ECharts自定义图表主题，以提升数据可视化的吸引力和一致性。1.什么是ECharts主题？ECharts的主题是指定义图表样式的配置项，包括颜色、字体、线条样式等。通过预设主题，你可以快速更改图表的整体风格，而自定义主题则允许你在此基础上进行个性化设置。2.
Redis系列：Geo 类型赋能亿级地图位置计算 Ly768768 redis bootstrap 数据库
1前言我们在篇深刻理解高性能Redis的本质的时候就介绍过Redis的几种基本数据结构，它是基于不同业务场景而设计的：动态字符串(REDIS_STRING)：整数(REDIS_ENCODING_INT)、字符串(REDIS_ENCODING_RAW)双端列表(REDIS_ENCODING_LINKEDLIST)压缩列表(REDIS_ENCODING_ZIPLIST)跳跃表(REDIS_ENCODI
Low Power概念介绍-Voltage Area 飞奔的大虎
随着智能手机，以及物联网的普及，芯片功耗的问题最近几年得到了越来越多的重视。为了实现集成电路的低功耗设计目标，我们需要在系统设计阶段就采用低功耗设计的方案。而且，随着设计流程的逐步推进，到了芯片后端设计阶段，降低芯片功耗的方法已经很少了，节省的功耗百分比也不断下降。芯片的功耗主要由静态功耗（staticleakagepower）和动态功耗(dynamicpower)构成。静态功耗主要是指电路处于等
广州会刊小程序开发公司哪家好｜开发多少钱费用｜专业外包服务红匣子实力推荐
在选择广州会刊小程序开发公司时，有几个关键因素需要考虑。首先，您应该确定自己的需求和目标，以便找到最合适的开发公司。其次，您需要考虑公司的经验和专业知识。最后，您还应该考虑公司的信誉和口碑。开发-联系电话：13642679953（微信同号）首先，您应该明确自己的需求和目标。会刊小程序是一种用于展示会议信息和日程安排的应用程序。在选择开发公司之前，您应该明确自己的需求，包括功能要求、设计风格和用户体
南美洲的奇特艺术品【神秘档案馆·第三期】清风小和尚
本期回答问题：1.复活节岛石像是谁建造的？2.复活节岛石像的建造方法与目的？3.纳斯卡线条的设计意义？南美洲是南亚美利加洲的简称，位于西半球的南部，东濒大西洋，西临太平洋，北滨加勒比海，南隔德雷克海峡与南极洲相望。对南美洲最简单的定位方法是：美国南面。南美洲是地球上第四大的大洲，有着种类繁多的物种和丰富的地形。在这片广袤的土地上，有两样奇特的艺术品---复活节岛摩艾石像与纳斯卡线条。摩艾石像（Mo
一比一复刻手表哪里可以买到？推荐三个可靠渠道腕表世界
在我国，提及一比一复刻手表，人们总是充满好奇与争议。这种高度仿真的复刻手表，凭借其精湛的工艺、时尚的设计，以及与正品相差无几的质感，深受一部分消费者的喜爱。但与此同时，其背后的侵权争议也一直不断。那么，究竟哪里可以买到这些令人心动的一比一复刻手表呢？腕表咨询微信：10428850一、何为一比一复刻手表？一比一复刻手表，指的是严格按照正版手表的设计、尺寸和工艺制作的仿制品。这些手表在材质、外观、功能
简单了解 JVM 记得开心一点啊 jvm
目录♫什么是JVM♫JVM的运行流程♫JVM运行时数据区♪虚拟机栈♪本地方法栈♪堆♪程序计数器♪方法区/元数据区♫类加载的过程♫双亲委派模型♫垃圾回收机制♫什么是JVMJVM是JavaVirtualMachine的简称，意为Java虚拟机。虚拟机是指通过软件模拟的具有完整硬件功能的、运行在一个完全隔离的环境中的完整计算机系统（如：JVM、VMwave、VirtualBox）。JVM和其他两个虚拟机
2021-01-24 9ce517ee104c
【打卡素材】《香帅金融学讲义》【标题】公司治理：怎样同床异梦地过下去【日期】2021.1.24【字数】公司本质上是一连串的合约关系。降低合同执行中的各种摩擦是公司正常有效运行的基础。协同各方的利益、制衡各方的权力是关键。为解决利益冲突问题、协同各方利益，进行权力制衡的机制设计就是公司治理机制。001什么是公司治理治理是管理的基础，治理机制越好，权、责、利就越清晰，管理的目标也就会更容易实现。002
数据结构 | 栈和队列 TT-Kun 数据结构与算法数据结构栈队列 C语言
文章目录栈和队列1.栈：后进先出（LIFO）的数据结构1.1概念与结构1.2栈的实现2.队列：先进先出（FIFO）的数据结构2.1概念与结构2.2队列的实现3.栈和队列算法题3.1有效的括号3.2用队列实现栈3.3用栈实现队列3.4设计循环队列结论栈和队列在计算机科学中，栈和队列是两种基本且重要的数据结构，它们在处理数据存储和访问顺序方面有着独特的规则和应用。本文将详细介绍栈和队列的概念、结构、实
python多线程程序设计之一 IT_Beijing_BIT #Python 程序设计语言 python
python多线程程序设计之一全局解释器锁线程APIsthreading.active_count()threading.current_thread()threading.excepthook(args,/)threading.get_native_id()threading.main_thread()threading.stack_size([size])线程对象成员函数构造器start/ru
可以赚钱的app，你们都在用哪些？配音新手圈
1.七猫免费小说2.有柿3.番茄小说兼职副业推荐公众号，配音新手圈，声优配音圈，新配音兼职圈，配音就业圈，鼎音副业，有声新手圈，每天更新各种远程工作与在线兼职，职位包括：写手、程序开发、剪辑、设计、翻译、配音、无门槛、插画、翻译、等等。。。每日更新兼职。4.速读免费小说5.得间免费小说6.快手7.快手极速8.抖音火山版（可提0.2，可能我懒赚的慢，但真不推荐）9.拼多多10.淘宝11.点淘12.美
系统架构设计师需求分析篇二 AmHardy 软件架构设计师系统架构需求分析面向对象分析分析模型 UML和SysML
面向对象分析方法1.用例模型构建用例模型一般需要经历4个阶段：识别参与者：识别与系统交互的所有事物。合并需求获得用例：将需求分配给予其相关的参与者。细化用例描述：详细描述每个用例的功能。调整用例模型：优化用例之间的关系和结构，前三个阶段是必需的。2.用例图的三元素参与者：使用系统的用户或其他外部系统和设备。用例：系统所提供的服务。通信关联：参与者和用例之间的关系，或用例与用例之间的关系。3.识别参
简单说说关于shell中zsh和bash的选择秋刀prince MacOS 小猿们的开发日常 bash
希望文章能给到你启发和灵感～如果觉得文章对你有帮助的话，点赞+关注+收藏支持一下博主吧～阅读指南开篇说明一、基础环境说明1.1硬件环境1.2软件环境二、什么是shell、bash、zsh?2.1bash2.2zsh三、选择Bash还是Zsh？四、一些常见问题开篇说明本篇主要简单说明一下，shell中bash和zsh的区别和选择；我们经常会把这两个搞混，不知道什么时候用哪一个，以及怎么使用；一、基础
Python神器！WEB自动化测试集成工具 DrissionPage 亚丁号 python 开发语言
一、前言用requests做数据采集面对要登录的网站时，要分析数据包、JS源码，构造复杂的请求，往往还要应付验证码、JS混淆、签名参数等反爬手段，门槛较高。若数据是由JS计算生成的，还须重现计算过程，体验不好，开发效率不高。使用浏览器，可以很大程度上绕过这些坑，但浏览器运行效率不高。因此，这个库设计初衷，是将它们合而为一，能够在不同须要时切换相应模式，并提供一种人性化的使用方法，提高开发和运行效率
解读Servlet原理篇二---GenericServlet与HttpServlet 周凡杨 java HttpServlet 源理 GenericService 源码
在上一篇《解读Servlet原理篇一》中提到，要实现javax.servlet.Servlet接口（即写自己的Servlet应用），你可以写一个继承自javax.servlet.GenericServletr的generic Servlet ，也可以写一个继承自java.servlet.http.HttpServlet的HTTP Servlet（这就是为什么我们自定义的Servlet通常是exte
MySQL性能优化 bijian1013 数据库 mysql
性能优化是通过某些有效的方法来提高MySQL的运行速度，减少占用的磁盘空间。性能优化包含很多方面，例如优化查询速度，优化更新速度和优化MySQL服务器等。本文介绍方法的主要有： a.优化查询 b.优化数据库结构
ThreadPool定时重试 dai_lm java ThreadPool thread timer timertask
项目需要当某事件触发时，执行http请求任务，失败时需要有重试机制，并根据失败次数的增加，重试间隔也相应增加，任务可能并发。由于是耗时任务，首先考虑的就是用线程来实现，并且为了节约资源，因而选择线程池。为了解决不定间隔的重试，选择Timer和TimerTask来完成 package threadpool; public class ThreadPoolTest {
Oracle 查看数据库的连接情况周凡杨 sql oracle 连接
首先要说的是，不同版本数据库提供的系统表会有不同，你可以根据数据字典查看该版本数据库所提供的表。 select * from dict where table_name like '%SESSION%'; 就可以查出一些表，然后根据这些表就可以获得会话信息 select sid,serial#,status,username,schemaname,osuser,terminal,ma
类的继承朱辉辉33 java
类的继承可以提高代码的重用行，减少冗余代码；还能提高代码的扩展性。Java继承的关键字是extends 格式:public class 类名（子类）extends 类名（父类）{ } 子类可以继承到父类所有的属性和普通方法，但不能继承构造方法。且子类可以直接使用父类的public和 protected属性，但要使用private属性仍需通过调用。子类的方法可以重写，但必须和父类的返回值类
android 悬浮窗特效肆无忌惮_ android
最近在开发项目的时候需要做一个悬浮层的动画，类似于支付宝掉钱动画。但是区别在于，需求是浮出一个窗口，之后边缩放边位移至屏幕右下角标签处。效果图如下：一开始考虑用自定义View来做。后来发现开线程让其移动很卡，ListView+动画也没法精确定位到目标点。后来想利用Dialog的dismiss动画来完成。自定义一个Dialog后，在styl
hadoop伪分布式搭建林鹤霄 hadoop
要修改4个文件 1: vim hadoop-env.sh 第九行 2: vim core-site.xml <configuration> &n
gdb调试命令 aigo gdb
原文：http://blog.csdn.net/hanchaoman/article/details/5517362 一、GDB常用命令简介 r run 运行.程序还没有运行前使用 c cuntinue
Socket编程的HelloWorld实例 alleni123 socket
public class Client { public static void main(String[] args) { Client c=new Client(); c.receiveMessage(); } public void receiveMessage(){ Socket s=null; BufferedRea
线程同步和异步百合不是茶线程同步异步
多线程和同步 : 如进程、线程同步，可理解为进程或线程A和B一块配合，A执行到一定程度时要依靠B的某个结果，于是停下来，示意B运行；B依言执行，再将结果给A；A再继续操作。所谓同步，就是在发出一个功能调用时，在没有得到结果之前，该调用就不返回，同时其它线程也不能调用这个方法多线程和异步:多线程可以做不同的事情,涉及到线程通知 &
JSP中文乱码分析 bijian1013 java jsp 中文乱码
在JSP的开发过程中，经常出现中文乱码的问题。首先了解一下Java中文问题的由来： Java的内核和class文件是基于unicode的，这使Java程序具有良好的跨平台性，但也带来了一些中文乱码问题的麻烦。原因主要有两方面，
js实现页面跳转重定向的几种方式 bijian1013 JavaScript 重定向
js实现页面跳转重定向有如下几种方式：一.window.location.href <script language="javascript"type="text/javascript"> window.location.href="http://www.baidu.c
【Struts2三】Struts2 Action转发类型 bit1129 struts2
在【Struts2一】 Struts Hello World http://bit1129.iteye.com/blog/2109365中配置了一个简单的Action，配置如下 <!DOCTYPE struts PUBLIC "-//Apache Software Foundation//DTD Struts Configurat
【HBase十一】Java API操作HBase bit1129 hbase
Admin类的主要方法注释： 1. 创建表 /** * Creates a new table. Synchronous operation. * * @param desc table descriptor for table * @throws IllegalArgumentException if the table name is res
nginx gzip ronin47 nginx gzip
Nginx GZip 压缩 Nginx GZip 模块文档详见：http://wiki.nginx.org/HttpGzipModule 常用配置片段如下： gzip on; gzip_comp_level 2; # 压缩比例，比例越大，压缩时间越长。默认是1 gzip_types text/css text/javascript; # 哪些文件可以被压缩 gzip_disable &q
java-7.微软亚院之编程判断俩个链表是否相交给出俩个单向链表的头指针，比如 h1 ， h2 ，判断这俩个链表是否相交 bylijinnan java
public class LinkListTest { /** * we deal with two main missions: * * A. * 1.we create two joined-List(both have no loop) * 2.whether list1 and list2 join * 3.print the join
Spring源码学习-JdbcTemplate batchUpdate批量操作 bylijinnan java spring
Spring JdbcTemplate的batch操作最后还是利用了JDBC提供的方法，Spring只是做了一下改造和封装 JDBC的batch操作： String sql = "INSERT INTO CUSTOMER " + "(CUST_ID, NAME, AGE) VALUES (?, ?, ?)";
[JWFD开源工作流]大规模拓扑矩阵存储结构最新进展 comsci 工作流
生成和创建类已经完成,构造一个100万个元素的矩阵模型,存储空间只有11M大,请大家参考我在博客园上面的文档"构造下一代工作流存储结构的尝试",更加相信的设计和代码将陆续推出......... 竞争对手的能力也很强.......,我相信..你们一定能够先于我们推出大规模拓扑扫描和分析系统的....
base64编码和url编码 cuityang base64 url
import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import java.io.PrintWriter; import java.io.StringWriter; import java.io.UnsupportedEncodingException;
web应用集群Session保持 dalan_123 session
关于使用 memcached 或redis 存储 session ，以及使用 terracotta 服务器共享。建议使用 redis，不仅仅因为它可以将缓存的内容持久化，还因为它支持的单个对象比较大，而且数据类型丰富，不只是缓存 session，还可以做其他用途，一举几得啊。1、使用 filter 方法存储这种方法比较推荐，因为它的服务器使用范围比较多，不仅限于tomcat ，而且实现的原理比较简
Yii 框架里数据库操作详解-[增加、查询、更新、删除的方法 'AR模式'] dcj3sjt126com 数据库
public function getMinLimit () { $sql = "..."; $result = yii::app()->db->createCo
solr StatsComponent（聚合统计） eksliang solr聚合查询 solr stats
StatsComponent 转载请出自出处：http://eksliang.iteye.com/blog/2169134 http://eksliang.iteye.com/ 一、概述 Solr可以利用StatsComponent 实现数据库的聚合统计查询，也就是min、max、avg、count、sum的功能二、参数
百度一道面试题 greemranqq 位运算百度面试寻找奇数算法 bitmap 算法
那天看朋友提了一个百度面试的题目：怎么找出{1,1,2,3,3,4,4,4,5,5,5,5} 找出出现次数为奇数的数字. 我这里复制的是原话，当然顺序是不一定的，很多拿到题目第一反应就是用map,当然可以解决，但是效率不高。还有人觉得应该用算法xxx,我是没想到用啥算法好...！还有觉得应该先排序... 还有觉
Spring之在开发中使用SpringJDBC ihuning spring
在实际开发中使用SpringJDBC有两种方式： 1. 在Dao中添加属性JdbcTemplate并用Spring注入； JdbcTemplate类被设计成为线程安全的，所以可以在IOC 容器中声明它的单个实例，并将这个实例注入到所有的 DAO 实例中。JdbcTemplate也利用了Java 1.5 的特定(自动装箱，泛型，可变长度
JSON API 1.0 核心开发者自述 | 你所不知道的那些技术细节 justjavac json
2013年5月，Yehuda Katz 完成了JSON API(英文，中文) 技术规范的初稿。事情就发生在 RailsConf 之后，在那次会议上他和 Steve Klabnik 就 JSON 雏形的技术细节相聊甚欢。在沟通单一 Rails 服务器库—— ActiveModel::Serializers 和单一 JavaScript 客户端库——&
网站项目建设流程概述 macroli 工作
一.概念网站项目管理就是根据特定的规范、在预算范围内、按时完成的网站开发任务。二.需求分析项目立项　　我们接到客户的业务咨询，经过双方不断的接洽和了解，并通过基本的可行性讨论够，初步达成制作协议，这时就需要将项目立项。较好的做法是成立一个专门的项目小组，小组成员包括：项目经理，网页设计，程序员，测试员，编辑/文档等必须人员。项目实行项目经理制。客户的需求说明书　　第一步是需
AngularJs 三目运算表达式判断 qiaolevip 每天进步一点点学习永无止境众观千象 AngularJS
事件回顾：由于需要修改同一个模板，里面包含2个不同的内容，第一个里面使用的时间差和第二个里面名称不一样，其他过滤器，内容都大同小异。希望杜绝If这样比较傻的来判断if-show or not，继续追究其源码。 var b = "{{", a = "}}"; this.startSymbol = function(a) {
Spark算子：统计RDD分区中的元素及数量 superlxw1234 spark spark算子 Spark RDD分区元素
关键字：Spark算子、Spark RDD分区、Spark RDD分区元素数量 Spark RDD是被分区的，在生成RDD时候，一般可以指定分区的数量，如果不指定分区数量，当RDD从集合创建时候，则默认为该程序所分配到的资源的CPU核数，如果是从HDFS文件创建，默认为文件的Block数。可以利用RDD的mapPartitionsWithInd
Spring 3.2.x将于2016年12月31日停止支持 wiselyman Spring 3
Spring 团队公布在2016年12月31日停止对Spring Framework 3.2.x（包含tomcat 6.x）的支持。在此之前spring团队将持续发布3.2.x的维护版本。请大家及时准备及时升级到Spring
fis纯前端解决方案fis-pure zccst JavaScript
作者：zccst FIS通过插件扩展可以完美的支持模块化的前端开发方案，我们通过FIS的二次封装能力，封装了一个功能完备的纯前端模块化方案pure。 1，fis-pure的安装 $ fis install -g fis-pure $ pure -v 0.1.4 2，下载demo到本地 git clone https://github.com/hefangshi/f