From Wikipedia, the free encyclopedia
Jump to: navigation, search
The Time Stamp Counter is a 64-bit register present on all x86 processors since the Pentium. It counts the number of ticks since reset. Instruction RDTSC returns the TSC in EDX:EAX. Its opcode is 0F 31.[1] Pentium competitors such as the Cyrix 6x86 did not always have a TSC and may consider RDTSC an illegal instruction. Cyrix included a Time Stamp Counter in their MII.
The time stamp counter has, until recently, been an excellent high-resolution, low-overhead way of getting CPU timing information. With the advent of multi-core/hyperthreaded CPUs, systems with multiple CPUs, and "hibernating" operating systems, the TSC cannot be relied on to provide accurate results - unless great care is taken to correct the possible flaws: rate of tick and whether all cores (processors) have identical values in their time-keeping registers. There is no promise that the timestamp counters of multiple CPUs on a single motherboard will be synchronized. In such cases, programmers can only get reliable results by locking their code to a single CPU. Even then, the CPU speed may change due to power-saving measures taken by the OS or BIOS, or the system may be hibernated and later resumed (resetting the time stamp counter). In those latter cases, to stay relevant, the counter must be recalibrated periodically (according to the time resolution your application requires).
Reliance on the time stamp counter also reduces portability, as other processors may not have a similar feature. Recent Intel processors include a constant rate TSC (identified by the constant_tsc flag in Linux's /proc/cpuinfo). With these processors the TSC reads at the processor's maximum rate regardless of the actual CPU running rate. While this makes time keeping more consistent, it can skew benchmarks, where a certain amount of spin-up time is spent at a lower clock rate before the OS switches the processor to the higher rate. This has the effect of making things seem like they require more processor cycles than they normally would.
Under Windows platforms, Microsoft strongly discourages using the TSC for high-resolution timing for exactly these reasons, providing instead the Windows APIs QueryPerformanceCounter and QueryPerformanceFrequency.[2] Even when using these functions, Microsoft recommends the code to be locked to a single CPU. Under Linux, similar functionality is provided by reading the value of CLOCK_MONOTONIC clock using POSIX clock_gettime function.
Starting with the Pentium Pro, Intel processors have supported out-of-order execution, where instructions are not necessarily performed in the order they appear in the executable. This can cause RDTSC to be executed later than expected, producing a misleading cycle count.[3] This problem can be solved by executing a serializing instruction, such as CPUID, to force every preceding instruction to complete before allowing the program to continue or by using RDTSCP instruction, which is a serializing variant of the RDTSC instruction.
------------------------------------------------------------------------------------------
1. 多核的系统中, 如果多个核之间没有同步, 会导致 TSC 误差. 所以如过想在多核系统中使用TSC, 必须先同步
2. 在 Windows 用 QueryPerformanceCounter 和 QueryPerformanceFrequency,Linux 下用 POSIX 的 clock_gettime 函数,以 CLOCK_MONOTONIC 参数调用。 这个时钟频率的典型值是 10MHz 左右,比 CPU 主频低得多。主板上这个时钟是稳定的(物理因素导致的漂移除外),不随 CPU 主频变化而变化。当然主板型号不同,频率可能有所不同
3. The time stamp counter becomes invalid if a thread jumps between different CPU cores. You may have to fix the thread to a specific CPU core during time measurements to avoid this. (In Windows, use SetThreadAffinityMask).