Nvidia CUDA 3.0 更新

- Section 1.2
    - Updated figure


- Section 2.5
    - Mentioned the Fermi architecture


- Section 3.1
    - Heavily rewritten to clarify binary, ptx, application, C++ compatibility
    - __noinline__ behaves differently for compute capability 2.0 and higher


3.1.1 部分详细介绍了nvcc的编译过程,怎么把CU文件或者CUDA的程序编译成目标文件,怎么把C/C++语言的部分提交给C或者C++的编译器编译。

3.1.2 说明了二进制文件的情况,说明了code代表的意思,说明例如1.3的标示说明这个二进制的文件是在1.3的硬件或者之后的硬件上才能运行。

3.1.3 简单说明了一下PTX的指令一般都可以执行,但是有些指令只能在更高的硬件设备上才能执行;

3.1.4 说明了不同的版本的二进制文件和ptx代码,在将来的硬件上执行的情况,当然手册推荐采用PTX代码格式,以后就可以在运行的时候自动转义过去,这样就可以适应更新的特性,因为其实现在的一些硬件在编译一条ptx指令的时候,可能真正的在硬件方面其实使用了更多的指令,因为还不支持原生态的ptx指令,当以后的ptx指令可以一条执行的时候,就会发生变化,所以这个地方提出了说明;



- Section 3.2
    - Clarified that a CUDA context is created under the hood when initializing
      the runtime and therefore CUDA resources are only valid in the context of
      the host thread that initialized the runtime
    - Updated graphics interoperability sections to new API

说明了现在的CUDA运行的每一个资源都在他的同一个context里面,这个后面也会说道,一个thread 控制一个GPU运行;

- Section 3.2.1
    - Mentioned 40-bit address space for devices of compute capability 2.0


- Section
    - Mentioned atomics to mapped page-locked memory


- Section 3.2.6
    - Added concurrent kernel execution and concurrent data transfer for devices
      of compute capability 2.0


- Section 3.3
    - Updated graphics interoperability sections to new API

- New Section 3.4 about interoperability between runtime and driver APIs
- Chapter 4 and 5 mostly rewritten with additional information
- Part of appendix A moved to new appendices G with additional information
- Section B.1.4
    - Mentioned that kernel parameters are passed via constant memory for
      devices of compute capability 2.0
- Section B.6
    - Added new functions __syncthreads_count(), __syncthreads_and(), and
- Section B.10
    - Mentioned atomics to mapped page-locked memory
- Section B.11
    - Added new functions __ballot()
- New Section B.12 on profiler counter function
- New Section B.14 on launch bounds
- Section C.1.1
    - Updated error for some functions
    - Updated based FMAD being fused for compute capability 2.0
- Section C.1.2
    - atomicAdd works with single-precision floating-point numbers for devices
      of compute capability 2.0
    - Updated error for some functions
- Section C.2.1
    - Added new functions
- Section C.2.2
    - Added new functions
- New Section D.6 about classes with non virtual member functions for devices
  of compute capability 2.0
- New appendix E for nvcc specifics (moved __noinline__, #pragma unroll to this
      appendix and added __restrict)




你可能感兴趣的:(Nvidia CUDA 3.0 更新)