Friday, October 16, 2009

Nvidia's Next Generation: Fermi - key architectural highlights

Third Generation Streaming Multiprocessor (SM)
  • 32 CUDA cores per SM, 4x over GT200
  • 8x the peak double precision floating point performance over GT200
  • Dual Warp Scheduler simultaneously schedules and dispatches instructions from two independent warps
  • 64 KB of RAM with a configurable partitioning of shared memory and L1 cache

Second Generation Parallel Thread Execution ISA
  • Unified Address Space with Full C++ Support
  • Optimized for OpenCL and DirectCompute
  • Full IEEE 754-2008 32-bit and 64-bit precision
  • Full 32-bit integer path with 64-bit extensions
  • Memory access instructions to support transition to 64-bit addressing
  • Improved Performance through Predication

Improved Memory Subsystem
  • NVIDIA Parallel DataCache™ hierarchy with Configurable L1 and Unified L2
  • Caches
  • First GPU with ECC memory support
  • Greatly improved atomic memory operation performance

NVIDIA GigaThread™ Engine
  • 10x faster application context switching
  • Concurrent kernel execution
  • Out of Order thread block execution
  • Dual overlapped memory transfer engines
more information: http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf

3 comments:

jo said...

Does this mean OpenCL double precision will be supported finally (through cl_khr_fp64 extension)?

Wendell Rodrigues said...

I don't know. What we see if we make a QueryDevice on the NVidia's GTX280? There is no reference about fp64?? If you use CUDA this is ok. But I don't know with OpenCL.

jo said...

Confirmed on nVidia forums: they are working on enabling fp64 extension for 2XX series.