GPU/OpenCL Modeling: Nvidia's Next Generation: Fermi

Friday, October 16, 2009

Nvidia's Next Generation: Fermi - key architectural highlights

Third Generation Streaming Multiprocessor (SM)
32 CUDA cores per SM, 4x over GT200
8x the peak double precision floating point performance over GT200
Dual Warp Scheduler simultaneously schedules and dispatches instructions from two independent warps
64 KB of RAM with a configurable partitioning of shared memory and L1 cache

Second Generation Parallel Thread Execution ISA
Unified Address Space with Full C++ Support
Optimized for OpenCL and DirectCompute
Full IEEE 754-2008 32-bit and 64-bit precision
Full 32-bit integer path with 64-bit extensions
Memory access instructions to support transition to 64-bit addressing
Improved Performance through Predication

Improved Memory Subsystem
NVIDIA Parallel DataCache™ hierarchy with Configurable L1 and Unified L2
Caches
First GPU with ECC memory support
Greatly improved atomic memory operation performance

NVIDIA GigaThread™ Engine
10x faster application context switching
Concurrent kernel execution
Out of Order thread block execution
Dual overlapped memory transfer engines
more information: http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf

3 comments:

jo said...: Does this mean OpenCL double precision will be supported finally (through cl_khr_fp64 extension)?; October 18, 2009 at 5:46 PM
Wendell Rodrigues said...: I don't know. What we see if we make a QueryDevice on the NVidia's GTX280? There is no reference about fp64?? If you use CUDA this is ok. But I don't know with OpenCL.; October 23, 2009 at 5:41 PM
jo said...: Confirmed on nVidia forums: they are working on enabling fp64 extension for 2XX series.; October 23, 2009 at 11:41 PM

Pages

Friday, October 16, 2009

Nvidia's Next Generation: Fermi - key architectural highlights

3 comments: