Third Generation Streaming Multiprocessor (SM)
- 32 CUDA cores per SM, 4x over GT200
- 8x the peak double precision floating point performance over GT200
- Dual Warp Scheduler simultaneously schedules and dispatches instructions from two independent warps
- 64 KB of RAM with a configurable partitioning of shared memory and L1 cache
Second Generation Parallel Thread Execution ISA
- Unified Address Space with Full C++ Support
- Optimized for OpenCL and DirectCompute
- Full IEEE 754-2008 32-bit and 64-bit precision
- Full 32-bit integer path with 64-bit extensions
- Memory access instructions to support transition to 64-bit addressing
- Improved Performance through Predication
Improved Memory Subsystem
- NVIDIA Parallel DataCache™ hierarchy with Configurable L1 and Unified L2
- Caches
- First GPU with ECC memory support
- Greatly improved atomic memory operation performance
NVIDIA GigaThread™ Enginemore information: http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf
- 10x faster application context switching
- Concurrent kernel execution
- Out of Order thread block execution
- Dual overlapped memory transfer engines
Friday, October 16, 2009
Nvidia's Next Generation: Fermi - key architectural highlights
Subscribe to:
Post Comments (Atom)
3 comments:
Does this mean OpenCL double precision will be supported finally (through cl_khr_fp64 extension)?
I don't know. What we see if we make a QueryDevice on the NVidia's GTX280? There is no reference about fp64?? If you use CUDA this is ok. But I don't know with OpenCL.
Confirmed on nVidia forums: they are working on enabling fp64 extension for 2XX series.
Post a Comment