Sunday, March 25, 2012

NVIDIA GeForce GTX 680 Kepler 2 GB

Introduction



How time flies. The last time we had a bleeding-edge graphics card from NVIDIA to try by fire, was exactly a year ago, with the launch of GeForce GTX 590. Quite a bit changed since then. Rival AMD is just about done with the launch of its Radeon HD 7000 series "Southern Islands" GPU family, and NVIDIA's answer to that felt wanted, for the past 3 months or so. Well, it’s finally here – our GeForce GTX 680 Kepler review.

Both evolutionary and revolutionary changes have gone into making the GeForce GTX 680. It's evolutionary in that it's designed to be an upgrade over its immediate predecessor (and not something two generations behind), and revolutionary new features make sure GTX 680 is a worthy upgrade. The evolutionary part of course is that NVIDIA wants to reclaim the title of having the fastest GPU out there; and the revolutionary part is a vibrant feature-set that supposedly contributes to never before seen energy-efficiency levels, to accomplish those steep design goals. 



Product Positioning


Speaking of positioning, NVIDIA is gunning for nothing short of the performance-crown for single-GPU graphics cards, with the GeForce GTX 680, and it wants to do so without having to compromise of energy-efficiency. In the past, we have seen both NVIDIA and AMD throw energy efficiency to the wind and come up with power-guzzling chips, in a blind pursuit of performance leadership. That's not the case with GeForce GTX 680.




See what we mean? To be fair to the HD 7970, it did impress us with its performance/Watt figures. What NVIDIA is looking to do is raise the bar with energy-efficiency. To accomplish that with the Kepler-based GTX 680, and go on to seek performance-leadership is a very tough ask, and takes some very gritty engineering.

One revolutionary change that allows GeForce GTX 680 to aim high, is an extremely smart self-tuning logic that fine-tunes clock speeds and voltages, on the fly, with zero user intervention, to yield the best possible combination of performance and efficiency for a given load scenario. The GTX 680 hence reshapes the definition of fixed load clock speed, with dynamic clock speeds. Think of it as a GPU-take on Intel's Turbo Boost technology, which works in conjunction with SpeedStep to produce the best performance-per-Watt for CPUs that feature it. 



GeForce GTX 680 Market Segment Analysis
Radeon
HD 7870
GeForce
GTX 580
Radeon
HD 7950
Radeon
HD 7970
GeForce
GTX 680
Radeon
HD 6990
GeForce
GTX 590
Shader Units12805121792204815362x 15362x 512
ROPs32483232322x 322x 48
Graphics ProcessorPitcairnGF110TahitiTahitiGK1042x Cayman2x GF110
Transistors2800M3000M4310M4310M3540M2x 2640M2x 3000M
Memory Size2048 MB1536 MB3072 MB3072 MB2048 MB2x 2048 MB2x 1536 MB
Memory Bus Width256 bit384 bit384 bit384 bit256 bit2x 256 bit2x 384 bit
Core Clock1000 MHz772 MHz800 MHz925 MHz1006 MHz+830 MHz607 MHz
Memory Clock1200 MHz1002 MHz1250 MHz1375 MHz1502 MHz1250 MHz855 MHz
Price$360$390$450$550$499$700$750

Architecture



  

At the heart of the GeForce GTX 680, is the GeForce Kepler architecture. Its design goals are to raise performance and energy-efficiency over the previous generation "Fermi" architecture. GeForce Kepler's architecture more or less maintains the basic component-hierarchy of GeForce Fermi, which emphasizes on a fast, highly parallelized component load-out. Think of the hierarchy as a Bento container. At the topmost level is the PCI-Express Gen. 3.0 host interface, a 256-bit wide GDDR5 memory interface, and a highly-tweaked NVIDIA GigaThread Engine, which transacts processed and unprocessed data between the host and memory interfaces.

  
At the downstream of the GigaThread Engine are four Graphics Processing Clusters (GPCs). Each GPC is a self-contained GPU subunit, since it has nearly every component an independent GPU does. The GPC has one shared resource, and two dedicated resources, the shared resource is the Raster Engine, which handles high-level raster operations such as edge setup, and Z-cull. The dedicated resources are the next-generation Streaming Multiprocessor-X (SMX). A large chunk of architectural improvements have gone into perfecting this component. The Streaming Multiprocessor-X (SMX) is the GPU's number-crunching resource. It is highly-parallelized, to handing the kind of computing loads that tomorrow's 3D applications demand.

 
The SMX further has shared and dedicated resources. The next-generation PolyMorph 2.0 Engine handles the low-level raster operations, such as vertex-fetch, tessellation, viewpoint-transformation, attribute setup, etc. The dedicated components are where the number-crunching action happens. Four Warp Schedulers marshal instructions and data between 192 CUDA cores. This is a six-fold increase over that of the GF110, and four-fold over that of the GF114. There are 16 texture memory units per SMX, which are cached. The Warp Schedulers are backed by a more efficient software pre-decode algorithm that reduces the number of steps needed to issue instructions. Essentially at shader compile time the shader compiler, which is a component of NVIDIA's driver, will evaluate the instruction stream, reorder instructions as needed and provide extra info to hardware by attaching additional info to instructions. It can do so because it has a complete view of the shader code.

 
NVIDIA also innovated what it calls Bindless Textures. In the classical GPU model, to reference a texture, the GPU has to allocate it a slot in a fixed size binding table, which limits the number of textures a shader can access at a given time, which ended up being 128, with the previous-generation Fermi architecture. The Kepler architecture removes this binding step, a shader can reference textures directly in the memory, without needing a conventional binding table. So the number of textures a shader can reference is practically unlimited, or 1 million if you want to talk in figures. This makes rendering scenes that are as complex as the photo above, a breeze, because it can be done so with fewer passes.

To sum it all up, the GeForce Kepler 104 GPU has 192 CUDA cores per SMX, 384 per GPC, and 1536 in all. It has 128 Texture Memory Units (TMUs) in all (16 per SMX, 32 per GPC); and 32 Raster Operations Processors (ROPs) in all. At several levels, transactions between the various components are cached, to prevent wastage of clock cycles (in turn, translating to energy efficiency).

 
NVIDIA also introduced a new anti-aliasing (AA) algorithm called TXAA. There have already been a few new AA algorithms introduced in recent past, such as FXAA, SMAA and SRAA, which raised the bar with quality, with lower performance-impact. TXAA seeks to raise it even further, with image quality comparable to high levels MSAA, with the performance penalty of 4x MSAA. TXAA is a hybrid between hardware multi-sampling, temporal AA, and a customized AA resolve. It has two levels: TXAA1 and TXAA2. The former offers image quality comparable to 8x MSAA, with the performance-penalty of 2x MSAA; the latter offers image quality beyond 8x MSAA, with the performance-penalty comparable to 4x MSAA. Since lower MSAA levels are practically "free" with today's GPUs, TXAA will wipe the floor with the competition, in terms of image quality, but there's a catch. Applications have to be customized to take advantage of TXAA. This is where NVIDIA's developer-relations muscle should kick in. We expect a fairly decent proliferation of TXAA among upcoming games. 

NVIDIA has also added an FXAA option to the driver control panel, which enables it in all games without the need for any integration from game developers.

 
The last of the three big features is Adaptive V-Sync. The feature improves on traditional V-Sync, by dynamically adjusting the frame limiter to ensure smoother gameplay. Traditional V-Sync merely sends frame data to the screen after every full screen refresh. This means if a frame arrives slow, because the GPU took longer to render it, it will have to wait a full screen refresh before it can be displayed, effectively reducing frame rate to 30 FPS. If rendering a frame takes longer than two full refreshes, the frame rate will even drop down to 20 FPS. These framerate differences are very noticeable during gaming because they are so huge. 
What Adaptive V-Sync does is, it makes the transition between frame-rate drop and synchronized frame-rate smooth, alleviating lag. It achieves this by dynamically adjusting the value that V-Sync takes into account when limiting frame-rates. I did some testing of this feature and found it to work as advertised. Of course this does not completely eliminate frame rate differences, but it makes them less noticeable.

The Card


 

NVIDIA's reference design cooler, follows the company's black and green cooler theme. The board looks immediately familiar, following the design of great products such as GTX 580.




The card requires two slots in your system.




Display connectivity options include one dual-link DVI port (with analog VGA), one dual-link DVI port (digital only), one full size HDMI port and one full size DisplayPort. You may use all the outputs at the same time, thanks to NVIDIA's new display output logic that is introduced with Kepler. This also makes triple screen multi-monitor gaming easy now, since you need only one card.



An HDMI sound device is included in the GPU, too. It is HDMI 1.4a compatible which includes HD audio, support for 4K and Blu-ray 3D movies. The DisplayPort outputs are version 1.2 which enables the use of hubs and Multi-Stream transport.




You may combine up to four GeForce GTX 680 cards from any vendor in a multi-GPU SLI configuration for higher framerates or better image quality settings.


 

Pictured above are photos of the front and back, showing the disassembled board. High-res versions are also available (frontback). If you choose to use these images for voltmods etc, please include a link back to this site or let us post your article.



Source & Readmore article: Techpowerup.com



0 comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...
 
Copyright 2012 Somethings. Powered by Blogger Blogger Templates create by Deluxe Templates. WP by Masterplan