How time flies. The last time we had a bleeding-edge graphics card from NVIDIA to try by fire, was exactly a year ago, with the launch of GeForce GTX 590. Quite a bit changed since then. Rival AMD is just about done with the launch of its Radeon HD 7000 series "Southern Islands" GPU family, and NVIDIA's answer to that felt wanted, for the past 3 months or so. Well, it’s finally here – our GeForce GTX 680 Kepler review.
Both evolutionary and revolutionary changes have gone into making the GeForce GTX 680. It's evolutionary in that it's designed to be an upgrade over its immediate predecessor (and not something two generations behind), and revolutionary new features make sure GTX 680 is a worthy upgrade. The evolutionary part of course is that NVIDIA wants to reclaim the title of having the fastest GPU out there; and the revolutionary part is a vibrant feature-set that supposedly contributes to never before seen energy-efficiency levels, to accomplish those steep design goals.
Product Positioning
Speaking of positioning, NVIDIA is gunning for nothing short of the performance-crown for single-GPU graphics cards, with the GeForce GTX 680, and it wants to do so without having to compromise of energy-efficiency. In the past, we have seen both NVIDIA and AMD throw energy efficiency to the wind and come up with power-guzzling chips, in a blind pursuit of performance leadership. That's not the case with GeForce GTX 680.
See what we mean? To be fair to the HD 7970, it did impress us with its performance/Watt figures. What NVIDIA is looking to do is raise the bar with energy-efficiency. To accomplish that with the Kepler-based GTX 680, and go on to seek performance-leadership is a very tough ask, and takes some very gritty engineering.
One revolutionary change that allows GeForce GTX 680 to aim high, is an extremely smart self-tuning logic that fine-tunes clock speeds and voltages, on the fly, with zero user intervention, to yield the best possible combination of performance and efficiency for a given load scenario. The GTX 680 hence reshapes the definition of fixed load clock speed, with dynamic clock speeds. Think of it as a GPU-take on Intel's Turbo Boost technology, which works in conjunction with SpeedStep to produce the best performance-per-Watt for CPUs that feature it.
Radeon HD 7870 | GeForce GTX 580 | Radeon HD 7950 | Radeon HD 7970 | GeForce GTX 680 | Radeon HD 6990 | GeForce GTX 590 | |
---|---|---|---|---|---|---|---|
Shader Units | 1280 | 512 | 1792 | 2048 | 1536 | 2x 1536 | 2x 512 |
ROPs | 32 | 48 | 32 | 32 | 32 | 2x 32 | 2x 48 |
Graphics Processor | Pitcairn | GF110 | Tahiti | Tahiti | GK104 | 2x Cayman | 2x GF110 |
Transistors | 2800M | 3000M | 4310M | 4310M | 3540M | 2x 2640M | 2x 3000M |
Memory Size | 2048 MB | 1536 MB | 3072 MB | 3072 MB | 2048 MB | 2x 2048 MB | 2x 1536 MB |
Memory Bus Width | 256 bit | 384 bit | 384 bit | 384 bit | 256 bit | 2x 256 bit | 2x 384 bit |
Core Clock | 1000 MHz | 772 MHz | 800 MHz | 925 MHz | 1006 MHz+ | 830 MHz | 607 MHz |
Memory Clock | 1200 MHz | 1002 MHz | 1250 MHz | 1375 MHz | 1502 MHz | 1250 MHz | 855 MHz |
Price | $360 | $390 | $450 | $550 | $499 | $700 | $750 |
Architecture
At the heart of the GeForce GTX 680, is the GeForce Kepler architecture. Its design goals are to raise performance and energy-efficiency over the previous generation "Fermi" architecture. GeForce Kepler's architecture more or less maintains the basic component-hierarchy of GeForce Fermi, which emphasizes on a fast, highly parallelized component load-out. Think of the hierarchy as a Bento container. At the topmost level is the PCI-Express Gen. 3.0 host interface, a 256-bit wide GDDR5 memory interface, and a highly-tweaked NVIDIA GigaThread Engine, which transacts processed and unprocessed data between the host and memory interfaces.
At the downstream of the GigaThread Engine are four Graphics Processing Clusters (GPCs). Each GPC is a self-contained GPU subunit, since it has nearly every component an independent GPU does. The GPC has one shared resource, and two dedicated resources, the shared resource is the Raster Engine, which handles high-level raster operations such as edge setup, and Z-cull. The dedicated resources are the next-generation Streaming Multiprocessor-X (SMX). A large chunk of architectural improvements have gone into perfecting this component. The Streaming Multiprocessor-X (SMX) is the GPU's number-crunching resource. It is highly-parallelized, to handing the kind of computing loads that tomorrow's 3D applications demand.
The SMX further has shared and dedicated resources. The next-generation PolyMorph 2.0 Engine handles the low-level raster operations, such as vertex-fetch, tessellation, viewpoint-transformation, attribute setup, etc. The dedicated components are where the number-crunching action happens. Four Warp Schedulers marshal instructions and data between 192 CUDA cores. This is a six-fold increase over that of the GF110, and four-fold over that of the GF114. There are 16 texture memory units per SMX, which are cached. The Warp Schedulers are backed by a more efficient software pre-decode algorithm that reduces the number of steps needed to issue instructions. Essentially at shader compile time the shader compiler, which is a component of NVIDIA's driver, will evaluate the instruction stream, reorder instructions as needed and provide extra info to hardware by attaching additional info to instructions. It can do so because it has a complete view of the shader code.
NVIDIA also innovated what it calls Bindless Textures. In the classical GPU model, to reference a texture, the GPU has to allocate it a slot in a fixed size binding table, which limits the number of textures a shader can access at a given time, which ended up being 128, with the previous-generation Fermi architecture. The Kepler architecture removes this binding step, a shader can reference textures directly in the memory, without needing a conventional binding table. So the number of textures a shader can reference is practically unlimited, or 1 million if you want to talk in figures. This makes rendering scenes that are as complex as the photo above, a breeze, because it can be done so with fewer passes.
To sum it all up, the GeForce Kepler 104 GPU has 192 CUDA cores per SMX, 384 per GPC, and 1536 in all. It has 128 Texture Memory Units (TMUs) in all (16 per SMX, 32 per GPC); and 32 Raster Operations Processors (ROPs) in all. At several levels, transactions between the various components are cached, to prevent wastage of clock cycles (in turn, translating to energy efficiency).
NVIDIA also introduced a new anti-aliasing (AA) algorithm called TXAA. There have already been a few new AA algorithms introduced in recent past, such as FXAA, SMAA and SRAA, which raised the bar with quality, with lower performance-impact. TXAA seeks to raise it even further, with image quality comparable to high levels MSAA, with the performance penalty of 4x MSAA. TXAA is a hybrid between hardware multi-sampling, temporal AA, and a customized AA resolve. It has two levels: TXAA1 and TXAA2. The former offers image quality comparable to 8x MSAA, with the performance-penalty of 2x MSAA; the latter offers image quality beyond 8x MSAA, with the performance-penalty comparable to 4x MSAA. Since lower MSAA levels are practically "free" with today's GPUs, TXAA will wipe the floor with the competition, in terms of image quality, but there's a catch. Applications have to be customized to take advantage of TXAA. This is where NVIDIA's developer-relations muscle should kick in. We expect a fairly decent proliferation of TXAA among upcoming games.
NVIDIA has also added an FXAA option to the driver control panel, which enables it in all games without the need for any integration from game developers.
The last of the three big features is Adaptive V-Sync. The feature improves on traditional V-Sync, by dynamically adjusting the frame limiter to ensure smoother gameplay. Traditional V-Sync merely sends frame data to the screen after every full screen refresh. This means if a frame arrives slow, because the GPU took longer to render it, it will have to wait a full screen refresh before it can be displayed, effectively reducing frame rate to 30 FPS. If rendering a frame takes longer than two full refreshes, the frame rate will even drop down to 20 FPS. These framerate differences are very noticeable during gaming because they are so huge.
What Adaptive V-Sync does is, it makes the transition between frame-rate drop and synchronized frame-rate smooth, alleviating lag. It achieves this by dynamically adjusting the value that V-Sync takes into account when limiting frame-rates. I did some testing of this feature and found it to work as advertised. Of course this does not completely eliminate frame rate differences, but it makes them less noticeable.
The Card
NVIDIA's reference design cooler, follows the company's black and green cooler theme. The board looks immediately familiar, following the design of great products such as GTX 580.
The card requires two slots in your system.
Display connectivity options include one dual-link DVI port (with analog VGA), one dual-link DVI port (digital only), one full size HDMI port and one full size DisplayPort. You may use all the outputs at the same time, thanks to NVIDIA's new display output logic that is introduced with Kepler. This also makes triple screen multi-monitor gaming easy now, since you need only one card.
An HDMI sound device is included in the GPU, too. It is HDMI 1.4a compatible which includes HD audio, support for 4K and Blu-ray 3D movies. The DisplayPort outputs are version 1.2 which enables the use of hubs and Multi-Stream transport.
You may combine up to four GeForce GTX 680 cards from any vendor in a multi-GPU SLI configuration for higher framerates or better image quality settings.
Pictured above are photos of the front and back, showing the disassembled board. High-res versions are also available (front, back). If you choose to use these images for voltmods etc, please include a link back to this site or let us post your article.
Source & Readmore article: Techpowerup.com
0 comments:
Post a Comment