Video cards. Getting to know the GPU GP104 Pascal cards

2016 is already coming to an end, but his contribution to the gaming industry will remain with us for a long time. Firstly, video cards from the red camp received an unexpectedly successful update in the middle price range, and secondly, NVIDIA once again proved that it is not in vain that it occupies 70% of the market. Maxwells were good, the GTX 970 was rightfully considered one of the best cards for the money, but Pascal is a completely different matter.

The new generation of hardware in the face of the GTX 1080 and 1070 literally buried the results of last year's systems and the flagship used hardware market, while the "younger" lines in the face of the GTX 1060 and 1050 consolidated their success in more affordable segments. The owners of the GTX980Ti and other Titans are crying with crocodile tears: their uber-guns for many thousands of rubles lost 50% of the cost and 100% of the show-off at once. NVIDIA itself claims that the 1080 is faster than last year's TitanX, the 1070 easily "heaps" the 980Ti, and the relatively budget 1060 will hurt the owners of all other cards.

Is this really where the legs of high performance grow from and what to do with it all on the eve of the holidays and sudden financial joys, as well as what exactly to please yourself with, you can find out in this long and a little boring article.

You can love Nvidia or ... not love it, but to deny that it is currently the leader in the field of video engineering will only be a hit from an alternative universe. Since AMD's Vega hasn't been announced yet, we haven't seen the flagship RX's on Polaris yet, and the R9 Fury, with its 4 GB of experimental memory, can't really be considered a promising card (VR and 4K still want a little more, than she has) - we have what we have. While the 1080 Ti and the conditional RX 490, RX Fury and RX 580 are just rumors and expectations, we have time to sort out the current NVIDIA lineup and see what the company has achieved in recent years.

The mess and the history of the origin of Pascal

NVIDIA regularly gives reasons to "not love yourself." The history of the GTX 970 and its "3.5 GB of memory", "NVIDIA, Fuck you!" from Linus Torvalds, complete pornography in the lines of desktop graphics, refusal to work with the free and much more common FreeSync system in favor of its own proprietary ... In general, there are enough reasons. One of the most annoying things for me personally is what happened with the last two generations of video cards. If we take a rough description, then "modern" GPUs have come from the days of DX10 support. And if you look for the “grandfather” of the 10th series today, then the beginning of modern architecture will be in the region of the 400th series of video accelerators and the Fermi architecture. It was in it that the idea of ​​\u200b\u200ba "block" design from the so-called. "CUDA cores" in NVIDIA terminology.

Fermi

If video cards of the 8000th, 9000th and 200th series were the first steps in mastering the very concept, "modern architecture" with universal shader processors (like AMD, yes), then the 400th series was already as similar as possible to what what we see in some 1070. Yes, Fermi still had a small Legacy crutch from previous generations: the shader unit worked at twice the frequency of the core responsible for calculating the geometry, but the overall picture of some GTX 480 is not much different from some some 780th, SM multiprocessors are clustered, clusters communicate through a common cache with memory controllers, and the results of the work are displayed by a rasterization block common to the cluster:


Block diagram of the GF100 processor used in the GTX 480.

In the 500th series, there was still the same Fermi, slightly improved “inside” and with less marriage, so the top solutions received 512 CUDA cores instead of 480 for the previous generation. Visually, the flowcharts generally seem to be twins:


The GF110 is the heart of the GTX 580.

In some places they increased the frequencies, slightly changed the design of the chip itself, there was no revolution. All the same 40 nm process technology and 1.5 GB of video memory on a 384-bit bus.

Kepler

With the advent of the Kepler architecture, a lot has changed. We can say that it was this generation that gave NVIDIA video cards the vector of development that led to the emergence of current models. Not only the architecture of the GPU has changed, but also the kitchen for developing new hardware inside NVIDIA. If Fermi was focused on finding a solution that would provide high performance, then Kepler bet on energy efficiency, reasonable use of resources, high frequencies and ease of optimization of the game engine for the capabilities of a high-performance architecture.

Serious changes were made in the design of the GPU: not the "flagship" GF100 / GF110 was taken as the basis, but the "budget" GF104 / GF114, which was used in one of the most popular cards of that time - the GTX 460.


The overall processor architecture has been simplified by using only two large blocks with four unified shader multiprocessor modules. The layout of the new flagships looked something like this:


GK104 installed in GTX 680.

As you can see, each of the computing units has significantly gained weight relative to the previous architecture, and has been named SMX. Compare the structure of the block with what is shown above in the Fermi section.


Multiprocessor SMX GPU GK104

The six hundredth series did not have video cards on a full-fledged processor containing six blocks of computing modules, the flagship was the GTX 680 with the GK104 installed, and cooler than it was only the “two-headed” 690th, on which just two processors were bred with all the necessary bindings and memory. A year later, the flagship GTX 680 with minor changes turned into the GTX 770, and the crown of the evolution of the Kepler architecture was video cards based on the GK110 crystal: GTX Titan and Titan Z, 780Ti and the usual 780. Inside - all the same 28 nanometers, the only qualitative improvement (which is NOT went to consumer video cards based on GK110) - performance with double precision operations.

Maxwell

The first video card based on the Maxwell architecture was… NVIDIA GTX 750Ti. A little later, its cuts appeared in the face of the GTX 750 and 745 (supplied only as an integrated solution), and at the time of their appearance, lower-end cards really shook up the market for inexpensive video accelerators. The new architecture was tested on the GK107 chip: a tiny piece of future flagships with huge heatsinks and a frightening price. It looked something like this:


Yes, there is only one computing unit, but how much more complicated it is than its predecessor, compare for yourself:


Instead of the large SMX block, which was used as the basic "building brick", the creation of the GPU uses new, more compact SMM blocks. Kepler's basic computing units were good, but suffered from poor capacity utilization - a banal hunger for instructions: the system could not scatter instructions over a large number of actuators. The Pentium 4 had approximately the same problems: power was idle, and an error in branch prediction was very expensive. In Maxwell, each computing module was divided into four parts, each with its own instruction buffer and warp scheduler - the same type of operations on a group of threads. As a result, the efficiency has increased, and the GPUs themselves have become more flexible than their predecessors, and most importantly, at the cost of little blood and a fairly simple crystal, they have worked out a new architecture. The story goes in a spiral, hehe.

Mobile solutions have benefited the most from innovations: the area of ​​the crystal has grown by a quarter, and the number of execution units of multiprocessors has almost doubled. As luck would have it, it was the 700th and 800th series that made the main mess in the classification. Inside the 700 alone, there were video cards based on the Kepler, Maxwell and even Fermi architectures! That is why the desktop Maxwells, in order to move away from the hodgepodge of previous generations, received a common 900 series, from which the GTX 9xx M mobile cards subsequently spun off.

Pascal - logical development of the Maxwell architecture

What was laid down in Kepler and continued in the Maxwell generation remained in Pascals: the first consumer video cards were released based on the not very large GP104 chip, which consists of four graphics processing clusters. The full-sized, six-cluster GP100 went to an expensive semi-professional GPU under the TITAN X brand. However, even the “cropped” 1080 lights up so that past generations feel sick.

Performance improvement

foundation of the foundations

Maxwell became the foundation of the new architecture, the diagram of comparable processors (GM104 and GP104) looks almost the same, the main difference is the number of multiprocessors packed into clusters. In Kepler (700th generation) there were two large SMX multiprocessors, which were divided into 4 parts each in Maxwell, providing the necessary strapping (changing the name to SMM). In Pascal, two more were added to the existing eight in the block, so that there were 10 of them, and the abbreviation was once again interrupted: now single multiprocessors are again called SM.


The rest is a complete visual similarity. True, there were even more changes inside.

Engine of progress

There are indecently many changes inside the multiprocessor block. In order not to go into the very boring details of what was redone, how it was optimized and how it was before, I will describe the changes very briefly, otherwise some are already yawning.

First of all, Pascal corrected the part that is responsible for the geometric component of the picture. This is necessary for multi-monitor configurations and work with VR helmets: with proper support from the game engine (and this support will soon appear through the efforts of NVIDIA), the video card can calculate the geometry once and get several geometry projections for each of the screens. This significantly reduces the load in VR not only in the area of ​​working with triangles (here the increase is just twofold), but also in working with the pixel component.

The conditional 980Ti will read the geometry twice (for each eye), and then fill it with textures and perform post-processing for each of the images, processing a total of about 4.2 million points, of which about 70% will actually be used, the rest will be cut off or fall into the area , which is simply not displayed for each of the eyes.

1080 will process the geometry once, and pixels that do not fall into the final image simply will not be calculated.


With the pixel component, everything is, in fact, even cooler. Since increasing the memory bandwidth can only be done on two fronts (increasing the frequency and bandwidth per clock), and both methods cost money, and the “hunger” of the GPU in terms of memory is more and more pronounced over the years due to the growth in resolution and the development of VR remains improve "free" methods to increase bandwidth. If you can not expand the bus and raise the frequency - you need to compress the data. In previous generations, hardware compression was already implemented, but in Pascal it was taken to a new level. Again, we will do without boring mathematics, and take a ready-made example from NVIDIA. On the left - Maxwell, on the right - Pascal, those points whose color component was subjected to lossless compression are filled with pink.


Instead of transferring specific tiles of 8x8 points, the memory contains the “average” color + a matrix of deviations from it, such data takes from ½ to ⅛ of the original volume. In real tasks, the load on the memory subsystem has decreased from 10 to 30%, depending on the number of gradients and the uniformity of fillings in complex scenes on the screen.


This seemed to the engineers not enough, and for the flagship video card (GTX 1080) memory with increased bandwidth was used: GDDR5X transmits twice as many bits of data (not instructions) per clock, and produces more than 10 Gb / s at the peak. Transferring data at such a crazy speed required a completely new memory layout on the board, and in total memory efficiency increased by 60-70% compared to the flagships of the previous generation.

Reduce delays and downtime

Video cards have long been engaged not only in graphics processing, but also in related calculations. Physics is often tied to animation frames and is remarkably parallel, which means it is much more efficient to calculate on the GPU. But the biggest generator of problems in recent times has become the VR industry. Many game engines, development methodologies and a bunch of other technologies used to work with graphics were simply not designed for VR, the case of moving the camera or changing the position of the user's head during the rendering of the frame simply was not processed. If you leave everything as it is, then the desynchronization of the video stream and your movements will cause bouts of seasickness and simply interfere with immersion in the game world, which means that “wrong” frames simply have to be thrown away after rendering and start working again. And these are new delays in displaying the picture on the display. It has no positive effect on performance.

Pascal took this problem into account and introduced dynamic load balancing and the possibility of asynchronous interrupts: now execution units can either interrupt the current task (saving the results of work in the cache) to process more urgent tasks, or simply reset the underdrawn frame and start a new one, significantly reducing delays in image formation. The main beneficiary here is, of course, VR and games, but this technology can also help with general-purpose calculations: particle collision simulation received a performance increase of 10-20%.

Boost 3.0

NVIDIA video cards received automatic overclocking a long time ago, back in the 700th generation based on the Kepler architecture. In Maxwell, overclocking was improved, but it was still, to put it mildly, so-so: yes, the video card worked a little faster, as long as the thermal package allowed it, additional 20-30 megahertz for the core and 50-100 for memory, wired from the factory, gave an increase, but a small . It worked like this:


Even if there was a margin for GPU temperature, performance did not increase. With the advent of Pascal, engineers shook up this dusty swamp. Boost 3.0 works on three fronts: temperature analysis, clock speed boost, and on-chip voltage boost. Now all the juices are being squeezed out of the GPU: standard NVIDIA drivers do not do this, but the software of vendors allows you to build a profiling curve in one click, which will take into account the quality of your specific video card instance.

EVGA was one of the first in this field, its Precision XOC utility has an NVIDIA certified scanner that sequentially goes through the entire range of temperatures, frequencies and voltages, achieving maximum performance in all modes.

Add here a new process technology, high-speed memory, all sorts of optimizations and a reduction in the heat pack of chips, and the result will be simply indecent. From 1500 "base" MHz, the GTX 1060 can be squeezed out more than 2000 MHz if a good copy comes across, and the vendor does not screw up with cooling.

Improving the quality of the picture and perception of the game world

Performance has been increased on all fronts, but there are a number of points in which there have been no qualitative changes for several years: in the quality of the displayed image. And this is not about graphic effects, they are provided by game developers, but about what exactly we see on the monitor and how the game looks to the end user.

Fast vertical sync

The most important feature of Pascal is the triple buffer for frame output, which simultaneously provides ultra-low delays in rendering and ensures vertical synchronization. The output image is stored in one buffer, the last rendered frame is stored in the other, and the current one is drawn in the third. Goodbye horizontal stripes and tearing, hello high performance. There are no delays that classic V-Sync suits here (since no one restrains the performance of the video card and it always draws at the highest possible frame rate), and only fully formed frames are sent to the monitor. I think that after the new year I will write a separate big post about V-Sync, G-Sync, Free-Sync and this new fast sync algorithm from Nvidia, there are too many details.

Normal screenshots

No, those screenshots that are now are just a shame. Almost all games use a lot of technology to make the picture in motion amazing and breathtaking, and screenshots have become a real nightmare: instead of a stunningly realistic picture that consists of animation, special effects that exploit the peculiarities of human vision, you see some kind of angular I don’t understand what with strange colors and absolutely lifeless picture.

The new NVIDIA Ansel technology solves the problem with screenshots. Yes, its implementation requires the integration of special code from game developers, but there is a minimum of real manipulations, but the profit is huge. Ansel knows how to pause the game, transfers control of the camera to your hands, and then - room for creativity. You can just take a picture without the GUI and your favorite angle.


You can render an existing scene in ultra-high resolution, shoot 360-degree panoramas, stitch them into a plane, or leave them in three-dimensional form for viewing in a VR helmet. Take a photo with 16 bits per channel, save it as a kind of RAW file, and then play with exposure, white balance and other settings so that the screenshots become attractive again. We expect tons of cool content from game fans in a year or two.

Video sound processing

The new NVIDIA Gameworks libraries add a lot of features available to developers. They are mainly aimed at VR and speeding up various calculations, as well as improving the quality of the picture, but one of the features is the most interesting and worthy of mention. VRWorks Audio takes work with sound to a fundamentally new level, counting sound not by banal average formulas depending on the distance and thickness of the obstacle, but performs a complete audio signal tracing, with all reflections from the environment, reverberation and sound absorption in various materials. NVIDIA has a good video example of how this technology works:


Watch better with headphones

Purely theoretically, nothing prevents running such a simulation on Maxwell, but optimizations in terms of asynchronous execution of instructions and a new interrupt system built into Pascals allow you to carry out calculations without greatly affecting the frame rate.

Pascal in total

In fact, there are even more changes, and many of them are so deep in architecture that one could write a huge article on each of them. The key innovations are the improved design of the chips themselves, optimization at the lowest level in terms of geometry and asynchronous operation with full interrupt handling, a lot of features tailored to work with high resolutions and VR, and, of course, insane frequencies that past generations of video cards could not dream of. Two years ago, the 780 Ti barely crossed the 1 GHz threshold, today the 1080 runs on two in some cases: and here the merit is not only in the reduced process technology from 28 nm to 16 or 14 nm: many things are optimized at the lowest level, starting with the design of transistors , ending with their topology and strapping inside the chip itself.

For each individual case

The line of NVIDIA 10-series video cards turned out to be truly balanced, and quite densely covers all gaming user cases, from the option “to play strategy and diablo” to “I want top games in 4k”. The game tests were chosen according to one simple technique: to cover as wide a range of tests as possible with the smallest possible set of tests. BF1 is a great example of good optimization and allows you to compare the performance of DX11 vs DX12 under the same conditions. DOOM was chosen for the same reason, only to compare OpenGL and Vulkan. The third "Witcher" here acts as a so-so-optimized toy, in which the maximum graphics settings allow any flagship to be screwed up simply by virtue of the shit code. It uses the classic DX11, which is time-tested and perfectly worked out in drivers and is familiar to game developers. Overwatch takes the rap for all "tournament" games that have well-optimized code, in fact it is interesting for how high the average FPS is in a game that is not very heavy from a graphical point of view, sharpened to work in the "average" config available around the world.

I will give some general comments right away: Vulkan is very voracious in terms of video memory, for it this characteristic is one of the main indicators, and you will see this thesis reflected in benchmarks. DX12 on AMD cards behaves much better than on NVIDIA, if the "green" ones show an average FPS drawdown on new APIs, then the "red" ones, on the contrary, show an increase.

junior division

GTX 1050

The younger NVIDIA (without the letters Ti) is not as interesting as its charged sister with the letters Ti. Its destiny is a gaming solution for MOBA games, strategies, tournament shooters and other games where detail and picture quality are of little interest to anyone, and a stable frame rate for minimal money is what the doctor ordered.


In all the pictures there is no core frequency, because it is individual for each instance: 1050 without additional. power may not be chasing, and her sister with a 6-pin connector will easily take the conditional 1.9 GHz. In terms of power and length, the most popular options are shown, you can always find a video card with a different circuit or other cooling that does not fit into the specified "standards".

DOOM 2016 (1080p, ULTRA): OpenGL - 68 FPS, Vulkan - 55 FPS;
The Witcher 3: Wild Hunt (1080p, MAX, HairWorks Off): DX11 - 38 FPS;
Battlefield 1 (1080p, ULTRA): DX11 - 49 FPS, DX12 - 40 FPS;
Overwatch (1080p, ULTRA): DX11 - 93 FPS;

The GTX 1050 has a GP107 graphics processor, inherited from the older card with a slight trimming of functional blocks. 2 GB of video memory will not let you run wild, but for e-sports disciplines and playing some kind of tanks, it is perfect, since the price for a junior card starts at 9.5 thousand rubles. Additional power is not required, the video card only needs 75 watts from the motherboard via the PCI-Express slot. True, in this price segment there is also the AMD Radeon RX460, which with the same 2 GB of memory is cheaper, and is almost not inferior in quality, and for about the same money you can get the RX460, but in a 4 GB version. Not that they helped him much, but some kind of reserve for the future. The choice of a vendor is not so important, you can take what is available and does not pull out your pocket with an extra thousand rubles, which is better to spend on the cherished letters Ti.

GTX 1050 Ti

About 10 thousand for a regular 1050 is not bad, but for a charged (or full-fledged, call it what you want) version, they ask for a little more (on average, 1-1.5 thousand more), but its filling is much more interesting. By the way, the entire 1050 series is produced not from cutting / rejecting "large" chips that are not suitable for 1060, but as a completely independent product. It has a smaller manufacturing process (14 nm), a different plant (the crystals are grown by the Samsung factory), and there are extremely interesting specimens with additional. power supply: the thermal package and the base consumption are still the same 75 W, but the overclocking potential and the ability to go beyond what is permitted are completely different.


If you continue to play at FullHD resolution (1920x1080), do not plan to upgrade, and your rest of the hardware is within 3-5 years ago, it is a great way to increase the performance in toys with little loss. You should focus on ASUS and MSI solutions with an additional 6-pin power supply, options from Gigabyte are not bad, but the price is not so encouraging.

DOOM 2016 (1080p, ULTRA): OpenGL - 83 FPS, Vulkan - 78 FPS;
The Witcher 3: Wild Hunt (1080p, MAX, HairWorks Off): DX11 - 44 FPS;
Battlefield 1 (1080p, ULTRA): DX11 - 58 FPS, DX12 - 50 FPS;
Overwatch (1080p, ULTRA): DX11 - 104 FPS.

Middle division

Video cards of the 60th line have long been considered the best choice for those who do not want to spend a lot of money, and at the same time play at high graphics settings in everything that will be released in the next couple of years. It started with the GTX 260, which had two versions (simpler, 192 stream processors, and fatter, 216 “stones”), continued in the 400th, 500th, and 700th generations, and now NVIDIA again got into an almost perfect combination prices and quality. Two “middle” versions are again available: GTX 1060 with 3 and 6 GB of video memory differ not only in the amount of available RAM, but also in performance.

GTX 1060 3GB

Queen of esports. Reasonable price, amazing performance for FullHD (and in eSports they rarely use a higher resolution: results are more important than beautiful things there), a reasonable amount of memory (3 GB, for a minute, was two years ago in the flagship GTX 780 Ti, which cost indecent money). In terms of performance, the younger 1060 easily overwhelms last year's GTX 970 with a memorable 3.5 GB of memory, and easily drags the year before last's super-flagship 780 Ti by the ears.


DOOM 2016 (1080p, ULTRA): OpenGL - 117 FPS, Vulkan - 87 FPS;
The Witcher 3: Wild Hunt (1080p, MAX, HairWorks Off): DX11 - 70 FPS;
Battlefield 1 (1080p, ULTRA): DX11 - 92 FPS, DX12 - 85 FPS;
Overwatch (1080p, ULTRA): DX11 - 93 FPS.

Here the absolute favorite in terms of price and exhaust is the version from MSI. Good frequencies, silent cooling system and sane dimensions. For her, they ask for nothing at all, in the region of 15 thousand rubles.

GTX 1060 6GB

The 6GB version is the budget ticket to VR and high resolutions. It won't starve for memory, it's a little faster in all tests and it will confidently outperform the GTX 980 where last year's video card won't have enough 4 GB of video memory.


DOOM 2016 (1080p, ULTRA): OpenGL - 117 FPS, Vulkan - 121 FPS;
The Witcher 3: Wild Hunt (1080p, MAX, HairWorks Off): DX11 - 73 FPS;
Battlefield 1 (1080p, ULTRA): DX11 - 94 FPS, DX12 - 90 FPS;
Overwatch (1080p, ULTRA): DX11 - 166 FPS.

I would like to once again note the behavior of video cards when using the Vulkan API. 1050 with 2 GB of memory - FPS drawdown. 1050 Ti with 4 GB - almost on par. 1060 3 GB - drawdown. 1060 6 GB - growth of results. The trend, I think, is clear: Vulkan needs 4+ GB of video memory.

The trouble is that both 1060s are not small video cards. It seems that the heat pack is reasonable, and the board there is really small, but many vendors decided to simply unify the cooling system between 1080, 1070 and 1060. Someone has video cards 2 slots high, but 28+ centimeters long, someone made them shorter, but thicker (2.5 slots). Choose carefully.

Unfortunately, additional 3 GB of video memory and an unlocked computing unit will cost you ~ 5-6 thousand rubles on top of the price of the 3-gig version. In this case, Palit has the most interesting options for price and quality. ASUS has released monstrous 28-cm cooling systems, which are sculpted on 1080, 1070, and 1060, and such a video card will not fit anywhere, versions without factory overclocking cost almost the same, and the exhaust is less, and they ask for more for relatively compact MSI than competitors at about the same level of quality and factory overclocking.

Major League

Playing for all the money in 2016 is difficult. Yes, the 1080 is insanely cool, but perfectionists and hardware guys know that NVIDIA is HIDING the existence of the super-flagship 1080 Ti, which should be incredibly cool. The first specs are already leaking online, and it's clear that the greens are waiting for the red-and-whites to step in: some kind of uber-gun that can be instantly put into place by the new king of 3D graphics, the great and mighty GTX 1080 Ti. Well, for now, we have what we have.

GTX 1070

Last year's adventures of the mega-popular GTX 970 and its not-quite-honest-4-gigabyte-memory were actively sorted out and sucked all over the Internet. This did not stop her from becoming the most popular gaming graphics card in the world. Ahead of the year change on the calendar, it holds first place in the Steam Hardware & Software Survey. This is understandable: the combination of price and performance was just perfect. And if you missed last year's upgrade and the 1060 doesn't seem like a badass enough, the GTX 1070 is your choice.

Resolutions of 2560x1440 and 3840x2160 the video card digests with a bang. The Boost 3.0 overclocking system will try to throw firewood when the load on the GPU increases (that is, in the most difficult scenes, when FPS sags under the onslaught of special effects), overclocking the video card processor to a mind-blowing 2100+ MHz. The memory easily gets 15-18% of the effective frequency above the factory values. Monster thing.


Attention, all tests are carried out in 2.5k (2560x1440):

DOOM 2016 (1440p, ULTRA): OpenGL - 91 FPS, Vulkan - 78 FPS;
The Witcher 3: Wild Hunt (1440p, MAX, HairWorks Off): DX11 - 73 FPS;
Battlefield 1 (1440p, ULTRA): DX11 - 91 FPS, DX12 - 83 FPS;
Overwatch (1440p, ULTRA): DX11 - 142 FPS.

Of course, it’s impossible to pull out ultra-settings in 4k and never sag below 60 frames per second either by this card or 1080, but you can play at conditional “high” settings, turning off or slightly lowering the most voracious features in full resolution, and in terms of actual performance, the graphics card easily beats even last year's 980 Ti, which cost almost twice as much. Gigabyte has the most interesting option: they managed to cram a full-fledged 1070 into an ITX-standard case. Thanks to the modest thermal package and energy-efficient design. Prices for cards start from 29-30 thousand rubles for delicious options.

GTX 1080

Yes, the flagship does not have the letters Ti. Yes, it does not use the largest GPU available from NVIDIA. Yes, there is no cool HBM 2 memory here, and the graphics card does not look like a Death Star or, in the extreme case, a Star Destroyer-class Imperial cruiser. And yes, it's the coolest gaming graphics card out there right now. One by one takes and runs DOOM at 5k3k resolution at 60fps on ultra settings. All new toys are subject to it, and for the next year or two it will not experience problems: until the new technologies embedded in Pascal become widespread, until game engines learn how to efficiently load available resources ... Yes, in a couple of years we will say: “Here, look at GTX 1260, a couple of years ago you needed a flagship to play at those settings”, but for now, the best of the best graphics cards are available before the new year at a very reasonable price.


Attention, all tests are carried out in 4k (3840x2160):

DOOM 2016 (2160p, ULTRA): OpenGL - 54 FPS, Vulkan - 78 FPS;
The Witcher 3: Wild Hunt (2160p, MAX, HairWorks Off): DX11 - 55 FPS;
Battlefield 1 (2160p, ULTRA): DX11 - 65 FPS, DX12 - 59 FPS;
Overwatch (2160p, ULTRA): DX11 - 93 FPS.

It remains only to decide: you need it, or you can save money and take 1070. There is not much difference between playing on “ultra” or “high” settings, since modern engines perfectly draw a picture in high resolution even at medium settings: after all, we have you are not soapy consoles that can't provide enough performance for honest 4k and stable 60fps.

If we discard the most inexpensive options, then Palit will again have the best combination of price and quality in the GameRock version (about 43-45 thousand rubles): yes, the cooling system is “thick”, 2.5 slots, but the video card is shorter than competitors, and a pair of 1080 is rarely installed . SLI is slowly dying, and even the life-giving injection of high-speed bridges does not help it much. The ASUS ROG option is not bad if you have a lot of extras installed. you don’t want to cover extra expansion slots: their video card is exactly 2 slots thick, but it requires 29 centimeters of free space from the back wall to the hard drive basket. I wonder if Gigabyte will be able to release this monster in ITX format?

Results

New NVIDIA video cards just buried the used hardware market. Only the GTX 970 survives on it, which can be snatched for 10-12 thousand rubles. Potential buyers of used 7970 and R9 280 often have nowhere to put it and simply not feed it, and many options from the secondary market are simply unpromising, and as a cheap upgrade for a couple of years ahead they are no good: there is little memory, new technologies are not supported. The beauty of the new generation of video cards is that even games not optimized for them run much more cheerfully than on veteran GPU charts of past years, and it’s hard to imagine what will happen in a year, when game engines learn to use the full power of new technologies.

GTX 1050 and 1050Ti

Alas, I cannot recommend the purchase of the most inexpensive Pascal. The RX 460 usually sells for a thousand or two less, and if your budget is so limited that you take a video card “for the latest”, then the Radeon is objectively a more interesting investment. On the other hand, 1050 is a little faster, and if the prices in your city for these two video cards are almost the same, take it.

1050Ti, in turn, is a great option for those who value story and gameplay more than bells and whistles and realistic nose hair. It does not have a bottleneck in the form of 2 GB of video memory, it will not “go down” after a year. You can put money on it - do it. The Witcher on high settings, GTA V, DOOM, BF 1 - no problem. Yes, you will have to give up a number of improvements, such as extra-long shadows, complex tessellation or the “expensive” calculation of self-shadowing models with limited ray tracing, but in the heat of battle you will forget about these prettinesses after 10 minutes of play, and stable 50-60 frames per second will give much more immersive effect than nerve jumps from 25 to 40, but with settings to "maximum".

If you have any Radeon 7850, GTX 760 or younger, video cards with 2 GB of video memory or less, you can safely change.

GTX 1060

The younger 1060 will please those for whom a frame rate of 100 FPS is more important than graphics bells and whistles. At the same time, it will allow you to comfortably play all released games in FullHD resolution with high or maximum settings and stable 60 frames per second, and the price is very different from everything that comes after it. The older 1060 with 6 gigabytes of memory is an uncompromising solution for FullHD with a performance margin for a year or two, familiarity with VR and a completely acceptable candidate for playing in high resolutions at medium settings.

It makes no sense to change your GTX 970 to a GTX 1060, it will take another year. But the annoying 960, 770, 780, R9 280X and older units can be safely updated to 1060.

Top segment: GTX 1070 and 1080

1070 is unlikely to become as popular as the GTX 970 (nevertheless, most users have an iron update cycle every two years), but in terms of price and quality, it is certainly a worthy continuation of the 70th line. It just grinds games at mainstream 1080p, easily handles 2560x1440, withstands the ordeal of unoptimized 21 to 9, and is quite capable of displaying 4k, albeit not at maximum settings.


Yes, SLI can be like that too.

We say goodbye to every 780 Ti, R9 390X and other last year's 980s, especially if we want to play in high definition. And, yes, this is the best option for those who like to build a hell of a box in Mini-ITX format and scare guests with 4k games on a 60-70 inch TV that run on a computer the size of a coffee maker.
gtx 1050 graphics card history Add tags

According to recently released anecdotal evidence, the Pascal GPU family could become one of NVIDIA's most complete lineups in recent years. In just a few months, the company has introduced four GPUs based on Pascal and is not going to stop there. According to the head of the company, far from all Pascal chips, not to mention real products, were presented. Apparently, in the near future we are waiting for new announcements.

NVIDIA Pascal: eight products in four months

Since April of this year, NVIDIA has introduced four Pascal-based chips: GP100 with 16 GB of HBM2 memory, GP102 with GDDR5X support, GP104 and GP106. At the same time, the company announced eight products based on these GPUs (excluding individual products of various kinds of special editions of the following, as well as specialized devices such as DGX-1): GeForce GTX 1080/1070 (GP104), GeForce GTX 1060 (GP106), TITAN X (GP102 + 12GB GDDR5X), Quadro P5000 (GP104GL + 16GB GDDR5X), Quadro P6000 (GP102GL + 24GB GDDR5X), Tesla P100 SXM and Tesla P100 PCIe (both based on GP100 + 16GB HBM2).

While four GPUs and eight products in four months is a remarkable accomplishment, it's noticeable that the company hasn't introduced a single new notebook solution, nor a single new graphics card under $250. According to the head of NVIDIA, the company is preparing new GPUs based on Pascal, they already exist in silicon, but they will enter the market only after some time.

NVIDIA: All Pascals are ready, but not all are presented

“We have designed, verified and started production of allGPU architecture basedPascal», said Jen-Hsun Huang, chief executive of NVIDIA, during a conference call with investors and financial analysts. “However, we have not yet introduced all of these GPUs.”

New configurations

However, it's not so much the GP107, GP108 and GP102 internals that are of interest to gamers and performance enthusiasts, but the fact that each Pascal chip will exist in at least two basic configurations (in terms of the PCIe ID that the NVIDIA driver uses) . This opens up opportunities for creating a host of new products based on the GP100, GP102, GP104 and GP106 chips.

So, the GP104 exists in the GP104-A and GP104-B configurations, as well as versions with acceleration enabled for professional applications - GP104GL-A and GP104GL-B. We do not know what exactly the letters "A" and "B" correspond to, but we can assume that "A" denotes a microcircuit in the maximum configuration. So GP104-A can match GeForce GTX 1080 and GP104-B can match GeForce GTX 1070.

Considering that the GP102 and GP106 microcircuits also exist in two configurations (in any case, this is indicated by the AIDA64 database and NVIDIA drivers), but at the same time there is only one product based on them (GeForce GTX 1060 and TITAN X), we can well expect the emergence of new solutions based on them. Whether these cards will be faster or slower than the existing ones, time will tell. In any case, the GP102 can scale both "up" (up to 3840 stream processors) and "down". At the same time, of course, one cannot rule out the hypothetical possibility of the appearance of the third version of GP102-C, in case NVIDIA needs it.

One way or another, it is obvious that NVIDIA plans to expand the family of graphics cards based on Pascal. Although the immediate plans should clearly include mobile and mainstream GPUs, it is very likely that we will see new solutions for high-performance gaming PCs in the future.

ParameterMeaning
Chip code nameGP104
Production technology16nm FinFET
Number of transistors7.2 billion
Core area314 mm²
Architecture
DirectX hardware support
Memory bus
1607 (1733) MHz
Computing blocks20 Streaming Multiprocessors including 2560 IEEE 754-2008 floating point scalar ALUs;
Texturing blocks160 texture addressing and filtering units with support for FP16 and FP32 components in textures and support for trilinear and anisotropic filtering for all texture formats
Monitor support
GeForce GTX 1080 Reference Graphics Specifications
ParameterMeaning
Core frequency1607 (1733) MHz
2560
Number of texture blocks160
Number of blending blocks64
Effective memory frequency10000 (4×2500) MHz
Memory typeGDDR5X
Memory bus256-bit
Memory8 GB
320 GB/s
about 9 teraflops
103 gigapixels/s
257 gigatexels/s
TirePCI Express 3.0
Connectors
power usageup to 180 W
Extra foodOne 8-pin connector
2
Recommended price$599-699 (USA), 54990 RUB (Russia)

The new model of the GeForce GTX 1080 video card received a logical name for the first solution of the new GeForce series - it differs from its direct predecessor only in a changed generation number. The novelty not only replaces the top-end solutions in the current line of the company, but also became the flagship of the new series for some time, until the Titan X was released on even more powerful GPUs. Below it in the hierarchy is also the already announced model GeForce GTX 1070, based on a stripped-down version of the GP104 chip, which we will consider below.

The suggested prices for Nvidia's new graphics card are $599 and $699 for regular and Founders Edition (see below), respectively, which is a pretty good deal considering the GTX 1080 is ahead of not only the GTX 980 Ti, but also the Titan X. Today, the new product is the best solution in terms of performance on the single-chip video card market without any questions, and at the same time it is cheaper than the most powerful video cards of the previous generation. So far, the GeForce GTX 1080 has essentially no competitor from AMD, so Nvidia was able to set a price that suits them.

The video card in question is based on the GP104 chip, which has a 256-bit memory bus, but the new type of GDDR5X memory operates at a very high effective frequency of 10 GHz, which gives a high peak bandwidth of 320 GB / s - which is almost on par with the GTX 980 Ti with 384 -bit bus. The amount of memory installed on a video card with such a bus could be 4 or 8 GB, but it would be stupid to set a smaller amount for such a powerful solution in modern conditions, so the GTX 1080 got 8 GB of memory, and this amount is enough to run any 3D- applications with any quality settings for several years to come.

The GeForce GTX 1080 PCB is understandably quite different from the company's previous PCBs. The value of typical power consumption for new items is 180 watts - slightly higher than the GTX 980, but noticeably lower than the less powerful Titan X and GTX 980 Ti. The reference board has the usual set of connectors for connecting image output devices: one Dual-Link DVI, one HDMI and three DisplayPort.

Founders Edition reference design

Even with the announcement of the GeForce GTX 1080 in early May, a special edition of the video card called Founders Edition was announced, which has a higher price than regular video cards from the company's partners. In fact, this edition is the reference design of the card and cooling system, and it is produced by Nvidia itself. You can have different attitudes towards such options for video cards, but the reference design developed by the company's engineers and manufactured using high-quality components has its fans.

But whether they will pay several thousand rubles more for a video card from Nvidia itself is a question that only practice can answer. In any case, at first it will be the reference video cards from Nvidia that will appear on sale at an increased price, and there is not much to choose from - this happens with every announcement, but the reference GeForce GTX 1080 is different in that it is planned to be sold in this form throughout its life. life, until the release of next-generation solutions.

Nvidia believes that this edition has its merits even over the best works of partners. For example, the two-slot design of the cooler makes it easy to assemble both gaming PCs of a relatively small form factor and multi-chip video systems based on this powerful video card (even despite the three- and four-chip mode not recommended by the company). The GeForce GTX 1080 Founders Edition has some advantages in the form of an efficient cooler using a evaporative chamber and a fan that pushes heated air out of the case - this is the first such solution from Nvidia that consumes less than 250 watts of power.

Compared to the company's previous reference product designs, the power circuit has been upgraded from four-phase to five-phase. Nvidia also talks about the improved components on which the new product is based, electrical noise has also been reduced to improve voltage stability and overclocking potential. As a result of all the improvements, the power efficiency of the reference board has increased by 6% compared to the GeForce GTX 980.

And in order to differ from the "ordinary" models of the GeForce GTX 1080 and outwardly, an unusual "chopped" case design was developed for the Founders Edition. Which, however, probably also led to the complication of the shape of the evaporation chamber and radiator (see photo), which may have been one of the reasons for paying $100 extra for such a special edition. We repeat that at the beginning of sales, buyers will not have much choice, but in the future it will be possible to choose both a solution with their own design from one of the company's partners, and performed by Nvidia itself.

New generation of Pascal graphics architecture

The GeForce GTX 1080 video card is the company's first solution based on the GP104 chip, which belongs to the new generation of Nvidia's Pascal graphics architecture. Although the new architecture is based on the solutions worked out in Maxwell, it also has important functional differences, which we will write about later. The main change from a global point of view was the new technological process, according to which the new graphics processor was made.

The use of the 16 nm FinFET process technology in the production of GP104 GPUs at the factories of the Taiwanese company TSMC made it possible to significantly increase the complexity of the chip while maintaining a relatively low area and cost. Compare the number of transistors and the area of ​​the GP104 and GM204 chips - they are close in area (the chip of the novelty is even physically smaller), but the Pascal architecture chip has a significantly larger number of transistors, and, accordingly, execution units, including those providing new functionality.

From an architectural point of view, the first gaming Pascal is very similar to similar solutions of the Maxwell architecture, although there are some differences. Like Maxwell, Pascal architecture processors will have different configurations of Graphics Processing Clusters (GPCs), Streaming Multiprocessors (SMs), and memory controllers. The SM multiprocessor is a highly parallel multiprocessor that schedules and runs warps (warps, groups of 32 instruction streams) on CUDA cores and other execution units in the multiprocessor. You can find detailed information about the design of all these blocks in our reviews of previous Nvidia solutions.

Each of the SM multiprocessors is paired with the PolyMorph Engine, which handles texture sampling, tessellation, transformation, vertex attribute setting, and perspective correction. Unlike the company's previous solutions, the PolyMorph Engine in the GP104 chip also contains a new Simultaneous Multi-Projection block, which we will discuss below. The combination of the SM multiprocessor with one Polymorph Engine is traditionally called TPC - Texture Processor Cluster for Nvidia.

In total, the GP104 chip in the GeForce GTX 1080 contains four GPC clusters and 20 SM multiprocessors, as well as eight memory controllers combined with 64 ROPs. Each GPC cluster has a dedicated rasterization engine and includes five SMs. Each multiprocessor, in turn, consists of 128 CUDA cores, 256 KB register file, 96 KB shared memory, 48 KB L1 cache, and eight TMU texture units. That is, in total, GP104 contains 2560 CUDA cores and 160 TMU units.

Also, the graphics processor on which the GeForce GTX 1080 is based contains eight 32-bit (as opposed to the 64-bit previously used) memory controllers, which gives us a final 256-bit memory bus. Eight ROPs and 256 KB of L2 cache are tied to each of the memory controllers. That is, in total, the GP104 chip contains 64 ROPs and 2048 KB of L2 cache.

Thanks to architectural optimizations and a new process technology, the first gaming Pascal has become the most energy efficient GPU ever. Moreover, there is a contribution to this both from one of the most advanced technological processes 16 nm FinFET, and from the architecture optimizations carried out in Pascal, compared to Maxwell. Nvidia was able to increase the clock speed even more than they expected when moving to a new process technology. The GP104 runs at a higher frequency than a hypothetical GM204 made using the 16nm process. To do this, Nvidia engineers had to carefully check and optimize all the bottlenecks of previous solutions that prevent overclocking above a certain threshold. As a result, the new GeForce GTX 1080 runs at over 40% higher clock speeds than the GeForce GTX 980. But that's not all there is to the GPU clock changes.

GPU Boost 3.0 Technology

As we well know from previous Nvidia graphics cards, they use GPU Boost hardware technology in their GPUs, designed to increase the operating clock speed of the GPU in modes where it has not yet reached its power consumption and thermal limits. Over the years, this algorithm has undergone many changes, and the third generation of this technology is already used in the Pascal architecture video chip - GPU Boost 3.0, the main innovation of which is a finer setting of turbo frequencies, depending on voltage.

If you remember the principle of operation of previous versions of the technology, then the difference between the base frequency (the guaranteed minimum frequency value below which the GPU does not fall, at least in games) and the turbo frequency was fixed. That is, the turbo frequency has always been a certain number of megahertz above the base. GPU Boost 3.0 introduced the ability to set turbo frequency offsets for each voltage separately. The easiest way to understand this is with an illustration:

On the left is the GPU Boost of the second version, on the right - the third, which appeared in Pascal. A fixed difference between the base and turbo frequencies did not allow to reveal the full capabilities of the GPU, in some cases, GPUs of previous generations could work faster at the set voltage, but a fixed excess of the turbo frequency did not allow this. In GPU Boost 3.0, this feature appeared, and the turbo frequency can be set for each of the individual voltage values, completely squeezing all the juice out of the GPU.

Handy utilities are required to manage overclocking and set the turbo frequency curve. Nvidia itself does not do this, but helps its partners create such utilities to facilitate overclocking (within reasonable limits, of course). For example, the new functionality of GPU Boost 3.0 has already been revealed in EVGA Precision XOC, which includes a dedicated overclocking scanner that automatically finds and sets the non-linear difference between base frequency and turbo frequency at different voltages by running a built-in performance and stability test. As a result, the user gets a turbo frequency curve that perfectly matches the capabilities of a particular chip. Which, moreover, can be modified as you like in manual mode.

As you can see in the screenshot of the utility, in addition to information about the GPU and the system, there are also settings for overclocking: Power Target (defines typical power consumption during overclocking, as a percentage of the standard), GPU Temp Target (maximum allowed core temperature), GPU Clock Offset (exceeding the base frequency for all voltage values), Memory Offset (exceeding the frequency of video memory over the default value), Overvoltage (additional opportunity to increase the voltage).

The Precision XOC utility includes three overclocking modes: Basic, Linear, and Manual. In the main mode, you can set a single overclock value (fixed turbo frequency) over the base one, as was the case for previous GPUs. Linear mode allows you to set the frequency ramp from the minimum to the maximum voltage values ​​for the GPU. Well, in manual mode, you can set unique GPU frequency values ​​\u200b\u200bfor each voltage point on the graph.

The utility also includes a special scanner for automatic overclocking. You can either set your own frequency levels or let the Precision XOC utility scan the GPU at all voltages and find the most stable frequencies for each point on the voltage and frequency curve completely automatically. During the scanning process, Precision XOC incrementally increases the frequency of the GPU and checks its operation for stability or artifacts, building an ideal frequency and voltage curve that will be unique to each specific chip.

This scanner can be customized to your own requirements by setting the time interval to test each voltage value, the minimum and maximum frequency to be tested, and its step. It is clear that in order to achieve stable results, it would be better to set a small step and a decent duration of testing. During testing, unstable operation of the video driver and the system may be observed, but if the scanner does not freeze, it will restore operation and continue to find the optimal frequencies.

New type of video memory GDDR5X and improved compression

So, the power of the GPU has grown significantly, and the memory bus has remained only 256-bit - will the memory bandwidth limit the overall performance and what can be done about it? It seems that the promising second-generation HBM is still too expensive to manufacture, so other options had to be looked for. Ever since the introduction of GDDR5 memory in 2009, Nvidia engineers have been exploring the possibilities of using new types of memory. As a result, developments have come to the introduction of a new memory standard GDDR5X - the most complex and advanced standard to date, giving a transfer rate of 10 Gbps.

Nvidia gives an interesting example of just how fast this is. Only 100 picoseconds elapse between transmitted bits - during this time, a beam of light will travel a distance of only one inch (about 2.5 cm). And when using GDDR5X memory, the data-receiving circuits have to choose the value of the transmitted bit in less than half of this time before the next one is sent - this is just so you understand what modern technology has come to.

Achieving this speed required the development of a new I/O system architecture that required several years of joint development with memory chip manufacturers. In addition to the increased data transfer rate, energy efficiency has also increased - GDDR5X memory chips use a lower voltage of 1.35 V and are manufactured using new technologies, which gives the same power consumption at a 43% higher frequency.

The company's engineers had to rework the data transmission lines between the GPU core and memory chips, paying more attention to preventing signal loss and signal degradation all the way from memory to GPU and back. So, in the illustration above, the captured signal is shown as a large symmetrical "eye", which indicates good optimization of the entire circuit and the relative ease of capturing data from the signal. Moreover, the changes described above have led not only to the possibility of using GDDR5X at 10 GHz, but also should help to get a high memory bandwidth on future products using the more familiar GDDR5 memory.

Well, we got more than 40% increase in memory bandwidth from the use of the new memory. But isn't that enough? To further increase memory bandwidth efficiency, Nvidia continued to improve the advanced data compression introduced in previous architectures. The memory subsystem in the GeForce GTX 1080 uses improved and several new lossless data compression techniques designed to reduce bandwidth requirements - already the fourth generation of on-chip compression.

Algorithms for data compression in memory bring several positive aspects at once. Compression reduces the amount of data written to memory, the same applies to data transferred from video memory to the second level cache, which improves the efficiency of using the L2 cache, since a compressed tile (a block of several framebuffer pixels) has a smaller size than an uncompressed one. It also reduces the amount of data sent between different points, like the TMU texture module and the framebuffer.

The data compression pipeline in the GPU uses several algorithms, which are determined depending on the "compressibility" of the data - the best available algorithm is selected for them. One of the most important is the delta color compression algorithm. This compression method encodes the data as the difference between consecutive values ​​instead of the data itself. The GPU calculates the difference in color values ​​between the pixels in a block (tile) and stores the block as some average color for the entire block plus data on the difference in values ​​for each pixel. For graphic data, this method is usually well suited, since the color within small tiles for all pixels often does not differ too much.

The GP104 GPU in the GeForce GTX 1080 supports more compression algorithms than previous Maxwell chips. Thus, the 2:1 compression algorithm has become more efficient, and in addition to it, two new algorithms have appeared: a 4:1 compression mode, suitable for cases where the difference in the color value of the pixels of a block is very small, and an 8:1 mode, which combines a constant 4:1 compression of 2×2 pixel blocks with 2x delta compression between blocks. When compression is not possible at all, it is not used.

However, in reality, the latter happens very infrequently. This can be seen from the example screenshots from the game Project CARS, which Nvidia cited to illustrate the increased compression ratio in Pascal. In the illustrations, those frame buffer tiles that could be compressed by the GPU are shaded in magenta, and those that can not be compressed without loss remained with the original color (top - Maxwell, bottom - Pascal).

As you can see, the new compression algorithms in GP104 really work much better than in Maxwell. Although the old architecture was also able to compress most of the tiles in the scene, a lot of grass and trees around the edges, as well as car parts, are not subject to legacy compression algorithms. But with the inclusion of new techniques in Pascal, a very small number of image areas remained uncompressed - improved efficiency is evident.

As a result of improvements in data compression, the GeForce GTX 1080 is able to significantly reduce the amount of data sent per frame. In numbers, improved compression saves an additional 20% of effective memory bandwidth. In addition to the more than 40% increase in memory bandwidth of the GeForce GTX 1080 relative to the GTX 980 from using GDDR5X memory, all together this gives about a 70% increase in effective memory bandwidth compared to the previous generation model.

Support for Async Compute

Most modern games use complex calculations in addition to graphics. For example, calculations when calculating the behavior of physical bodies can be carried out not before or after graphical calculations, but simultaneously with them, since they are not related to each other and do not depend on each other within the same frame. Another example is the post-processing of already rendered frames and the processing of audio data, which can also be performed in parallel with rendering.

Another clear example of the use of functionality is the Asynchronous Time Warp technique used in VR systems to change the output frame according to the movement of the player's head right before it is output, interrupting the rendering of the next one. Such asynchronous loading of GPU capacities allows increasing the efficiency of using its execution units.

These workloads create two new GPU usage scenarios. The first of these includes overlapping loads, since many types of tasks do not fully use the capabilities of GPUs, and some resources are idle. In such cases, you can simply run two different tasks on the same GPU, separating its execution units to get more efficient use - for example, PhysX effects that run in conjunction with the 3D rendering of the frame.

To improve the performance of this scenario, the Pascal architecture introduced dynamic load balancing. In the previous Maxwell architecture, overlapping workloads were implemented as a static distribution of GPU resources between graphics and compute. This approach is effective provided that the balance between the two workloads roughly corresponds to the division of resources and the tasks run equally in time. If non-graphical computations take longer than graphical ones, and both are waiting for the completion of the common work, then part of the GPU will be idle for the remaining time, which will cause a decrease in overall performance and nullify all the benefits. Hardware dynamic load balancing, on the other hand, allows you to use the freed up GPU resources as soon as they become available - for understanding, we will give an illustration.

There are also tasks that are time-critical, and this is the second scenario for asynchronous computing. For example, the execution of the asynchronous time distortion algorithm in VR must complete before the scan out or the frame will be discarded. In such a case, the GPU must support very fast task interruption and switching to another task in order to take a less critical task from execution on the GPU, freeing its resources for critical tasks - this is called preemption.

A single render command from a game engine can contain hundreds of draw calls, each draw call in turn contains hundreds of rendered triangles, each containing hundreds of pixels to be calculated and drawn. The traditional GPU approach uses only high-level task interruption, and the graphics pipeline has to wait for all that work to complete before switching tasks, resulting in very high latency.

To fix this, the Pascal architecture first introduced the ability to interrupt a task at the pixel level - Pixel Level Preemption. Pascal GPU execution units can constantly monitor the progress of rendering tasks, and when an interrupt is requested, they can stop execution, saving the context for later completion by quickly switching to another task.

Thread-level interrupt and toggle for compute operations works similarly to pixel-level interrupt for graphics computing. Computational workloads consist of multiple grids, each containing multiple threads. When an interrupt request is received, the threads running on the multiprocessor terminate their execution. Other blocks save their own state to continue from the same point in the future, and the GPU switches to another task. The entire task switching process takes less than 100 microseconds after the running threads exit.

For gaming workloads, the combination of pixel-level interrupts for graphics, and thread-level interrupts for compute tasks gives Pascal architecture GPUs the ability to quickly switch between tasks with minimal time loss. And for computing tasks on CUDA, it is also possible to interrupt with minimal granularity - at the instruction level. In this mode, all threads stop execution at once, immediately switching to another task. This approach requires saving more information about the state of all registers of each thread, but in some cases of non-graphical calculations it is quite justified.

The use of fast interrupt and task switching in graphical and computational tasks was added to the Pascal architecture so that graphical and non-graphical tasks could be interrupted at the level of individual instructions, rather than entire threads, as was the case with Maxwell and Kepler. These technologies can improve the asynchronous execution of different GPU workloads and improve responsiveness when running multiple tasks simultaneously. At the Nvidia event, they showed a demonstration of the work of asynchronous calculations using the example of calculating physical effects. If without asynchronous calculations the performance was at the level of 77-79 FPS, then with the inclusion of these features, the frame rate increased to 93-94 FPS.

We have already given an example of one of the possibilities for using this functionality in games in the form of asynchronous time distortion in VR. The illustration shows the operation of this technology with traditional interruption (preemption) and fast. In the first case, the process of asynchronous time distortion is tried to be carried out as late as possible, but before the start of updating the image on the display. But the work of the algorithm must be given to the execution in the GPU a few milliseconds earlier, since without a fast interruption there is no way to accurately execute the work at the right time, and the GPU is idle for some time.

In the case of precise interruption at the pixel and thread level (shown on the right), this capability gives greater accuracy in determining the moment of interruption, and asynchronous time warping can be started much later with confidence in the completion of the work before the update of the information on the display begins. And idle for some time in the first case, the GPU can be loaded with some additional graphic work.

Simultaneous Multi-Projection Technology

The new GP104 GPU adds support for a new Simultaneous Multi-Projection (SMP) technology that allows the GPU to render data more efficiently on modern display systems. SMP allows the video chip to simultaneously display data in several projections, which required the introduction of a new hardware block in the GPU as part of the PolyMorph engine at the end of the geometric pipeline before the rasterization block. This block is responsible for working with multiple projections for a single geometry stream.

The multi-projection engine processes geometric data simultaneously for 16 pre-configured projections that combine the projection point (cameras), these projections can be independently rotated or tilted. Since each geometry primitive can appear simultaneously in multiple projections, the SMP engine provides this functionality, allowing the application to instruct the video chip to replicate the geometry up to 32 times (16 projections at two projection centers) without additional processing.

The whole processing process is hardware accelerated, and since multiprojection works after the geometry engine, it does not need to repeat all the stages of geometry processing several times. The saved resources are important when rendering speed is limited by geometry processing performance, like tessellation, when the same geometric work is performed several times for each projection. Accordingly, in the peak case, multi-projection can reduce the need for geometry processing by up to 32 times.

But why is all this necessary? There are several good examples where multi-projection technology can be useful. For example, a multi-monitor system of three displays mounted at an angle to each other close enough to the user (surround configuration). In a typical situation, the scene is rendered in one projection, which leads to geometric distortions and incorrect geometry rendering. The correct way is three different projections for each of the monitors, according to the angle at which they are located.

With a video card on a chip with Pascal architecture, this can be done in one geometry pass, specifying three different projections, each for a different monitor. And the user, thus, will be able to change the angle at which the monitors are located to each other not only physically, but also virtually - by rotating the projections for the side monitors in order to get the correct perspective in the 3D scene with a noticeably wider viewing angle (FOV). True, there is a limitation here - for such support, the application must be able to render the scene with a wide FOV and use special SMP API calls to set it. That is, you can’t do this in every game, you need special support.

In any case, the days of a single projection on a single flat monitor are over, there are now many multi-monitor configurations and curved displays that can also use this technology. Not to mention virtual reality systems that use special lenses between the screens and the user's eyes, which require new techniques for projecting a 3D image into a 2D image. Many of these technologies and techniques are still in early development, the main thing is that older GPUs cannot effectively use more than one planar projection. They require multiple rendering passes, multiple processing of the same geometry, and so on.

Maxwell chips had limited Multi-Resolution support to help increase efficiency, but Pascal's SMP can do much more. Maxwell could rotate the projection by 90 degrees for cube mapping or different projection resolutions, but this was only useful in a limited range of applications like VXGI.

Other possibilities for using SMP include rendering at different resolutions and single-pass stereo rendering. For example, rendering at different resolutions (Multi-Res Shading) can be used in games to optimize performance. When applied, a higher resolution is used in the center of the frame, and at the periphery it is reduced to obtain a faster rendering speed.

Single-pass stereo rendering is used in VR, it has already been added to the VRWorks package and uses the multi-projection feature to reduce the amount of geometric work required in VR rendering. If this feature is used, the GeForce GTX 1080 GPU processes the scene geometry only once, generating two projections for each eye at once, which reduces the geometric load on the GPU by half, and also reduces the losses from the driver and OS.

An even more advanced technique for improving the efficiency of VR rendering is Lens Matched Shading, which uses multiple projections to simulate the geometric distortions required in VR rendering. This method uses multi-projection to render a 3D scene onto a surface that approximates the lens-adjusted surface when rendered for VR headset output, avoiding many extra pixels on the periphery that would be discarded. The easiest way to understand the essence of the method is by illustration - four slightly expanded projections are used in front of each eye (in Pascal, you can use 16 projections for each eye - to more accurately simulate a curved lens) instead of one:

This approach can lead to significant performance savings. For example, a typical Oculus Rift image per eye is 1.1 megapixels. But due to the difference in projections, to render it, the original image is 2.1 megapixels - 86% more than necessary! The use of multi-projection, implemented in the Pascal architecture, allows reducing the resolution of the rendered image to 1.4 megapixels, obtaining a 1.5-fold saving in pixel processing speed, and also saves memory bandwidth.

And along with a twofold saving in geometry processing speed due to single-pass stereo rendering, the GeForce GTX 1080 graphics processor is able to provide a significant increase in VR rendering performance, which is very demanding on geometry processing speed, and even more so on pixel processing.

Improvements in video output and processing blocks

In addition to performance and new functionality related to 3D rendering, it is necessary to maintain a good level of image output, as well as video decoding and encoding. And the first Pascal architecture graphics processor did not disappoint - it supports all modern standards in this sense, including the hardware decoding of the HEVC format, which is necessary for viewing 4K videos on a PC. Also, future owners of GeForce GTX 1080 graphics cards will soon be able to enjoy streaming 4K video from Netflix and other providers on their systems.

In terms of display output, the GeForce GTX 1080 has support for HDMI 2.0b with HDCP 2.2 as well as DisplayPort. So far, the DP 1.2 version has been certified, but the GPU is ready for certification for newer versions of the standard: DP 1.3 Ready and DP 1.4 Ready. The latter allows 4K screens to be displayed at 120Hz, and 5K and 8K displays at 60Hz using a pair of DisplayPort 1.3 cables. If for the GTX 980 the maximum supported resolution was 5120x3200 at 60Hz, then for the new GTX 1080 model it has grown to 7680x4320 at the same 60Hz. The reference GeForce GTX 1080 has three DisplayPort outputs, one HDMI 2.0b and one digital Dual-Link DVI.

The new model of the Nvidia video card also received an improved block for decoding and encoding video data. Thus, the GP104 chip complies with the high standards of PlayReady 3.0 (SL3000) for streaming video playback, which allows you to be sure that playing high-quality content from well-known providers such as Netflix will be of the highest quality and energy efficient. Details on support for various video formats during encoding and decoding are given in the table, the new product is clearly better than previous solutions:

But an even more interesting novelty is support for the so-called High Dynamic Range (HDR) displays, which are about to become widespread in the market. TVs are on sale as early as 2016 (with four million HDR TVs expected to be sold in just one year), and monitors next year. HDR is the biggest breakthrough in display technology in years, delivering double the color tones (75% visible spectrum vs. 33% for RGB), brighter displays (1000 nits) with higher contrast ratio (10000:1) and rich colors.

The emergence of the ability to play content with a greater difference in brightness and richer and more saturated colors will bring the image on the screen closer to reality, the black color will become deeper, the bright light will dazzle, just like in the real world. Accordingly, users will see more detail in bright and dark areas of images compared to standard monitors and TVs.

To support HDR displays, the GeForce GTX 1080 has everything you need - 12-bit color output, support for BT.2020 and SMPTE 2084 standards, and HDMI 2.0b 10/12-bit 4K HDR output. resolution, which was the case with Maxwell. In addition, Pascal has added support for decoding the HEVC format in 4K resolution at 60 Hz and 10- or 12-bit color, which is used for HDR video, as well as encoding the same format with the same parameters, but only in 10-bit for HDR video recording or streaming. Also, the novelty is ready for DisplayPort 1.4 standardization for HDR data transmission via this connector.

By the way, HDR video encoding may be needed in the future in order to transfer such data from a home PC to a SHIELD game console that can play 10-bit HEVC. That is, the user will be able to broadcast the game from a PC in HDR format. Wait, where can I get games with such support? Nvidia is constantly working with game developers to implement this support, giving them everything they need (driver support, code samples, etc.) to correctly render HDR images that are compatible with existing displays.

At the time of release of the video card, the GeForce GTX 1080, games such as Obduction, The Witness, Lawbreakers, Rise of the Tomb Raider, Paragon, The Talos Principle and Shadow Warrior 2 have support for HDR output. But this list is expected to be replenished in the near future .

Changes to multi-chip SLI rendering

There were also some changes related to the proprietary SLI multi-chip rendering technology, although no one expected this. SLI is used by PC gaming enthusiasts to boost performance either to the extreme by running the most powerful single-chip graphics cards in tandem, or to get very high frame rates by limiting themselves to a couple of mid-range solutions that are sometimes cheaper than one top-end ( controversial decision, but they do it). With 4K monitors, players have almost no other options than installing a couple of video cards, since even top models often cannot provide a comfortable game at maximum settings in such conditions.

One of the important components of Nvidia SLI are bridges that connect video cards into a common video subsystem and serve to organize a digital channel for data transfer between them. GeForce graphics cards have traditionally featured dual SLI connectors, which served to connect between two or four graphics cards in 3-Way and 4-Way SLI configurations. Each of the video cards had to be connected to each, since all the GPUs sent the frames they rendered to the main GPU, which is why two interfaces were needed on each of the boards.

Starting with the GeForce GTX 1080, all Nvidia graphics cards based on the Pascal architecture have two SLI interfaces linked together to increase the performance of data transfer between graphics cards, and this new dual-channel SLI mode improves performance and comfort when displaying visual information on very high-resolution displays or multi-monitor systems.

For this mode, new bridges were also needed, called SLI HB. They combine a pair of GeForce GTX 1080 video cards via two SLI channels at once, although the new video cards are also compatible with older bridges. For resolutions of 1920×1080 and 2560×1440 pixels at a refresh rate of 60 Hz, standard bridges can be used, but in more demanding modes (4K, 5K and multi-monitor systems), only new bridges will provide better results in terms of smooth frame change, although the old ones will work, but somewhat worse.

Also, when using SLI HB bridges, the GeForce GTX 1080 data interface runs at 650 MHz, compared to 400 MHz for conventional SLI bridges on older GPUs. Moreover, for some of the tough old bridges, a higher data transfer rate is also available with video chips of the Pascal architecture. With an increase in the data transfer rate between the GPU via a doubled SLI interface with an increased frequency of operation, a smoother display of frames on the screen is also provided, compared to previous solutions:

It should also be noted that support for multi-chip rendering in DirectX 12 is somewhat different from what was customary before. In the latest version of the graphics API, Microsoft has made many changes related to the operation of such video systems. There are two multi-GPU options available to software developers in DX12: Multi Display Adapter (MDA) and Linked Display Adapter (LDA) modes.

Moreover, the LDA mode has two forms: Implicit LDA (which Nvidia uses for SLI) and Explicit LDA (when the game developer takes on the task of managing multi-chip rendering. The MDA and Explicit LDA modes were just implemented in DirectX 12 in order to give game developers have more freedom and opportunities when using multi-chip video systems.The difference between the modes is clearly visible in the following table:

In LDA mode, the memory of each GPU can be connected to the memory of another and displayed as a large total volume, of course, with all the performance limitations when the data is taken from "foreign" memory. In MDA mode, each GPU's memory works separately, and different GPUs cannot directly access data from another GPU's memory. LDA mode is designed for multi-chip systems of similar performance, while MDA mode is less restrictive and can work together with discrete and integrated GPUs or discrete solutions with chips from different manufacturers. But this mode also requires more attention and work from developers when programming collaboration so that GPUs can communicate with each other.

By default, the GeForce GTX 1080 based SLI system supports only two GPUs, and three- and four-GPU configurations are officially deprecated, as modern games are becoming increasingly difficult to achieve performance gains from adding a third and fourth GPU. For example, many games rely on the capabilities of the system's central processor when operating multi-chip video systems, and new games increasingly use temporal (temporal) techniques that use data from previous frames, in which the efficient operation of several GPUs at once is simply impossible.

However, the operation of systems in other (non-SLI) multi-chip systems remains possible, such as MDA or LDA Explicit modes in DirectX 12 or a two-chip SLI system with a dedicated third GPU for PhysX physical effects. But what about the records in benchmarks, is Nvidia really abandoning them altogether? No, of course, but since such systems are in demand in the world by almost a few users, a special Enthusiast Key was invented for such ultra-enthusiasts, which can be downloaded from the Nvidia website and unlock this feature. To do this, you first need to get a unique GPU ID by running a special application, then request the Enthusiast Key on the website and, after downloading it, install the key into the system, thereby unlocking the 3-Way and 4-Way SLI configurations.

Fast Sync technology

Some changes have taken place in synchronization technologies when displaying information on the display. Looking ahead, there is nothing new in G-Sync, nor is Adaptive Sync technology supported. But Nvidia decided to improve the smoothness of the output and synchronization for games that show very high performance when the frame rate is significantly higher than the refresh rate of the monitor. This is especially important for games that require minimal latency and fast response, and which are multiplayer battles and competitions.

Fast Sync is a new alternative to vertical sync that does not have visual artifacts such as tearing in the image and is not tied to a fixed refresh rate, which increases latency. What is the problem with vertical sync in games like Counter-Strike: Global Offensive? This game on powerful modern GPUs runs at several hundred frames per second, and the player has a choice whether to enable v-sync or not.

In multiplayer games, users most often chase for minimal delays and turn off VSync, getting clearly visible tearing in the image, which is extremely unpleasant even at high frame rates. However, if you turn on v-sync, then the player will experience a significant increase in delays between his actions and the image on the screen, when the graphics pipeline slows down to the monitor's refresh rate.

This is how a traditional pipeline works. But Nvidia decided to separate the process of rendering and displaying the image on the screen using Fast Sync technology. This allows the part of the GPU that renders frames at full speed to continue to operate at maximum efficiency by storing those frames in a special temporary Last Rendered Buffer.

This method allows you to change the display method and take the best from the VSync On and VSync Off modes, getting low latency, but without image artifacts. With Fast Sync, there is no frame flow control, the game engine runs in sync-off mode and is not told to wait to draw the next one, so latencies are almost as low as VSync Off mode. But since Fast Sync independently selects a buffer for displaying on the screen and displays the entire frame, there are no picture breaks either.

Fast Sync uses three different buffers, the first two of which work similar to double buffering in a classic pipeline. Primary buffer (Front Buffer - FB) is a buffer, information from which is displayed on the display, a fully rendered frame. The back buffer (Back Buffer - BB) is the buffer that receives information when rendering.

When using vertical sync in high frame rate conditions, the game waits until the refresh interval is reached in order to swap the primary buffer with the secondary buffer to display the image of a single frame on the screen. This slows things down, and adding more buffers like traditional triple buffering will only add to the delay.

With Fast Sync, a third Last Rendered Buffer (LRB) is added, which is used to store all the frames that have just been rendered in the secondary buffer. The name of the buffer speaks for itself, it contains a copy of the last fully rendered frame. And when the moment comes to update the primary buffer, this LRB buffer is copied to the primary buffer in its entirety, and not in parts, as from the secondary buffer with disabled vertical synchronization. Since copying information from buffers is inefficient, they are simply swapped (or renamed, as it will be more convenient to understand), and the new logic of swapping buffers, introduced in GP104, manages this process.

In practice, the inclusion of a new synchronization method Fast Sync still provides a slightly larger delay compared to completely disabled vertical synchronization - an average of 8 ms more, but it displays frames on the monitor in its entirety, without unpleasant artifacts on the screen that tear the image. The new method can be enabled from the Nvidia control panel graphics settings in the vertical sync control section. However, the default value remains application control, and enabling Fast Sync in all 3D applications is simply not required, it is better to choose this method specifically for games with high FPS.

Virtual reality technology Nvidia VRWorks

We've touched on the hot topic of VR more than once in this article, but it's mostly been about boosting framerates and ensuring low latency, which is very important for VR. All this is very important and there is indeed progress, but so far VR games look nowhere near as impressive as the best of the "regular" modern 3D games. This happens not only because leading game developers have not yet been particularly involved in VR applications, but also because VR is more demanding on frame rates, which prevents the use of many of the usual techniques in such games due to high demands.

In order to reduce the difference in quality between VR games and regular games, Nvidia decided to release a whole package of related VRWorks technologies, which included a large number of APIs, libraries, engines and technologies that can significantly improve both the quality and performance of VR- applications. How does this relate to the announcement of the first gaming solution in Pascal? It's very simple - some technologies have been introduced into it that help increase productivity and improve quality, and we have already written about them.

And although it concerns not only graphics, first we will talk a little about it. The set of VRWorks Graphics technologies includes the previously mentioned technologies, such as Lens Matched Shading, using the multi-projection feature that appeared in the GeForce GTX 1080. The new product allows you to get a performance increase of 1.5-2 times in relation to solutions that do not have such support. We also mentioned other technologies, such as MultiRes Shading, designed to render at different resolutions in the center of the frame and on its periphery.

But much more unexpected was the announcement of VRWorks Audio technology, designed for high-quality calculation of sound data in 3D scenes, which is especially important in virtual reality systems. In conventional engines, the positioning of sound sources in a virtual environment is calculated quite correctly, if the enemy shoots from the right, then the sound is louder from this side of the audio system, and such a calculation is not too demanding on computing power.

But in reality, sounds go not only towards the player, but in all directions and bounce off various materials, similar to how light rays bounce. And in reality, we hear these reflections, although not as clearly as direct sound waves. These indirect sound reflections are usually simulated by special reverb effects, but this is a very primitive approach to the task.

VRWorks Audio uses sound wave rendering similar to ray tracing in rendering, where the path of light rays is traced to multiple reflections from objects in a virtual scene. VRWorks Audio also simulates the propagation of sound waves in the environment when direct and reflected waves are tracked, depending on their angle of incidence and the properties of reflective materials. In its work, VRWorks Audio uses the high-performance Nvidia OptiX ray tracing engine known for graphics tasks. OptiX can be used for a variety of tasks, such as indirect lighting calculation and lightmapping, and now also for sound wave tracing in VRWorks Audio.

Nvidia has built accurate sound wave calculation into its VR Funhouse demo, which uses several thousand rays and calculates up to 12 reflections from objects. And in order to learn the advantages of the technology using a clear example, we suggest you watch a video about the operation of the technology in Russian:

It is important that Nvidia's approach differs from traditional sound engines, including the hardware-accelerated method from the main competitor using a special block in the GPU. All of these methods provide only accurate positioning of sound sources, but do not calculate the reflections of sound waves from objects in a 3D scene, although they can simulate this using the reverb effect. However, the use of ray tracing technology can be much more realistic, since only such an approach will provide an accurate imitation of various sounds, taking into account the size, shape and materials of objects in the scene. It is difficult to say whether such computational accuracy is required for a typical player, but we can say for sure: in VR, it can add to users the very realism that is still lacking in conventional games.

Well, it remains for us to tell only about the VR SLI technology, which works in both OpenGL and DirectX. Its principle is extremely simple: a two-GPU video system in a VR application will work in such a way that each eye is allocated a separate GPU, as opposed to the AFR rendering familiar to SLI configurations. This greatly improves the overall performance, which is so important for virtual reality systems. Theoretically, more GPUs can be used, but their number must be even.

This approach was required because AFR is not well suited for VR, since with its help the first GPU will draw an even frame for both eyes, and the second one will render an odd one, which does not reduce the delays that are critical for virtual reality systems. Although the frame rate will be quite high. So with the help of VR SLI, work on each frame is divided into two GPUs - one works on part of the frame for the left eye, the second for the right, and then these halves of the frame are combined into a whole.

Splitting work like this between a pair of GPUs brings about a 2x increase in performance, resulting in higher frame rates and lower latency compared to systems based on a single graphics card. True, the use of VR SLI requires special support from the application in order to use this scaling method. But VR SLI technology is already built into VR demo apps like Valve's The Lab and ILMxLAB's Trials on Tatooine, and that's just the beginning - Nvidia promises other apps to come soon, as well as bringing the technology to Unreal Engine 4, Unity, and Max Play.

Ansel Game Screenshot Platform

One of the most interesting announcements related to the software was the release of a technology for capturing high-quality screenshots in gaming applications, named after one famous photographer - Ansel. Games have long been not just games, but also a place to use playful hands for various creative personalities. Someone changes scripts for games, someone releases high-quality texture sets for games, and someone makes beautiful screenshots.

Nvidia decided to help the latter by introducing a new platform for creating (namely, creating, because this is not such an easy process) high-quality shots from games. They believe that Ansel can help create a new kind of contemporary art. After all, there are already quite a few artists who spend most of their lives on the PC, creating beautiful screenshots from games, and they still did not have a convenient tool for this.

Ansel allows you to not only capture an image in the game, but change it as the creator needs. Using this technology, you can move the camera around the scene, rotate and tilt it in any direction in order to obtain the desired composition of the frame. For example, in games like first-person shooters, you can only move the player, you can’t really change anything else, so all the screenshots are pretty monotonous. With a free camera in Ansel, you can go far beyond the gaming camera by choosing the angle you need for a good picture, or even capture a full 360-degree stereo image from the required point, and in high resolution for later viewing in a VR helmet.

Ansel works quite simply - with the help of a special library from Nvidia, this platform is embedded in the game code. To do this, its developer only needs to add a small piece of code to his project to allow the Nvidia video driver to intercept buffer and shader data. There is very little work to be done, bringing Ansel into the game takes less than one day to implement. So, the inclusion of this feature in The Witness took about 40 lines of code, and in The Witcher 3 - about 150 lines of code.

Ansel will come with an open development package - SDK. The main thing is that the user receives with him a standard set of settings that allow him to change the position and angle of the camera, add effects, etc. The Ansel platform works like this: it pauses the game, turns on the free camera and allows you to change the frame to the desired view by recording the result in the form of a regular screenshot, a 360-degree image, a stereo pair, or just a panorama of high resolution.

The only caveat is that not all games will receive support for all the features of the Ansel game screenshot platform. Some of the game developers, for one reason or another, do not want to include a completely free camera in their games - for example, because of the possibility of cheaters using this functionality. Or they want to limit the change in viewing angle for the same reason - so that no one gets an unfair advantage. Well, or so that users do not see miserable sprites in the background. All this is quite normal desires of game creators.

One of the most interesting features of Ansel is the creation of screenshots of simply huge resolution. It doesn't matter that the game supports resolutions up to 4K, for example, and the user's monitor is Full HD. Using the screenshot platform, you can capture a much higher quality image, limited rather by the size and performance of the drive. The platform captures screenshots up to 4.5 gigapixels with ease, stitched together from 3600 pieces!

It is clear that in such pictures you can see all the details, up to the text on the newspapers lying in the distance, if such a level of detail is in principle provided for in the game - Ansel can also control the level of detail, setting the maximum level to get the best picture quality. But you can still enable supersampling. All this allows you to create images from games that you can safely print on large banners and be calm about their quality.

Interestingly, a special hardware-accelerated code based on CUDA is used to stitch large images. After all, no video card can render a multi-gigapixel image in its entirety, but it can do it in pieces, which you just need to combine later, taking into account the possible difference in lighting, color, and so on.

After stitching such panoramas, a special post-processing is used for the entire frame, also accelerated on the GPU. And to capture images in a higher dynamic range, you can use a special image format - EXR, an open standard from Industrial Light and Magic, the color values ​​​​in each channel of which are recorded in 16-bit floating point format (FP16).

This format allows you to change the brightness and dynamic range of the image in post-processing, bringing it to the desired for each specific display in the same way as it is done with RAW formats from cameras. And for the subsequent use of post-processing filters in image processing programs, this format is very useful, since it contains much more data than the usual image formats.

But the Ansel platform itself contains a lot of post-processing filters, which is especially important because it has access not only to the final image, but also to all the buffers used by the game when rendering, which can be used for very interesting effects, like depth of field. To do this, Ansel has a special post-processing API, and any of the effects can be included in the game with support for this platform.

Ansel postfilters include: color curves, color space, transformation, desaturation, brightness/contrast, film grain, bloom, lens flare, anamorphic glare, distortion, heathaze, fisheye, color aberration, tone mapping, lens dirt, lightshafts , vignette, gamma correction, convolution, sharpening, edge detection, blur, sepia, denoise, FXAA and others.

As for the appearance of Ansel support in games, then we will have to wait a bit until the developers implement and test it. But Nvidia promises that such support will soon appear in such well-known games as The Division, The Witness, Lawbreakers, The Witcher 3, Paragon, Fortnite, Obduction, No Man's Sky, Unreal Tournament and others.

The new 16nm FinFET process technology and architecture optimizations have allowed the GeForce GTX 1080 based on the GP104 GPU to achieve a high clock speed of 1.6-1.7 GHz even in the reference form, and the new generation guarantees operation at the highest possible frequencies in games GPU Boost technologies. Together with an increased number of execution units, these improvements make it not only the highest performing single-chip graphics card of all time, but also the most energy efficient solution on the market.

The GeForce GTX 1080 is the first graphics card to feature the new GDDR5X graphics memory, a new generation of high-speed chips that achieve very high data rates. In the case of a modified GeForce GTX 1080, this type of memory operates at an effective frequency of 10 GHz. Combined with improved framebuffer compression algorithms, this resulted in a 1.7x increase in effective memory bandwidth for this GPU compared to its direct predecessor, the GeForce GTX 980.

Nvidia wisely decided not to release a radically new architecture on a completely new process technology for itself, so as not to encounter unnecessary problems during development and production. Instead, they seriously improved the already good and very efficient Maxwell architecture by adding some features. As a result, everything is fine with the production of new GPUs, and in the case of the GeForce GTX 1080 model, engineers have achieved a very high frequency potential - in overclocked versions from partners, the GPU frequency is expected up to 2 GHz! Such an impressive frequency became a reality thanks to the perfect technical process and painstaking work of Nvidia engineers in the development of the Pascal GPU.

And while Pascal is a direct follower of Maxwell, and these graphics architectures are fundamentally not too different from each other, Nvidia has introduced many changes and improvements, including display capabilities, video encoding and decoding engine, improved asynchronous execution of various types of calculations on the GPU, made changes to multi-chip rendering and introduced a new synchronization method, Fast Sync.

It is impossible not to highlight the Simultaneous Multi-Projection technology, which helps to improve performance in virtual reality systems, get more correct display of scenes on multi-monitor systems, and introduce new performance optimization techniques. But VR applications will see the greatest speed boost when they support multi-projection technology, which helps to save GPU resources by half when processing geometric data and by one and a half times in per-pixel calculations.

Among the purely software changes, the platform for creating screenshots in games called Ansel stands out - it will be interesting to try it in practice not only for those who play a lot, but also for those who are simply interested in high-quality 3D graphics. The novelty allows you to advance the art of creating and retouching screenshots to a new level. Well, such packages for game developers as GameWorks and VRWorks, Nvidia just continues to improve step by step - so, in the latter, an interesting possibility of high-quality sound calculation has appeared, taking into account numerous reflections of sound waves using hardware ray tracing.

In general, in the form of the Nvidia GeForce GTX 1080 video card, a real leader entered the market, having all the necessary qualities for this: high performance and wide functionality, as well as support for new features and algorithms. Early adopters of this graphics card will be able to experience many of these benefits right away, while other features of the solution will be revealed a little later, when there is widespread support from the software. The main thing is that the GeForce GTX 1080 turned out to be very fast and efficient, and, as we really hope, Nvidia engineers managed to fix some of the problem areas (the same asynchronous calculations).

Graphics accelerator GeForce GTX 1070

ParameterMeaning
Chip code nameGP104
Production technology16nm FinFET
Number of transistors7.2 billion
Core area314 mm²
ArchitectureUnified, with an array of common processors for stream processing of numerous types of data: vertices, pixels, etc.
DirectX hardware supportDirectX 12, with support for Feature Level 12_1
Memory bus256-bit: eight independent 32-bit memory controllers supporting GDDR5 and GDDR5X memory
GPU frequency1506 (1683) MHz
Computing blocks15 active (out of 20 in the chip) streaming multiprocessors, including 1920 (out of 2560) scalar ALUs for floating point calculations within the framework of the IEEE 754-2008 standard;
Texturing blocks120 active (out of 160 in the chip) texture addressing and filtering units with support for FP16 and FP32 components in textures and support for trilinear and anisotropic filtering for all texture formats
Raster Operations Units (ROPs)8 wide ROPs (64 pixels) with support for various anti-aliasing modes, including programmable and with FP16 or FP32 frame buffer format. Blocks consist of an array of configurable ALUs and are responsible for depth generation and comparison, multisampling and blending
Monitor supportIntegrated support for up to four monitors connected via Dual Link DVI, HDMI 2.0b and DisplayPort 1.2 (1.3/1.4 Ready)

GeForce GTX 1070 Reference Graphics Specifications
ParameterMeaning
Core frequency1506 (1683) MHz
Number of universal processors1920
Number of texture blocks120
Number of blending blocks64
Effective memory frequency8000 (4×2000) MHz
Memory typeGDDR5
Memory bus256-bit
Memory8 GB
Memory Bandwidth256 GB/s
Computing performance (FP32)about 6.5 teraflops
Theoretical maximum fill rate96 gigapixels/s
Theoretical texture sampling rate181 gigatexels/s
TirePCI Express 3.0
ConnectorsOne Dual Link DVI, one HDMI and three DisplayPort
power usageup to 150 W
Extra foodOne 8-pin connector
Number of slots occupied in the system chassis2
Recommended price$379-449 (US), 34,990 (Russia)

The GeForce GTX 1070 video card also received a logical name similar to the same solution from the previous GeForce series. It differs from its direct predecessor GeForce GTX 970 only in a changed generation figure. The novelty becomes a step below the current top solution GeForce GTX 1080 in the current line of the company, which became the temporary flagship of the new series until the release of even more powerful GPU solutions.

The recommended prices for Nvidia's new top video card are $379 and $449 for regular Nvidia Partners and Founders Edition, respectively. Compared to the top model, this is a very good price given that the GTX 1070 is about 25% behind it at worst. And at the time of the announcement and release, the GTX 1070 becomes the best performance solution in its class. Like the GeForce GTX 1080, the GTX 1070 has no direct competitors from AMD, and can only be compared with the Radeon R9 390X and Fury.

The GP104 GPU in the GeForce GTX 1070 modification decided to leave a full 256-bit memory bus, although they did not use a new type of GDDR5X memory, but a very fast GDDR5, which operates at a high effective frequency of 8 GHz. The amount of memory installed on a video card with such a bus can be 4 or 8 GB, and in order to ensure maximum performance of the new solution in conditions of high settings and rendering resolutions, the GeForce GTX 1070 video card model was also equipped with 8 GB of video memory, like its older sister. This volume is enough to run any 3D applications with maximum quality settings for several years.

GeForce GTX 1070 Founders Edition

With the announcement of the GeForce GTX 1080 in early May, a special edition of the video card called Founders Edition was announced, which has a higher price than regular video cards from the company's partners. The same applies to the novelty. In this article, we will again talk about a special edition of the GeForce GTX 1070 video card called Founders Edition. As in the case of the older model, Nvidia decided to release this version of the manufacturer's reference video card at a higher price. They claim that many gamers and enthusiasts who buy expensive top-end graphics cards want a product with an appropriate "premium" look and feel.

Accordingly, it is for such users that the GeForce GTX 1070 Founders Edition video card will be released to the market, which is designed and manufactured by Nvidia engineers from premium materials and components, such as the GeForce GTX 1070 Founders Edition aluminum cover, as well as a low-profile back plate that covers the back of the PCB and quite popular among enthusiasts.

As you can see from the photos of the board, the GeForce GTX 1070 Founders Edition inherited exactly the same industrial design from the reference version of the GeForce GTX 1080 Founders Edition. Both models use a radial fan that blows heated air out, which is very useful in both small cases and multi-chip SLI configurations with limited physical space. By blowing heated air out instead of circulating it inside the case, you can reduce thermal stress, improve overclocking results, and extend the life of system components.

Under the cover of the reference cooling system GeForce GTX 1070 hides a specially shaped aluminum radiator with three built-in copper heat pipes that remove heat from the GPU itself. The heat dissipated by the heat pipes is then dissipated by an aluminum heatsink. Well, the low-profile metal plate on the back of the board is also designed to provide better thermal performance. It also features a retractable section for better airflow between multiple graphics cards in SLI configurations.

As for the board's power system, the GeForce GTX 1070 Founders Edition has a four-phase power system optimized for a stable power supply. Nvidia claims that the use of special components in the GTX 1070 Founders Edition improves power efficiency, stability, and reliability over the GeForce GTX 970, delivering better overclocking performance. In the company's own tests, the GeForce GTX 1070 GPUs easily surpassed 1.9 GHz, which is close to the results of the older GTX 1080 model.

The Nvidia GeForce GTX 1070 graphics card will be available in retail stores starting June 10th. The recommended prices for the GeForce GTX 1070 Founders Edition and partner solutions are different, and this is the main question for this special edition. If Nvidia partners sell their GeForce GTX 1070 graphics cards starting at $379 (in the US market), then Nvidia's reference design Founders Edition will cost as little as $449. Are there many enthusiasts who are ready to overpay for, let's face it, the dubious advantages of the reference version? Time will tell, but we believe that the reference fee is more interesting as an option available for purchase at the very beginning of sales, and later the point of acquiring it (and even at a high price!) is already reduced to zero.

It remains to be added that the printed circuit board of the reference GeForce GTX 1070 is similar to that of the older video card, and both of them differ from the device of the company's previous boards. The typical power consumption value for the new product is 150 W, which is almost 20% less than the value for the GTX 1080 and close to the power consumption of the previous generation GeForce GTX 970 video card. The Nvidia reference board has a familiar set of connectors for connecting image output devices: one Dual-Link DVI , one HDMI and three DisplayPort. Moreover, there is support for new versions of HDMI and DisplayPort, which we wrote about above in the review of the GTX 1080 model.

Architectural changes

The GeForce GTX 1070 is based on the GP104 chip, the first of a new generation of Nvidia's Pascal graphics architecture. This architecture was based on the solutions developed back in Maxwell, but it also has some functional differences, which we wrote about in detail above - in the part devoted to the top GeForce GTX 1080 video card.

The main change of the new architecture was the technological process by which all new GPUs will be executed. The use of the 16 nm FinFET manufacturing process in the production of GP104 made it possible to significantly increase the complexity of the chip while maintaining a relatively low area and cost, and the very first chip of the Pascal architecture has a significantly larger number of execution units, including those providing new functionality, compared to Maxwell chips of similar positioning.

The GP104 video chip is similar in its design to similar Maxwell architecture solutions, and you can find detailed information about the design of modern GPUs in our reviews of previous Nvidia solutions. Like previous GPUs, the chips of the new architecture will have a different configuration of Graphics Processing Cluster (GPC), Streaming Multiprocessor (SM) and memory controllers, and some changes have already occurred in the GeForce GTX 1070 - part of the chip was locked and inactive ( highlighted in grey):

Although the GP104 GPU includes four GPC clusters and 20 SM multiprocessors, in the version for the GeForce GTX 1070 it received a stripped-down modification with one GPC cluster disabled by hardware. Since each GPC cluster has a dedicated rasterization engine and includes five SMs, and each multiprocessor consists of 128 CUDA cores and eight texture TMUs, 1920 CUDA cores and 120 TMUs of 2560 stream processors are active in this version of GP104 and 160 physical texture units.

The graphics processor on which the GeForce GTX 1070 is based contains eight 32-bit memory controllers, giving a final 256-bit memory bus - exactly like in the case of the older GTX 1080 model. The memory subsystem has not been cut in order to provide a sufficiently high bandwidth memory with the condition of using GDDR5 memory in the GeForce GTX 1070. Each of the memory controllers has eight ROPs and 256 KB of L2 cache, so the GP104 chip in this modification also contains 64 ROPs and 2048 KB of L2 cache level.

Thanks to architectural optimizations and a new process technology, the GP104 GPU has become the most energy efficient GPU to date. Nvidia engineers were able to increase the clock speed more than they expected when moving to a new process, for which they had to work hard, carefully checking and optimizing all the bottlenecks of previous solutions that did not allow them to work at a higher frequency. Accordingly, the GeForce GTX 1070 also operates at a very high frequency, more than 40% higher than the reference value for the GeForce GTX 970.

Since the GeForce GTX 1070 is, in essence, just a slightly less productive GTX 1080 with GDDR5 memory, it supports absolutely all the technologies we described in the previous section. For more details about the Pascal architecture, as well as the technologies it supports, such as improved output and video processing units, Async Compute support, Simultaneous Multi-Projection technology, changes in SLI multi-chip rendering, and the new Fast Sync synchronization type, it is worth reading with a section on the GTX 1080.

High-performance GDDR5 memory and its efficient use

We wrote above about changes in the memory subsystem of the GP104 GPU, on which the GeForce GTX 1080 and GTX 1070 models are based - the memory controllers included in this GPU support both the new type of GDDR5X video memory, which is described in detail in the GTX 1080 review, as well as and the good old GDDR5 memory that we have known for several years now.

In order not to lose too much in memory bandwidth in the younger GTX 1070 compared to the older GTX 1080, all eight 32-bit memory controllers were left active in it, getting a full 256-bit common video memory interface. In addition, the video card was equipped with the fastest GDDR5 memory available on the market - with an effective operating frequency of 8 GHz. All this provided a memory bandwidth of 256 GB / s, in contrast to 320 GB / s for the older solution - the computing capabilities were cut by about the same amount, so that the balance was maintained.

Keep in mind that while peak theoretical bandwidth is important for GPU performance, you need to pay attention to its efficiency as well. During the rendering process, many different bottlenecks can limit the overall performance, preventing the use of all available memory bandwidth. To minimize these bottlenecks, GPUs use special lossless data compression to improve the efficiency of data reads and writes.

The fourth generation of delta compression of buffer information has already been introduced in the Pascal architecture, which allows the GPU to more efficiently use the available capabilities of the video memory bus. The memory subsystem in the GeForce GTX 1070 and GTX 1080 uses improved old and several new lossless data compression techniques designed to reduce bandwidth requirements. This reduces the amount of data written to memory, improves L2 cache efficiency, and reduces the amount of data sent between different points on the GPU, like the TMU and the framebuffer.

GPU Boost 3.0 and overclocking features

Most of Nvidia's partners have already announced factory-overclocked solutions based on the GeForce GTX 1080 and GTX 1070. And many of the video card manufacturers also create special overclocking utilities that allow you to use the new functionality of GPU Boost 3.0 technology. One example of such utilities is EVGA Precision XOC, which includes an automatic scanner to determine the voltage-to-frequency curve - in this mode, for each voltage, by running a stability test, a stable frequency is found at which the GPU provides a performance boost. However, this curve can also be changed manually.

We know GPU Boost technology well from previous Nvidia graphics cards. In their GPUs, they use this hardware feature, which is designed to increase the operating clock speed of the GPU in modes where it has not yet reached the limits of power consumption and heat dissipation. In Pascal GPUs, this algorithm has undergone several changes, the main of which is a finer setting of turbo frequencies, depending on the voltage.

If earlier the difference between the base frequency and the turbo frequency was fixed, then in GPU Boost 3.0 it became possible to set turbo frequency offsets for each voltage separately. Now the turbo frequency can be set for each of the individual voltage values, which allows you to fully squeeze all the overclocking capabilities out of the GPU. We wrote about this feature in detail in the GeForce GTX 1080 review, and you can use the EVGA Precision XOC and MSI Afterburner utilities for this.

Since some details have changed in the overclocking methodology with the release of video cards with support for GPU Boost 3.0, Nvidia had to make additional explanations in the instructions for overclocking new products. There are different overclocking techniques with different variable characteristics that affect the final result. For each particular system, a particular method may be better suited, but the basics are always about the same.

Many overclockers use the Unigine Heaven 4.0 benchmark to check system stability, which loads the GPU well, has flexible settings and can be run in windowed mode along with an overclocking and monitoring utility window nearby, like EVGA Precision or MSI Afterburner. However, such a check is enough only for initial estimates, and to firmly confirm the stability of overclocking, it must be checked in several gaming applications, because different games require different loads on different functional units of the GPU: mathematical, texture, geometric. The Heaven 4.0 benchmark is also convenient for overclocking because it has a looped mode of operation, in which it is convenient to change overclocking settings, and there is a benchmark for evaluating the speed increase.

Nvidia advises running Heaven 4.0 and EVGA Precision XOC windows together when overclocking the new GeForce GTX 1080 and GTX 1070 graphics cards. At first, it is desirable to immediately increase the fan speed. And for serious overclocking, you can immediately set the speed value to 100%, which will make the video card very loud, but will cool the GPU and other components of the video card as much as possible by lowering the temperature to the lowest possible level, preventing throttling (reduction in frequencies due to an increase in GPU temperature above a certain value ).

Next, you need to set the target power value (Power Target) also to the maximum. This setting will provide the GPU with the maximum amount of power possible by increasing the power consumption level and the target temperature of the GPU (GPU Temp Target). For some purposes, the second value can be separated from the Power Target change, and then these settings can be adjusted individually - to achieve less heating of the video chip, for example.

The next step is to increase the GPU Clock Offset value - it means how much higher the turbo frequency will be during operation. This value raises the frequency for all voltages and results in better performance. As usual, when overclocking, you need to check the stability when increasing the frequency of the GPU in small steps - from 10 MHz to 50 MHz per step before you notice a hang, driver or application error, or even visual artifacts. When this limit is reached, you should reduce the frequency value by a step down and once again check the stability and performance during overclocking.

In addition to the GPU frequency, you can also increase the video memory frequency (Memory Clock Offset), which is especially important in the case of the GeForce GTX 1070 equipped with GDDR5 memory, which usually overclocks well. The process in the case of the memory frequency exactly repeats what is done when finding a stable GPU frequency, the only difference is that the steps can be made larger - add 50-100 MHz to the base frequency at once.

In addition to the above steps, you can also increase the Overvoltage limit, because a higher GPU frequency is often achieved at increased voltage, when unstable parts of the GPU receive additional power. True, a potential disadvantage of increasing this value is the possibility of damaging the video chip and its accelerated failure, so you need to use a voltage increase with extreme caution.

Overclocking enthusiasts use slightly different techniques, changing the parameters in a different order. For example, some overclockers share experiments on finding a stable GPU and memory frequency so that they do not interfere with each other, and then test the combined overclocking of both the video chip and memory chips, but these are already insignificant details of an individual approach.

Judging by the opinions in the forums and comments on articles, some users did not like the new GPU Boost 3.0 operation algorithm, when the GPU frequency first rises very high, often higher than the turbo frequency, but then, under the influence of an increase in GPU temperature or increased power consumption above the set limit, it can drop to much lower values. This is just the specifics of the updated algorithm, you need to get used to the new behavior of the dynamically changing GPU frequency, but it does not have any negative consequences.

The GeForce GTX 1070 is the second model after the GTX 1080 in Nvidia's new line of graphics processors based on the Pascal family. The new 16nm FinFET manufacturing process and architecture optimizations have enabled this graphics card to achieve high clock speeds, which is supported by the new generation of GPU Boost technology. Even though the number of functional blocks in the form of stream processors and texture modules has been reduced, their number remains sufficient for the GTX 1070 to become the most profitable and energy efficient solution.

Installing GDDR5 memory on the youngest of a pair of released models of Nvidia video cards on a GP104 chip, unlike the new type of GDDR5X that distinguishes the GTX 1080, does not prevent it from achieving high performance indicators. Firstly, Nvidia decided not to cut the memory bus of the GeForce GTX 1070 model, and secondly, they put the fastest GDDR5 memory on it with an effective frequency of 8 GHz, which is only slightly lower than 10 GHz for the GDDR5X used in the older model. In addition, with improved delta compression algorithms, the effective memory bandwidth of the GPU has become higher than the same parameter for a similar model of the previous generation GeForce GTX 970.

The GeForce GTX 1070 is good in that it offers very high performance and support for new features and algorithms at a much lower price compared to the older model announced a little earlier. If a few enthusiasts can afford the purchase of a GTX 1080 for 55,000, then a much larger circle of potential buyers will be able to pay 35,000 for only a quarter of a less productive solution with exactly the same capabilities. It was the combination of relatively low price and high performance that made the GeForce GTX 1070 perhaps the most profitable purchase at the time of its release.

Graphics accelerator GeForce GTX 1060

ParameterMeaning
Chip code nameGP106
Production technology16nm FinFET
Number of transistors4.4 billion
Core area200 mm²
ArchitectureUnified, with an array of common processors for stream processing of numerous types of data: vertices, pixels, etc.
DirectX hardware supportDirectX 12, with support for Feature Level 12_1
Memory bus192-bit: six independent 32-bit memory controllers supporting GDDR5 memory
GPU frequency1506 (1708) MHz
Computing blocks10 streaming multiprocessors, including 1280 scalar ALUs for floating point calculations within the IEEE 754-2008 standard;
Texturing blocks80 texture addressing and filtering units with support for FP16 and FP32 components in textures and support for trilinear and anisotropic filtering for all texture formats
Raster Operations Units (ROPs)6 wide ROPs (48 pixels) with support for various anti-aliasing modes, including programmable and with FP16 or FP32 frame buffer format. Blocks consist of an array of configurable ALUs and are responsible for depth generation and comparison, multisampling and blending
Monitor supportIntegrated support for up to four monitors connected via Dual Link DVI, HDMI 2.0b and DisplayPort 1.2 (1.3/1.4 Ready)

GeForce GTX 1060 Reference Graphics Specifications
ParameterMeaning
Core frequency1506 (1708) MHz
Number of universal processors1280
Number of texture blocks80
Number of blending blocks48
Effective memory frequency8000 (4×2000) MHz
Memory typeGDDR5
Memory bus192-bit
Memory6 GB
Memory Bandwidth192 GB/s
Computing performance (FP32)about 4 teraflops
Theoretical maximum fill rate72 gigapixels/s
Theoretical texture sampling rate121 gigatexels/s
TirePCI Express 3.0
ConnectorsOne Dual Link DVI, one HDMI and three DisplayPort
Typical Power Consumption120 W
Extra foodOne 6-pin connector
Number of slots occupied in the system chassis2
Recommended price$249 ($299) in the US and 18,990 in Russia

The GeForce GTX 1060 video card also received a name similar to the same solution from the previous GeForce series, differing from the name of its direct predecessor GeForce GTX 960 only by the changed first digit of the generation. The novelty has become in the current line of the company one step lower than the previously released GeForce GTX 1070 solution, which is average in terms of speed in the new series.

The recommended prices for Nvidia's new video card are $249 and $299 for the regular versions of the company's partners and for the special Founder's Edition, respectively. Compared to the two older models, this is a very favorable price, since the new GTX 1060 model, although inferior to top-end motherboards, is nowhere near as much as it is cheaper. At the time of the announcement, the novelty definitely became the best performance solution in its class and one of the most profitable offers in this price range.

This model of Nvidia's Pascal family video card came out to counter the fresh decision of the rival company AMD, which released the Radeon RX 480 a little earlier. You can compare the new Nvidia video card with this video card, although not quite directly, since they still differ quite significantly in price . The GeForce GTX 1060 is more expensive ($249-299 versus $199-229), but it's also clearly faster than its competitor.

The GP106 graphics processor has a 192-bit memory bus, so the amount of memory installed on a video card with such a bus can be 3 or 6 GB. A smaller value in modern conditions is frankly not enough, and many game projects, even in Full HD resolution, will run into a lack of video memory, which will seriously affect the smoothness of rendering. To ensure maximum performance of the new solution at high settings, the GeForce GTX 1060 model was equipped with 6 GB of video memory, which is enough to run any 3D applications with any quality settings. Moreover, today there is simply no difference between 6 and 8 GB, and such a solution will save some money.

The typical power consumption value for the new product is 120 W, which is 20% less than the value for the GTX 1070 and is equal to the power consumption of the previous generation GeForce GTX 960 graphics card, which has much lower performance and capabilities. The reference board has the usual set of connectors for connecting image output devices: one Dual-Link DVI, one HDMI and three DisplayPort. Moreover, there was support for new versions of HDMI and DisplayPort, which we wrote about in the review of the GTX 1080 model.

The length of the GeForce GTX 1060 reference board is 9.8 inches (25 cm), and from the differences from the older options, we separately note that the GeForce GTX 1060 does not support the SLI multi-chip rendering configuration, and does not have a special connector for this. Since the board consumes less power than older models, one 6-pin PCI-E external power connector was installed on the board for additional power.

GeForce GTX 1060 video cards have appeared on the market since the day of the announcement in the form of products from the company's partners: Asus, EVGA, Gainward, Gigabyte, Innovision 3D, MSI, Palit, Zotac. A special edition of the GeForce GTX 1060 Founder's Edition, produced by Nvidia itself, will be released in limited quantities, which will be sold at a price of $299 exclusively on the Nvidia website and will not be officially presented in Russia. The Founder's Edition is distinguished by the fact that it is made of high quality materials and components, including an aluminum case, and uses an efficient cooling system, as well as low resistance power circuits and specially designed voltage regulators.

Architectural changes

The GeForce GTX 1060 video card is based on a completely new graphics processor model GP106, which is functionally no different from the first-born of the Pascal architecture in the form of the GP104 chip, on which the GeForce GTX 1080 and GTX 1070 models described above are based. This architecture was based on solutions worked out back in Maxwell, but it also has some functional differences, which we wrote about in detail earlier.

The GP106 video chip is similar in its design to the top-end Pascal chip and similar solutions of the Maxwell architecture, and you can find detailed information about the design of modern GPUs in our reviews of previous Nvidia solutions. Like previous GPUs, the chips of the new architecture have a different configuration of Graphics Processing Cluster (GPC), Streaming Multiprocessor (SM) and memory controllers:

The GP106 graphics processor incorporates two GPC clusters, consisting of 10 streaming multiprocessors (Streaming Multiprocessor - SM), that is, exactly half of the GP104. As in the older GPU, each of the multiprocessors contains 128 cores, 8 TMU texture units, 256 KB of register memory, 96 KB of shared memory, and 48 KB of L1 cache. As a result, the GeForce GTX 1060 contains a total of 1,280 compute cores and 80 texture units, half that of the GTX 1080.

But the memory subsystem of the GeForce GTX 1060 was not halved relative to the top solution, it contains six 32-bit memory controllers, giving the final 192-bit memory bus. With an effective frequency of GDDR5 video memory for the GeForce GTX 1060 equal to 8 GHz, the bandwidth reaches 192 GB / s, which is quite good for a solution in this price segment, especially considering the high efficiency of its use in Pascal. Each of the memory controllers has eight ROPs and 256 KB of L2 cache associated with it, so in total the full version of the GP106 GPU contains 48 ROPs and 1536 KB of L2 cache.

To reduce memory bandwidth requirements and make more efficient use of the available Pascal architecture, lossless on-chip data compression has been further improved, which is able to compress data in buffers, gaining efficiency and performance gains. In particular, new delta compression methods with 4:1 and 8:1 ratios have been added to the chips of the new family, providing an additional 20% to the efficiency of the memory bandwidth compared to previous solutions of the Maxwell family.

The base frequency of the new GPU is 1506 MHz - the frequency should not fall below this mark in principle. The typical turbo clock (Boost Clock) is much higher at 1708 MHz, which is the average of the actual frequency that the GeForce GTX 1060 graphics chip runs at in a wide range of games and 3D applications. The actual Boost frequency depends on the game and the conditions in which the test takes place.

Like the rest of the Pascal family, the GeForce GTX 1060 not only operates at a high clock speed, providing high performance, but also has a decent margin for overclocking. The first experiments indicate the possibility of reaching frequencies of the order of 2 GHz. It is not surprising that the company's partners are also preparing factory overclocked versions of the GTX 1060 video card.

So, the main change in the new architecture was the 16 nm FinFET process, the use of which in the production of GP106 made it possible to significantly increase the complexity of the chip while maintaining a relatively low area of ​​​​200 mm², so this Pascal architecture chip has a significantly larger number of execution units compared to a Maxwell chip of similar positioning produced using the 28 nm process technology.

If the GM206 (GTX 960) with an area of ​​​​227 mm² had 3 billion transistors and 1024 ALUs, 64 TMUs, 32 ROPs and a 128-bit bus, then the new GPU contained 4.4 billion transistors, 1280 ALUs, in 200 mm², 80 TMUs and 48 ROPs with a 192-bit bus. Moreover, at almost one and a half times higher frequency: 1506 (1708) versus 1126 (1178) MHz. And this is with the same power consumption of 120 watts! As a result, the GP106 GPU has become one of the most energy efficient GPUs, along with the GP104.

New Nvidia Technologies

One of the most interesting technologies of the company, which is supported by the GeForce GTX 1060 and other solutions of the Pascal family, is the technology Nvidia Simultaneous Multi-Projection. We already wrote about this technology in the GeForce GTX 1080 review, it allows you to use several new techniques to optimize rendering. In particular - to simultaneously project a VR image for two eyes at once, significantly increasing the efficiency of using the GPU in virtual reality.

To support SMP, all GPUs of the Pascal family have a special engine, which is located in the PolyMorph Engine at the end of the geometric pipeline before the rasterizer. With it, the GPU can simultaneously project a geometric primitive onto several projections from one point, while these projections can be stereo (ie, up to 16 or 32 projections are supported simultaneously). This feature allows Pascal GPUs to accurately reproduce a curved surface for VR rendering, as well as display correctly on multi-monitor systems.

It is important that Simultaneous Multi-Projection technology is already being integrated into popular game engines (Unreal Engine and Unity) and games, and to date, support for the technology has been announced for more than 30 games in development, including such well-known projects as Unreal Tournament , Poolnation VR, Everest VR, Obduction, Adr1ft and Raw Data. Interestingly, although Unreal Tournament is not a VR game, it does use SMP to achieve better visuals and performance.

Another long-awaited technology is a powerful tool for creating screenshots in games. Nvidia Ansel. This tool allows you to create unusual and very high-quality screenshots from games, with previously inaccessible features, saving them in very high resolution and supplementing them with various effects, and share your creations. Ansel allows you to literally build a screenshot the way the artist wants it, allowing you to install a camera with any parameters anywhere in the scene, apply powerful post-filters to the image, or even take a 360-degree shot for viewing in a virtual reality helmet.

Nvidia has standardized the integration of the Ansel UI into games, and doing so is as easy as adding a few lines of code. It is no longer necessary to wait for this feature to appear in games, you can evaluate Ansel's abilities right now in Mirror's Edge: Catalyst, and a little later it will become available in Witcher 3: Wild Hunt. In addition, many Ansel-enabled game projects are in development, including games such as Fortnite, Paragon and Unreal Tournament, Obduction, The Witness, Lawbreakers, Tom Clancy's The Division, No Man's Sky, and more.

The new GeForce GTX 1060 GPU also supports the toolkit Nvidia VRWorks, which helps developers to create impressive projects for virtual reality. This package includes many utilities and tools for developers, including VRWorks Audio, which allows you to perform very accurate calculation of the reflections of sound waves from scene objects using GPU ray tracing. The package also includes integration into VR and PhysX physics effects to ensure physically correct behavior of objects in the scene.

One of the most exciting VR games to benefit from VRWorks is VR Funhouse, Nvidia's own VR game, available for free on Valve's Steam service. This game is powered by Unreal Engine 4 (Epic Games) and runs on GeForce GTX 1080, 1070 and 1060 graphics cards in conjunction with HTC Vive VR headsets. Moreover, the source code of this game will be publicly available, which will allow other developers to use ready-made ideas and code already in their VR attractions. Take our word for it, this is one of the most impressive demonstrations of the possibilities of virtual reality.

Including thanks to SMP and VRWorks technologies, the use of the GeForce GTX 1060 GPU in VR applications provides performance that is quite sufficient for the entry-level virtual reality, and the GPU in question meets the minimum required hardware level, including for SteamVR, becoming one of the most successful acquisitions for use in systems with official VR support.

Since the GeForce GTX 1060 model is based on the GP106 chip, which is in no way inferior to the GP104 graphics processor, which became the basis for older modifications, it supports absolutely all the technologies described above.

The GeForce GTX 1060 is the third model in Nvidia's new line of graphics processors based on the Pascal family. The new 16nm FinFET process technology and architecture optimizations have allowed all new graphics cards to achieve high clock speeds and place more functional blocks in the GPU in the form of stream processors, texture modules and others, compared to previous generation video chips. That is why the GTX 1060 has become the most profitable and energy efficient solution in its class and in general.

It is especially important that the GeForce GTX 1060 offers sufficiently high performance and support for new features and algorithms at a much lower price compared to older solutions based on the GP104. The GP106 graphics chip used in the new model delivers best-in-class performance and power efficiency. The GeForce GTX 1060 is specially designed and perfectly suited for all modern games at high and maximum graphics settings at a resolution of 1920x1080 and even with full-screen anti-aliasing enabled by various methods (FXAA, MFAA or MSAA).

And for those who want even more performance with ultra-high-resolution displays, Nvidia has top-of-the-line GeForce GTX 1070 and GTX 1080 graphics cards that are also quite good in terms of performance and power efficiency. And yet, the combination of low price and sufficient performance quite favorably distinguishes the GeForce GTX 1060 from the background of older solutions. Compared to the competing Radeon RX 480, Nvidia's solution is slightly faster with less complexity and GPU footprint, and has significantly better power efficiency. True, it is sold a little more expensive, so each video card has its own niche.

We're moving on to another feature of the GeForce GTX 1080 that made it the first of its kind - support for GDDR5X memory. In this capacity, the GTX 1080 will be the only product on the market for some time, since it is already known that the GeForce GTX 1070 will be equipped with standard GDDR5 chips. In combination with new color compression algorithms (more on that later), high memory bandwidth will allow GP104 to more effectively manage the available computing resources than products based on GM104 and GM200 chips could afford.

JEDEC released the final specifications of the new standard only in January of this year, and the only manufacturer of GDDR5X at the moment is Micron. 3DNews did not have a separate article on this technology, so we will briefly describe the innovations that GDDR5X brings in this review.

The GDDR5X protocol has much in common with GDDR5 (although both chips differ electrically and physically) - unlike HBM memory, which is a fundamentally different type, which makes coexistence with the GDDR5 (X) interface in one GPU practically impossible. For this reason, GDDR5X is called that, and not, for example, GDDR6.

One of the key differences between GDDR5X and GDDR5 is the ability to transmit four bits of data per signal cycle (QDR - Quad Data Rate) as opposed to two bits (DDR - Double Data Rate), as was the case in all previous modifications of DDR SDRAM memory. The physical frequencies of the memory cores and the data transfer interface are located approximately in the same range as that of the GDDR5 chips.

And in order to saturate the increased bandwidth of the chips with data, GDDR5X uses a data prefetch increased from 8n to 16n. With a 32-bit interface of a separate chip, this means that the controller selects not 32, but 64 bytes of data in one memory access cycle. As a result, the resulting interface bandwidth reaches 10-14 Gb / s per pin at a CK (command clock) frequency of 1250-1750 MHz - this is the frequency that utilities for monitoring and overclocking video cards, such as GPU-Z, show. At least for now, such figures are included in the standard, but in the future Micron plans to reach numbers up to 16 Gb / s.

The next advantage of GDDR5X is the increased chip volume - from 8 to 16 Gb. The GeForce GTX 1080 comes with eight 8Gb chips, but in the future, graphics card manufacturers will be able to double the amount of RAM as more capacious chips become available. Like GDDR5, GDDR5X allows the use of two chips on one 32-bit controller in the so-called clamshell mode, which makes it possible to address 32 GB of memory on a 256-bit GP104 bus. In addition, the GDDR5X standard, in addition to equal powers of two, describes chip volumes of 6 and 12 Gb, which will allow you to vary the total amount of on-board memory of video cards more “fractionally” - for example, equip a card with a 384-bit RAM bus with chips for a total of 9 GB.

Contrary to the expectations that accompanied the first information about GDDR5X, which appeared in the public domain, the power consumption of the new type of memory is comparable to that of GDDR5 or only slightly higher than the latter. To compensate for the increased power at high bandwidths, the creators of the standard reduced the supply voltage of the cores from 1.5 V, standard for GDDR5, to 1.35 V. In addition, the standard introduces chip frequency control as a mandatory measure depending on the temperature sensor . It is still unknown how much new memory really depends on the quality of heat dissipation, but it is possible that now we will more often see cooling systems on video cards that serve not only GPUs, but also RAM chips, while manufacturers of GDDR5-based cards for the most part ignore this possibility.

One might get the impression that the transition from GDDR5 to GDDR5X was an easy task for NVIDIA due to the relatedness of these technologies. In addition, the GeForce GTX 1080 is equipped with the lowest memory bandwidth defined by the standard - 10 Gb / s per pin. However, the practical implementation of the new interface is associated with a number of engineering difficulties. Transferring data at such high frequencies required careful design of the data bus topology on the board in order to minimize interference and attenuation of the signal in the conductors.

The resulting 256-bit bus bandwidth in the GeForce GTX 1080 is 320 GB/s, which is not significantly less than the speed of 336 GB/s, which is characterized by the GeForce GTX 980 Ti (TITAN X) with its 384-bit GDDR5 bus at 7 Gb/s per pin .

Now the PolyMorph Engine can create up to 16 projections (viewports) at the same time, placed arbitrarily, and focused on one or two points, shifted along the horizontal axis relative to each other. These conversions are done entirely in the hardware, and do not cause any performance degradation per se.

This technology has two quite predictable applications. The first is VR helmets. Due to the two projection centers, Pascal can create a stereo image in one pass (however, this is only about geometry - the GPU still has to do twice as much work to rasterize the textures in two frames).

In addition, SMP allows at the geometry level to compensate for the distortion of the picture, which is introduced by the lenses of the helmet. For this, the image for each eye is formed by four separate projections, which are then glued into a plane using a post-processing filter. Thus, not only the geometric accuracy of the final image is achieved, but also the need for processing 1/3 of the pixels, which otherwise would still be lost during the final correction of the standard flat projection for the curvature of the lenses, is eliminated.

The only optimization for VR that Maxwell had was that the peripheral areas of the image, which are compressed most heavily for output through the lenses, could be rendered at a lower resolution, which resulted in a bandwidth savings of only 10-15%.

The next area where the SMP feature is in demand is in multi-monitor configurations. Without SMP, the image on multiple docked displays is a plane from the point of view of the GPU, and looks geometrically correct provided that the screens in front of the viewer are lined up, but docking at an angle no longer looks correct - as if you simply bent a large photo in several places . Not to mention that in any case, the viewer sees exactly a flat image, and not a window into the virtual world: if you turn your head to the side screen, the objects in it will remain stretched, since the virtual camera is still looking at the central point.

With the help of SMP, the video card driver can obtain information about the physical location of several screens in order to project an image for each of them through its own viewport, which ultimately functionally brings the multi-monitor assembly closer to a full-fledged "window".

In short, the purpose of triple buffering is to separate the process of rendering new frames in the GPU pipeline from scanning the image from the frame buffer by allowing the graphics card to create new frames at an arbitrarily high rate, writing them to two rotating frame buffers. In this case, the contents of the latest frame with a frequency that is a multiple of the screen refresh rate is copied to the third buffer, from where the monitor can pick it up without picture breaks. Thus, the frame that hits the screen at the time the scan starts always contains the latest information that the GPU produced.

Triple buffering is most useful on monitors with a refresh rate of 50-60Hz. At frequencies of 120-144 Hz, as we already wrote in the article on G-Sync, turning on vertical synchronization already, in principle, increases the latency insignificantly, but Fast Sync will remove it to a minimum.

If you're wondering how Fast Sync compares to G-Sync (and AMD's counterpart Free Sync - but that's a purely theoretical question as NVIDIA only supports its own variant), then G-Sync reduces latency when the GPU does not have time to produce a new frame by the time the scan starts, and Fast Sync, on the contrary, reduces latency when the frame refresh rate in the rendering pipeline is higher than the screen refresh rate. In addition, these technologies can work together.

GeForce GTX 1080 Founder's Edition:design

This pompous name is now the reference version of the GeForce GTX 1080. Starting with the GeForce GTX 690, NVIDIA has paid a lot of attention to the form in which their new products enter the market. Reference samples of modern video cards under the GeForce brand are far from their nondescript predecessors, equipped with relatively inefficient and noisy cooling systems.

The GeForce GTX 1080 Founder's Edition incorporates the best design features of Kepler and Maxwell graphics cards: an aluminum turbine shroud, a cooler impeller made from a low-noise material, and a massive aluminum frame that adds rigidity to the structure and removes heat from the RAM chips.


As part of the GTX 1080, there are two components at the same time that periodically appear and disappear from NVIDIA reference video cards - a GPU heatsink with a vapor chamber and a back plate. The latter is partially dismantled without a screwdriver in order to provide airflow to the cooler of the adjacent video card in SLI mode.

In addition to its representative function, a reference sample of a video card is needed so that end-card manufacturers can purchase it - in this case from NVIDIA - and satisfy demand until devices of the original design on the same GPU are ready. But this time, NVIDIA plans to keep the reference version on sale throughout the life of the model and distribute, among other things, through its official website. This motivates the $100 higher price of the GTX 1080 FE compared to the recommended $599 for everyone else. After all, the Founder's Edition doesn't look or feel like a cheap product.

At the same time, the video card has reference frequencies, below which, as usual, no manufacturer of cards of the original design will fall. There is also no question of any selection of GPUs for the GTX 1080 FE in terms of overclocking potential. Therefore, in the whole mass of GeForce GTX 1080 implementations, there may be more expensive ones. But for a while, the Founder's Edition will be the predominant and even the only version of the flagship Pascal, which automatically raises its retail prices by $100 above NVIDIA's "recommendation".

The GeForce GTX 1080 Ti features 11GB of GDDR5X memory, a 1583MHz GPU (overclockable to 2000MHz with stock cooling), 11GHz QDR memory, and 35% better performance than the GeForce GTX 1080. And that's at a reduced price of $699.

The new graphics card displaces the GeForce GTX 1080 from the position of the flagship in the GeForce line and becomes the fastest graphics card that exists today, as well as the most powerful card on the Pascal architecture.

The Most Powerful NVIDIA GeForce GTX 1080 Ti Gaming Card

NVIDIA GeForce GTX 1080 Ti is gamer's dream, who can finally enjoy the latest AAA-class games, play in high-definition virtual reality helmets, enjoy the clarity and accuracy of graphics.

The GTX 1080 Ti was designed to be the first full graphics card for 4K gaming. It is equipped with the newest and most technologically advanced hardware that no other video card can boast of today.

Here official presentation NVIDIA GeForce GTX 1080 Ti

“It's time for something new. The one that is 35% faster than the GTX 1080. The one that is faster than the Titan X. Let's call it the ultimate…

Year by year, video games have become more and more beautiful, so we are introducing a next-generation top product so that you can enjoy next-generation games.”

Jen-Xun

Specifications NVIDIA GeForce GTX 1080 Ti

NVIDIA has not stinted on the stuffing for its new and super-powerful video card.

It is equipped with the same GPU Pascal GP102 GPU, like Titan X (P), but superior to the latter in all respects.

The processor is equipped with 12 billion transistors and has six clusters for graphics processing, two of which are blocked. This gives a total 28 multithread processors 128 cores each.

Thus, the GeForce GTX 1080 Ti video card has 3584 CUDA cores, 224 texture mapping units and 88 ROPs (units responsible for z-buffering, anti-aliasing, writing the final image to the video memory frame buffer).

The overclocking range starts from 1582 MHz to 2 GHz. The Pascal architecture was created mainly for overclocking in the reference and more extreme overclocking in non-standard models.

The GeForce GTX 1080 Ti also has 11 GB GDDR5X memory, working through a 352-bit bus. The flagship also features the fastest G5X solution to date.

With the new compression system and tile caching, the bandwidth of the GTX 1080 Ti graphics card can be increased up to 1200Gb/s, which is superior to AMD's HBM2 technology.

Specification NVIDIA GeForce GTX 1080 Ti:

Characteristics GTX TItan X Pascal GTX 1080 Ti GTX 1080
Process technology 16 nm 16nm 16 nm
transistors 12 billion 12 billion 7.2 billion
Crystal area 471mm² 471mm² 314mm²
Memory 12GB GDDR5X 11GB GDDR5X 8GB GDDR5X
Memory speed 10 Gb/s 11 Gb/s 11 Gb/s
Memory interface 384-bit 352-bit 256-bit
Bandwidth 480GB/s 484GB/s 320GB/s
CUDA cores 3584 3584 2560
base frequency 1417 1607
Acceleration frequency 1530MHz 1583 MHz 1730 MHz
Computing power 11 teraflops 11.5 teraflops 9 teraflops
Thermal power 250W 250W 180W
Price 1200$ US$699 499$

Cooling NVIDIA GeForce GTX 1080 Ti

The GeForce GTX 1080 Ti Founders features a new airflow solution that allows for better cooling of the board and is also quieter than previous designs. All this makes it possible to overclock the video card more and achieve even greater speed. In addition, the cooling efficiency is improved by 7-phase power supply on 14 high efficiency dualFET transistors.

The GeForce GTX 1080 Ti comes with the latest NVTTM design, which introduces a new Vapor Cooling Chamber that has twice the cooling area of ​​the Titan X (P). This new thermal design helps achieve optimal cooling and accelerates your graphics card's GPU above specification with GPU Boost 3.0 technology.

NVIDIA GeForce GTX 1080 Ti is an overclocker's dream

So, what do we do with this impressive video card power? The answer is obvious - overclock to the limit. During the event, NVIDIA demonstrated the outstanding overclocking potential of their GTX 1080 Ti graphics card. Recall that they managed to reach a processor frequency of 2.03 GHz at a blocked 60 FPS.



What else to read