Radeon HD 7000 Revealed: AMD to Mix GCN with VLIW4 & VLIW5 Architectures
Starting next week, AMD is going to organize Tech Days in several destinations around the globe, such as London or Paris - during which the company is going to present 28nm Radeon HD 7000 series.
There is a lot of rumors flying around the web, some of which are spun by AMD themselves to raise confusion, as the Radeon HD 7000 series is going to mix the existing VLIW4 and VLIW5 architectures with the "Graphics Core Next" (GCN), introduced during June's Fusion Development Summit held in Bellevue, WA.
Radeon HD 7000 Series with the old VLIW4 and VLIW5 Architectures
A couple of years ago, AMD and favorable media were all over NVIDIA for mixing different GPU architectures within the same product line. Then with the Radeon HD 6000 series, all of a sudden nobody questioned why AMD mixed two distinctive GPU architectures within a single series (new VLIW4 architecture only powered three high end parts). With Radeon HD 7000 Series, the situation is set to become even more complicated, with AMD mixing no less than three distinctive GPU architectures within the single generation of products.
Given the recent cancellation of 28nm Krishna and Wichita APUs, AMD will rebrand the Brazos 2.0 APU platform as Radeon HD 7200 and 7300 series, and for instance rebranded AMD E-Series APU will be powered by Radeon HD 7200 or 7300 series (all based on Evergreen GPU - VLIW5).
The higher end Trinity APU, the heir to the successful Llano A-Series APU will be powered by a Devastator GPU core, based on contemporary "Northern Islands" VLIW4 architecture, featuring product names such as Radeon HD 7450(D), 7550(D) and so on and so forth.
When it comes to discrete parts, parts with the codename Cape Verde (HD 7500, 7600, and 7700) and Pitcairn (HD 7800), they are all based on the VLIW4 architecture. The "Graphics Core Next" architecture is reserved just for the 7900 Series. Desktop parts are codenamed on Southern Islands, while mobile parts are codenamed after parts of London (read: Cape Verde becomes Lombok, Pitcairn becomes Thames etc.).
If you compare the VLIW4-based HD 6900 and the upcoming HD 7800 series, there isn't much difference between the two. According to our sources, HD 7800 "Pitcairn" is a 28nm die shrink of the popular HD 6900 "Cayman" GPU with minor performance adjustments. This will bring quite a compute power into the price sensitive $199-$249 bracket and we expect a lot of headaches for NVIDIA in that respect.
Welcome Graphics Core Next: Powering the Tahiti and New Zealand (HD 7900)
AMD spoke about Graphics Core Next (GCN) quite openly, a move we can only commend them for. During his keynote session in June, Eric Demers of AMD explained the reasoning behind the move to GCN: compute is graphics, graphics is compute. There is no doubt that the future of GPUs are enhanced compute capabilities and we already hear from game developers who are using computational power of the GPU to create details inside the games instead of gigabytes and gigabytes of textures.
The new GCN architecture brings numerous innovations to GPU architecture, out of which we see x86 virtual memory as perhaps one of the most important ones. While the GPU manufacturers have promised functional virtual memory for ages, this is the first time we're seeing a working implementation. This is not a marketing gimmick, IOMMU is a fully functional GPU feature, supporting page faults, over allocating and even accepting 64-bit x86 memory pointers for 100% compatibility with 64-bit CPUs. Virtual memory is going to be the large part of next-gen Fusion APUs (2013) and FireStream GPGPU cards (2012), and we can only commend the effort made in making this possible.
All of this required to expand the GPU controller by two additional lines for a grand total of 384-bits, identical to GeForce GTX 580, for example. However, AMD timings are much more aggressive than the conservative NVIDIA, so expect the memory clock to remain higher with AMD GPUs.
A rumor recently exploded that HD 7900 Series will come with Rambus XDR2 memory. Given the fact that AMD has a memory development team and the company being the driving force behind creation of GDDR3, GDDR4 and GDDR5 memory standards - we were unsure of the rumors.
Bear in mind that going Rambus is not an easy decision, as a lot of engineers inside AMD flat out refuse to even consider the idea of using Rambus products due to company's litigious behavior. However, our sources are telling us that AMD is frustrated that the DRAM industry didn't made good on the very large investment on AMD's part, creating two GDDR5 memory standards: Single Ended (S.E. GDDR5) and Differential GDDR5. Thus, the company applied pressure to the memory industry in bridging GDDR5 and the future memory standard with XDR2 memory. The production Tahiti part will utilize GDDR5 memory, though.
Is AMD going to continue investing in future memory standards? We would say yes, but with all the changes that have happened, it just might take the executive route to utilize available market technologies rather than spending time and money on future iterations of GDDR memory. After all, AMD recently reshuffled their memory design task force. In any case, Differential GDDR5 comes at very interesting bandwidth figures and those figures are something AMD wants to utilize "as soon as possible".
AMD is pushing forward with their Fusion System Architecture (FSA) and the goals of that architecture will take some time to implement - we won't see a full implementation before 2014. However, Southern Islands brings several key features which AMD lacked when compared to NVIDIA Fermi and the upcoming Kepler architectures.
The GPU itself replaced SIMD array with MIMD-capable Compute Units (CU), which bring support for C++ in the same way NVIDIA did with Fermi, but AMD went beyond Fermi's capabilities with aforementioned IOMMU. There is also a link between power management for the CPU and GPU, which should reduce power consumption (currently, single action that GPU makes will wake up the CPU, even if it's something as simple as screen refresh).
As you can see on the image above, a single CU block is consisted out of a single Scalar and 64 Vector units which are fed through multiple layers of cache. Overall, the Compute Unit comes with 16KB of L1 Data cache and 64KB LDS memory (i.e. scratch memory), with an additional 48KB shared between four CU blocks. Each CU connects to 64KB of dedicated L2 cache.
With Tahiti packing 32 Compute Units in a maximum configuration, a 32 CU GPU with 2048 processing cores features almost 5MB of on-die memory: 512KB L1 Data cache, 384KB Shared L1 cache and 2MB of LDS and 2MB of L2 Cache. This is a record amount of cache for the GPUs so far, and you can expect this trend to continue.
AMD adopted a smart compute approach. Graphic Core Next is a true MIMD (Multiple-Instruction, Multiple Data) architecture. With the new design, the company opted for "fat and rich" processing cores that occupy more die space, but can handle more data. AMD is citing loading the CU with multiple command streams, instead of conventional GPU load: "fire a billion instructions off, wait until they all complete". Single Compute Unit can handle 64 FMAD (Fused Multiply Add) or 40 SMT (Simultaneous Multi-Thread) waves. Wonder how much MIMD instructions can GCN take? Four threads. Four thread MIMD or 64 SIMD instructions, your call. As Eric explained, Southern Islands is a "MIMD architecture with a SIMD array".
These compute units are paired with conventional fixed function hardware. AMD tried the non-fixed function hardware route with the R600 in 2007 (Radeon HD 2000 series) and after that experiment, the company saw no value in avoiding fixed function hardware. Thus, Southern Islands will continue to have up to 64 fixed Raster Ops (ROP), Z units, up to 128 Texture Memory Units, FSAA logic etc.
Tahiti becomes HD 7950 and 7970, New Zealand becomes HD 7990
Now that we're properly introduced with the GPU core, the time has come to pay more attention to the lineup itself. Given that the memory bus was extended to 384-bits, i.e. the same as GeForce GTX 580, 3GB GDDR5 are being used across the board, and we would not exclude a 1.5GB or even 896MB "7930" part coming as the number of partially functional GPUs increases.
AMD kept the unified clock concept and given that Radeon HD 7970 is based on fully configured "Tahiti XT" GPU, 2048 cores (32 Compute Units) operate at 1GHz clock. 3GB of GDDR5 memory operates in Quad Data Rate mode i.e. 1.37GHz ODR ("effective 5.5GHz"). This results with record video memory bandwidth for a single GPU - 264GB/s.
The HD7950 is based on "Tahiti Pro" and packs 30 Compute Units for 1920 cores operating at 900MHz. The number of ROPs decreased to 60, while Texture units naturally reduced to 120 (as every CU connects to 2 ROPs and 4 TMUs). Our sources did not disclose if the memory controller is still 384-bit or a 256-bit one, but the memory clock was decreased to 1.25GHz, i.e. the same clock as previous gen models. Should 384-bit controller stay, the clock should be good for 240GB/s of bandwidth.
Both products are expected to be released on CES 2012 in Las Vegas, NV, occupying the $349-449 price bracket. Those additional gigabytes of memory (and processing cores) will certainly cost a lot of $$$.
As far as the dual-GPU "New Zealand", 6GB GDDR5 is expected to be clocked on the same level as the HD6990/7970, meaning you will be getting full performance out of the dual-GPU part.
Unlike HD7950 and HD7970, Radeon HD 7990 will debut in March 2012 and the target price is the same as the original price of its predecessor - $699.