Quinton McLeod
Banned
Oh god, show me real evidence not someone saying something.
Junior
http://en.wikipedia.org/wiki/SDRAM_latency
The wiki article seems to disagree with you.
This guy explains it the best:
http://www.techspot.com/community/t...-between-ddr3-memory-and-gddr5-memory.186408/
The principle differences are:
DDR3 runs at a higher voltage that GDDR5 (typically 1.25-1.65V versus ~1V)
DDR3 uses a 64-bit memory controller per channel ( so, 128-bit bus for dual channel, 256-bit for quad channel), whereas GDDR5 is paired with controllers of a nominal 32-bit (16 bit each for input and output), but whereas the CPU's memory contoller is 64-bit per channel, a GPU can utilise any number of 32-bit I/O's (at the cost of die size) depending upon application ( 2 for 64-bit bus, 4 for 128-bit, 6 for 192-bit, 8 for 256-bit, 12 for 384-bit etc...). The GDDR5 setup also allows for doubling or asymetric memory configurations. Normally (using this generation of cards as example) GDDR5 memory uses 2Gbit memory chips for each 32-bit I/O (I.e for a 256-bit bus/2GB card: 8 x 32-bit I/O each connected by a circuit to a 2Gbit IC = 8 x 2Gbit = 16Gbit = 2GB), but GDDR5 can also operate in what is known as clamshell mode, where the 32-bit I/O instead of being connected to one IC is split between two (one on each side of the PCB) allowing for a doubling up of memory capacity. Mixing the arrangement of 32-bit memory controllers, memory IC density, and memory circuit splitting allows of asymetric configurations ( 192-bit, 2GB VRAM for example)
Physically, a GDDR5 controller/IC doubles the I/O of DDR3 - With DDR, I/O handles an input (written to memory), or output (read from memory) but not both on the same cycle. GDDR handles input and output on the same cycle.
The memory is also fundamentally set up specifically for the application it uses:
System memory (DDR3) benefits from low latency (tight timings) at the expense of bandwidth, GDDR5's case is the opposite. Timings for GDDR5 would seems unbelieveably slow in relation to DDR3, but the speed of VRAM is blazing fast in comparison with desktop RAM- this has resulted from the relative workloads that a CPU and GPU undertake. Latency isn't much of an issue with GPU's since their parallel nature allows them to move to other calculation when latency cycles cause a stall in the current workload/thread. The performance of a graphics card for instance is greatly affected (as a percentage) by altering the internal bandwidth, yet altering the external bandwidth (the PCI-Express bus, say lowering from x16 to x8 or x4 lanes) has a minimal effect. This is because there is a great deal of I/O (textures for examples) that get swapped in and out of VRAM continuously- the nature of a GPU is many parallel computations, whereas a CPU computes in a basically linear way.
It goes back to the nature of graphics rendering and how the polygons are drawn. Sorry if I'm teaching my grandmother to suck eggs, but it might be a little easier if I outline the graphics pipeline. I'll use a red coloured font to show the video memory transactions and green for system RAM (probably be better as a flow chart but nvm)
On the software side you have your game (or app) ↔ API (DirectX/OpenGL) ↔ User Mode Driver / ICD ↔ Kernel Mode Driver (KMD) + CPU command buffer→ loading textures to vRAM → GPU Front End (Input assembler) .
Up until this point you're basically dealing with CPU and RAM- executing and monitoring game code, creating resources, shader compile, draw calls and allocating access to the graphics (since you likely have more than just the game needing resources). From here, the workload becomes hugely more parallel and moves to the graphics card. The video memory now holds the textures and the shader compilations that the game+API+drivers have loaded, These are added to the first few stages of the pipeline as and where needed to each the following shaders as the code is transformed from points (co-ordinates) and lines into polygons and their lighting:
Input Assembler (vRAM input) → Vertex Shader (vRAM input) → Hull Shader (vRAM input) → Tessellation Control Shader (vRAM input) (if Tessellation is used) → Domain Shader (vRAM input) → Geometry Shader (vRAM input)
At this point, the stream output can move all or part of the render back into the memory to be re-worked. Depending on what is called for, the output can be called to any part of the previous shader pipeline (basically a loop) or held in memory buffers. Once the computations are completed they then move to Rasterization (turning the 3D image into pixels):
Rasterizer → Pixel Shader* (vRAM input and output) → Output Manager (tasked with producing the final screen image, and requires vRAM input and output)
* The Compute Shaders (if they exist on the card) are tasked with post processing (ambient occlusion, film grain, global illumination, motion blur, depth of field etc), A.I. routines, physics, and a lot of custom algorithms depending on the app., also run via the pixel shader, and can use that shaders access to vRAM input and output.
So basically, the parallel nature of graphics calls for input and output from vRAM at many points covering many concurrent streams of data. Some of that vRAM is also subdivided into memory buffers and caches to save data that would otherwise have to re-compiled for following frames. All this swapping out of data calls for high bandwidth, but latency can be lax (saving power demand) as any stall in one thread is generally lost in the sheer number of threads queued at any given time.
As I noted previously, GDDR5 allows a write and read to/from memory every clock cycle, whereas DDR3 is limited to a read or a write, which reduces bandwidth. Graphics DDR also allows for multiple memory controllers to cope with the I/O functions.