tusharngf
Member
NVIDIA GeForce RTX 4070 will be the next-generation high-end gaming graphics card, offering the latest graphics architecture based on Ada Lovelace GPUs. The graphics card will be replacing the RTX 3070, a very popular gaming graphics card in the $500-$600 US segment.
RTX 4070 series graphics cards will be designed around the $500 US segment which is a high-end price range that still offers lots of performance at hand. It's simple, the RTX 4090 series will be aimed at users who want the best of the best without worrying about the amount of money they are spending while the RTX 4080 series is aimed at users who want the best gaming performance at the best possible price. The RTX 4070 will be the sweet spot for high-end gaming, offering a buttery smooth 2K game experience.
The previous GeForce RTX 3070 was touted to offer a huge improvement over the RTX 2070 and was said to offer performance faster than the RTX 2080 Ti but ended up mostly on par with the Turing flagship with only the RTX 3070 Ti exceeding the performance of the previous Turing GPU flagship. It looks like the RTX 4070 will be placed in a similar position where it might offer graphics performance on par or close to the RTX 3080 Ti but a 'Ti' variant going further ahead in graphics performance.
The NVIDIA Ada Lovelace AD104 GPU is expected to feature up to 5 GPC (Graphics Processing Clusters). This is the one less GPC than the GA104 GPU. Each GPU will consist of 6 TPCs and 2 SMs which is the same configuration as the existing chip. Each SM (Streaming Multiprocessor) will house four sub-cores which is also the same as the GA102 GPU. What's changed is the FP32 & the INT32 core configuration. Each sub-core will include 128 FP32 units but combined FP32+INT32 units will go up to 192. This is because the FP32 units don't share the same sub-core as the IN32 units. The 128 FP32 cores are separate from the 64 INT32 cores.
So in total, each sub-core will consist of 32 FP32 plus 16 INT32 units for a total of 48 units. Each SM will have a total of 128 FP32 units plus 64 INT32 units for a total of 192 units. And since there are a total of 60 SM units (12 per GPC), we are looking at 7,680 FP32 Units and 3,840 INT32 units for a total of 11,520 cores. Each SM will also include two Wrap Schedules (32 thread/CLK) for 64 wraps per SM. This is a 50% increase on the cores (FP32+INT32) and a 33% increase in Wraps/Threads vs the GA102 GPU.
NVIDIA AD103 'Ada Lovelace' Gaming GPU 'SM' Block Diagram (Image Credits: Kopite7kimi):
Just for comparison's sake:
Full article:https://wccftech.com/roundup/nvidia-geforce-rtx-4070/
RTX 4070 series graphics cards will be designed around the $500 US segment which is a high-end price range that still offers lots of performance at hand. It's simple, the RTX 4090 series will be aimed at users who want the best of the best without worrying about the amount of money they are spending while the RTX 4080 series is aimed at users who want the best gaming performance at the best possible price. The RTX 4070 will be the sweet spot for high-end gaming, offering a buttery smooth 2K game experience.
The previous GeForce RTX 3070 was touted to offer a huge improvement over the RTX 2070 and was said to offer performance faster than the RTX 2080 Ti but ended up mostly on par with the Turing flagship with only the RTX 3070 Ti exceeding the performance of the previous Turing GPU flagship. It looks like the RTX 4070 will be placed in a similar position where it might offer graphics performance on par or close to the RTX 3080 Ti but a 'Ti' variant going further ahead in graphics performance.
NVIDIA's AD104 'Ada Lovelace' GPU - The Next-Gen Powerhouse
Starting with the GPU configuration, the NVIDIA GeForce RTX 4070 series graphics cards are said to utilize the AD104 GPU core. The GPU is said to measure around 300mm2 and will utilize the TSMC 4N process node which is an optimized version of TSMC's 5nm (N5) node designed for the green team.The NVIDIA Ada Lovelace AD104 GPU is expected to feature up to 5 GPC (Graphics Processing Clusters). This is the one less GPC than the GA104 GPU. Each GPU will consist of 6 TPCs and 2 SMs which is the same configuration as the existing chip. Each SM (Streaming Multiprocessor) will house four sub-cores which is also the same as the GA102 GPU. What's changed is the FP32 & the INT32 core configuration. Each sub-core will include 128 FP32 units but combined FP32+INT32 units will go up to 192. This is because the FP32 units don't share the same sub-core as the IN32 units. The 128 FP32 cores are separate from the 64 INT32 cores.
So in total, each sub-core will consist of 32 FP32 plus 16 INT32 units for a total of 48 units. Each SM will have a total of 128 FP32 units plus 64 INT32 units for a total of 192 units. And since there are a total of 60 SM units (12 per GPC), we are looking at 7,680 FP32 Units and 3,840 INT32 units for a total of 11,520 cores. Each SM will also include two Wrap Schedules (32 thread/CLK) for 64 wraps per SM. This is a 50% increase on the cores (FP32+INT32) and a 33% increase in Wraps/Threads vs the GA102 GPU.
NVIDIA AD103 'Ada Lovelace' Gaming GPU 'SM' Block Diagram (Image Credits: Kopite7kimi):
- 5 GPCs vs 6 GPCs on GA104
- +25% Cores vs GA104 GPU
- 50% More L1 Cache (Versus Ampere GA104)
- Twice More L2 Cache (Versus Ampere GA104)
- +66% ROPs (Versus Ampere GA104)
- 4th Gen Tensor & 3rd Gen RT Cores
NVIDIA GeForce RTX 4070 Series Preliminary Specs:
Graphics Card Name | NVIDIA GeForce RTX 4070 Ti | NVIDIA GeForce RTX 4070 | NVIDIA GeForce RTX 3070 Ti | NVIDIA GeForce RTX 3070 |
---|---|---|---|---|
GPU Name | AD104-400? | AD104-300? | Ampere GA104-400 | Ampere GA104-300 |
Process Node | TSMC 4N | TSMC 4N | Samsung 8nm | Samsung 8nm |
Die Size | ~300mm2 | ~300mm2 | 395.2mm2 | 395.2mm2 |
Transistors | TBD | TBD | 17.4 Billion | 17.4 Billion |
CUDA Cores | ~7680 | ~7040 | 6144 | 5888 |
TMUs / ROPs | TBD / 160 | TBD / 144 | 192/ 96 | 184 / 96 |
Tensor / RT Cores | TBD / TBD | TBD / TBD | 192/ 48 | 184 / 46 |
Base Clock | TBD | TBD | 1575 MHz | 1500 MHz |
Boost Clock | TBD | TBD | 1770 MHz | 1730 MHz |
FP32 Compute | ~38 TFLOPs | ~36 TFLOPs | 22 TFLOPs | 20 TFLOPs |
RT TFLOPs | TBD | TBD | 42 TFLOPs | 40 TFLOPs |
Tensor-TOPs | TBD | TBD | 174 TOPs | 163 TOPs |
Memory Capacity | 12 GB GDDR6X? | 12 GB GDDR6 | 8 GB GDDR6X | 8 GB GDDR6 |
Memory Bus | 192-bit | 192-bit | 256-bit | 256-bit |
Memory Speed | 21 Gbps | 18 Gbps | 19 Gbps | 14 Gbps |
Bandwidth | 504 GB/s | 432 GB/s | 608 Gbps | 448 Gbps |
TGP | ~330W | ~300W | 290W | 220W |
Price (MSRP / FE) | $599 US? | $499 US? | $599 US | $499 US |
Launch (Availability) | 2022 | 2022 | 10th June 2021 | 29th October 2020 |
Just for comparison's sake:
- NVIDIA GeForce RTX 4090 Ti: ~103 TFLOPs (FP32) (Assuming 2.8 GHz clock)
- NVIDIA GeForce RTX 4090: ~90 TFLOPs (FP32) (Assuming 2.8 GHz clock)
- NVIDIA GeForce RTX 4080: ~50 TFLOPs (FP32) (Assuming 2.5 GHz clock)
- NVIDIA GeForce RTX 3090 Ti: 40 TFLOPs (FP32) (1.86 GHz Boost clock)
- NVIDIA GeForce RTX 4070 Ti: ~38 TFLOPs (FP32) (Assuming 2.5 GHz clock)
- NVIDIA GeForce RTX 4070: ~36 TFLOPs (FP32) (Assuming 2.5 GHz clock)
- NVIDIA GeForce RTX 3090: 36 TFLOPs (FP32) (1.69 GHz Boost clock)
- NVIDIA GeForce RTX 3080: 30 TFLOPs (FP32) (1.71 GHz Boost clock)
- NVIDIA GeForce RTX 3070 Ti: 22 TFLOPs (FP32) (1.77 GHz Boost clock)
- NVIDIA GeForce RTX 3070: 20 TFLOPs (FP32) (1.72 GHz Boost clock)
Full article:https://wccftech.com/roundup/nvidia-geforce-rtx-4070/