As far as I can see, the FP reg file is cut in half and the FMA area is also reduced somewhat, so this might be indeed a 128-bit implementation of...
The perf scaling between the SKUs definitely underlines how AMD and NV have been implementing geometry progressing. The RDNA cards are grouped in...
I think NV varies the rate of geometry processing per SM over different SKUs, both for TDP reasons as well as product segmentation. For example,...
Actually, there's a similar proposition by Agner Fog for a hybrid CISC/RISC forward-compatible ISA: https://www.forwardcom.info/
For the LOLs, here is a 1080Ti @ 2037/13000 MHz : [IMG]
V2 had fixed 4MB for the frame buffer. That would fit two 16-bit frames (standard double-buffering) and a 32-bit depth-buffer, all at 800*600...
The Hawaii GPU doesn't support frame-buffer compression, that could be one of the reasons for the poor performance scaling in this case.
TSMC should probably consider researching the various EDRAM technologies to resolve the memory scaling issues, particularly for large cache...
AMD still maintains 2:1 ratio for the depth-buffer sampling rate? I thought they moved to 4:1 years ago.
Mr. Fritz: got the first shot (as usual), but not quite flawlessly: [MEDIA] It looks like the place for what would have been used to double the...
All Zen architectures were already capable of dual ADD and MUL rates for the SIMD/FMA instructions. With Zen3 and the major re-shuffling of the...
Full ISA performance dump: http://users.atw.hu/instlatx64/AuthenticAMD/AuthenticAMD0A20F10_K19_Vermeer_InstLatX64.txt Surprising to see AMD have...
The general layout is correct, but the fine logic details are glossed over.
It all boils down to balancing the die budget between compute and memory resources. With RDNA2, AMD put the stops on chasing the raw FLOPS numbers...
More like a PCI-E SERDES. Larger die-shot here: https://images.anandtech.com/doci/16202/Die-Shot_Color-Front.jpg
The 128MB cache array was quite well spread along the die edges, judging from the die-shot mockup.
Why not. Intel already did similar thing with Knights Landing and MCDRAM.
Kepler: hold my underfed 192 FMA lanes...
Fermi will happen. :p
At this rate of Tensor logic investment, what are the chances that at some point in the future Nvidia will just fold all arithmetic ALUs in just...
This type of rumors (co-processors, EDRAM, etc.) have been a recurring phenomena ever since G80. It's an easy click-bait wishful fantasy.
This is a mock up die shot. Nvidia has been using similar presentation sketches since Tegra, but no relation to the actual poly-silicon design....
Nvidia has a trend to increase the number of multi-processors per GPC since Kepler (or keep the GPC unit count hard capped at six), so GA102's...
So, this new storage API specifically exploits the NVMe performance features? Looks like SATA SSDs will not benefit from this.
Heatpipes rely on capillary fluid transfer, so their efficiency is mostly unaffected depending on the mounting orientation. Given the heat density...
Does that mean the RT units in RDNA2 also share common data path with the TMUs, i.e. blocking each other? Turing's RT core apparently sits on...
4x larger VGPR file?
Double the L/S units, but still just one TMU quad. Probably that SM layout is not conclusive for the consumer parts, particularly regarding the...
So, the current crop of A100 GPUs disables one of the eight GPCs to gain yealds (plus few extra SMs), that's why MIG virtualization is limited to...
So, can Nvidia just add another TMU quad per SM in Ampere to increase the intersection test rate, or it will be limited by cache/memory data...
In general, the engine is forward-rendering with tile-based occlusion pre-pass for the lighting. My guess is that if the engine is using async...
That's an early alpha version with older engine and different API -- not exactly a meaningful comparison. Here is another bench run with 780Ti at...
https://www.techspot.com/article/2001-doom-eternal-older-gpu-test/ Another Doom Eternal bench with low-end and older generation GPUs. The Kepler...
The game is definitely VRAM hungry. On max settings, 1440p mode and no res scaling, my 1080Ti GPU utilization hover around the 40% mark with...
The storage subsystem has been stagnating for the longest time, so naturally the emphasis for the next generation is there. Adding more RAM beyond...
[IMG] The CCX dimensions suggest a hefty cut down of the L3 cache size, compared to the chiplet version of Zen2.
Or actually fit better with the limited R&D resources, preventing the development of complicated all-in-one architecture. :-|
Turing still has to carry the rather large dead wight of all the Tensor logic, besides the few sprinkles for RT acceleration.
It performs Multiply-Additions across different vector formats.
Not really: [IMG]
Could the low INT32 throughput in both GCN and Navi contribute to the performance deficit here, on top of the memory access issues?
My GTX 1080Ti scored 9281 points at 1080p Ultra settings.
Wow, Intel's Embree library really drags hard on the Bulldozer architecture.
[IMG] Source: https://www.tomshardware.com/news/amd-zen-3-zen-4-epyc-rome-milan-genoa-architecture-microarchitecture,40561.html
Yes, the server SKUs will probably be the excursive recipients for quad-way SMT for the time being. Database workloads and large scale VMs...
Zen2 architecture already hints to such move: unified AGU scheduler, wider load/store pipe, double the micro-op cache and etc. If AMD keeps...
It will work just fine. Samsung offers a free application just for that kind of migration on their site.
Since this benchmark is build with Intel's Fortran compiler, I see it still underutilizes AMD's architectures. My 6-core Broadwell-E scores 12.432...
No performance changes, only feature updates like higher tier conservative rasterization support.
The test with 0% culling was hitting the setup pipelines full time, so in this case the limit was not in geometry processing. Older generations of...