I don't know how you get a 5700XT type of performance
We won't get that in raw performance. It's not the way to go in a small box.
I don't know how you get a 5700XT type of performance
Pro -> up to 175W
XBX -> up to 200W
Had a strange idea.
If intersection is handled within TMU, can it easily handle things like alpha textures in one/same pass as intersection? (give answer if intersection was in opaque part of triangle or not.)
In typical AMD fashion for their discrete cards, the 5700XT is clocked and volted way beyond its optimum efficiency curve.Well, a Radeon 5700XT is 9.5 Tflops and can draw up to 240 Watts or so. That pretty much gives you some estimation of where the new consoles will land, unless they break the mold and ramp up power consumption.
if remove 4 CUs for redundancy, clock slightly lower for better yields. Say -200 MHz and where would we end up?He could even achieve an average 85W consumption at 1500MHz. That's 43% of the power consumption at 80% the performance.
40 CUs at 1500MHz consuming 85W is 7.68 TFLOPs.
At 1700MHz, the card is consuming 105W average and that's 8.7 TFLOPs.
Assuming ~20W for 256bit GDDR6, we have Navi 10 at 1700MHz doing 102 GFLOPs/W.
So by increasing the CU count to say 48 and keeping the clocks at 1700MHz, you get 10.44 TFLOPs, and that would consume around 102W on the GPU portion of the SoC / chiplet.
Still think fehu's 7 TFLOPs are the pinnacle of what the current tech can achieve?
New theory about Shawn Layden: he could be a double agent like the very successful Agent Phil Harrison before him. His mission, quit Sony, get hired by Microsoft and feed false information to Spencer about PS5 hardware, software and strategy.
...
Still think fehu's 7 TFLOPs are the pinnacle of what the current tech can achieve?
I've watched a lot of thermal graphs and under heavy load X1X will keep under 200W. Most often sitting between 160-175; If there is a 200W peak, it's <0.001% of the time. Or a specific X1X that was tuned to require more power.
Where did u get that information from? The info always seem pretty light on the dev kit specs. Same chip is accurate but I’ve never seen detailed info in terms of cpu/gpu frequency. Even psdevwiki lacks that info.
Plus these aren’t final hardware kits. Earlier PS4 dev kits used 8 bulldozer cores, which would not have given an accurate look at final hardware other than core count.
if remove 4 CUs for redundancy, clock slightly lower for better yields. Say -200 MHz and where would we end up?
nvm: about
7.68 TF 40 CUs 1500 Mhz
8.448 TF 44CUs. 1500 Mhz.
9.5 TF 44 CUs 1700Hz.
hmm yea not sure where its going to land since the consoles are running some form of RDNA 2 variant.
Lets make one thing clear.
Current Navi + Zen2 and RT combo on 7nm process cannot provide 10+TF console. That is just pie in the sky.
Only way we are seeing 10TF consoles is if MS and Sony are using TSMCs 7nm+ process with hypotetical RDNA2 chip that delivers 15-20% higher TFLOP per watt.
If not, I can easily see chip clocked at 1.7xGhz with 36 active CUs, hardware RT and 3.2 Zen2 CPU.
This would be in line with what we got last time around, except much more competitive CPU instead of Jaguar cores. Remember, people expected ~3TF GPUs last time around and weaker one turned out 1.3TF.
They had to work off paper napkins and beer mats. Cerny's busy crafting the chips by hand under a microscope.
Geometry is flagged as either opaque or non-opaque when you build the BVH structures - if it's not opaque (or the submitted ray carries non-opaque flag), then Any-Hit shader is executed in the shader unit. So the TMU just checks for the flags, but can not alpha test without a command from the shader.If intersection is handled within TMU, can it easily handle things like alpha textures in one/same pass as intersection? (give answer if intersection was in opaque part of triangle or not.)
The AMD patent claims that intersections are tested in fixed function hardware, but shader code controls the execution and can be used for testing non-standard BVH structures. But they also claim that specific implementations may include compute units alongside the fixed function blocks.As I understand it, the TMU is meant to step through the BVH. The actual intersection is done in compute? I think....
That's a lot for an APU part - if that 10 TF figure is true, I'd think they've implemented some improvements to the architecture.So by increasing the CU count to say 48 and keeping the clocks at 1700MHz, you get 10.44 TFLOPs, and that would consume around 102W on the GPU portion of the SoC / chiplet.
Why are you assuming they need 4 CUs for redundancy, and/or that Navi 10 doesn't already have a number of redundant CUs?if remove 4 CUs for redundancy, clock slightly lower for better yields.
Isn't 200MHz for yields a huge clock variation? Both Navi and Turing cards are apparently going for 100-130MHz variations between their higher and lower binned models, and those are already pushing very high clocks.Say -200 MHz and where would we end up?
I never said that it was...
What's a lot? 100W for the GPU part?That's a lot for an APU part - if that 10 TF figure is true, I'd think they've implemented some improvements to the architecture.
For the current consoles as is; they have 1 redundant CU per shader engine (4 Shader Engines for mid-gen). PS4 and Xbox One had 2 shader engines, therefore 2 redundancy. This is critical because there is only 1 spec, and it would be costly to throw away lots of chips.Why are you assuming they need 4 CUs for redundancy, and/or that Navi 10 doesn't already have a number of redundant CUs?
Once again, you've got to cater to the bottom denominator here. You want to sell the absolute baseline for acceptable price/performance since there is no binning strategy.Isn't 200MHz for yields a huge clock variation? Both Navi and Turing cards are apparently going for 100-130MHz variations between their higher and lower binned models, and those are already pushing very high clocks.
Why are you assuming they need 4 CUs for redundancy, and/or that Navi 10 doesn't already have a number of redundant CUs?
For the current consoles as is; they have 1 redundant CU per shader engine (4 Shader Engines for mid-gen). PS4 and Xbox One had 2 shader engines, therefore 2 redundancy. This is critical because there is only 1 spec, and it would be costly to throw away lots of chips.
Devkits use all the CUs - no redundancy, I suppose depending how far along the silicon is, the non perfect chips are used for retail.
As for the PC space; as i understand it; it's all done through binning. Best silicon chips that are perfect have full CU counts and the highest clock speeds. The crappiest chips have the most number of CUs shut off with lower speeds and that's how the whole lineup is built. So I don't believe there should be purposeful redundant CUs in the PC space.
Yes you do, but I'm still asking where you took the 200MHz delta from.Once again, you've got to cater to the bottom denominator here. You want to sell the absolute baseline for acceptable price/performance since there is no binning strategy.
I'm aware of console SoCs traditionally having to implement redundancy and lower-than-desktop clocks to maximize yields.Considering that you can't salvage a console SOC like you can a GPU or CPU (unless they go with a 2 tiered console launch), you either need to build in redundancy or clock the chips fairly low to maximize yields. Possibly even both.