Baseless Next Generation Rumors with no Technical Merits [post E3 2019, pre GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
Had a strange idea.
If intersection is handled within TMU, can it easily handle things like alpha textures in one/same pass as intersection? (give answer if intersection was in opaque part of triangle or not.)

As I understand it, the TMU is meant to step through the BVH. The actual intersection is done in compute? I think...
Anyways, once you have a texture in cache, checking opacity of the texels for a given intersection point is trivial. What makes alpha textures a problem for ray tracing is the fact a typical game scene has hundreds of different high res alpha textures at a time, and they don't fit all in your cache at a time. It's retrieving them fr memory that makes this slow, and it will be slow for whoever does it, be it a custom RT core or reworked Texture Unit.
 
Well, a Radeon 5700XT is 9.5 Tflops and can draw up to 240 Watts or so. That pretty much gives you some estimation of where the new consoles will land, unless they break the mold and ramp up power consumption.
In typical AMD fashion for their discrete cards, the 5700XT is clocked and volted way beyond its optimum efficiency curve.

People are getting 155W of total consumption with a quick undervolt:

And others are getting 130W with undervolt and limiting the clock to 1800MHz:
https://www.techpowerup.com/forums/threads/testing-undervolting-with-my-5700xt.258313/

He could even achieve an average 85W consumption at 1500MHz. That's 43% of the power consumption at 80% the performance.
40 CUs at 1500MHz consuming 85W is 7.68 TFLOPs.
At 1700MHz, the card is consuming 105W average and that's 8.7 TFLOPs.

Assuming ~20W for 256bit GDDR6, we have Navi 10 at 1700MHz doing 102 GFLOPs/W.

So by increasing the CU count to say 48 and keeping the clocks at 1700MHz, you get 10.44 TFLOPs, and that would consume around 102W on the GPU portion of the SoC / chiplet.




Still think fehu's 7 TFLOPs are the pinnacle of what the current tech can achieve?
 
He could even achieve an average 85W consumption at 1500MHz. That's 43% of the power consumption at 80% the performance.
40 CUs at 1500MHz consuming 85W is 7.68 TFLOPs.
At 1700MHz, the card is consuming 105W average and that's 8.7 TFLOPs.

Assuming ~20W for 256bit GDDR6, we have Navi 10 at 1700MHz doing 102 GFLOPs/W.

So by increasing the CU count to say 48 and keeping the clocks at 1700MHz, you get 10.44 TFLOPs, and that would consume around 102W on the GPU portion of the SoC / chiplet.




Still think fehu's 7 TFLOPs are the pinnacle of what the current tech can achieve?
if remove 4 CUs for redundancy, clock slightly lower for better yields. Say -200 MHz and where would we end up?
nvm: about
7.68 TF 40 CUs 1500 Mhz
8.448 TF 44CUs. 1500 Mhz.
9.5 TF 44 CUs 1700Hz.

hmm yea not sure where its going to land since the consoles are running some form of RDNA 2 variant.
 
Last edited:
New theory about Shawn Layden: he could be a double agent like the very successful Agent Phil Harrison before him. His mission, quit Sony, get hired by Microsoft and feed false information to Spencer about PS5 hardware, software and strategy.

Well Xbox VP Mike Ybarra has left Microsoft along with Mixer co-founders Matt Salsamendi & James Boehm. Plus they're hiring another games manager for 1st party too. Sounds like a few openings available.

Tommy McClain
 
I've watched a lot of thermal graphs and under heavy load X1X will keep under 200W. Most often sitting between 160-175; If there is a 200W peak, it's <0.001% of the time. Or a specific X1X that was tuned to require more power.

Yah, most of what I read about power consumption had the ps4 pro at around 160W and the One X around 180W, I'm assuming averages.
 
Where did u get that information from? The info always seem pretty light on the dev kit specs. Same chip is accurate but I’ve never seen detailed info in terms of cpu/gpu frequency. Even psdevwiki lacks that info.

Plus these aren’t final hardware kits. Earlier PS4 dev kits used 8 bulldozer cores, which would not have given an accurate look at final hardware other than core count.

It is just based on the timeline of what Sony have done with the Pro and PS4. With the Pro they had a final spec* dev kit available in at least December 2015 with I guess third-party roll out after GDC 2016.

With the PS4, yes they had a early PC-based kit but I don't know if any devs outside Sony/middleware teams got them? Maybe EA/Ubi for launch games? In January 2013 what looked like close to a final dev kit was posted on GAF

This new PS5 "V" dev kit looks very far along/ near final look to me. I don't doubt there will still be revisions to come, but outside of clock speeds I'm not sure this dev kit will change much in the RAM/APU department. I guess in less than 6 months the retail console will have to start being made so not that long left for devs to "adjust to its capabilities" to quote Mark Cerny from the first Wired article.

*From the leaked dev documents PDF.

I'd love more detailed info on dev kits but understand this is one thing pretty much off-limits to be shared.
 
if remove 4 CUs for redundancy, clock slightly lower for better yields. Say -200 MHz and where would we end up?
nvm: about
7.68 TF 40 CUs 1500 Mhz
8.448 TF 44CUs. 1500 Mhz.
9.5 TF 44 CUs 1700Hz.

hmm yea not sure where its going to land since the consoles are running some form of RDNA 2 variant.

Probably RX5700 non XT raw performance with the addition of Ray Tracing hardware and some other stuff.
 
Lets make one thing clear.

Current Navi + Zen2 and RT combo on 7nm process cannot provide 10+TF console. That is just pie in the sky.

Only way we are seeing 10TF consoles is if MS and Sony are using TSMCs 7nm+ process with hypotetical RDNA2 chip that delivers 15-20% higher TFLOP per watt.

If not, I can easily see chip clocked at 1.7xGhz with 36 active CUs, hardware RT and 3.2 Zen2 CPU.

This would be in line with what we got last time around, except much more competitive CPU instead of Jaguar cores. Remember, people expected ~3TF GPUs last time around and weaker one turned out 1.3TF.
 
Lets make one thing clear.

Current Navi + Zen2 and RT combo on 7nm process cannot provide 10+TF console. That is just pie in the sky.

Only way we are seeing 10TF consoles is if MS and Sony are using TSMCs 7nm+ process with hypotetical RDNA2 chip that delivers 15-20% higher TFLOP per watt.

If not, I can easily see chip clocked at 1.7xGhz with 36 active CUs, hardware RT and 3.2 Zen2 CPU.

This would be in line with what we got last time around, except much more competitive CPU instead of Jaguar cores. Remember, people expected ~3TF GPUs last time around and weaker one turned out 1.3TF.

Someone with an educated and realistic expectations.... :p seriously what are people thinking. Someone that saw the thing running called 'unimpressive specs, but nice graphics'. Were 6 months away if even that and then those things are being produced, we can't expect more then the best AMD has to offer right now.

On top of that, perhaps like a leaker claimed, that it will be a RDNA1/2 hybrid, not a full RDNA2 variant. Explains also why they are going with 'just a Zen 2' and not Zen 3. Perhaps custom navi with RT logic from Navi 2, or something like that.

https://forum.beyond3d.com/threads/...0-ps5-navi-hybrid-xbox-navi-pure-spawn.61231/
 
Pretty sure MS and Sony are not constraint by PC GPU generations and they can, and will (as seen before), pick and chose which extension to GPU core they want from AMD.

This chip will also hit clock sweet spot, and voltage will almost certainly be lower then PC equivalent. But,budget for GPU (with RT hardware) is ~120W-130W max at full load, so there is very little wiggle room to get full Navi XT on in there at that wattage.
 
If intersection is handled within TMU, can it easily handle things like alpha textures in one/same pass as intersection? (give answer if intersection was in opaque part of triangle or not.)
Geometry is flagged as either opaque or non-opaque when you build the BVH structures - if it's not opaque (or the submitted ray carries non-opaque flag), then Any-Hit shader is executed in the shader unit. So the TMU just checks for the flags, but can not alpha test without a command from the shader.

As I understand it, the TMU is meant to step through the BVH. The actual intersection is done in compute? I think....
The AMD patent claims that intersections are tested in fixed function hardware, but shader code controls the execution and can be used for testing non-standard BVH structures. But they also claim that specific implementations may include compute units alongside the fixed function blocks.

So by increasing the CU count to say 48 and keeping the clocks at 1700MHz, you get 10.44 TFLOPs, and that would consume around 102W on the GPU portion of the SoC / chiplet.
That's a lot for an APU part - if that 10 TF figure is true, I'd think they've implemented some improvements to the architecture.
 
Last edited:
if remove 4 CUs for redundancy, clock slightly lower for better yields.
Why are you assuming they need 4 CUs for redundancy, and/or that Navi 10 doesn't already have a number of redundant CUs?

Say -200 MHz and where would we end up?
Isn't 200MHz for yields a huge clock variation? Both Navi and Turing cards are apparently going for 100-130MHz variations between their higher and lower binned models, and those are already pushing very high clocks.


I never said that it was...

I know, sorry. It was more of an open-ended question towards a plural audience. "You" has a different word for singular and plural in my native language and sometimes I make these mistakes.


That's a lot for an APU part - if that 10 TF figure is true, I'd think they've implemented some improvements to the architecture.
What's a lot? 100W for the GPU part?
Do you think the Xbox One X or the PS4 Pro have less power dedicated to the GPU?
 
Why are you assuming they need 4 CUs for redundancy, and/or that Navi 10 doesn't already have a number of redundant CUs?
For the current consoles as is; they have 1 redundant CU per shader engine (4 Shader Engines for mid-gen). PS4 and Xbox One had 2 shader engines, therefore 2 redundancy. This is critical because there is only 1 spec, and it would be costly to throw away lots of chips.
Devkits use all the CUs - no redundancy, I suppose depending how far along the silicon is, the non perfect chips are used for retail.

As for the PC space; as i understand it; it's all done through binning. Best silicon chips that are perfect have full CU counts and the highest clock speeds. The crappiest chips have the most number of CUs shut off with lower speeds and that's how the whole lineup is built. So I don't believe there should be purposeful redundant CUs in the PC space.

Isn't 200MHz for yields a huge clock variation? Both Navi and Turing cards are apparently going for 100-130MHz variations between their higher and lower binned models, and those are already pushing very high clocks.
Once again, you've got to cater to the bottom denominator here. You want to sell the absolute baseline for acceptable price/performance since there is no binning strategy.
 
For the current consoles as is; they have 1 redundant CU per shader engine (4 Shader Engines for mid-gen). PS4 and Xbox One had 2 shader engines, therefore 2 redundancy. This is critical because there is only 1 spec, and it would be costly to throw away lots of chips.
Devkits use all the CUs - no redundancy, I suppose depending how far along the silicon is, the non perfect chips are used for retail.

As for the PC space; as i understand it; it's all done through binning. Best silicon chips that are perfect have full CU counts and the highest clock speeds. The crappiest chips have the most number of CUs shut off with lower speeds and that's how the whole lineup is built. So I don't believe there should be purposeful redundant CUs in the PC space.

We wouldn't know if Vega 20 had more than 60 CUs if it wasn't for the Instinct MI60.
Likewise, until we get e.g. a die shot like e.g. from Fritz (or unless someone made them that question directly, and they answered), we don't really know for sure if Navi 10 actually has 48 CUs total.
AMD could be preparing a RX5800 XT with 48 CUs to go together with 16+GT/s GDDR6 when those chips become available, for example.

Once again, you've got to cater to the bottom denominator here. You want to sell the absolute baseline for acceptable price/performance since there is no binning strategy.
Yes you do, but I'm still asking where you took the 200MHz delta from.
And why are you taking 200MHz out of 1700MHz? That seems random too. Most Navi 10 GPUs seem to reach 2GHz with undervolting. Why aren't you taking 200MHz out of 2GHz to ensure a bottom denominator?


Considering that you can't salvage a console SOC like you can a GPU or CPU (unless they go with a 2 tiered console launch), you either need to build in redundancy or clock the chips fairly low to maximize yields. Possibly even both.
I'm aware of console SoCs traditionally having to implement redundancy and lower-than-desktop clocks to maximize yields.

What I don't know is why iroboto:
1 - assumed these 4 CUs would need to be taken out of the total 48 CUs I proposed, since what I did were calculations for power consumption and the disabled CUs wouldn't consume any power.
Can't the SoC or GPU chiplet be designed with 52 CUs to then disable 4 CUs out?
Besides, Navi now uses dual-CUs so if they want to have redundancy for every shader engine, wouldn't they actually need to write out 8 CUs (which is now a huge amount of transistors / die area)?
Maybe they don't want to implement redundancy in the same way this time.


2 - Decided that 1700MHz was some sort of baseline for 7nm Navi (which it isn't because all Navi 10 chips so far clock way above that), and that 200MHz would need to be taken down from said baseline.
Back in 2012 the highest clocked Pitcairn and Bonaire cards were clocked at 1GHz. Then Liverpool had its GPU clocked at 800MHz and Durango at 853MHz.
The highest clocked Navi cards seem to be able to sustain over 1850MHz, or 1950MHz if we take the anniversary edition into consideration. Why are we assuming AMD needs to take 200MHz down from 1700MHz?
 
Status
Not open for further replies.
Back
Top