Middle Generation Console Upgrade Discussion [Scorpio, 4Pro]

Status
Not open for further replies.
The entire SoC could be considered a graphics processor, and legitimately so.

I disagree. When someone says GPU there's a reasonable expectation that they mean GPU. It's not like this was a live off-the-cuff presentation. It was a pre-recorded and edited video.

And anything from 5.5 TF up could be rounded to 6 TF without comeback, given that no decimal places are given.

I also strongly disagree with this. That big of a spread would generate a huge backlash and rightfully so. In fact, anything short of 6.0 even (okay, maybe 5.95 would be ok) would set off a shit-storm and the further they are from that number the bigger the storm. I don't think they would continue to repeat that number, without ever clarifying or adding precision to it, unless that was the number they were targeting. Establishing that expectation when you *know* you are going to underdeliver would be insane.
 
I need to go back and reread Xbox architects DF interview but I seem to recall them saying CU count doesn't scale linearly. They also made a big deal about clock frequency, saying CPU was more important for frame rate. If my recollection is correct I'd be very surprised if Scorpio utilises Jaguar. They talked a lot about balance back then and a 6TF GPU coupled with Jaguar doesn't come across as balanced IMO...
Ah yes The Balance! I forgot about that. Thematically Zen would support the whole balance thing ;-) Since it was said to be 8 cores that would mean 8 Zen cores which seems very ambitious. I would think that reworking the memory and memory controller and doing whatever they are going to do about the ESRAM is going to be pretty ambitious.


Sent from my Nexus 5X using Tapatalk
 
Ah yes The Balance! I forgot about that. Thematically Zen would support the whole balance thing ;-) Since it was said to be 8 cores that would mean 8 Zen cores which seems very ambitious. I would think that reworking the memory and memory controller and doing whatever they are going to do about the ESRAM is going to be pretty ambitious.


Sent from my Nexus 5X using Tapatalk

My guess is the bandwidth they are able to achieve with GDDR5 at >320GB/s will allow them to emulate the ESRAM (somewhere around 200GB/S peak?) for backwards compatibility and I don't think ESRAM will factor into any future designs. Again I need to go back through that article.
 
My guess is the bandwidth they are able to achieve with GDDR5 at >320GB/s will allow them to emulate the ESRAM (somewhere around 200GB/S peak?) for backwards compatibility and I don't think ESRAM will factor into any future designs. Again I need to go back through that article.
Well that is bandwidth of course which is fine if that is what the ESRAM was being used for but if latency and/or bidirectional data flow was something then that would complicate things.

I don't know how much non bandwidth uses there were so I don't know how much of an issue it will be
Sent from my Nexus 5X using Tapatalk
 
That was in contrast to increasing clocks which boosted everything from geometry to fillrate.
AMD has traditionally been very strong in CU parformance (raw compute flops), but lacking in fixed function side (esp geometry processing). Fury X was a good example of scaling up the CU count and the memory bandwidth alone. It wasn't enough. Polaris on the other hand mostly improved the fixed function side, and shows nice gains compared to older AMD GPUs with similar CU count.

For example the old Radeon 7970 GE beats the new RX 470 in fill rate, flops and bandwidth, but clearly loses in games:
http://hwbench.com/vgas/radeon-rx-470-vs-radeon-hd-7970-ghz-edition
My guess is the bandwidth they are able to achieve with GDDR5 at >320GB/s will allow them to emulate the ESRAM (somewhere around 200GB/S peak?) for backwards compatibility and I don't think ESRAM will factor into any future designs. Again I need to go back through that article.
As long as the GDDR bandwidth is at least somewhat higher and the latency is not significantly worse, you could just reserve 32 MB of GDDR and map it to virtual address space to mock ESRAM. Of course legacy titles would still do silly things like needlessly copy things in/out of this 32 MB region, but this shouldn't be a problem since we are talking about running legacy titles designed for Xbox One on a ~4x more powerful GPU. Of course new titles developed for both platforms in mind would use different code paths on each console.
 
My guess is the bandwidth they are able to achieve with GDDR5 at >320GB/s will allow them to emulate the ESRAM (somewhere around 200GB/S peak?) for backwards compatibility and I don't think ESRAM will factor into any future designs. Again I need to go back through that article.

It's possible that they don't need to be as fast at every single possible memory operation.

If they're vastly faster overall at ROPerations (do you see what I did there?) then individual operations having slightly higher latency might not matter.
 
AMD has traditionally been very strong in CU parformance (raw compute flops), but lacking in fixed function side (esp geometry processing). Fury X was a good example of scaling up the CU count and the memory bandwidth alone. It wasn't enough. Polaris on the other hand mostly improved the fixed function side, and shows nice gains compared to older AMD GPUs with similar CU count.

For example the old Radeon 7970 GE beats the new RX 470 in fill rate, flops and bandwidth, but clearly loses in games:
http://hwbench.com/vgas/radeon-rx-470-vs-radeon-hd-7970-ghz-edition

I've been comparing the 480 to the 390X recently as they have almost exactly the same flops, and noticed that the 480 is between roughly as fast at 900p, through to 10~20% slower at 4K.

The 480 has 67% of the theoretical BW, and 60% of the theoretical fill rate. While the 480 has colour compression to somewhat offset the BW deficit, I'm not aware of anything that can offset its apparently low fill.

I think the 480 is hurt by it's ROPs. Would this be a fair assessment, or am I missing something?
 
As long as the GDDR bandwidth is at least somewhat higher and the latency is not significantly worse, you could just reserve 32 MB of GDDR and map it to virtual address space to mock ESRAM. Of course legacy titles would still do silly things like needlessly copy things in/out of this 32 MB region, but this shouldn't be a problem since we are talking about running legacy titles designed for Xbox One on a ~4x more powerful GPU. Of course new titles developed for both platforms in mind would use different code paths on each console.
It would seemingly bring the latency back into the same range that GPUs already handle with expanded resources to compensate. (maybe ~2x ESRAM?)
Depending on the residency and update policy for the resources, Scorpio might have some fastpath detection for pages being copied back and forth that could be resolved to an attribute update.

Somewhat more bandwidth may or may not be sufficient, since the theoretically optimal nearly 1:1 read/write utilization of the ESRAM is one of the worst cases for DRAM utilization. It might resolve to being generally sufficient with unwanted drops in bandwidth available for any kind of activity, the new bus contention for the ESRAM allocation aside. Perhaps better compression and larger caches to reduce misses and queue up enough of the remaining accesses can be used to get something DRAM friendly. Having additional channels could provide some additional concurrency and spread out conflicts.
 
I've been comparing the 480 to the 390X recently as they have almost exactly the same flops, and noticed that the 480 is between roughly as fast at 900p, through to 10~20% slower at 4K.

The 480 has 67% of the theoretical BW, and 60% of the theoretical fill rate. While the 480 has colour compression to somewhat offset the BW deficit, I'm not aware of anything that can offset its apparently low fill.

I think the 480 is hurt by it's ROPs. Would this be a fair assessment, or am I missing something?
Radeon 390 and 390X have 64 ROPs. RX 480 is in the same performance segment, but only has 32 ROPs. More ROPs and more bandwidth would definitely help.
 
Polaris on the other hand mostly improved the fixed function side, and shows nice gains compared to older AMD GPUs with similar CU count.

For example the old Radeon 7970 GE beats the new RX 470 in fill rate, flops and bandwidth, but clearly loses in games:
http://hwbench.com/vgas/radeon-rx-470-vs-radeon-hd-7970-ghz-edition

Not really seeing these gains to be honest. Yes the compression helps with bandwidth, but in reality RX470 has easily the 7970 beat in flops. Your link just has the figure for the 470 calculated from the unrealistically low 926Mhz base clock, when in reality it operates much higher than that. The boost clock for the 470 is 1206Mhz and when you factor that into the results, you'll see that it doesn't look so great against the more than 4 year old Ghz edition...
 
Computerbase made a comparison between the R9 280X (Tahiti), R9 380X (Tonga) and RX 470 (Polaris) with idenctical shader clocks and nearly the same bandwith (although for Tahiti the increased latency might be a performance hit).
https://translate.google.de/translate?sl=de&tl=en&js=y&prev=_t&hl=de&ie=UTF-8&u=https://www.computerbase.de/2016-08/amd-radeon-polaris-architektur-performance/&edit-text=

On average Tonga was 10% faster than Tahiti and Polaris 10 was 18%.
Since every game has different bottlenecks the results can be sometimes quite dramatic.
From only 3% vs. Tahiti in The Talos Principle to 41% in The Witcher 3.
 
Computerbase made a comparison between the R9 280X (Tahiti), R9 380X (Tonga) and RX 470 (Polaris) with idenctical shader clocks and nearly the same bandwith (although for Tahiti the increased latency might be a performance hit).
https://translate.google.de/translate?sl=de&tl=en&js=y&prev=_t&hl=de&ie=UTF-8&u=https://www.computerbase.de/2016-08/amd-radeon-polaris-architektur-performance/&edit-text=

On average Tonga was 10% faster than Tahiti and Polaris 10 was 18%.
Since every game has different bottlenecks the results can be sometimes quite dramatic.
From only 3% vs. Tahiti in The Talos Principle to 41% in The Witcher 3.

Thanks for the link, that's an interesting test!

I do have to say though that the way they have the test setup, it puts a super heavy emphasis on the effectiveness of the memory compression. When Tonga launched it was pretty much clear that the new compression tech enabled AMD to match the old Tahiti 384bit interface with a 256bit interface. In essence that's where the actual effective bandwidth was seemingly about the same.

In that Computerbase test, if I'm reading it right they have overclocked the Tonga memory by about 15%, downclocked the Tahiti by 23% and with Polaris having more advanced compression tech, I kind of have to ask: Are we measuring anything else besides the memory performance here? Tahiti is being crippled by this downclock. It would have been interesting to see a test with emphasis put on taking the memory speed out of the equation...

It does seem clear though that there are some situations like the Witcher 3 and other geometry/tesselation heavy titles where AMD has without question improved the performance of their GPU tech along the years, but the gains still aren't imo large when you check the performance in a vast group of games and eliminate other bottlenecks.
 
Obviously they are measuring the whole "package" against each other.
Having nearly the same raw-data means that all improvements made are in charge.
The delta color compression, larger/better L2$, 4 Geometry-Engines vs. 2, better Geometry-Engines itself and other optimization.
Since we can't tests all improvements isolated I like the comparison where we at least have one baseline and can compare what they are worth overall.

Maybe another tests is interesting in regards to the bandwidth, back in the very old days Computerbase downclocked the 7970 memory to 2.000 Mhz so it matches the bandwidth from the 680 GTX (192 GB/s vs. the original 264 GB/s).
The results were 7% less performance for 27% less bandwidth (on average of course).
https://translate.google.de/transla...de/2012-04/bericht-nvidia-geforce-gtx-680/18/
https://translate.google.de/transla...de/2012-04/bericht-nvidia-geforce-gtx-680/18/
 
It's interesting to see how far ahead of the Gen 3 Tonga Gen 4 Polaris is, while the 480 is so far behind the Gen 2 390X.

Compare the results to Locuza posted to these: https://www.techpowerup.com/reviews/AMD/RX_480/20.html

As the 480 is effectively AMD's headline product until well into next year, I wish they had given Polaris the bus and the ROPs to outperform the 390X (and close in on the Fury [none X]) in the way the technology is almost certainly capable of doing.

On the bright side, Scorpio looks likely to have a 384-bit bus, and hopefully MS have taken a page from the PS4's book and will cram Scorpio with 64 ROPs. If so, I expect to see Scorpio coming in ahead of the 480, the 390X, the 1060 and 980, and perhaps even the Fury [none X].
 
haha...anyway...I think Scorpio will perform similar to gtx 980ti in games.

How would that be possible?

If Scorpio features 6TF and 320GB/s then in comparison to the RX 480 its theoretically 16% faster in the core and has 25% more bandwidth - however that bandwidth has to be shared with the CPU. Since theoreticals don't fully translate into performance increases it's doubtful that Scorpio would be more than about 15-20% faster than the RX 480.

The 980Ti averages 33% faster than the RX 480 in modern games so we should expect it to perform a little better than Scorpio but nothing Earth shattering.

https://www.techpowerup.com/reviews/AMD/RX_480/24.html
 
Status
Not open for further replies.
Back
Top