Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
"We knew that many inference algorithms need only 8-bit and 4-bit integer positions for weights and the math operations involving those weights comprise the bulk of the performance overhead for those algorithms," says Andrew Goossen. "So we added special hardware support for this specific scenario. The result is that Series X offers 49 TOPS for 8-bit integer operations and 97 TOPS for 4-bit integer operations. Note that the weights are integers, so those are TOPS and not TFLOPs. The net result is that Series X offers unparalleled intelligence for machine learning."

https://www.eurogamer.net/articles/digitalfoundry-2020-inside-xbox-series-x-full-specs
 
"We knew that many inference algorithms need only 8-bit and 4-bit integer positions for weights and the math operations involving those weights comprise the bulk of the performance overhead for those algorithms," says Andrew Goossen. "So we added special hardware support for this specific scenario. The result is that Series X offers 49 TOPS for 8-bit integer operations and 97 TOPS for 4-bit integer operations. Note that the weights are integers, so those are TOPS and not TFLOPs. The net result is that Series X offers unparalleled intelligence for machine learning."

https://www.eurogamer.net/articles/digitalfoundry-2020-inside-xbox-series-x-full-specs
Some perspective on the level difference of a prosumer card: 2080
lkdfghpzhx-15-100771738-orig.jpg
 
Im not understanding. What do fp64 have to do with int4 and int8?
I think he quoted the wrong part and meant to quote this:
"Some variants of the dual compute unit expose additional mixed-precision dot-product modes in the ALUs, primarily for accelerating machine learning inference. A mixed-precision FMA dot2 will compute two half-precision multiplications and then add the results to a single-precision accumulator. For even greater throughput, some ALUs will support 8-bit integer dot4 operations and 4-bit dot8 operations, all of which use 32-bit accumulators to avoid any overflows.
 
Speaking of audio chips, did devs not take advantage of SHAPE on Xb1? We haven't heard any developer experience programming for the tensilica cores. Maybe it was automatically handled by the OS?
 
Speaking of audio chips, did devs not take advantage of SHAPE on Xb1? We haven't heard any developer experience programming for the tensilica cores. Maybe it was automatically handled by the OS?

It's what powers spatial audio, I believe, and prior to that was dedicated to Kinect 2.
 
As per the sampler feed back video above from the DX team, Claire Andrews says "it is a GPU hardware feature." and it is neatly printed on the slide.

About 20 seconds into that video.

WRT to the whole phraseology from the Sony PS5 presentation - I also have trouble following the logic of how they went from having trouble to maintaining 2.0 Ghz (GPU) 3.0 Ghz (CPU) under their old normal fixed clock paradigm to. a 2.23 and 3.5 with it transfering power between the parts. What loads were they testing that made 2.0/3.0 hard to get - and what loads are they testing that 2.23 and 3.5 happen "most of the time"? Wouldnt it make sense that loads that made 2.0 and 3.0 hard to maintain under a fixed power would have the same affect on a situation where power transfers?

As I have stated previously in this thread the reason they had trouble reaching 2.0/3.0 was because they used their old way of selecting clock speeds.
If they wanted a consistent power supply and cooling they would need to clock the GPU+CPU much lower than what it otherwise could mostly run at, just in case.

Mark Cerny gave examples of what types of workloads were powerhungry for the GPU (HZD map screen with low triangle count) and CPU (AVX workloads). They wanted a solution that would be at the highest possible frequency during normal use, and then outlier/extra powerhungry instructions would decrease the frequency to keep within the powerbudget.
This way they don't have to increase the fan- and PSU size because of an outlier scenario.
AMD Smartshift was only mentioned as being used in a specific scenario where the CPU was not using up its assorted power budget, and because of that it would be able to transfer the power to the GPU.
 
Its been a while. What is the rule on link posting? @Shifty Geezer

Very interesting article posted today regarding the technical capabilities/differences of the i/o and ram subsystems of the two consoles.

Let me know how I can post reference please.
 
Generally links are fine as long as they are not pointing to illegal content. So links to documents that are under an NDA and have been obtained illegally is generally frowned upon.
 
That would be absolutely horrible for early adopters especially if a devkit offered better performance than retail units, so the devs think they're offering a great experience but a large subset of their consumers suffer.
What happens when the vents aren't cleaned regularly or you live some place unusually warm? Will your gaming experience get worse over time? I was skeptical of a 2ghz gpu and I'm even more curious about building a cooling system with a specific thermal range and adjusting clocks relative to each other to maintain optimal aggregate heat output from CPU plus gpu.
 
Is this credible? While others said the OS takes memory from the slower bandwidth modules leaving the 10GB of the faster and 3.5 of the lower for games this guy says that 7.5GB of the fast memory is available for games from start or the bandwidth is averaged?

The SX has 2.5 GB reserved for system functions and we don't know how much the PS5 reserves for that similar functionality but it doesn't matter - the Xbox SX either has only 7.5 GB of interleaved memory operating at 560 GB/s for game utilisation before it has to start "lowering" the effective bandwidth of the memory below that of the PS5... or the SX has an averaged mixed memory bandwidth that is always below that of the baseline PS4. Either option puts the SX at a disadvantage to the PS5 for more memory intensive games and the latter puts it at a disadvantage all of the time.
 
Is this credible? While others said the OS takes memory from the slower bandwidth modules leaving the 10GB of the faster and 3.5 of the lower for games this guy says that 7.5GB of the fast memory is available for games from start or the bandwidth is averaged?

The SX has 2.5 GB reserved for system functions and we don't know how much the PS5 reserves for that similar functionality but it doesn't matter - the Xbox SX either has only 7.5 GB of interleaved memory operating at 560 GB/s for game utilisation before it has to start "lowering" the effective bandwidth of the memory below that of the PS5... or the SX has an averaged mixed memory bandwidth that is always below that of the baseline PS4. Either option puts the SX at a disadvantage to the PS5 for more memory intensive games and the latter puts it at a disadvantage all of the time.

Maybe some of the fastest RAM is reserved for the OS. But the full 2.5GB loaded into the fastest pool? Smells like BS.
 
Is this credible? While others said the OS takes memory from the slower bandwidth modules leaving the 10GB of the faster and 3.5 of the lower for games this guy says that 7.5GB of the fast memory is available for games from start or the bandwidth is averaged?

The SX has 2.5 GB reserved for system functions and we don't know how much the PS5 reserves for that similar functionality but it doesn't matter - the Xbox SX either has only 7.5 GB of interleaved memory operating at 560 GB/s for game utilisation before it has to start "lowering" the effective bandwidth of the memory below that of the PS5... or the SX has an averaged mixed memory bandwidth that is always below that of the baseline PS4. Either option puts the SX at a disadvantage to the PS5 for more memory intensive games and the latter puts it at a disadvantage all of the time.

That immediately makes it not credible if they can't even get that simple information correct. MS has already stated that the OS reservation is in the "slow" portion of memory.

Regards,
SB
 
Maybe some of the fastest RAM is reserved for the OS. But the full 2.5GB loaded into the fastest pool? Smells like BS.

https://www.resetera.com/threads/pl...ve-ot-secret-agent-cerny.175780/post-30372096

This is the answer from a dev on era. I was thinking the same but it could help the bandwidth. Probably not the full OS... The functionnality used very often could end up in this part of the memory for not taking up too much bandwidth. It would help to have a higher bandwidth.
 
Last edited:
https://www.resetera.com/threads/pl...ve-ot-secret-agent-cerny.175780/post-30372096

This is the answer from a dev on era. I was thinking the same but it could help the bandwidth. Probably not the full OS... The functionnality used very often could end up in this part of the memory for not taking up too much bandwidth. It would help to have a higher bandwidth.

That doesn't make sense as even when occupying the slower portion of memory, that bandwidth is still significantly higher than what a desktop CPU has access to.

There's really no reason to use any of the "fast" memory allocation for the OS.

Regards,
SB
 
Status
Not open for further replies.
Back
Top