AMD: Speculation, Rumors, and Discussion (Archive)

Status
Not open for further replies.
Ofc if the 480 was not exist, and if i was thinking that this 36CU wil correspond to an older GCN archittecture with 36cu.Is there any corespondance with older GCN gpu ? what do you think ? that the neo is still using an 7000, hawaii series based gpu's on 28nm?
Obviously not, PS4 was alreaady GCN2, point was that the CU count is just a coincidence, Neo has GCN3 or GCN4 iGPU, but the CU count being same as Polaris is just a coincidene
 
So PS and Xbox will both use the same or very close GPU? Also that means that the Xbox version will be running at higher frequency?

And btw ppl are selling 480 4GB for 300 dollars in Ebay...and some are actually buying them. What a time to be alive...
 
So PS and Xbox will both use the same or very close GPU? Also that means that the Xbox version will be running at higher frequency?
PS4 and XB1 did. PSNeo and Scorpio probably won't. Scorpio is coming late enough to be Vega-based (gfx ip lvl 9, while polaris is gfx ip lvl 8.x like tonga & fiji). If I had to guess by the 6 TFLOPS number, assuming it's just GPU flops, I'd put my money on 48 CU @ 1 GHz
 
PS4 and XB1 did. PSNeo and Scorpio probably won't. Scorpio is coming late enough to be Vega-based (gfx ip lvl 9, while polaris is gfx ip lvl 8.x like tonga & fiji). If I had to guess by the 6 TFLOPS number, assuming it's just GPU flops, I'd put my money on 48 CU @ 1 GHz
What's the differences between the two levels?

Indeed I'd expect most of the performance difference coming from more CU's.
 
So PS and Xbox will both use the same or very close GPU? Also that means that the Xbox version will be running at higher frequency?

Alternatively they just run with more CUs instead of or in addition to higher clock speed. It's highly unlikely that the CUs in an Xbox console will be clocked significantly higher than the CUs in a PS console. Look at PS4/XBO as examples. PS4 has more compute power, but the CUs are clocked the same as the XBO.

Regards,
SB
 
Obviously not, PS4 was alreaady GCN2, point was that the CU count is just a coincidence, Neo has GCN3 or GCN4 iGPU, but the CU count being same as Polaris is just a coincidene

Code written for Neo's GPU is not interchangeable with what was written for the PS4's GPU. Other slides indicate new instructions with Neo as well.
However, that might also mean that whatever Neo has due to being backwards-compatible is not a complete match, since the confirmed ISA match between GCN3 and GCN4 leaves some conflicts with GCN2--unless the PS4's variation on GCN2 was different enough to leave room. The items Polaris brings at a physical level would benefit Neo, and might very well be necessary to get to a FinFET GCN. Another fun coincidence is that Neo's apparent clock is one of the Polaris power states.
 
Some interesting reading for people that don't frequent the console forums.

http://www.eurogamer.net/articles/digitalfoundry-2016-doom-tech-interview

There's some good stuff in there about how they use async compute. I don't want to dig through past posts to try to find the specific ones but I remember some people feeling like Dx12/Vulkan and async compute would mean more work and thus less effort put into art, game design, etc.

Digital Foundry: What are your thoughts on adopting Vulkan/DX12 as primary APIs for triple-A game development? Is it still too early?

Axel Gneiting: I would advise anybody to start as soon as possible. There is definitely a learning curve, but the benefits are obvious. Vulkan actually has pretty decent tools support with RenderDoc already and the debugging layers are really useful by now. The big benefit of Vulkan is that shader compiler, debug layers and RenderDoc are all open source. Additionally, it has full support for Windows 7, so there is no downside in OS support either compared to DX12.

Tiago Sousa: From a different perspective, I think it will be interesting to see the result of a game entirely taking advantage by design of any of the new APIs - since no game has yet. I'm expecting to see a relatively big jump in the amount of geometry detail on-screen with things like dynamic shadows. One other aspect that is overlooked is that the lower CPU overhead will allow art teams to work more efficiently - I'm predicting a welcome productivity boost on that side.

Contrary to those people's opinions, iD expects there to be a significant productivity boost due to Dx12/Vulkan, at least when it comes to the art teams working on the games. And right now Art is one of the limiting factors on how quickly a game can be finished. IE - art and arts assets take up the most development time.

Interestingly they are also expecting to see a "relatively big jump in the amount of geometry detail" due to the new APIs once developers start using Vulkan and Dx12 as the primary development API which no-one, including iD has done yet.

Oh and I'd love to see a site test this.

Axel Gneiting: We are using all seven available cores on both consoles and in some frames almost the entire CPU time is used up. The CPU side rendering and command buffer generation code is very parallel. I suspect the Vulkan version of the game will run fine on a reasonably fast dual-core system. OpenGL takes up an entire core while Vulkan allows us to share it with other work.

Regards,
SB
 
Last edited:
Code written for Neo's GPU is not interchangeable with what was written for the PS4's GPU. Other slides indicate new instructions with Neo as well.
However, that might also mean that whatever Neo has due to being backwards-compatible is not a complete match, since the confirmed ISA match between GCN3 and GCN4 leaves some conflicts with GCN2--unless the PS4's variation on GCN2 was different enough to leave room. The items Polaris brings at a physical level would benefit Neo, and might very well be necessary to get to a FinFET GCN. Another fun coincidence is that Neo's apparent clock is one of the Polaris power states.
Could GCN3/4 be made "GCN2-compatible" with something as simple as microcode updates? I know it sounds like a hassle, but what if Neo just switches between 2 microcode updates depending on if you're running unpatched PS4 game or Neo-patched game?
 
Jean Geffroy said:
Our post-processing and tone-mapping for instance run in parallel with a significant part of the graphics work
Looks like the inter frame async to me (also explains always on Vsync with Vulkan, which is free since there is already additional inter frame bufferization for async). I wonder how much fps async brings on table for 4K because a few % framerate gain could not worth ~20 milliseconds losses in input latency (on FuryX if avg framerate is 50).
 
Last edited:
Could GCN3/4 be made "GCN2-compatible" with something as simple as microcode updates? I know it sounds like a hassle, but what if Neo just switches between 2 microcode updates depending on if you're running unpatched PS4 game or Neo-patched game?
I'd think it's possible. The ACE/HWS could easily emulate prior generations with microcode. The actual ALUs would be another matter, but so long as they weren't deprecating instructions or reducing/removing fixed function hardware I don't see a problem. I haven't seen any technical details, but tonga/figi/polaris seem to share the same scheduling hardware as well with some new features backported. Curious to know if the prefetching with Polaris was backported since the ISA didn't actually change. In theory that was a scalar processor feature that was implemented without updating anything. That could be the "improved" drivers they released earlier in the year.

Would they have released Polaris without all the hardware features enabled yet? Might explain the lack of a whitepaper or specifics about the processors.

Looks like the inter frame async to me (also explains always on Vsync with Vulkan, which is free since there is already additional inter frame bufferization for async). I wonder how much fps async brings on table for 4K because a few % framerate gain could not worth ~20 milliseconds losses in input latency (on FuryX if avg framerate is 50).
If I understood their technique correctly, that latency would be largely irrelevant as they are reprojecting similar to async timewarp with VR. The effect for hiding AFR latency with mGPU might be interesting as well. Not sure I've heard anyone discuss just how much latency ATW could bury. That might also play towards the vsync differences between AMD and Nvidia we're seeing.
 
Looks like the inter frame async to me (also explains always on Vsync with Vulkan, which is free since there is already additional inter frame bufferization for async). I wonder how much fps async brings on table for 4K because a few % framerate gain could not worth ~20 milliseconds losses in input latency (on FuryX if avg framerate is 50).

This is actually not true (or a bug on Eurogamer's side) because I can perfectly disable Vsync or set Adaptive Vsync on my Fury X under Vulkan (with both the latest WHQL and beta drivers)
 
This is actually not true (or a bug on Eurogamer's side) because I can perfectly disable Vsync or set Adaptive Vsync on my Fury X under Vulkan (with both the latest WHQL and beta drivers)
Have you checked whether turning off vsync has some actual effect? I'm asking because there are quite a few people saying that turning off VSync with Vulkan doesn't actually turn it off.
BTW Could you please test a few locations with Temporal AA / SMAA to check Async gains? It would be nice to know some numbers for Async on (TAA) vs Async Off (SMAA), especially in higher resolutions
 
Have you checked whether turning off vsync has some actual effect? I'm asking because there are quite a few people saying that turning off VSync with Vulkan doesn't actually turn it off.
BTW Could you please test a few locations with Temporal AA / SMAA to check Async gains? It would be nice to know some numbers for Async on (TAA) vs Async Off (SMAA), especially in higher resolutions
When turning off Vsync I get screen tearing and unlocked FPS..so I guess that it is indeed off :).
Regarding Async I unfortunately won't have access to my Desktop PC until Monday so can't really tell you. But from my quick testing @ 1440P Async On gives me roughly between 5% - 12% depending on the scenes/action.
 
Have you checked whether turning off vsync has some actual effect? I'm asking because there are quite a few people saying that turning off VSync with Vulkan doesn't actually turn it off.
BTW Could you please test a few locations with Temporal AA / SMAA to check Async gains? It would be nice to know some numbers for Async on (TAA) vs Async Off (SMAA), especially in higher resolutions
Let's put it this way: if turning it off had no effect, you would be limited to whatever refresh rate your monitor is, and from many benchmarks we know that's not the case
 
Could GCN3/4 be made "GCN2-compatible" with something as simple as microcode updates? I know it sounds like a hassle, but what if Neo just switches between 2 microcode updates depending on if you're running unpatched PS4 game or Neo-patched game?

For some time, x86 CPUs have a microcode engine and the ability to trap at an instruction level, which allows them to work around bugs that might turn up in some corner case. However, at least for CPUs it's usually not a performance improvement to do this.
The microcode updates I have seen discussed for GCN are updates to the microcode programs used by the front end processors to handle queue and system commands for the GPU, which are separate from the GCN ISA and not relevant to the CUs.

I'd think it's possible. The ACE/HWS could easily emulate prior generations with microcode. The actual ALUs would be another matter, but so long as they weren't deprecating instructions or reducing/removing fixed function hardware I don't see a problem.
There have been instructions dropped since GCN2, and there are cases where the freed encodings have been used by the new instructions for GCN3/4.
Other encodings have changed, such as scalar memory operations. Those doubled in length with the addition of scalar memory writes, so no amount of finagling is going to make GCN3 fit into GCN2.
If there isn't some kind of microcode engine that has gone unmentioned either for the shared instruction fetch portion for the CU group, or at a CU level, some other solution would have to be found like a dual-mode decode block or a modest update to the PS4's ISA that sneaks in some additional functionality in the few places with encoding room.

Maybe things done for physical implementation for the purposes of power management and whatnot that might have made it in. They would seemingly be necessary to get Neo's power consumption down with its expanded resource count.
There are some items that do follow Polaris, if the slides are accurate. Neo's enhanced mode activates additional shader engines to have 36 CUs, which means the same decision that was made for Polaris exists in Neo. What would be different is that the ROP count math changes. Polaris gets to 32 ROPs with all its engines active, but the Neo slides seem to indicate that it can get to 32 with some of its shader engines shut off.
 
If there isn't some kind of microcode engine that has gone unmentioned either for the shared instruction fetch portion for the CU group, or at a CU level, some other solution would have to be found like a dual-mode decode block or a modest update to the PS4's ISA that sneaks in some additional functionality in the few places with encoding room.
Wouldn't have been the first time that this is solved by a simple shader rewrite. I don't know the precise software architecture, but it is safe to assume that there is at least some type of hardware abstraction / driver layer, right?
 
Status
Not open for further replies.
Back
Top