I believe the unnamed person from AMD you refer to to be misinformed, or possibly being deliberately deceptive. There's two pools of (probably) SRAM too on the die, and very likely other hardware as well, meant to facilitate emulation of the ancient gamecube/wii chipset. There's the CPU bus/northbridge interface as well, DSP, ARM CPU integration and so on.The only customization that I believe Nintendo did was adding the Renesas eDram to the GPU logic.
And in my point of view, changin SPU count is a customization, and this was ruled off by the AMD rep. The only customization that I believe Nintendo did was adding the Renesas eDram to the GPU logic. For me, its either a vanila HD4650 or HD5550, with no Nintendo secret sauce at all. And the tesselation abilities points more to the later option.
Yes and unfortunately they don't all really elaborate their opinion (or do they?).wario said:some devs have said its on par, some have said its even weaker.
The requirement is also backward compatibility with Hollywood/Flipper. How does a vanilla HD4650 or HD5550 achieve that? I think Iwata asks mentions something about hollywood/flipper features *extending* the design and actually contribute.
Yes and unfortunately they don't all really elaborate their opinion (or do they?).
I love it.
[edit] Okay perhaps that is slightly too trollish, but why can't ps2->xbox logic be applied to 360/ps3->wiiu logic? I understand the magnitude of the transitions are different, but I don't think that changes the underlining point.
Well I guess we all can agree with that. Nintendo's problem is that they promised a console capable of delivering next gen quality but didn't show anything like that. Which led to no interest from core gamers and thus bad sales. So either they lied flat out or the system is capable of running PS360 ports without major optimization just below par and exceeds that largely when resources are used properly. I don't think anyone beliefs the latter,there is no proof for that whatsoever. I just hope that Nintendo did, for their own sake.But I still don't think Nintendo would modify the base GPU architecture that much. And maybe that what's that AMD guy was talking about. No new secret Nintendo instructions or a completely different SPU/TMU balance from R7xx parts available. But, I also don't think that this would change Wii Us position as somewhat better than PS360, and way inferior than PS4/XB1.
Then again, PS2 didn't use the most sophisticated form of shading I believe while the XBOX and Gamecube probably already looked better by turning on simple gourraud and/or phong shadingrangers said:Probably because Xbox usually had the slightly superior ports to PS2 in every case. even the "lazy" ports looked better on Xbox in almost every case.
With Wii U we see that's not the case, it has a spotty record on ports so far.
Only in certain GPGPU workloads (as discussed in threads on the subject of eDRAM/ESRAM). GPU's are highly developed to manage high latency RAM access when rendering graphics - it's not a problem that the GPU vendors shrugged their shoulders at and said, "high latency, huh? Whatever."Question for the specialist: the GPU's on die eDram reduces latencies compared to GDDR5 stuff right? Would it be possible that due to this reduced latency, the SIMD cores stall a lot less?
Until Microsoft or AMD discloses more about how the eSRAM interfaces with the GPU memory subsystems, we don't know how much faster it would be. That it should be faster to some degree is the current assumption, but until we know how it works and how it fits in the overall system, we are making assumptions beyond what is publicly available.Question for the specialist: the GPU's on die eDram reduces latencies compared to GDDR5 stuff right? Would it be possible that due to this reduced latency, the SIMD cores stall a lot less? And would that reduce the need for running a multitude of threads per core? And wouldn't that reduce SRAM register bank requirements compared to AMDs mainstream designs?
Ah, think my question is answered with that, thanks3dilettante said:The GDDR5 memory itself contributes measurable latency, but in terms of the latency the GPU memory subsystem has it is a small fraction of the total. If the path the eSRAM is mostly the same as general memory traffic, the overall reduction in stalls won't be enough to significantly reduce the need for multiple threads.
I love it.
[edit] Okay perhaps that is slightly too trollish, but why can't ps2->xbox logic be applied to 360/ps3->wiiu logic? I understand the magnitude of the transitions are different, but I don't think that changes the underlining point.
With a few exceptions like D3 and HL2 aren't you mixing up ps2 and xbox? ps2 had a 70+% market share and a whole bunch of exclusive games. I highly doubt that given the large differences in hardware & the market difference devs worked on xbox first and that shoehorned it to ps2. The other way is most likely.
PS2 hardware is probably much more "maxed out" than xbox or gc hardware. I remember devs on this board commenting about what kind of crazy stuff some people did with ps2 to make it do things it never really was designed to do.
It needs at least two threads before it can utilize all its issue cycles. If there is only one thread, half the cycles on a SIMD do nothing. For GCN, with its 4 SIMDs and round-robin issue, the minimum is four. This is before all other considerations.@3dilettante:
So do I understand you correctly and does a VLIW SIMD core need 2 threads to mask it's own latencies? Thinking a bit further, compared to a 4650, the WiiU's eDram may have at least twice the bandwidth available. So texture fetch times will be much smaller, less threads needed to fill that gap. Does this mean that, in theory, 40SP per SIMD core would be possible with only 16 SRAM banks per core as it would need less SRAM to hold thread bound registers?
I'm not sure of what the implementation is, or why latency benchmarks show the VLIW GPUs as having a flat latency graph that's a flat line at the worst-case latency level. This may be a combination of a memory subsystem that very heavily trades latency for high bandwidth utilization and some kind of very long pipelining for texture fetches.@3dilettante
Ok I had to read it 3 times before I got it, but I think I got it. You mention that there is 180 cycles of latency in case of a *hit*. Does this mean there is a queue in between? I mean I don't suppose the TMU's databus is kept busy so long when it accesses its cache or do I get that wrong?
The VLIW GPUs have a more centralized scheduling and control setup, which GCN distributed amongst the CUs. Some of the hardware was probably in the VLIW, just unexposed and physically distant from the SIMDs.My misconception was I always assumed that SIMDs were able to fetch instructions, texels etc themselves. But instead its the scheduling layer above, that manages those things and feeds the SIMDs with 'simplified' instuctions such as MADD, though introducing more latency. And as long as the amount of cycles spent on fetching data in case of a cache miss isn't huge in comparison to the other latencies, the eDram won't help much. Correct?
No, you're not. But 3dilettante was exclusive talking of texture and vertex fetches (i.e. GATHER and VFETCH ops) and missed one third kind of data access - LDS. Such access is done from within ALU clauses, without the interference of the scheduler. Or put in other words, VLIW SIMDs *can* make use of low-latency mempools.My misconception was I always assumed that SIMDs were able to fetch instructions, texels etc themselves. But instead its the scheduling layer above, that manages those things and feeds the SIMDs with 'simplified' instuctions such as MADD, though introducing more latency. And as long as the amount of cycles spent on fetching data in case of a cache miss isn't huge in comparison to the other latencies, the eDram won't help much. Correct?
Or am I raging as a complete idiot now?