Wii U hardware discussion and investigation *rename

Status
Not open for further replies.
1920x1080 @FP16 w/o AA = 16,6MB
1280x720 @FP16 w/o AA = 7,4MB
1280x720 @FP16 2xAA = 14,7MB
1280x720 @FP16 4xAA = 29,5MB
854x480 (est. uTab res.) @FP16 w/o AA = 3,3MB (1 uTab this is!)

1080p@FP16 + 4xuTab this would be around 30MB only for frame cache.
If they will really support only 1 uTab it would be 20MB, 720p with 2xAA and 4 uTabs would already need 27,9MB @FP16. So with 30MB they could emulate the complete WiiU 1T-SRAM (24+2+1 = 27MB) and support up to 4 uTabs with 1080p @FP16 on the TV screen.
 
To me rumors about a APU makes a lot more sense than what I read earlier.
I still wonder about the amount of edram embedded inside the chip, mostly for cost reasons.
In forward renderer you need ~15 MB for your frame buffer @1080p, that's'quiet a lot, half of what's available to a power7. That takes room then @low clocks it makes sense to use edram too for a shared L2.
As most engines are moving to deferred rendering the requirement are actually higher, so I don't believe that N will takes that road and match the edram amount for 1080p rendering. The best they could do is just above 10MB to fit a light weight G-buffer as the one used in Crisis2 @ 720p. That's'still a lot and it would surprise me if the amount is lower than that and N asked ATI to implement some tiling support as in xenos, the extra processing power would allow to hide the cost of the extra passes.
So for me 16 MB of edram sounds optimistic (12 for the gpu 4 for cpu cache).
I'm more comfortable with 10MB 2 for cpu, 8 for the gpu even though that means that even light G-buffer would not fit within the edram pool.
 
Last edited by a moderator:
So I wonder about the optimization N may have asked to ati.
I think, one problem, especially with increasing edram size, is the transfer back to memory.
Having a fat GBuffer might also stall just because of reading it again, I could imagin that N requested to be able to sample from edram, like it was possible on GC and Wii. Otherwise you'd need quite fast main memory to keep up and having 2x the speed is probably 4x the cost -> it might make sense to have more edram with better gpu-access and save that money on the ram side.

I didn't read anything about texture sampling from edram on wiiu, does anyone have any source?
 
So for me 16 MB of edram sounds optimistic (12 for the gpu 4 for cpu cache).
I'm more comfortable with 10MB 2 for cpu, 8 for the gpu.

30MB 1T-SRAM-Q @45nm (33,6mm²) would only be slightly bigger than the 3MB 1T-SRAM @90nm in the Wii (26,4mm²), so I wouldn't say 16MB is overly optimistic for CPU and GPU, quite the contrary from my perspective.
 
I didn't read anything about texture sampling from edram on wiiu, does anyone have any source?

I thought about that too but I think textures today are just to big to have a dedicated texture cache (one 1k texture is already ~3MB, 2k over 12MB)
 
I thought about that too but I think textures today are just to big to have a dedicated texture cache (one 1k texture is already ~3MB, 2k over 12MB)

Texture caches don't work like that. They leverage spatial locality, not temporal.
 
If it is an APU, I expect the eDRAM to be neither dedicated cache nor framebuffer, but simply another "MEM1". An embedded high speed memory pool developers can use however they see fit.
 
I do not like EDRAM in any console design.

Though for Wii U, already not going for the highest performance, I guess it doesn't matter as much.
 
Texture caches don't work like that. They leverage spatial locality, not temporal.

Sorry for OT, but could you explain this further?

If it is an APU, I expect the eDRAM to be neither dedicated cache nor framebuffer, but simply another "MEM1". An embedded high speed memory pool developers can use however they see fit.

I except it to be fully flexible too if it will in fact be around these 30MB.
 
Sorry for OT, but could you explain this further?

Tex caches are primarily intended as a stream through medium (this leverages spatial locality, samples are "near" eachother so fetching a block means you'll feed more than the initial generating access), not a store and reuse deposit(this would imply temporal reuse, hence you'd need to have the whole texture cached, or a large chunk of it, for reuse in subsequent cycles). Even ignoring that, nobody caches entire datasets but rather blocks of them (lots of data around), so one shouldn't think about a cache in an "I have X Mb of cache, if my entire dataset is larger I'm screwed, can't use it". For example, the L1 tex-cache in something like Cypress is 8KB, and it's quite useful!
 
I do not like EDRAM in any console design.

Though for Wii U, already not going for the highest performance, I guess it doesn't matter as much.

I like it!

In the last two generations the only consoles without some kind of embedded video memory were the Xbox and the PS3. Both had cost and performance issues because of a lack of cheap, fast video memory. Both also ended up being the least successful systems of their respective generations in $$ terms.

Embedded video memory has previously allowed for higher performance within the same range of manufacturing costs, so it's great. It's only not been used recently when someone was desperately trying to force an Nvidia chip into a console in a hurry!
 
The Xbox was stuck with a 128 bit memory bus and 4 memory chips for its entire life. No chance of cost reduction even as the PS2 evaporated down to nothing, and Xbox was still hobbled by low bandwidth.

The PS3 is stuck with a 128-bit memory bus and 4 memory chips just for its GPU (never mind the XDR main memory pool) for its entire life. And it's still hobbled by low bandwidth.

Ditching embedded video memory means spending far, far more on a fat bus that you'll be stuck paying for forever. It's a false economy, and one that will also result in worse performance.
 
Ditching embedded video memory means spending far, far more on a fat bus that you'll be stuck paying for forever. It's a false economy, and one that will also result in worse performance.

But the xbox had higher performance than the PS2. And you can argue that PS3's performance issues are centred on the RSX core rather than the memory model it uses.
 
[...] And you can argue that PS3's performance issues are centred on the RSX core rather than the memory model it uses.

Go tell this John Carmack ;)

function is correct, eDRAM could profit immensely from a possible die shrink of the SOC to 32 or even 22nm. It just makes sense in the long term.
 
Go tell this John Carmack ;)

function is correct, eDRAM could profit immensely from a possible die shrink of the SOC to 32 or even 22nm. It just makes sense in the long term.

I'm not questioning that. It's the assertion that lack of edram caused worse performance in the xbox and PS3. As I said Xbox was already the highest peforming console available so we can't really say the lack of edram hurt it from that stand point.

And as far as PS3 is concerned, how well would it have performed if x% of the RSX die had been given over to eDRAM as opposed to the shaders etc... that already aren't enough for it to compete with Xenos?

Cost wise there are definately benefits but overall performance might be debatable. For example how many more shader arrays could Xenos have had if it didn't have that edram daughter die?
 
But the xbox had higher performance than the PS2. And you can argue that PS3's performance issues are centred on the RSX core rather than the memory model it uses.

Whether the Xbox had higher performance than the PS2 is a separate issue to whether the Xbox design was a cost effective one. The cost per MB per second of the Xbox design would have been rubbish by the end of its short life, and there were framebuffer bandwidth related areas where the PS2 trashed the Xbox despite the huge cost of the Xbox setup.

A better question than whether the Xbox was more powerful than the PS2 would be to ask whether the Xbox could have achieved Xbox level performance without its 128-bit main memory bus and I'll bet it could (though maybe not in a hurriedly designed nvidia based system). With unlimited framebuffer bandwidth and less contention between the CPU and GPU you might even have made something faster.

The PS3 also has issues with framebuffer bandwidth, which is noteable because the 4 memory chips and 128-bit bus to them on the GPU package is likely to be more expensive than the single 'edram and ROP' daughter die on the Xbox GPU package. [Edit]Although I have absolutely no evidence to back this up and so could rightly be described as pulling this suggestion out of my ass. What's the bus between Xenos and daughter die? 512 bit x 500mhz, one bit per lane per cycle? [/Edit]

Nintendo saw their way past these issues from the GC onwards, and the Wu seems like it'll offer a similarly smart set of choices.

Cost wise there are definately benefits but overall performance might be debatable. For example how many more shader arrays could Xenos have had if it didn't have that edram daughter die?

Cost and performance are inextricably linked though. Saving money on one thing allows you to spend more on another, and costs which you know will scale down well over time allow you to be more aggressive with a machine's capabilities at launch.

Without the Edram daughter die Xenos would have needed a 256-bit memory bus and twice as many memory chips for most of the life of the platform (still using 8 when the 360 switch to 4). Any additional performance would have come at a terribly high price. How high I don't know, but you could probably count it in buckets full of million dollar tears at EAD.
 
Last edited by a moderator:
The FinancialSuccess = f(eDRAM) relationship is both OT and mind-blowing. Please let it go and return to the topic at hand, thank you.
 
Ditching embedded video memory means spending far, far more on a fat bus that you'll be stuck paying for forever. It's a false economy, and one that will also result in worse performance.

The eDRAM needs an I/O as well. It's not insignificant either.

eDRAM could profit immensely from a possible die shrink of the SOC to 32 or even 22nm. It just makes sense in the long term.

Transitioning to the most advanced node is a slow process because it's initially expensive, especially for larger designs. This cannot be more true for eDRAM due to the way it is manufactured.


-------------



^^^^^^^^ We shouldn't be going astray with discussing the success of a console based on its power level.
 
Status
Not open for further replies.
Back
Top