Xbox One (Durango) Technical hardware investigation

Status
Not open for further replies.
Least we know the GPU won't be bandwidth starved. Also it seems very much straightforward for devs to work with.
 
Well, according to ERP developers are encouraged to use DDR3 as the framebuffer...I am more of the theory that ESRAM is a big L3 cache for GPGPU ops.

Do GPGPU ops really depend on latency so much that the normal GPU caches don't handle that effectively? For pure bandwidth reasons the difference between 68GB to 102GB doesn't really sound relevant to me.
 
Do we even know if the 12 CU's could see an improvement in something greater than 16 ROPs? It only has two more CUs than the 7770 which also has 16 ROPs but with a ~20% slower core clock and I assume that was a pretty balanced design.
 
Do GPGPU ops really depend on latency so much that the normal GPU caches don't handle that effectively? For pure bandwidth reasons the difference between 68GB to 102GB doesn't really sound relevant to me.

Depends on what you want to do. If you want to have some fancy visual physics effects, maybe a flag fluttering in the wind, then you can do it with an off-the-shelf graphics hardware, Nvidia PhysX on a GeForce card for example. On the other hand, non-visual GPGPU algorithms, like for example A.I. or pathfinding or driving physics, are ultra latency sensitive since GPU and CPU have to work together on it. AMD claims that such algorithms are possible with the HSA.
 
Depends on what you want to do. If you want to have some fancy visual physics effects, maybe a flag fluttering in the wind, then you can do it with an off-the-shelf graphics hardware, Nvidia PhysX on a GeForce card for example. On the other hand, non-visual GPGPU algorithms, like for example A.I. or pathfinding or driving physics, are ultra latency sensitive since GPU and CPU have to work together on it. AMD claims that such algorithms are possible with the HSA.

Well, since the GPU addresses virtual addresses that would seem to indicate they could be coherent with the CPU.

I know it's been asked, but can someone run through the disclosure and tell us what's actually different from GCN, besides the ESRAM?
 

Yeah, he said something similar on twitter:

they're accurate as far as i know. just somewhat incomplete. i'm just saying my info is dated feb 2012.
https://twitter.com/aegies/status/298472641935851522

edit: at this point I'm ready to believe what we've seen 100% and just wait for a disclosure on the other blocks in the system (display planes, DMEs, etc.)
 
We don't know how much silicon the other parts eat up. It could still be a big die. For instance, 6T SRAM would be huge.

Yes, but doesn't vgleak claim ESRAM as 1T? And are GPUOPS really depending so much on latency that the normal GPU caches are so inefficient that this ESRAM would make such a relevant difference? Something here isn't really that obvious to me.
 
It has nothing to do with cache. The copy overhead kills the latency: For non-visual GPGPU algorithms you have to copy data from CPU to GPU and back again all the time. The copying takes longer than the computation itself, making it useless for developers. Visual GPGPU algorithms work on homogeneous processors because the data doesn't have to be sent back to the CPU (CPU -> GPU -> screen). The HSA allows CPU and GPU to work on tasks without copying the data back and forth.
 
from superDaE twitter
2j5h5c.png
 
I asked this in the other thread but didn't get a reply;

What is the function of ROPs in relation to rendering and framerate? Also how many texturing units does durango have?
 
I think we've been through this too many times on this forum. You should check posts by Hornet, myself and others in the "Predict a Next Gen..." thread (now locked).

Durango's ESRAM will be 1T-SRAM, which is more closely related to eDRAM than actual SRAM (6T or 6 transistors per cell). 32MB of (6T)SRAM would be rediculous and infeasible, thus the only reasonable option is 1T-SRAM, aka the same type used on the gamecube.

It'll be denser than eDRAM, availble for manufacturing on a 28nm process and possible to be all on the same die as the other components.

AFAIK, for smaller nodes (<= 40 nm) 1T-SRAM = eDRAM. Just ask MoSys or TSMC ;-)

eDRAM is nice, but you have to deal with refreshes and (more important) leakage issues, and not to forget to extra process masks (4-6 ??) --> more costs per die.
 
Could the eSRAM make the Durango perform very efficiently at tessellation. Nvida once labelled tessellation as the future of gaming. It suppose to allow for better animation, better lighting and up polygon counts tremendously, while reducing the overall memory bandwidth and footprint.

http://www.hardwaresecrets.com/datasheets/03_TessellationDeepDive.pdf

http://forum.beyond3d.com/images/styles/B3DArena/editor/italic.gif

Compression: Using tessellation allows us to reduce our memory footprint and bandwidth consumption. This is true both for on-disk storage and for system and video memory usage, thus reducing the overall game distribution size and improving loading time. The memory savings are especially relevant to console developers with scarce memory resources.

Bandwidth is improved because, instead of transferring all of the vertex data for a high-polygon mesh over the PCI-E bus, we only supply the coarse mesh to the GPU. At render time, the GPU will need to fetch only this mesh data, yielding higher utilization of vertex cache and fetch performance. The tessellator directly generates new data which is immediately consumed by the GPU, without additional storage in memory.


Has anyone thought about TBDR. Its supported in Direct 3D 11.1 now. Doesn't it use alot of local memory to reduce to read/modify/writes to main memory saving bandwidth in the process? Could GCN cores be programmed to be TBDR like?
 
Last edited by a moderator:
Status
Not open for further replies.
Back
Top