Xbox One (Durango) Technical hardware investigation

Status
Not open for further replies.
Thanks 3dilettante. Here is a similar diagram from a paper discussing very similar design proposal for gpgpu not specifically to amd. Those EX from the AMD design appear very similar to the 32 tiny cache in the diagram below.

At the end though if you read the text in my photo right above the diagrams they settled on 16 tiny cache, and not the 32 shown in the diagram, quite similar to what it seem AMD settled on if those 16 EX are indeed micro caches.

semv5y.jpg

http://users.soe.ucsc.edu/~renau/docs/islped13.pdf
 
Last edited by a moderator:
Thanks 3dilettante. Here is a similar diagram from a paper discussing very similar design proposal for gpgpu not specifically to amd. Those EX from the AMD design appear very similar to the 32 tiny cache in the diagram below.

At the end though if you read the text in my photo right above the diagrams they settled on 16 tiny cache, and not the 32 shown in the diagram, quite similar to what it seem AMD settled on if those 16 EX are indeed micro caches.

I've not done much more than skim the pdf, but the most likely reason why the diagram has 32 lanes is that it's based on an Nvidia design, hence the warp-width lane count and SM nomenclature.
The tiny caches also write-back, which a GCN-type architecture would have a problem with.
 
http://directxbox.com/viewtopic.php?f=2&t=524

RYSE:

Internal HDD
To menu: 31 seconds
Game Load: 1 minute 18.5 seconds *my bold
Reload same checkpoint from in game: 7.5 seconds

RAID0:
To menu: 29 seconds
Game Load: 56 seconds
Reload same checkpoint from in game: 6 seconds

SSD (Crucial m4 256GB)
To menu: 27 seconds
Game Load: 56 seconds
Reload same checkpoint from in game: 6 seconds

Wat. I haven't played console games in a long time but is it normal these days for a game to spend 80 seconds loading a level? Anything >~8 seconds feels long to me, and my PC is nothing too special (Intel 330 SSD + i7-3770). I could squeeze out a turd burglar + wipe and wash hands in that time.
 
Yeah, it's not a mystery where these poster are getting this crap.

The actual diagrams and presentations are worthwhile explorations of a possible future direction.
To varying degrees, some of the worst-case scenarios used to compare the simulated improved architecture are what the best APUs are doing right now, which markedly reduces their potential.
The problem is that they are being used to jump to conclusions that they have no relevance for.

It's a case of having a decent quantity of data that has been stripped of context and understanding while being contorted into unnatural forms by tunnel vision and wishful thinking.


Buzzword hoarders like that source bother me, in no small part because of the sheer volume of heedless output, which in the absence of meaningful effort at understanding expends so much effort at what is in my opinion such a lazy form of argumentation.
 
Wat. I haven't played console games in a long time but is it normal these days for a game to spend 80 seconds loading a level? Anything >~8 seconds feels long to me, and my PC is nothing too special (Intel 330 SSD + i7-3770). I could squeeze out a turd burglar + wipe and wash hands in that time.
Consoles are the last bastion of hard drives. No other consumer device uses them anymore. Your PC has a SSD. Try removing it and it feels like going back to stone age. All tablets and phones have flash based (fast seek) storage devices. People are now used to opening apps and the web browser instantly. I still remember the old times when it took like 15 seconds to open the web browser (and 25 second to open the Photoshop). My old computer was very unresponsive for the first 30 seconds or so after boot (all background applications were loading, desktop was loading icons one by one, etc). At least we got mandatory hard drives this time (loading directly from disc was even worse).

1080p is 2.25x pixel count compared to 720p. To make a game look good you need 2.25x more texture data. And that's without any other material quality improvements. Add all the stuff you need for good PBR pipeline, and you easily reach 4x-5x texture size increase. Hard drives haven't evolved much in the last 10 years (not even 2x speed improvement). Longer loading times were expected.
 
PDF:

Glossary
SPU = Shader Processing Unit
SPUs = Stream Processing Unit
...
SPU = Shader Processing Unit, is not SPUs ,

Sry, but no. SPU have not that fixed meaning. It is just a shortcut, that makes sense in the context of what is presented.


Well different meaning of SPU:
- Sound Processing Unit
- Special Processing Unit
- Special Porpose Unit
- Stream Processing Unit
- Shader Processing Unit

So if you don't know the context of the presentation you wouldn't know what SPU means.

Well and just as it is in presentations shortcuts don't necessary use "s" if they mean multiple "elements"
768 SPU == 768 SPUs
 
Sebbi - do you have the new SDK docs yet, and if so can you talk to any of the changes it will allow?
If he has access to the SDK docs he's probably under NDA and thus can't share, Sebbi has a number of great posts describing how ESRAM can be utilised to great effect. An additional 10% of GPU time can provide 'more' of that as of yet I haven't seen any credible claims for it enabling anything new. Of course if you're not under NDA Sebbi I'd be fascinated to hear if it does enable anything new or allow techniquers restricted by 'only' having 90% of the GPU time.
 
If he has access to the SDK docs he's probably under NDA and thus can't share, Sebbi has a number of great posts describing how ESRAM can be utilised to great effect. An additional 10% of GPU time can provide 'more' of that as of yet I haven't seen any credible claims for it enabling anything new. Of course if you're not under NDA Sebbi I'd be fascinated to hear if it does enable anything new or allow techniquers restricted by 'only' having 90% of the GPU time.

time slice does sound sort of devastating.
It's not like you lost 1 CU worth of power, making it 11. It's more like you're going 9 times straight and sitting there doing nothing for 1 time. If you're gaming is relying on continual feeding of data back and forth, I can see this being a thorn in your side.
 
EX = Execution lane or Processing element
GCN1.0 = 4 EX lane, 4 x Vector thread, the Vector width is 16 ALU

On AMD GCN documentation there is no SPU, it is only SP and CU and SIMD

Yes SPU is older term, but also on various leaked VI specs from Chiphell
posted the spec as:
1 SIMD = 16X SPU
also xx SP
(SP & SPU is different)


http://amd-dev.wpengine.netdna-cdn....Sea_Islands_Instruction_Set_Architecture1.pdf
from Dec 2013, this is Rev 1.3, (and they keep showing Northern Island Similarity)

From Sea Island pdf, we see AMD back to 1 SIMD representing 1 CU
1 CU = 1 SIMD = 64 ALU, very much like NI
in NI OpenCL documentation, they showed as 1 SIMD = 16x SPU or 16x PE

The problem is MS official slide is = AMD GCN 768 SPU, not AMD NI 768 SPU or AMD GCN 768 SP
ALL AMD GCN documentation not listed any SPU term, yes there is SP.

the only one inline is the chiphell forum listed future VI, as 1 SIMD as 16X SPU, but also xx SP
basically 1 CU/1 SIMD = 16 SPU, 64 SP
 
I'm afraid I am not able to make sense of what you are saying.
Terms with a general consensus as to their meaning are being used in ways that contradict their usual usage and possibly by several AMD conferences, engineers, and disclosures from AMD, Sony, Microsoft, and so on.

I won't try to poke further holes in the interpretation, because I cannot make sense of it.
I will caution that, if you are using the diagrams in the Sea Islands documentation that I think you are using, there are some massive copy-paste errors from multiple generations back that AMD has refused to fix for years.
 
That presentation is interesting.

Interestingly they list all the specs as we know them, 853 mhz GPU etc, but list the ESRAM as 102 GB/s (not 109 or 204), with the caveat "sometimes faster in practice".

Also interesting to me was they called out the DDR3 as "low latency". Which we haven't heard a lot. Although did not say compared to what.
 
Interesting tidbit about esram usage. I wonder if launch titles were restricted to 1/2 (basically static allocation) due to familiarity with the hardware or if it was the api that was lacking (or both).

The wording make it sound like partial residency of buffers is a recentish sdk addition.

Its impossible given the demands of the rendering pipelines we've seen in x1 launch games in particularly various deferred rendered games that they'd only have access to half the esram, especially considering the splitting bandwidth of the mainmemory & other issues i wont mention because of x1/ps4/pc platform wars.

I doubt there is a single quote from a developer stating or even suggesting they were limited to half the esram in past sdks.
Lack of familiarity wouldn't limit developers to accessing half of the esram scratchpad.

They are likely making use of various techniques at reducing such as storing the top portion of render targets in the main memory (eg sky) and the bottom portion in esram.
 
Status
Not open for further replies.
Back
Top