Xbox One (Durango) Technical hardware investigation

Status
Not open for further replies.
Funny they demo it with nvidia. Do not believe this is anything new.

Seems to be based on http://www.opengl.org/registry/specs/AMD/sparse_texture.txt
I knew that but thanks for sharing more detailed info. Imo, perhaps they showed the demo running on NVidia cards because there wasn't a GCN equivalent on NVidia so they are supporting this technology since recently?

Partially Resident Textures were introduced with GCN and are exposed in DirectX 11.2 through the Tiled Resources functionality.

There are two types of support of tiled resources, and the AMD DX11.1 support puts it in tier 2.
This means that the AMD 7790 HD Bonaire GPU is a Tier 2, and since this technology is going to run across several generations, new AMD GPUs are going to be Tier 3.

Aside from that, as we know only too well, journalists compared the GPU within Xbox One with the HD 7790, a theory that I fully support. -in some ways they seem to be fairly even hardware-

Other people "like" to compare it to the HD 7770. -which I think would be Tier 1, it may well be that it isn't, I am just trying to make sure that I come to understand what you define by your Tiers classification (i.e. Tier 2).

I am curious, which graphics card does the Xbox One GPU resemble the most to you, the 7770 or the 7790?
 
Last edited by a moderator:
I am curious, which graphics card does the Xbox One GPU resemble the most to you, the 7770 or the 7790?

Given the supposed 2nd geometry setup engine, it would seem more akin to Bonaire (CU/TU counts not-withstanding). Of course, neither are sporting 256-bit buses, let alone DDR3.
 
Given the supposed 2nd geometry setup engine, it would seem more akin to Bonaire (CU/TU counts not-withstanding). Of course, neither are sporting 256-bit buses, let alone DDR3.
Many thanks AlNets -our fellow former... I'll leave it at that, until recently, when by that point, everyone who already knew you, knew you as .......whereas new members perhaps don't remember you?-. That kinda settles it for me. We could create a thread dedicated to discussing the numbers but maybe it isn't necessary at all.

Cheers
 
Do the CUs come in groups of 2 or 4 or could they be activated one at a time?

They can be disabled/activated one at a time, however, every 4 CU share some resources, and if there's a flaw any part of the shared resources and you don't can't disable a full 4CU block, the chip is toast.
 
Call me crazy, but the idea of an possible upclock to Durango has been fermenting in my head a little...

-Eastmen's recent upclock post
-Greenburg's tweet seeming to imply hardware changes before the May reveal. That was never explained.
-The lack of any clocks or flops being revealed for XB1, means they are still unknown, for better or worse.
-Albert Panello's (Microsoft guy) recent post on GAF. Too me, I can read it as hinting at an upclock.


http://www.neogaf.com/forum/showpost.php?p=66980146&postcount=542

Albert Panello:






What does the bolded part, and the post in total, mean? "specs change"?

It could also read to mean they somehow dont want Sony to one up them?

Note, I'd only indulge in this speculation as idle sideshow. We have people like DF, and other sources, that have never hinted at any upclock or have said the clock is 800.

Edit: reading Panello's post again, an alternate explanation that seems very plausible is "we're better (due to something not on paper), we dont want the competition to find out and up their specs to one up us". Rather than hinting at any upclock.

Hope that's not too versus.

I do know Microsoft toyed with 850 MHz and 900 MHz upclocks on GPU and ESRAM according to my sources and decided to stay at 800. MHz as far as I know!Plus the 800 MHz is the specs for the alpha dev kits!
 
That "Mynd" guy on PSU forums who was quoted earlier seems to be being discussed on GAF Anyway, does this make any sense (somehow I doubt it) his explanation for the new BW?



There is other stuff he's saying that seems controversial too

http://www.psu.com/forums/showthrea...lopers/page7?p=6132276&viewfull=1#post6132276

Using the falling edge is an interesting idea that I hadn't thought of. Looking at DDR it will read or write on both the rising and falling edge of the clock, but it only operates in one mode (read, write) at a time. If that were true of this ESRAM you could in theory read or write at double rate. But the thing is DF is saying the full bandwidth for read or write remain unchanged. So would that imply rising edge for read and falling edge for write? Why set up that way instead of doubling bandwidth for each mode, like DDR? Seems weird. Maybe someone else would have a good speculative reason for doing so. It seems complicated to avoid bus contention alternating reads and writes like that. Is there a strategy for dealing with contention that might suggest the 130 GB/s number, or whatever it is, which is well below theoretical max, assuming this setup is true for arguments sake. Still seems like a complex design for timings when you have two devices driving the bus while the clock is asserted. How does the second device know when to assert before the falling edge without trampling the data before the rising edge operation is complete? Also how do you prevent the alternating writes from destroying the data you're reading?
 
Using the falling edge is an interesting idea that I hadn't thought of. Looking at DDR it will read or write on both the rising and falling edge of the clock, but it only operates in one mode (read, write) at a time. If that were true of this ESRAM you could in theory read or write at double rate. But the thing is DF is saying the full bandwidth for read or write remain unchanged. So would that imply rising edge for read and falling edge for write? Why set up that way instead of doubling bandwidth for each mode, like DDR? Seems weird. Maybe someone else would have a good speculative reason for doing so. It seems complicated to avoid bus contention alternating reads and writes like that. Is there a strategy for dealing with contention that might suggest the 130 GB/s number, or whatever it is, which is well below theoretical max, assuming this setup is true for arguments sake. Still seems like a complex design for timings when you have two devices driving the bus while the clock is asserted. How does the second device know when to assert before the falling edge without trampling the data before the rising edge operation is complete? Also how do you prevent the alternating writes from destroying the data you're reading?

That notion comes down to...

Has this technique ever been used on eSRAM before? If so, maybe they built it with this in mind, but were unsure on whether or not it could handle that type of maneuvering.

It would make sense, considering we're nearing final silicon and the engineers would get to finally get to test out that capability.

Maybe that's why we have a number so far from theoretical max, that the people running the testing just wanted to validate that it worked. Then they send the documentation to the developers, tell them how to utilize it and start writing in the tools necessary to capitalize on it.

Risky investment paying off possibly?
 
Using the falling edge is an interesting idea that I hadn't thought of. Looking at DDR it will read or write on both the rising and falling edge of the clock, but it only operates in one mode (read, write) at a time. If that were true of this ESRAM you could in theory read or write at double rate. But the thing is DF is saying the full bandwidth for read or write remain unchanged. So would that imply rising edge for read and falling edge for write? Why set up that way instead of doubling bandwidth for each mode, like DDR? Seems weird. Maybe someone else would have a good speculative reason for doing so. It seems complicated to avoid bus contention alternating reads and writes like that. Is there a strategy for dealing with contention that might suggest the 130 GB/s number, or whatever it is, which is well below theoretical max, assuming this setup is true for arguments sake. Still seems like a complex design for timings when you have two devices driving the bus while the clock is asserted. How does the second device know when to assert before the falling edge without trampling the data before the rising edge operation is complete? Also how do you prevent the alternating writes from destroying the data you're reading?

External DRAM benefits from DDR because it has constraints where it makes sense to prioritize density over performance and physical and economic reasons to limit pin counts.

In the scenario where reads happen on one edge and writes happen on the other over the same data paths for the eSRAM, this assumes that all the routing logic, arbitration, and necessary buffers can be ready twice as fast.
At that point, why not just have another bus, or consider making that bus faster?
It's making a saving on cheap data paths by adding extra complexity to the pipeline stages and logic, which might hinder its ability to work quickly and hinder clock scaling.

That's not to say something like this doesn't happen on-die. A register file can have reads and writes occur on different sides of a clock cycle, which is done to keep one from interfering with the other. The ports are separate and any bandwidth accounting would take note of that. There's no time in that scenario to drive the ports in two directions. The memory and data paths now work at the same speed as the digital logic that is supposed to be handling the doubled rate, so there's no physical speed disparity to take advantage of.

This is not a problem for DDR system memory because it is single-ported, slow, and has to incur a penalty to change modes.


High-bandwidth buses on interposers make the interface cheap again, and are for a few versions dropping back to SDR. It's still not the same as being purely on die with high-speed arrays, so they may go to DDR at a later point, if that revision comes to pass.
edit: This is per the Wide-IO and HBM proposals, with a low-power version chip on chip version used for the PS Vita.
 
Last edited by a moderator:
Which parts of the document for a discrete SRAM component with two ports show how you can read and write data over the same bus in the same clock, leading to double the design's bandwidth?
(edit: discrete is an improper term, a better one would be "isolated")
From what I've skimmed, the peak bandwidth is what you get with two separate ports.

It looks more like it's a dual-ported SRAM with specifically outlined cases for when the inputs for the two ports lead to a conflict. The interface and control logic would be on the other side of all the control and data lines, and for an on-die version I would expect the pipeline logic to be smart enough to avoid the corner cases, especially since there are read-write conflicts that lead to unknown or old data being read back.

edit:
To summarize, I would like some exposition on why this should be considered relevant or what argument it is supporting.
 
Last edited by a moderator:
Smerfy I found this http://www.actel.com/documents/AC374_Simul_RW_AN.pdf yesterday after a google serach for dual port ESRAM. They explain how simultaneous reading and writing to the ESRAM in an FPGA would work. There are also many research pdfs available attempting to explain the same principle.

Thank you :)

I'll admit I'm a novice in this area, but I've always been interested in unique architectures. This really does seem the best place for discussion about hardware, something you can't get in many other places.

I'll be delving into this tonight!
 
Btw., if something wonders what this compressed AA is, AMD names it usually EQAA and it is basically equivalent to nV's CSAA. The Cayman ROPs started to support it and all GCN GPUs can do it to.
I really hope that developers will be able to access coverage samples in shader or at least write the information into a buffer.
This would allow some nice things for deferred renderers and custom AA resolves.
 
3d, I only posted it because I thought it showed what scott arm and smirfy were speaking about (issuing a read and write at the same time. You have indicated it doesn't so I'll leave it at that. I'm not a technical person just simply trying to understand what could make it possible.
 
Yeah, it's so weird the way I've throw my lot in with the company that actually seems to know what they're doing right now and not the backpedaling, tone-deaf fuck ups. It's almost like I think it's good how forthright Sony have been about their system and appreciate the enormous value of PlayStation Plus and their commitment to games as a medium for artistic expression and not merely as some kind of dodge or hustle, or stepping stone to greener pastures.

Seems to be just your opinion about what the goals of each company are and with that is a set of bias.

IMO Microsoft knew what they wanted to do, they just didn't articulate it in a cohesive manner. They let the messaging take over them. Therefore they made concessions to appease their consumer base, as any company should do in such a situation. Need I remind you of the Sony conference where they bowed their heads in shame for...what was it...7 seconds?

No company in this industry is without failures. But to broad-stroke a company's outlook/goal with gaming as a "hustle" through implication doesn't really stand up to critique.

Oh well, that's more than far enough off-topic IMO. More on-topic, has there been any confirmation on where the eSRAM is located? I've been wondering this for quite some time, as to whether it passes through a memory controller or not.
 
Last time I take a break for a few days lol...

Unfortunately, it was not much help asking Richard all those questions you guys had since his source for that article was not a graphics engineer (and he reminds me he isn't one himself either)

It seems all the info he got is already in the article. The original source is a small briefing update sent to devs by MS which was forwarded onto him (and confirmed by another source of his).

The main point he gets from the original document is that "ESRAM is capable of more than was previously thought by virtue of actually being able to test production silicon rather than make theoretical calculations".

Though it definitely seems that the GPU clock is still 800 mhz as MS explicitly states in the document of the ESRAM bandwidth calculation of 128 bytes x 800 mhz = 102.4 GB/s that 'this is still the case' ... so Brad, you can keep your shirt on.

As an aside, does the ESRAM clock necessarily have to match the GPU clock?
 
Last edited by a moderator:
http://m.oxm.co.uk/57365/microsoft-...ight-not-tell-the-whole-story-on-performance/

Interestingly, Penello went onto hint that Xbox One's specifications might yet be subject to alteration in the months between now and the console's November release.

"I would like to pose this question to the audience," he*remarked*in a subsequent post. "There are several months until the consoles launch, and [as] any student of the industry will remember, specs change.

"Given the rumored specs for both systems, can anyone conceive of a circumstance or decision one platform holder could make, where despite the theoretical performance benchmarks of the components, the box that appears "weaker" could actually be more powerful.

Thoughts?
 
External DRAM benefits from DDR because it has constraints where it makes sense to prioritize density over performance and physical and economic reasons to limit pin counts.

In the scenario where reads happen on one edge and writes happen on the other over the same data paths for the eSRAM, this assumes that all the routing logic, arbitration, and necessary buffers can be ready twice as fast.
At that point, why not just have another bus, or consider making that bus faster?
It's making a saving on cheap data paths by adding extra complexity to the pipeline stages and logic, which might hinder its ability to work quickly and hinder clock scaling.

That's not to say something like this doesn't happen on-die. A register file can have reads and writes occur on different sides of a clock cycle, which is done to keep one from interfering with the other. The ports are separate and any bandwidth accounting would take note of that. There's no time in that scenario to drive the ports in two directions. The memory and data paths now work at the same speed as the digital logic that is supposed to be handling the doubled rate, so there's no physical speed disparity to take advantage of.

This is not a problem for DDR system memory because it is single-ported, slow, and has to incur a penalty to change modes.


High-bandwidth buses on interposers make the interface cheap again, and are for a few versions dropping back to SDR. It's still not the same as being purely on die with high-speed arrays, so they may go to DDR at a later point, if that revision comes to pass.
edit: This is per the Wide-IO and HBM proposals, with a low-power version chip on chip version used for the PS Vita.

Interesting post. Thanks.
 
If this is really based on a tech-note to developers from MS then it's probably not hyperbole.
Though it sounds an awful lot like the reporter does not understand what he was told.
I guess it's possible that the hardware designers just didn't comunicate the actual design to the software side and when running tests they find anomalous performance and asked for clarification it does happen.
I worked with a hardware group once that delivered an 8086 based board without bothering to wire the bottom address line and also not bothering to tell anyone on the software side about it.
 
Status
Not open for further replies.
Back
Top