Xbox One (Durango) Technical hardware investigation

Status
Not open for further replies.
There were almost zero programs that were multi threaded when dual cores appeared. My Athlon X2 ran worse than my single core Athlon 64 it replaced because single threaded programs would bounce back and forth between cores unless process affinity was set.

I spoke of dual threaded code, didn't I? A dual-core CPU is superior to a single-core CPU in a dual threaded code scnenario. So what's your problem with that? And again: That's not the point. Use a different example if you don't like mine.

Converting programs from general purpose CPUs to GPU/APU architecture isn't going to be straight forward just as multi threading programs weren't straight forward.

Orbis is rumored to be very easy to program for, so I don't see a problem. It's not going to be a second "Cell shock".
 
As long its games are completely compatible with other W8 machines and durango is essentially a Ms steambox, i would be fine with that :p
I can definitely see this, at which point all your Durango games will be forwards compatible, and that'll be a really important message for MS. "Buy Durango now, and upgrade to get even better experiences later on." If Sony is a conventional console, it'll have better utilisation but probably no forwards compatibility, so the software you buy will be dropped in later years (save maybe Mobile download games)

If true then expect this to be unanimous across both PS4 and Durango.

No wonder the EA CEO was expressing his excitement over the coming next-gen consoles. :LOL:
If true, it must be something MS and Sony have come to decide together. There's no way one would go out on a limb and destroy their 2nd hand market while the other doesn't. I expect a pact made within the industry and there won't be AAA disk games made for any platform that doesn't follow the publishers requirements.

Having just read the article, I'm slightly sceptical. It requires an always-on internet connection. That's a significant requirement IMO. Yes, it can be expected this day and age, but it's also creating barriers to convenience.
 
Could we see games that work both on Win8PC and XBox?
 
One thing that's surprised me is the lack of functionality. No decompression of JPEG into DXTC means your stuck with bitmaps in ESRAM to draw.

The result is either 12 or 16 bits per pixel, which is worse than the 8 bits found for BC6 and 7, but no catastrophically so.

By the way, are the jpeg decoding numbers in the VGLeaks article real, or just examples? Because decoding just 4M pixels (two 1920x1080 jpegs) per frame is exceptionally slow, IMHO.

Cheers
 
I can definitely see this, at which point all your Durango games will be forwards compatible, and that'll be a really important message for MS. "Buy Durango now, and upgrade to get even better experiences later on." If Sony is a conventional console, it'll have better utilisation but probably no forwards compatibility, so the software you buy will be dropped in later years (save maybe Mobile download games)

But will gamers accept it? Compatibility means visual improvements only, gameplay has to be exactly the same on all XBox SKUs. Do console gamers really want to buy two, three or four XBoxes only to play with better filter algorithms or better framerates? This would be a very, very risky move.
 
But will gamers accept it? Compatibility means visual improvements only, gameplay has to be exactly the same on all XBox SKUs. Do console gamers really want to buy two, three or four XBoxes only to play with better filter algorithms or better framerates? This would be a very, very risky move.

I think this will be really destructible for the industry unless microsoft changes its patching policy. Aargh to make use of this game on 720v 3.14 buy this $15 dlc patch....
 
But will gamers accept it? Compatibility means visual improvements only, gameplay has to be exactly the same on all XBox SKUs. Do console gamers really want to buy two, three or four XBoxes only to play with better filter algorithms or better framerates? This would be a very, very risky move.

Let's say Microsoft maintains forward compatibility for 8 years, with an upgrade every 2 years. Users could buy 1-4 consoles over an 8-years timeframe. I would expect it would be fine for most users to upgrade only once, after 4 years. Budget users can buy only once and being able to play all the games released for the platform for 8 years. Enthusiasts can spend roughly 300$ every two years and stay up-to-date, which I guess would attract many PC gamers. Developers would only have to support 4 different configurations, which doesn't seem too bad compared to current PCs.
 
From the vgleaks article, isn't the most important thing the "free" tiling aspect of the DMEs? No one seems to be talking about that, but if you have highly tiled scenes, possibly at different resolutions, wouldn't this go a long way towards improving the effectiveness of the GPU?
 
But will gamers accept it? Compatibility means visual improvements only, gameplay has to be exactly the same on all XBox SKUs. Do console gamers really want to buy two, three or four XBoxes only to play with better filter algorithms or better framerates? This would be a very, very risky move.
There's a whole other thread for discussing the idea of an upgradeable/forward compatible architecture and whether it makes business sense. I just think it's worth factoring in to understanding Durango and what we are currently hearing that this is a possible strategy for MS that fits in with other developments of theirs, and could affect what we see Durango achieving.
 
Online Activation Codes or No Second Hand Games thread split into general console forum -- http://forum.beyond3d.com/showthread.php?t=63015

I'm sure there's an entirely large thread already dedicated to that topic, so this spawn thread may get moved into that one. One thing is for certain, it certainly does not belong in here.
 
Because decoding just 4M pixels (two 1920x1080 jpegs) per frame is exceptionally slow, IMHO.
Well, it's still much faster than loading the raw data straight from optical disc, or even harddrive, so I guess some mitigating factor can be found in that, if there's any legitimate reason to store game assets in the now terribly ancient jpeg format... I don't know what games actually do that, I guess they must exist.

The move engines seem very underwhelming to me. They don't operate at full memory bus speed, and they all share bandwidth with each other and other on-chip devices. The decode/encode hardware all use terribly outdated algorithms. This has the smell of "better than nothing", with these specific algos presumably chosen because it could be implemented using a bare minimum of transistors.

It's not as if consoles haven't had DMA hardware in the past; they have, SNES had quite some fancy DMA stuff, and even NES probably had some implementation as well. The Commodore Amiga had 11 DMA channels dedicated to various things, including four of them for a somewhat programmable blitter with boolean logic operations and barrel shifter capabilities.

In the end though, the more you rely on shifting data around the slower your software will run. These "move engines" (DMA channels, really) aren't accelerators, and they don't accelerate anything. Not even image decoding, as you note yourself. ;)
 
How does the Kinect camera feed data to the GPU ? Via the DMEs into the ESRAM straight ? Or to the DDR3 then DME to ESRAM upon request ? Or both, with programmer control ?
 
Durango’s Move Engines

Moore’s Law imposes a design challenge: How to make effective use of ever-increasing numbers of transistors without breaking the bank on power consumption? Simply packing in more instances of the same components is not always the answer. Often, a more productive approach is to move easily encapsulated, math-intensive operations into hardware.

The Durango GPU includes a number of fixed-function accelerators. Move engines are one of them.

Durango hardware has four move engines for fast direct memory access (DMA)

This accelerators are truly fixed-function, in the sense that their algorithms are embedded in hardware. They can usually be considered black boxes with no intermediate results that are visible to software. When used for their designed purpose, however, they can offload work from the rest of the system and obtain useful results at minimal cost.

More after the link
http://www.vgleaks.com/world-exclusive-durangos-move-engines/
 

A 64-way L1 seems crazy to me. Perhaps, they feel each ALU needs a way in the cache.

I found this interesting too:

The other major difference between GCN and Durango is the amount of L2 cache per SC/CU. The Radeon 7970 has 32 Compute Units and 768K of L2 cache total split into six 128K blocks. Durango has 12 Compute Units and 512K of L2 cache, broken into four 128K blocks. Proportionally, there’s a great deal more L2 cache serving each CU with Durango.

Coupled with the SRAM this is the "secret" sauce. Not much, but a better memory subsystem will certainly help with utilization.
 
Well, it's still much faster than loading the raw data straight from optical disc, or even harddrive, so I guess some mitigating factor can be found in that, if there's any legitimate reason to store game assets in the now terribly ancient jpeg format... I don't know what games actually do that, I guess they must exist.

You have much higher compression ratios with jpeg than DXTC variants. It could be quite valuable in decoding small chunks of streamed jpeg into a virtual texture atlas (á la Megatexture) complete with swizzle support.

The move engines seem very underwhelming to me. They don't operate at full memory bus speed, and they all share bandwidth with each other and other on-chip devices.

If they operated at full speed, the CPU and GPU alike would be starved for data.

The decode/encode hardware all use terribly outdated algorithms. This has the smell of "better than nothing", with these specific algos presumably chosen because it could be implemented using a bare minimum of transistors.

LZ77 is a good choice of entropic compression/decompression, it is fast and used everywhere. Why they chose JPEG over JPEG XR, I can't say. JPEG XR supports floating point values directly and has slightly better compression ratios. It must be down to hardware complexity.

In the end though, the more you rely on shifting data around the slower your software will run. These "move engines" (DMA channels, really) aren't accelerators, and they don't accelerate anything. Not even image decoding, as you note yourself. ;)

Memory access can be a lot more efficient using a dedicated chunk of silicon like this. Loading/storing tiles in a textures involves swizzling addresses on the fly, producing a lot of stride-1 accesses, followed by a big stride. That is a really poor fit for a hardware prefetcher on a CPU and if you use a GPU, you'll have your massive shader-array sit idle while you just load and store values.

Cheers
 
Last edited by a moderator:
Most interest in the new VGLeaks, to me, was using DMEs to move pieces of tiled textures, which I think answers my earlier question about whether there are any potential benefits to Partially Resident Textures. The other part is the mention of time slicing the GPU, which is sounding more and more like multitasking to me.

So, what's the impact of the changes to the associativeness of the caches?
 
How does the Kinect camera feed data to the GPU ? Via the DMEs into the ESRAM straight ? Or to the DDR3 then DME to ESRAM upon request ? Or both, with programmer control ?
From the earliest VGLeak doc (2nd post this thread), Kinect goes through the Southbridge. I guess it deposits a stream in RAM that is fetched by CPU/GPU as needed (expecting this is on the OS timeslice and API makes available to devs various Kinect states).
 
A 64-way L1 seems crazy to me. Perhaps, they feel each ALU needs a way in the cache.
Extremetech is wrong though. Nothing changed there. Each GCN based GPU has a 64 way associative L1. It consists of just 4 sets (that's what they probably mixed up). The older VLIW architectures had even a fully associative L1 (which meant 128way associative and just a single set comprising the full 8kB). Latency is high anyway, maximizing the hitrate is important here.
 
Status
Not open for further replies.
Back
Top