Xbox One Architecture Panel *Main Console HW Points* Transcript

Warchild

Newcomer
Main Points i have compiled from the Architecture Panel in case anyone missed them:

We have embedded esRam on die on our chip. We really like that because it gives us high bandwidth to the gpu to keep the gpu fed with data. Sometimes when you go off chip with memory, it is harder to keep that gpu fed with data. We added lot's of memory both system memory and flash cache in order to have that simultaneous and instantaneous action. And then, for the ready part of it, we really architected the box at the beginning to have multiple power states so that you could use just enough power for the experiences that you're in and not anymore. And if you were transitioning to another state, you could do that easily and quickly, and the best example of that of course is wake on voice. We have just a little bit of power with kinect and a little bit of power on the console to process the voice that's coming in. When you say "xbox on" we can immediately power on quickly and get you to your experiences.
If you look at everything we're trying to do with saving power consumption, running multiple os, delivering the performance we wanted to do. It really is not possible with to do that off the shelf silicon. We really developed a lot of those things from the ground up.Turned out we had to develop 5 pieces of silicon between the console and the new Kinect, and not just individual pieces silicon, we needed to make them work in a coherent fashion even across usb in certain cases. So we had to verify all these things together involved getting the latest simulation/emulation tools. We developed this piece of technology that let's us run as fast as possible before we get any chips back. Equipment that actually takes 50k watts and is watercooled, we're able to run 10 trillion cycles in simulation before we even got the silicon back from the lab. So it gives you an idea to what we had to go through here.
Now if apps and services and other things from developers are coming and going on the box that means you're in a dynamic environment and you don't have guarantees as to exactly how much resources the game developer is getting. So we sat back and noticed that this was a problem, and we decided to take a risk and went and chose technology that originally came from the services world, so think virtual machines, we started with hyperv from services and you start stripping down all the general purpose scoop, so in normal virtual machine technology you don't know exactly what apps you're running, os, but in this case we know exactly which, we know there's two, we know there's one about apps and one about games, we know exactly how they're gonna be configured.
The last piece ill mention is that, with xbox live when Mark was talking about the number of machines that are being added, this is a big deal. Next gen isn't just about having lots of transistors local, it's also about having transistors in the cloud. The best way i could explain it, is that to me, next-gen is about change. I've got these games to stay the same, i got apps that are changing but now, you start throwing in servers that are just one hop away and that could start doing things like, you look at a game and there's latency sensitive loads and latency insensitive loads, let's start moving those insensitive loads off to the cloud freeing up local resources, and effectively over time your box gets more and more powerful. This is completely unlike previous generations, you got a fixed number of transistors in your house, and a variable number of transistors in the cloud, and as we get smarter about which loads we can move into the cloud that frees up local resources to do things that are about the here and now, and this is really exciting!
When i think about the graphics stack in particular, memory bandwidth is really important, and we got these esram caches on the chip, really really fast caches on the chip. This next generation of gpu is kind of a break from last generation of gpus which were very micro code optimization sensitive you know, the exact order of the codes and the shaders made a huge difference in how fast they ran. This time, the chip architecture is based on this supercomputer-like technology, and is much more about data flow, having the right data in the right caches in the right places at the right time is what will make all the difference in the world for taking advantage of these chips. It's relatively easy to get your code ported onto it, and is relatively hard to get it optimized and really looking good. We worked closely with Nick's team to make sure we had really good caches on the die ready to feed the gpu, that was a really important partnership.
Sort of been mentioned before, but i wanted to point out that alot of the technologies we have to go and investigate on the hardware really belong in a data center. So with 64 bit processors we need to support hardware virtualization to be able to have multiple operating systems, we have to invest a lot in coherency throughout the chip, so there's been io coherency for a while but we really wanted to get the software out of the mode of managing caches and put in hardware coherency for the first time on the mass scale in the living room on the gpu.
We talked about parallellism at an operating system level, but even to get all of this out of the box in the first place. We had to really think about how efficient we are and all of the parts within the silicon. On the xbox 360 you look at the gpu and it's vector processor which is great but is a little brute force, it doesn't use all of the hardware every single possible cycle. So we went through and re-analized that, we have the latest out of order cpu tech. These cpu cores are capable of doing six operations per cycles, so across the eight cores that's 48 operations. The gpu as well is multitasking, so you can run several rendering and compute threads, so all of the cloud effects, ai and collision detection on the gpu in parallel while you're doing the rendering, and switch to a scaler engine which actually means that a lot of the gpu is being utilized and it can do 768 operations per cycle.
Again on the ram, we really wanted to get 8gb and make that power friendly as well which is a challenge to get both power friendly for acoustics and get high capacity and high bandwidth. So for our memory architecture we're actually achieving all of that and we're getting over 200gb/s across the memory sub-sytem.
Knowing that this is a box that's able to change its power consumption based on the loads that are going, i feel really good about that. If it needs it the power is there, if it doesn't it'll go into a lower power mode and use only what it needs to.
Within the silicon are actually power switches which actually shut cores off which are not being used. We can apply dynamic frequency voltage scaling so if you don't need to burn the power, you dont have to.
Last generation that box was fixed and it was all about optimize, optimize, optimize. The games that we see now on the xbox360 look tremendously better than the games at launch because we deeply understand that chip. That's going to happen in this generation, but add to it a growing number of transistors in the cloud that are really not very far away that you could start to move those loads on to. You can start to have bigger worlds, lots of players together and you could also take things that are done locally and push them out onto the cloud. This is a new way of thinking about gaming consoles and what could be done with them.
There's also some technology we put in to enable really large dynamic worlds as well. We have this thing called partially resident textures that the gpu supports which actually means that you can save esentially gigabytes of memory in not having to have all of the data loaded at all of the time. So the gpu itself can figure out if something is in memory or not, it doesn't need everything to be physically in memory the whole time. We also put in a compression as well, we have lz77 move engines that can just work behind the scene and compress/decompress, which is going to be really super important for working with data from the cloud.
 
We also put in a compression as well, we have lz77 move engines that can just work behind the scene and compress/decompress, which is going to be really super important for working with data from the cloud.


interesting, are those anything novel or just standard stuff? i assume the latter.

interestingly they seem to almost stress the esram latency, but it's so vague and dumbed down it's hard to tell.
 
I'm not sure but the multiple power states intrigues me:
Knowing that this is a box that's able to change its power consumption based on the loads that are going, i feel really good about that. If it needs it the power is there, if it doesn't it'll go into a lower power mode and use only what it needs to.
So i'm guessing 100w+ when playing demanding games? Who mentioned the console was only 100w only?
 
"We added lot's of memory both system memory and flash cache in order to have that simultaneous and instantaneous action."

First time hearing about the flash cache how much flash cache are they talking about or is this just the esRam?
 
"We added lot's of memory both system memory and flash cache in order to have that simultaneous and instantaneous action."

First time hearing about the flash cache how much flash cache are they talking about or is this just the esRam?

Yeah I was wondering about that reference too.
 
http://www.denali.com/wordpress/index.php/dmr/2010/04/26/what-is-a-flash-cache

A Flash cache acts like SRAM memory caches that are designed to speed up DRAM access times; Flash caches speed access to HDDs in an analogous manner. Data is drawn from HDDs as needed and the retrieved data is cached in NAND Flash. The next time this data is needed, it’s drawn directly from the cache instead of the slower HDD. Flash caches do not require as much NAND Flash memory as SSDs, and therefore cost less, but they can deliver significant performance improvements when paired with HDDs—in fact the effective performance of a Flash cache paired with an HDD can actually exceed that of an SSD.

(Note: It’s also possible to use DRAM to cache HDD data, but DRAM is more expensive than NAND Flash for equivalent capacity and DRAM provides only volatile storage unless you add a backup battery. For these reasons, NAND Flash is the better choice for an HDD memory cache.) Using a faster memory technology as a cache for a slower-yet-cheaper memory technology is a relatively common technique used by computer designers. Designers have always faced memory access-time problems and caching is a very, very common solution to this problem. If the typical working set is a small fraction of the total HDD capacity, then a cache that holds that working set will make the HDD appear to be as fast (or almost as fast) as NAND Flash memory, resulting in a dramatic improvement in application performance.

Adding a cache can deliver significant performance gains for I/O-intensive workloads but it’s critical to make the cache invisible to the application to avoid rewriting the application code. You make a Flash cache invisible by intimately integrating it into the operating system and the file system. This is a critical step because it sidesteps the need to rewrite the application so that it need not decide what goes where. Application code must explicitly manage code and data placement in storage when a system employs a mix of HDDs and faster, Flash-based SSDs but not if the Flash memory is configured as a cache. If you can write or rewrite an application so that it explicitly controls where data is stored, then a mix of SSDs and HDDs can be used effectively. NAND Flash cache used to accelerate HDD performance solves a more common problem—a problem ingrained in all existing application programs that are not written for an explicit SSD/HDD storage hierarchy.

Is this what he's talking about?
 
Actually this is probably why the 500 GB hard disk isn't user-replaceable: it's not a normal hard disk.

The Xbox One is probably using a customized hybrid hard drive: flash memory used as a cache to speed up load times, combined with a rotational hard disk for capacity.

Oooh, that would be really nice, depending on size. Sounds a little too good to be true though. If so why haven't they talked this up?
 

In the sense of HDD caching? I can't think of any circumstance where that would be useful for a game in a next-gen console.

HDD caching is great for desktops/servers, with multiple applications, or applications that start/stop all the time. For a game, I can't see how that would help.

(you could dump the hibernate data into flash RAM during shutdown to provide an accelerated startup - but that wouldn't really be a 'cache').

It may be he's refering to the ESRAM essentially being a "system-managed cache". That would work basically the same as that article describes, but for RAM->ESRAM rather than HDD->flash/memory.
 
Actually this is probably why the 500 GB hard disk isn't user-replaceable: it's not a normal hard disk.

The Xbox One is probably using a hybrid hard drive: flash memory used as a cache to speed up load times, combined with a rotational hard disk for capacity.

Probably something similar to this: http://www.newegg.com/Product/Product.aspx?Item=N82E16822178339

I think that if it was a hybrid drive, they would have mentioned it as a selling point.
 
Could be, and it could be in RAM.
Yeah hopefully it is ram but i wouldn't hold my breath. It's probably the standard 500gb hybrid drive with 100gb of ssd flash cache. I know the ocz vertex 4 has 512mb of ddr3 ram used for cache, but i doubt the xbox one is using ddr3 as cache from the hdd. Probably just ssd.
 
What is the win for putting a flash cache in RAM? I get why you'd put a flash cache on a HDD. The ESRAM makes sense, because it has its own faster bus. Is it possible he called the ESRAM flash for some reason?
I'll watch it again but the panel interview was all over the place, they would jump from one point to the next and would go back to talking about optimization.
 
I didnt notice before the ops per cycle was given for the cpu. So much for that secret sauce for cpu. :LOL:

Actually this is probably why the 500 GB hard disk isn't user-replaceable: it's not a normal hard disk.

The Xbox One is probably using a hybrid hard drive: flash memory used as a cache to speed up load times, combined with a rotational hard disk for capacity.

Probably something similar to this: http://www.newegg.com/Product/Product.aspx?Item=N82E16822178339

That whats going to be in my ps4 but its the 1TB version.... ;)
 
I didnt notice before the ops per cycle was given for the cpu. So much for that secret sauce for cpu. :LOL:



That whats going to be in my ps4 but its the 1TB version.... ;)

Looking at the 1TB version it only appears to have an 8GB cache... The 750GB comes with a 64GB cache. Not sure why that is.
 
Next gen isn't just about having lots of transistors local, it's also about having transistors in the cloud.
you look at a game and there's latency sensitive loads and latency insensitive loads, let's start moving those insensitive loads off to the cloud freeing up local resources, and effectively over time your box gets more and more powerful.
This is completely unlike previous generations, you got a fixed number of transistors in your house, and a variable number of transistors in the cloud, and as we get smarter about which loads we can move into the cloud that frees up local resources to do things that are about the here and now, and this is really exciting!
Sounds interesting how transistors are in the cloud. Question is, how much are we available to have?
 
Sounds interesting how transistors are in the cloud. Question is, how much are we available to have?

Transistors in the cloud is really awkward way of describing the vCPU and vRAM allocation you contract for with MS. From the silence on what the actual resources available in the cloud are and the Titanfall devs post about negotiating pricing with MS I'm going to say "how much you have = what you pay for".

Warchild said:
Yeah hopefully it is ram but i wouldn't hold my breath. It's probably the standard 500gb hybrid drive with 100gb of ssd flash cache. I know the ocz vertex 4 has 512mb of ddr3 ram used for cache, but i doubt the xbox one is using ddr3 as cache from the hdd. Probably just ssd.

I'm going to nail my colours to the mast and say you're getting 0GB of SSD storage. That's a real benefit and I can't see MS choosing to do this and then not trumpeting this from the high heavens. Even at the volumes MS would be buying 128GB of NAND + controller logic is not cheap and would probably rival the cost of the APU in any BOM analysis. I'm going to presume he meant a scratchpad in RAM either from the memory reserved by the O/S or just in main RAM.
 
Back
Top