Main Points i have compiled from the Architecture Panel in case anyone missed them:
We have embedded esRam on die on our chip. We really like that because it gives us high bandwidth to the gpu to keep the gpu fed with data. Sometimes when you go off chip with memory, it is harder to keep that gpu fed with data. We added lot's of memory both system memory and flash cache in order to have that simultaneous and instantaneous action. And then, for the ready part of it, we really architected the box at the beginning to have multiple power states so that you could use just enough power for the experiences that you're in and not anymore. And if you were transitioning to another state, you could do that easily and quickly, and the best example of that of course is wake on voice. We have just a little bit of power with kinect and a little bit of power on the console to process the voice that's coming in. When you say "xbox on" we can immediately power on quickly and get you to your experiences.
If you look at everything we're trying to do with saving power consumption, running multiple os, delivering the performance we wanted to do. It really is not possible with to do that off the shelf silicon. We really developed a lot of those things from the ground up.Turned out we had to develop 5 pieces of silicon between the console and the new Kinect, and not just individual pieces silicon, we needed to make them work in a coherent fashion even across usb in certain cases. So we had to verify all these things together involved getting the latest simulation/emulation tools. We developed this piece of technology that let's us run as fast as possible before we get any chips back. Equipment that actually takes 50k watts and is watercooled, we're able to run 10 trillion cycles in simulation before we even got the silicon back from the lab. So it gives you an idea to what we had to go through here.
Now if apps and services and other things from developers are coming and going on the box that means you're in a dynamic environment and you don't have guarantees as to exactly how much resources the game developer is getting. So we sat back and noticed that this was a problem, and we decided to take a risk and went and chose technology that originally came from the services world, so think virtual machines, we started with hyperv from services and you start stripping down all the general purpose scoop, so in normal virtual machine technology you don't know exactly what apps you're running, os, but in this case we know exactly which, we know there's two, we know there's one about apps and one about games, we know exactly how they're gonna be configured.
The last piece ill mention is that, with xbox live when Mark was talking about the number of machines that are being added, this is a big deal. Next gen isn't just about having lots of transistors local, it's also about having transistors in the cloud. The best way i could explain it, is that to me, next-gen is about change. I've got these games to stay the same, i got apps that are changing but now, you start throwing in servers that are just one hop away and that could start doing things like, you look at a game and there's latency sensitive loads and latency insensitive loads, let's start moving those insensitive loads off to the cloud freeing up local resources, and effectively over time your box gets more and more powerful. This is completely unlike previous generations, you got a fixed number of transistors in your house, and a variable number of transistors in the cloud, and as we get smarter about which loads we can move into the cloud that frees up local resources to do things that are about the here and now, and this is really exciting!
When i think about the graphics stack in particular, memory bandwidth is really important, and we got these esram caches on the chip, really really fast caches on the chip. This next generation of gpu is kind of a break from last generation of gpus which were very micro code optimization sensitive you know, the exact order of the codes and the shaders made a huge difference in how fast they ran. This time, the chip architecture is based on this supercomputer-like technology, and is much more about data flow, having the right data in the right caches in the right places at the right time is what will make all the difference in the world for taking advantage of these chips. It's relatively easy to get your code ported onto it, and is relatively hard to get it optimized and really looking good. We worked closely with Nick's team to make sure we had really good caches on the die ready to feed the gpu, that was a really important partnership.
Sort of been mentioned before, but i wanted to point out that alot of the technologies we have to go and investigate on the hardware really belong in a data center. So with 64 bit processors we need to support hardware virtualization to be able to have multiple operating systems, we have to invest a lot in coherency throughout the chip, so there's been io coherency for a while but we really wanted to get the software out of the mode of managing caches and put in hardware coherency for the first time on the mass scale in the living room on the gpu.
We talked about parallellism at an operating system level, but even to get all of this out of the box in the first place. We had to really think about how efficient we are and all of the parts within the silicon. On the xbox 360 you look at the gpu and it's vector processor which is great but is a little brute force, it doesn't use all of the hardware every single possible cycle. So we went through and re-analized that, we have the latest out of order cpu tech. These cpu cores are capable of doing six operations per cycles, so across the eight cores that's 48 operations. The gpu as well is multitasking, so you can run several rendering and compute threads, so all of the cloud effects, ai and collision detection on the gpu in parallel while you're doing the rendering, and switch to a scaler engine which actually means that a lot of the gpu is being utilized and it can do 768 operations per cycle.
Again on the ram, we really wanted to get 8gb and make that power friendly as well which is a challenge to get both power friendly for acoustics and get high capacity and high bandwidth. So for our memory architecture we're actually achieving all of that and we're getting over 200gb/s across the memory sub-sytem.
Knowing that this is a box that's able to change its power consumption based on the loads that are going, i feel really good about that. If it needs it the power is there, if it doesn't it'll go into a lower power mode and use only what it needs to.
Within the silicon are actually power switches which actually shut cores off which are not being used. We can apply dynamic frequency voltage scaling so if you don't need to burn the power, you dont have to.
Last generation that box was fixed and it was all about optimize, optimize, optimize. The games that we see now on the xbox360 look tremendously better than the games at launch because we deeply understand that chip. That's going to happen in this generation, but add to it a growing number of transistors in the cloud that are really not very far away that you could start to move those loads on to. You can start to have bigger worlds, lots of players together and you could also take things that are done locally and push them out onto the cloud. This is a new way of thinking about gaming consoles and what could be done with them.
There's also some technology we put in to enable really large dynamic worlds as well. We have this thing called partially resident textures that the gpu supports which actually means that you can save esentially gigabytes of memory in not having to have all of the data loaded at all of the time. So the gpu itself can figure out if something is in memory or not, it doesn't need everything to be physically in memory the whole time. We also put in a compression as well, we have lz77 move engines that can just work behind the scene and compress/decompress, which is going to be really super important for working with data from the cloud.