Predict: The Next Generation Console Tech

Status
Not open for further replies.
The issue with not being able to merge the eDRAM onto the die with the 360 is down to the process the particular eDRAM technology can be fabbed at. Rumours already point to that changing, which in part has been the reason for the eSRAM name change. In other words, whatever embedded memory they go with this time around will be able to be fabbed on the same process as the rest of the SoC, thus it should be integrated as part of the SoC from the start if the die size is feasible.
 
Because 100GB/s is only a fanboy dream, while clarifying the 2GB/s make it a precise measurement that must be real :O

I think he meant for the eD/eSRAM. Which is pretty much the opposite of a fanboy dream, unless you mean sony fanboy. ;) Anyhow, I can't see them going with eDRAM if it's only 30GB/s faster than main RAM, even if it can be accessed simultaneously. Unless, of course the main ram is only ~30GB/s. If that were the case I'd be hoping for 64MB+ of eDRAM at 250GB/s+ to make up for main memory!
 
If the GPU can only texture etc. from ESRAM/eDRAM (not SRAM), then you'd be right. But if the GPU can read simultaneously from RAM and eDRAM, the BW is aggregate.


Does it really do you any good being able to texture from 32MB though?


I think he meant for the eD/eSRAM. Which is pretty much the opposite of a fanboy dream, unless you mean sony fanboy. ;) Anyhow, I can't see them going with eDRAM if it's only 30GB/s faster than main RAM, even if it can be accessed simultaneously. Unless, of course the main ram is only ~30GB/s. If that were the case I'd be hoping for 64MB+ of eDRAM at 250GB/s+ to make up for main memory!

Well, I'm not a techie, but maybe 102 GB/s is all they need. 102 GB/s for framebuffer and 68GB/s for textures+other stuff.
 
If the GPU can only texture etc. from ESRAM/eDRAM (not SRAM), then you'd be right. But if the GPU can read simultaneously from RAM and eDRAM, the BW is aggregate.

Would that mean that the GPU has 2 busses/memory interfaces for each pool of memory? That might explain the "lower" 100GB/s bandwidth for the eSRAM because both are on a 256Bit bus and higher eSRAM bandwidth would complicate things?
 
Does it really do you any good being able to texture from 32MB though?
Sure. Render to a texture/buffer in eDRAM and use that in another pass. Or import a texture, process it on GPU with some procedural modification, and write the result out for use on the surface.

Well, I'm not a techie, but maybe 102 GB/s is all they need. 102 GB/s for framebuffer and 68GB/s for textures+other stuff.
I see no reason at this point not to consider them aggregate bandwidth in thinking of resources available to the processors. I'd be very surprised if it's like a PC and everything has to be copied from main RAM to VRAM (ESRAM) before the GPU can use it. Of course, this concern is really more about comparing the two platforms, than identifying the hardware. Nailing the BW for the part would be as far as this thread ought to go.

Would that mean that the GPU has 2 busses/memory interfaces for each pool of memory? That might explain the "lower" 100GB/s bandwidth for the eSRAM because both are on a 256Bit bus and higher eSRAM bandwidth would complicate things?
You can use a variety of memory topologies. Impossible to call at this point.
 
What I was thinking about with regards to the eDRAM/eSRAM (or whatever it's called) is that a 102.4 GB/sec is a plausible number.

800 MHz * 1024-bit bus comes out to 819.2 Gbits/sec = 102.4 GB/sec.

One of the things that I have been told (not related to consoles, just in general) is that a measurable portion of power consumption in modern high performance CPU goes to the clock distribution/generation. Therefore, I think for a power efficient/embedded design, it would make sense that for the APU (CPU and GPU) run from a common clock source with simple integer clock dividers.

The rumors that we've seen is that the CPU is clocked at 1.6 GHz and the GPU is clocked at 800MHz which would fall in line with my speculation. I would expect embedded RAM to be running at GPU clock speed on a very wide bus as well. If the embedded memory size is a power of 2, I would expect the bus to be as well. The math on the rumored 32MB of eSRAM on a 1024-bit bus @ 800 MHz seems to work and seems plausible.
 
Would that mean that the GPU has 2 busses/memory interfaces for each pool of memory? That might explain the "lower" 100GB/s bandwidth for the eSRAM because both are on a 256Bit bus and higher eSRAM bandwidth would complicate things?

Assuming MS goes with an APU with everything integrated into one chip, the bus to the embedded RAM would be completely on chip like it is between the eDRAM and ROPs (256GB/sec) currently. It wouldn't have as many problems and restrictions like extrenal bus have. For example, the bus from the XCGPU chip to the eDRAM daughter die now is only 32GB/sec. I see this as an area that they can really optimize and improve.
 
I don't see how adding the bandwidth of both pools makes a meaningful number, unless we have an access pattern to work with. As soon as one pool is saturated, the other pool's additional bandwidth becomes essentially zero.
 
It's been covered already in this thread before but it's definitely doable and part of the JEDEC spec, and can go up to 8 layers.

The specs are out there but basically DDR4 will draw much less power than DDR3. And the entire nature of stacking chips in 2.5/3D lets you run more efficiently because of greatly reduced interconnect length.
I still can't find it in the DDR4 JEDEC specs, nor in the thread.
The 2.5D interface has to be specified somewhere, if it exists...

The number of layers doesn't change the interface, it only adds capacity. So that doesn't solve the problem of needing 16 chips on an interposer for a low bandwidth. If they could really use 16 chips on an interposer from the start, they would be MUCH better using the low power Wide-IO which is currently in production. 17GB/s per chip, total 272GB/s and ridiculously low power.
 
We're talking about 2 different things, 2.5D and Wide IO. If you're creating a 2.5D chip you want to stack the memory right beside the logic dies. This is for latency, speed and power benefits compared to regular I/O interfaces. Because we're talking about a tiny area here for the interposer you need to go with stacked ram (ie. DDR3/4 stacked, with Wide IO interface). There's not enough space for something like a typical 256-bit GDDR5 bit bus. And they're not doing it the typical way because they want to take advantage of the inherent benefits of going 2.5D/3D.

And Micron's already going to be supplying something custom for at least one of the consoles. Don't know if that gives us any more clue into what type of DRAM they'd be using.

Micron said:
And talking about consumer again here. I thought it'd be beneficial to show you across a couple of key applications how this looks in terms of megabyte per system. On the left, what we have are game consoles. This is a space that's been pretty flat for a number of years in terms of the average shipped density per system. That's going to be changing here pretty quickly. I think everyone realizes that these systems are somewhat clumpy in their development. The next generation of system is under development now and that because of 3D and some of the bandwidth requirements, drives the megabyte per console up fairly quickly. So we're anticipating some good growth here.

We've worked with a number of these vendors specifically on both custom and semi-custom solutions in that space.
 
And Micron's already going to be supplying something custom for at least one of the consoles. Don't know if that gives us any more clue into what type of DRAM they'd be using.

Interesting. Suggests one of our current Durango or Orbis rumors is completely wrong on the RAM front (256 bit traditional DDR3 bus or 256 bit GDDR 5 bus).

Here's a transcript of the whole meeting: http://seekingalpha.com/article/291012-micron-technology-inc-shareholder-analyst-call?part=single

When they say a number of vendors in that space, I wonder if that means Vita + PS4/XB3 or PS4 + XB3.
 
Concerning RT:

1) Would require a LARGE amount of memory (try several Gig) to store the entire scene to test for intersections - primary or secondary rays. And the rendering engine can't utilize a delayed load geometry set as we need to see the entire scene. We aren't dealing with bucket rendering (i.e. offline).

2) How are you going to deal with aliasing? You'd need to cast several rays per pixel and each intersection would have to evaluate the shaders every time.

3) What would your depth limit of the ray type be before quitting? If you can't bounce at least 2-3 indirect rays, then you won't get very good results. What if you had 2 geometric objects that both refracted one behind the other? Objects would suddenly have to have "thickness" (which means even more geometry).

4) RT direct lighting would only be beneficial if you had area lights. But that would require even more samples. Doing specular lighting would most certainly require importance ray sampling (firing rays from the lights as well as the materials) with PDF and special sampling algorithms for both the light and BSDF. This would be the only way to get rid of the noise.

5) Notice how the Kepler demo that Nvidia showed only had 3 objects in the scene. LOL! Not even close.

In short, if they came out with a viable hardware device for RT, the film industry would get it first. :D And I don't see that happening for at least 2-3 more generations.


Lots of the patents ive included previously in this thread around RayTracing using BTE/VTE explain primary and secondary rays and how they are stored and passed around in inboxes as vector primitive data structures and in more complex spatial tree structures. They also go into discussing the use of various cache and ram strategies on the HW/Chip that will facilitate the recursive or iterative traversal of these rays over a scene per frame.. Again in know there only patents BUT there full of information of the HOW of what MS/AMD/IBM are thinking around RayTracing..

An xbox next with lots of on board memory(L1/L2 caches, edram, system ram etc) and very large GB/s buses between components will facilitate a RayTracing engine, this is what im looking for in rumoured specs for xbox next .. Time will tell ...


As for the aliasing, interesting you bring that up ... If AMD GCN is indeed used in the next xbox then quite possibly a hardware based media accelerator could be used. This pic from the AMD HSA page (top right 'AMD HD Media Accelerator')..

HSAAcceleratedProcessingUnit.png



This apparently can do hardware based ".. edge enhancement, noise reduction ... "
 
Interesting. Suggests one of our current Durango or Orbis rumors is completely wrong on the RAM front (256 bit traditional DDR3 bus or 256 bit GDDR 5 bus).

Orbis 2.5D stacked + GDDR5 is nonsense. Either we have a simple SoC Orbis with GDDR, what I doubt since every REAL insider information (Yole, Tsuruta) points towards interposer and TSV, or we have the famous 2.5D stacked SiP Orbis with WideIO memory, for example HBM.

GDDR5 could be devkit inventory, though. For XBox I wouldn't believe any of the rumours out there.
 
An xbox next with lots of on board memory(L1/L2 caches, edram, system ram etc) and very large GB/s buses between components will facilitate a RayTracing engine, this is what im looking for in rumoured specs for xbox next ..
But you want lots of RAM and fast RAM anyway, regardless of whether raytracing or scanline rendering or voxel rendering or whatever else. I don't see how large RAM values point to anything. :???:
 
Orbis 2.5D stacked + GDDR5 is nonsense. Either we have a simple SoC Orbis with GDDR, what I doubt since every REAL insider information (Yole, Tsuruta) points towards interposer and TSV, or we have the famous 2.5D stacked SiP Orbis with WideIO memory, for example HBM.

GDDR5 could be devkit inventory, though. For XBox I wouldn't believe any of the rumours out there.

Not sure if you thought I disagreed, but I agree. GDDR5 means traditional bus absolutely.

You're combining 2 different standards there. The interface is part the standard.
It's either DDR4 or Wide-IO. You can't mix and match them.

Micron did say custom and semi-custom :p (to play devil's advocate)
 
As for the aliasing, interesting you bring that up ... If AMD GCN is indeed used in the next xbox then quite possibly a hardware based media accelerator could be used. This pic from the AMD HSA page (top right 'AMD HD Media Accelerator')..


This apparently can do hardware based ".. edge enhancement, noise reduction ... "

The media accellerator is used for media processing. The hardware and software used to clean up video file output aren't specced to handle the throughput or perform appropriate work on the output of a 3d game render.
 
But you want lots of RAM and fast RAM anyway, regardless of whether raytracing or scanline rendering or voxel rendering or whatever else. I don't see how large RAM values point to anything. :???:

No they don't for directly for RayTracing BUT for TransMedia scenarios that involve RayTracing generated scenes that's where I want large system ram.

More like having avatar like graphic rendered game whilst having
1. A Skype video conference application with a mate running ontop of the game on the top right of my screen
2. A browser app running on the bottom right of my screen hosting some web pages
3. A sports Ticker app showing me scores at the bottom of the screen ..
4. etc..

Just suggesting that in a very rich TransMedia world, that the Xbox next is claiming it wants to be via rumors, a lot of RAM would be beneficial..

But your right, for RayTracing itself then its not warranted..
 
I might be wrong but I dont think IBM is much involved in next generation Xbox if at all. Also, I think we are still more than decade away from getting viable hardware based ray tracing engine.
 
Status
Not open for further replies.
Back
Top