Predict: The Next Generation Console Tech

Status
Not open for further replies.
AMD memory controllers work well at high speed on latest CPU/APU.
So I would expect 2133. It's decently cheap now, they could have a high CAS timing such as CL13, which still has a good absolute latency
http://en.wikipedia.org/wiki/DDR3_SDRAM#JEDEC_standard_modules

I now definitely believe the memory will be DDR3 and not DDR4, the latter should be hideously expensive and booked for servers.
And well. 68GB/s is not so bad, you find that kind of bandwith on midrange graphics cards, and here you have the ESRAM to play with as well.
 
Only to the ROPs. But otherwise you're right. 270 GB/s would be fine for a console without needing eDRAM which would contribute nothing. If Durango has ESRAM/EDRAM/eDRAM/RedRUM, the DDR3 will be on a traditional, slow bus, with the on-chip RAM providing the high-speed BW.

Out of curiousity, what was MS' reason for this config (i.e. eDRAM and ROPs on a daughter die) with the 360?

And assuming they use a larger pool of eDRAM/1T-SRAM in Durango, would they be likely to use this configuration again?
 
AMD memory controllers work well at high speed on latest CPU/APU.
So I would expect 2133. It's decently cheap now, they could have a high CAS timing such as CL13, which still has a good absolute latency
http://en.wikipedia.org/wiki/DDR3_SDRAM#JEDEC_standard_modules

I now definitely believe the memory will be DDR3 and not DDR4, the latter should be hideously expensive and booked for servers.
And well. 68GB/s is not so bad, you find that kind of bandwith on midrange graphics cards, and here you have the ESRAM to play with as well.

I suspect you're right. Great find ;-)
 
I wonder if MS will go again with the approach of "tile-based" (big tiles) rendering and uses a smaller eDRAM configuration for the framebuffer. If I didn't make a mistake at 4xFSAA and 1080p you need about 64MB - might be too expensive for MS. However if they want to have both front and backpuffer eDRAM access 64MB might be a good constellation - fitting in 2 720p frames with 4xFSAA into the buffer.
 
As it's always embedded, there's no need to precede it with an 'E'.

Actually, SRAM always has always been available as external chips. Consoles sometimes used it as regular memory. Micro-controllers can still be expanded with it : small amount (you don't have many addressing lines anyway), no refresh.

It's just that it has been embedded for ages as well, x86 got on-die SRAM in 1989 and before that microcontrollers in the 70s.
DRAM is just a lot more troublesome whereas a SRAM array might be the first thing you build to test a new silicon process that will later be used for CPUs/GPUs.
 
Actually reading more about 1T-SRAM i'm fairly certain now that this is what will be in Durango instead of eDRAM (it seems there are minor functional differences between the two). Mainly as 1T-SRAM is marketed as a SoC solution and would give MS more options for where they can have it fabbed than eDRAM.

1T-SRAM, which, as you said, is basically EDRAM, sounds good. Wikipedia only reports the densities up to 45 nm. If we assume at 28 nm the density, including the overhead, is roughly ~0.08 mm^2 per Mbit, 100 MB of 1T-SRAM would be roughly 64 mm^2. It should be doable in a large SoC.

I now definitely believe the memory will be DDR3 and not DDR4, the latter should be hideously expensive and booked for servers.

How expensive will DDR3/DDR4 be 3-4 years from now, though? The long term cost might be in favor of DDR4.

And well. 68GB/s is not so bad, you find that kind of bandwith on midrange graphics cards, and here you have the ESRAM to play with as well.

If there is a large chunk of fast 1T-SRAM/EDRAM, it should be fine. Otherwise, it will be hard to compete with the PS4 if it has ~192 GB/s of main memory bandwidth.

And assuming they use a larger pool of eDRAM/1T-SRAM in Durango, would they be likely to use this configuration again?

The best option, in my opinion, would be a high-speed ring bus with attached:
- the GPU second-level caches*
- the CPU second-level caches
- the 1T-SRAM/EDRAM
- the 256-bit wide memory interface.
In this way the 1T-SRAM/EDRAM could be mapped to a specific memory address range and accessed by both the CPU and GPU for any intent and purpose.

*The current bandwidth of the L2 caches in GCN is 64B/cycle per 64-bit channel. At 1 GHz with 4 channels (256-bit), the peak is 256 GB/s. Considering the chip is supposedly highly customized, I wouldn't rule out a doubling of the bandwidth to 128B/cycle per 64-bit channel, though. With a sufficiently fast ring bus, this would allow up to 512 GB/s of bandwidth from the 1T-SRAM/EDRAM to the GPU.
 
Well apparently, according to what Rangers posted a few pages back, the e-insert ur fav RAM buzzword here- is not part of the said special units. Anyway I don't believe MS will include such a large amount of RAM only to allow it to be crippled by its bandwidth. I think the fact that, from the rumour mill, its going to be an APU SoC will probably allow them to engineer a system to properly expose all the benefits of the system. The design of the Xenos gives me confidence that they are going to design a really fantastic system.

Btw, I understand that the edram in Xenos has a bandwidth of 256Gb/s, my question is, in what way did this go about alleviating the bandwidth constraints of the main RAM? I get that some operations were moved to the edram unit, but I would like to know how that solved the bandwidth constraint of the system. I ask because an understanding can give us some insight into the design choice of going with 8gb of slow RAM that will be supplemented with an edram.
 
The best option, in my opinion, would be a high-speed ring bus with attached:
- the GPU second-level caches*
- the CPU second-level caches
- the 1T-SRAM/EDRAM
- the 256-bit wide memory interface.
In this way the 1T-SRAM/EDRAM could be mapped to a specific memory address range and accessed by both the CPU and GPU for any intent and purpose.

*The current bandwidth of the L2 caches in GCN is 64B/cycle per 64-bit channel. At 1 GHz with 4 channels (256-bit), the peak is 256 GB/s. Considering the chip is supposedly highly customized, I wouldn't rule out a doubling of the bandwidth to 128B/cycle per 64-bit channel, though. With a sufficiently fast ring bus, this would allow up to 512 GB/s of bandwidth from the 1T-SRAM/EDRAM to the GPU.

Yes, sounds like a good guess. Jaguar's architecture doesn't support more than 4 cores and L3 caches, so i think that the 2 "4-core Island" will be connected as you said to all the other elements of the SoCs.
 
1T-SRAM, which, as you said, is basically EDRAM, sounds good. Wikipedia only reports the densities up to 45 nm. If we assume at 28 nm the density, including the overhead, is roughly ~0.08 mm^2 per Mbit, 100 MB of 1T-SRAM would be roughly 64 mm^2. It should be doable in a large SoC.



How expensive will DDR3/DDR4 be 3-4 years from now, though? The long term cost might be in favor of DDR4.



If there is a large chunk of fast 1T-SRAM/EDRAM, it should be fine. Otherwise, it will be hard to compete with the PS4 if it has ~192 GB/s of main memory bandwidth.



The best option, in my opinion, would be a high-speed ring bus with attached:
- the GPU second-level caches*
- the CPU second-level caches
- the 1T-SRAM/EDRAM
- the 256-bit wide memory interface.
In this way the 1T-SRAM/EDRAM could be mapped to a specific memory address range and accessed by both the CPU and GPU for any intent and purpose.

*The current bandwidth of the L2 caches in GCN is 64B/cycle per 64-bit channel. At 1 GHz with 4 channels (256-bit), the peak is 256 GB/s. Considering the chip is supposedly highly customized, I wouldn't rule out a doubling of the bandwidth to 128B/cycle per 64-bit channel, though. With a sufficiently fast ring bus, this would allow up to 512 GB/s of bandwidth from the 1T-SRAM/EDRAM to the GPU.

Couldn't they start with DDR3 2183 and then move to DDR4 later on?

Also, your proposed system sounds like a very good solution ;-)
 
Couldn't they start with DDR3 2183 and then move to DDR4 later on?

DDR4 banking is different than in DDR3. In some pathological cases, I can see a workload run much slower on DDR4 than it does on DDR3 of same specified speed. I don't see them doing that.

Which is fine, because they don't have to. DDR4 is not in some way inherently more expensive. It's made in the same facilities by the same companies as DDR3, and each die is ~of the same size. It's market price is simply a reflection of the fact that there is little demand or supply. If you placed a large order for DDR4 for the summer, you'd probably get a price reasonably close to DDR3 price.
 
And DDR4 should already allow for higher speeds than DRR3. Micron says that 2400 MHz and 2800 MHz is already possible and they can go higher with binning.
 
AMD memory controllers work well at high speed on latest CPU/APU.
So I would expect 2133. It's decently cheap now, they could have a high CAS timing such as CL13, which still has a good absolute latency
http://en.wikipedia.org/wiki/DDR3_SDRAM#JEDEC_standard_modules

I now definitely believe the memory will be DDR3 and not DDR4, the latter should be hideously expensive and booked for servers.
And well. 68GB/s is not so bad, you find that kind of bandwith on midrange graphics cards, and here you have the ESRAM to play with as well.

As far as I recall, AMD doesn't support DDR3-2133 for any of their processors. Not to say they couldn't, just that they don't so it would be a customization.
 
AMD memory controllers work well at high speed on latest CPU/APU.
So I would expect 2133. It's decently cheap now, they could have a high CAS timing such as CL13, which still has a good absolute latency
http://en.wikipedia.org/wiki/DDR3_SDRAM#JEDEC_standard_modules

I now definitely believe the memory will be DDR3 and not DDR4, the latter should be hideously expensive and booked for servers.
And well. 68GB/s is not so bad, you find that kind of bandwith on midrange graphics cards, and here you have the ESRAM to play with as well.
DDR4 is more realistic,it's not like Durango will only appear 2.5 years then suddenly Microsoft bankrupt because DDR4
 
Aegis has stated that Durango has a monster SoC and that both next-gen consoles consumes around 170-200W (he did that in two different statement on GAF).

The current rumored specs leaves quite a large space for our "mistery sauce".
 
Out of curiousity, what was MS' reason for this config (i.e. eDRAM and ROPs on a daughter die) with the 360?

And assuming they use a larger pool of eDRAM/1T-SRAM in Durango, would they be likely to use this configuration again?


Well if they use that configuration then it's not a GCN (or GCN2) GPU anymore is it. ;)
 
Aegis has stated that Durango has a monster SoC and that both next-gen consoles consumes around 170-200W (he did that in two different statement on GAF).

The current rumored specs leaves quite a large space for our "mistery sauce".

I don't know about that. Is final silicon even in the dev kits yet? It may be a power envelope goal but I can't see how that would be known yet.

Also if true doesn't it suggest we are missing a lot from Orbis? Could be lots of "secret sauce" in there too.
 
I guess it makes sense it's a monster because of all the added chips. ARM is in there too isn't it? And hell, maybe even that PowerPC cpu for back compat.
 
I don't know about that. Is final silicon even in the dev kits yet? It may be a power envelope goal but I can't see how that would be known yet.

Also if true doesn't it suggest we are missing a lot from Orbis? Could be lots of "secret sauce" in there too.

If I remember correctly, a rumor said that MS has shipped the last devkit with final silicon.
 
Out of curiousity, what was MS' reason for this config (i.e. eDRAM and ROPs on a daughter die) with the 360?

And assuming they use a larger pool of eDRAM/1T-SRAM in Durango, would they be likely to use this configuration again?

Btw, I understand that the edram in Xenos has a bandwidth of 256Gb/s, my question is, in what way did this go about alleviating the bandwidth constraints of the main RAM?
To answer both of you, the rendering of pixels is a large consumer of bandwidth. Moving that to the smart eDRAM is equivalent to saving that BW from main RAM, but not the same as adding the BWs together (in the early days of PS3 vs. XB360 we had the ludicrous PR bandwidth wars on top of the Flops wars and the Prerender wars). Xenos's internal BW was exactly as needed to never starve the ROPS, but the ROPs wouldn't be consuming that much BW all the time, so it wouldn't peak at 256 GBps.

It's a good design feature for a GPU using eDRAM. The alternative is to sit the ROPs outside the eDRAM and have them share the same GPU<>eDRAM bus. I'd be surprised if MS don't go with the same design, but maybe there's a downside I'm unaware of? Only I'd expect the eDRAM to be advanced further to have full GPU access for read/write.
 
Status
Not open for further replies.
Back
Top