Predict: The Next Generation Console Tech

liolio · Jan 19, 2013

Hornet said:
So 102.4 GB/s sounds right for the GPU connection. But the internal bandwidth should be in the order of 819.2 GB/s to sustain the same kind of ROPs operations that Xenos did, without resorting to compression (which, according to the Xbox 360 designers, is undesirable because unpredictable).

Well I do not read it the same way, it is faster than the GPU connection to the main memory pool which is according to the rumors ~60GB shared between the CPU and GPU.

He may clarify but I don't think he speak of what you are speaking about, ie and external die or what not, just the fact that the scratchpad would be only dedicated to raster operations / tight to the ROPs.

By the way 12 ROPs is nothing amazing, pitcairn have 32 (though under fed).

the whole point is what is more efficient, keep modern, powerful, efficient ROPS and link them to a scratchpad memory pool through a reasonably fast interface (enough to keep the ROPS fed all the time)? Or go with sucky ROPs as in Xenos and design really wide inteface between those ROPs an the scratchpad memory?

I'm not sure that the Scratchpad will be stuck to the ROPs, you may want the shader cores to be able to read your render target without having to moving you data back to the main RAM.
Though I've no idea about how much internal bandwidth there is lasst AMD GPUs I speak of the bandwidth at the level of what seems a "ring" on this graph: the thing that connect the L2, shader cores, ROPs to the memory controller /everything (in grey just above the memory controllers).

I could see the Scratchpad tight to that ring with the ROPs using it as favorite/primary target but having the option to render to the main if they want. That could be the most flexible solution (say you ROPs have to render a low res render target, that compress well, you don't want to move the data in the scratchpad +> render straight to the main ram, it seems MSFT when with a healthy amount of bandwidth vs something like trinity, could be like a Z only render target for occlusion purpose for example).
Though I don't know how much bandwidth that ring/crossbar provides ( I would think significantly more than the external bandwidth for obvious reasons though), I guess it is scalable (not the same needs for cap verde, pitcairn, tahiti). Something like xenos would be backward looking imho.

Hornet · Jan 19, 2013

Acert93 said:
If MS cares about BC more so than launching a new platform that refines and corrects past mistakes with an eye toward a better design, then sure, that makes a lot of sense. The eDRAM in Xenos while helpful also caused its own set of problems that modern GPUs--8 years later--have significantly shifted the targets and bottlenecks.

Going with a Xenos style eDRAM setup would be a major mistake IMO and a complete mismanagement of silicon resources.

I agree with you, but in my opinion you would need more than 32 MB and more than 102.4 GB/s to use the EDRAM as a general-purpose scratchpad memory. If we assume the current rumors are grounded in truth, the 32 MB at 102.4 GB/s external and 819.2.2 GB/s ROPs makes the most sense.

At least, 32 MB of EDRAM are sufficient for 1080p with 2xMSAA. If Microsoft went for 14 MB of EDRAM on the 360, the amount of games with 2xMSAA would have been much higher, in my opinion.

Silicon wise, the cost of 32 MB EDRAM would be pretty negligeable (around 20 mm^2 at 28 nm). Probably, rather than a daughter die, it will be on the same chip, but I believe the bandwidth will be as stated (819.2 GB/s for the ROPs and 102.4 GB/s to the rest of the SoC). 20 mm^2 worth of silicon cost much less than switching to GDDR5 and allows them to have 8 GB of main memory.

Xenus · Jan 19, 2013

DavidGraham said:
Could someone please explain the relation between memory bandwidth , capacity , and frame rates ? why would a a frame be restricted to acess certain amount of GBs from a full memory configuration ?

Frame rates is a given time frame. Easiest way to think about this would be a hose, a tank, and some timeperiod like 1s. In 1s or frame, the tank being the total amount of RAM in the config the hose cotrols how fast that is emptied. The larger the hose the faster it is emptied aka the more bandwidth it has.

scently · Jan 19, 2013

bgassassin said:
You must have missed or forgotten the posts about lherre's comment and if you noticed the other "hype" comments were from around or before that time as well. No one is saying the same thing now. That's what I said before and that's what I'm saying now. You're only getting (or focusing on) a part of the context and that's the recent comments.

Let's go back to June 2012:

http://forum.beyond3d.com/showpost.php?p=1649097&postcount=12175
http://forum.beyond3d.com/showpost.php?p=1649106&postcount=12182

Yeah, I have seen and read all of that and you have yet to show me where he changed his tune. That is what I am asking you because as far as I know he is saying that both will be close, with somethings favoring one or the other as the case may be. He even said back then that the durango gpu was weaker but I cannot recall him singing a different tune. And he is a developer mind you so they know the final specs, even if they had been working with alpha or beta kits.

Brad Grenz · Jan 19, 2013

MarkoIt said:
That's not entirely true. AMD and Nvidia research is aimed to run current games and current rendering technology at their best. They may have some long term graphics research going on, but they can't use it until it becames a standard. Take the tessellation unit as an example of this. Microsoft instead since provides the API can do much more to affect the future of graphic technology. I'm not saying that Microsoft reinvented the wheel, but with the help of AMD they may have design some unique rendering solution that will be standardized in the next iteration of DirectX. Therefore I think that we are missing an element of Durango's design, which makes up for the power difference we see between the two console.

By the time 360 came around ATI had been experimenting with tesselation for a long time. Reember TruForm? That goes back to like the old Radeon 7500.

Nisaaru said:
Sorry but do you assume the current AMD/NV designs are the end of the pipeline? These companies have obviously parallel teams which develop future designs and MS could surely finance AMD to speed up some next generation design team combined with some DX11++ extensions. I can't really see why they shouldn't play by their strengths to get an edge over their competition.

No, I just don't foresee magical unanticipated leaps forward.

Xenio · Jan 19, 2013

upnorthsox said:
How do you come up with the 819.2 GB/s number?

he did the math 8 x 102.4 = 819.2
maybe he is assuming that Xbox Next have 8x the ROPs of 360

Bagel seed · Jan 19, 2013

Carl B said:
Absolutely, in those areas where you would be able to sub it in for said CPU/GPU - but we're talking in addition to those chips. Is this CPU/GPU combo going to be unable to decode AVCHD streams, however inefficiently? I don't think so. So for me, now it's about - are these tasks expected to be taking place while the user is engaged in gameplay? I wouldn't think so there either. Which begs the question: would the entire idea be redundant for a console?

Well we know it's great in image processing. So anything post-processing related perhaps? That's a fairly significant part of rendering. Also, check this out, fractal compression:

Fractal compression is an efficient technique for image and video encoding that has not gained widespread acceptance due to its computational intensity. In this paper, we present a real-time implementation of fractal compression in OpenCL, and show how the algorithm can be efficiently optimized for multi-CPUs, GPUs, and FPGAs. We show that the core computation implemented on the FPGA through OpenCL is 3x and 114x faster than a high-end GPU and multi-core CPU, respectively. We also compare to a hand-coded FPGA implementation to showcase the effectiveness of OpenCL-to-FPGA compilation.

http://www2.infonets.hiroshima-u.ac.jp/aspdac/program/4A_abst.html#4A-1

I don't think there's much in PS4 that's going to be ported to CMOS sensors outside of a pure R&D angle, do you?

For the most part I just meant that if they're buying in bulk chip wise it'll have uses in other arms of their business. But I could see compression or video processing algos possibly being ported. They'll be able to make use of any potential R&D budget put towards it more effectively.

Hornet · Jan 19, 2013

liolio said:
the whole point is what is more efficient, keep modern, powerful, efficient ROPS and link them to a scratchpad memory pool through a reasonably fast interface (enough to keep the ROPS fed all the time)? Or go with sucky ROPs as in Xenos and design really wide inteface between those ROPs an the scratchpad memory?

As stated in my previous post, I would prefer a scratchpad memory, but you would need more than 32 MB and more than 102.4 GB/s to make it worth the complexity. Hence, if the rumors are true, this Xenos-style setup is the most sensible setup I can think of.

liolio said:
I'm not sure that the Scratchpad will be stuck to the ROPs, you may want the shader cores to be able to read your render target without having to moving you data back to the main RAM.

If it's going to be a general purpose scratchpad, I cannot see a good reason not to go with much higher bandwidth (at least 256 GB/s, but 512 GB/s would be useful). The only way in which 102.4 GB/s makes sense is that it's the required bandwidth needed to feed 12 ROPs.

liolio said:
Though I've no idea about how much internal bandwidth there is lasst AMD GPUs I speak of the bandwidth at the level of what seems a "ring" on this graph: the thing that connect the L2, shader cores, ROPs to the memory controller.

I don't know. I know that the L2 cache bandwidth is 64 Byte/cycle for each memory channel. It's 256 GB/s aggregate for Cape Verde.

liolio said:
I could see the Scratchpad tight to that ring with the ROPs using it a favorite target but having the option to render to the main if they want. That could be the most flexible solution.
Though I don't know how much bandwidth that ring/crossbar provides, I guess it is scalable (not the same needs for cap verde, pitcairn, tahiti). Something like xenos would be backward looking imho.

Much more than 102.4 GB/s, otherwise it would not even be able to take advantage of the 288 GB/s of main memory bandwidth on Tahiti. That's way 102.4 GB/s only make sense for a Xenos-style setup.

Carl B · Jan 19, 2013

Bagel seed said:
Well we know it's great in image processing. So anything post-processing related perhaps? That's a fairly significant part of rendering. Also, check this out, fractal compression:

http://www2.infonets.hiroshima-u.ac.jp/aspdac/program/4A_abst.html#4A-1

No doubt it is superior in compression/decode and other traditional signal processing tasks. The question for me is, is it needed? Because as inefficient as the AMD chips will be in that regard, they'll still get it done.

For the most part I just meant that if they're buying in bulk chip wise it'll have uses in other arms of their business. But I could see compression or video processing algos possibly being ported. They'll be able to make use of any potential R&D budget put towards it more effectively.

I'm keeping the two separate - Playstation and the other divisions. FPGA's should be inexpensive enough on a per-unit basis that you would not include one unnecessarily just to reduce your volume costs across other product lines.

Maybe the programmable logic Masaaki Tsuruta alludes to is as simple as is being posited, maybe it is something else entirely. Maybe it is for "background" AV services, maybe it is for cloud-based services... who knows.

McHuj · Jan 19, 2013

Let's assume for a moment that the rumors for the systems are true. We're looking at a 1.8 vs 1.2 TF GPU, that's roughly the discrepancy between a Radeon 7770 and a Radeon 7850. While the performance difference isn't as dramatic as a 7770 vs a 7970, I would say that it's definitely noticeable and I wouldn't use the words comparable to describe it.

I wouldn't except either GPU to be imbalanced in terms of the other units (TMUs and ROPs) so the way I see it, would would have to have major architectural difference (and I don't think embedded RAM in the ROP's would be enough. Would it?)

Perhaps an improved ISA for the lower flow GPU allowing for a much higher utilization of the theoritcal max? In theory, if the lower GPU can achieve a utilization of 75% of it's flops while the higher one maxes out around 50%, then they are roughly equivalent. What's a typical utilization/profile of a GPU workload on a modern GPU nowadays?

Proelite · Jan 19, 2013

Acert93 said:
If MS cares about BC more so than launching a new platform that refines and corrects past mistakes with an eye toward a better design, then sure, that makes a lot of sense. The eDRAM in Xenos while helpful also caused its own set of problems that modern GPUs--8 years later--have significantly shifted the targets and bottlenecks.

Going with a Xenos style eDRAM setup would be a major mistake IMO and a complete mismanagement of silicon resources.

That's if the ESRAM is set up like eDRAM.

McHuj said:
Perhaps an improved ISA for the lower flow GPU allowing for a much higher utilization of the theoritcal max? In theory, if the lower GPU can achieve a utilization of 75% of it's flops while the higher one maxes out around 50%, then they are roughly equivalent. What's a typical utilization/profile of a GPU workload on a modern GPU nowadays?

I don't think it's impossible for a console GPU to be architectured to have 100% utilization.

Brimstone · Jan 19, 2013

ramr said:
This is all quite fascinating, so I thought I would chime in and try to add a little strategic analysis to the discussion. Fair warning, I am not a technical person. However, I am a strategy professional.

I will take Orbis first.

The latest information and rumors lines up fairly well with Sony’s vision, capabilities, and limitations. The only questionable item is the cost of the memory, which I do believe was raised due to competitive pressure from MS and developer requests.

Durango is far more interesting since there are such wildly divergent rumors/information out there. I will first layout “facts” and assumptions and follow that with analysis/predictions.

Facts:
Powerpoint Leak
MS has publicly discussed the idea of forward compatibility
MS wants/needs to dominate the living room
Apple and Google are the true competitors
Gaming will drive early adoption and provide a differentiator to above competitors
Multiple media outlets have written articles about xbox surface tablets supported by multiple MS sources, see verge article http://www.theverge.com/2012/11/6/3608432/xbox-surface-xbox-tablet-7-inch.
The same sources indicated that the June surface leaks did have the correct specs.

Assumptions:
Initial plans called for at least 680 level performance
Many, if not most, of the rumors with concrete numbers are true, for certain values of true.
Lots of others that I don’t have time to reference.

Analysis/prediction:
1. all software for next box is scalable through forward compatibility
2. There are at least 4 SKUs being prepared
a. Set-top xbox
b. Xbox surface (if the 24th meeting confirms this, then the probability of the rest of this shoots through the roof)
c. Xbox next
d. Xbox Pro/server
3. The leaked powerpoint has most of the major initiatives in it

Description of SKUs
Set-top box – either super-slim 360 soc at 28 nm or some jaguar APU. This is the basic cable box alternative.

Xbox surface – 7 in tablet that plays windows mobile games and can act as a terminal for streamed content / games from xbox next and server. Has jaguar APU

Xbox next – base games machine. Has most of the leaked specs – 1.2 TF GPU, jaguar apu, 8 gb, secret sauce created something roughly equal to Orbis. Target 1080P 30 FPS, high image quality. Can play games and handle multimedia simultaneously, including streaming to 1 surface. May have HW BC initially.

Xbox pro – enhanced xbox with more memory, hardware BC to 360, and 2 GPU units, can play graphically enhanced version of next box games 1080+, 60 FPS, enhanced IQ, and stream to multiple surfaces.

This structure best fits the available data and provides a framework to achieve MS goals for living room dominance. Could they go in another direction? Absolutely, but given the available information, this seems to be the best observation. The meeting the 24th should be very telling.

I thought Surface was Nvidia TEGRA? Oddly enough person in charge of the TEGRA design has ample console experience...he lead the design of the Atari Jaguar consoles. So now MS changes to AMD Jaguar cpu cores?

If Microsoft is going AMD Jaguar they would be giving up control on the CPU...MS won't be allowed to own/control the x86 design. On top of the backwards compatibility issue. If switching from Power why not ARM?

The Durango picture to me is very strange if it is using AMD x86 cores. It seems to make things so complicated.

Carl B · Jan 19, 2013

scently said:
...And he is a developer mind you so they know the final specs...

As a complete aside, be it known that it will be the minority of developers that know the actual final specs at this time, even though they will all be NDA'd.

ramr · Jan 19, 2013

Brimstone said:
I thought Surface was Nvidia TEGRA? Oddly enough person in charge of the TEGRA design has ample console experience...he lead the design of the Atari Jaguar consoles. So now MS changes to AMD Jaguar cpu cores?

If Microsoft is going AMD Jaguar they would be giving up control on the CPU...MS won't be allowed to own/control the x86 design. On top of the backwards compatibility issue. If switching from Power why not ARM?

The Durango picture to me is very strange if it is using AMD x86 cores. It seems to make things so complicated.

i think surface is a product line. There is RT which competes with iPAD, Pro which competes with ultrabooks, and supposedly xbox surface may be more of a $199 type tablet crossed with streaming media and gameplay, including full 720 games ala wiiu tablet.

scently · Jan 19, 2013

Carl B said:
As a complete aside, be it known that it will be the minority of developers that know the actual final specs at this time, even though they will all be NDA'd.

Indeed. But he hasn't actually come out to say this is the specs of the nextbox or ps4. All he has done is give hints as to what to expect relative to the current gen and what has been leaked. Thus he hasn't broken any NDA.

Carl B · Jan 19, 2013

scently said:
Indeed. But he hasn't actually come out to say this is the specs of the nextbox or ps4. All he has done is give hints as to what to expect relative to the current gen and what has been leaked. Thus he hasn't broken any NDA.

Sure - my angle wasn't the NDA, which is anyone's own business to break or not and which I don't care about, but rather that he might not know the final specs.

To the extent that he is giving honest feedback relative to what his experiences or conversations related to the dev kits have been, there's certainly that, but everything else is grain of salt mode for me right now.

Ketto · Jan 19, 2013

Xenio said:
Nice, how do you know this?

Already answered

64 MB of eSRAM would cost about how much 10 MB eDRAM in the 2005, so what's the point?

Based on?

it's not a cross fire and it's transparent to the developer

If they're on two separate SoC and each one has it's own pool, let me guess the blitter?

it's easy and elegant to print a soc with VTE inside and put two of it on board
soc are means to have cpu, gpu and the rest of the microsystem, included audio, so what's so strange for you?

Again why have a VTE on both SoC just to handle audio? One would be enough.

ps3 with ps2chip inside for BC say nothing to you?

Them removing it says something to me.

a lot of normal things seems incredible to you, why?

I think 400-450 $

Nothing of what you're wishing for in Durango is "normal". So far all rumors that have come forward paint Durango as using 1 pool of DDR3. I honestly haven't even heard of a 3 memory pool + blitter setup until....well you (and misterx...lulz).

Shifty Geezer · Jan 19, 2013

DavidGraham said:
Could someone please explain the relation between memory bandwidth , capacity , and frame rates ? why would a a frame be restricted to acess certain amount of GBs from a full memory configuration ?

RAM capacity represents how much content you have available. Each texture and object takes up so much RAM. Let's take a representative case where each car, person, and building consumes 10 megabytes of data.

RAM bandwidth represents how much content you can move to the GPU in a given time for it to draw on the screen. Thus with our example, 100 megabytes a second would let you draw 10 objects in one second. 200 MB/s would allow for 20 objects to be drawn in one second.

Framerate is how quickly you get the GPU to draw stuff, which clearly means the faster the bandwidth, the more you can draw in a frame. Let's take a 30 fps game. For our GPU to draw one object every frame, it is going to need to access 10 MBs of object data 30 times a second, for a needed bandwidth of 300 MB/s. If we have 600 MB/s, we can draw 2 objects a frame, or one object a frame twice as quickly at 60 frames per second.

This means a rule of thumb for BW consumption is that you have max bandwidth/frames-per-second for how much data you can access each frame. A console with 150 GBs can provide the GPU with 5 GB/frame in a 30fps game. A console with 200 GB/s can provide 6.67 GB/frame, which would mean 1/3 more content, whether more detailed textures, objects, or more scene variety.

Of course, in real life it's not that simple.

DoctorFouad · Jan 19, 2013

Hornet said:
I agree with you, but in my opinion you would need more than 32 MB and more than 102.4 GB/s to use the EDRAM as a general-purpose scratchpad memory. If we assume the current rumors are grounded in truth, the 32 MB at 102.4 GB/s external and 819.2.2 GB/s ROPs makes the most sense.

At least, 32 MB of EDRAM are sufficient for 1080p with 2xMSAA. If Microsoft went for 14 MB of EDRAM on the 360, the amount of games with 2xMSAA would have been much higher, in my opinion.

Silicon wise, the cost of 32 MB EDRAM would be pretty negligeable (around 20 mm^2 at 28 nm). Probably, rather than a daughter die, it will be on the same chip, but I believe the bandwidth will be as stated (819.2 GB/s for the ROPs and 102.4 GB/s to the rest of the SoC). 20 mm^2 worth of silicon cost much less than switching to GDDR5 and allows them to have 8 GB of main memory.

I dont know if it is the right place to discuss edram potential for nextgen consoles (I think a lot of threads and posts were dedicated to this in the past), but I do believe in one thing for edram : edram is worthy ONLY IF you can include enough of it for your target graphics. But thats very expensive, every time a console maker include edram in its console, its always the same problem : they want to reduce costs by avoiding implementation of fast main RAM, but they ALWAYS end up with terrible bottlenecks and problems with insufficient quantity of edram (ps2, xbox360)....so unless Microsoft come up with 128 Mb of edram, its just not worth it...

In short I prefer faster bigger general purpose main RAM anytime anywhere over insufficiant amount of edram.

Prophecy2k · Jan 19, 2013

Shifty Geezer said:
RAM capacity represents how much content you have available. Each texture and object takes up so much RAM. Let's take a representative case where each car, person, and building consumes 10 megabytes of data.

RAM bandwidth represents how much content you can move to the GPU in a given time for it to draw on the screen. Thus with our example, 100 megabytes a second would let you draw 10 objects in one second. 200 MB/s would allow for 20 objects to be drawn in one second.

Framerate is how quickly you get the GPU to draw stuff, which clearly means the faster the bandwidth, the more you can draw in a frame. Let's take a 30 fps game. For our GPU to draw one object every frame, it is going to need to access 10 MBs of object data 30 times a second, for a needed bandwidth of 300 MB/s. If we have 600 MB/s, we can draw 2 objects a frame, or one object a frame twice as quickly at 60 frames per second.

This means a rule of thumb for BW consumption is that you have max bandwidth/frames-per-second for how much data you can access each frame. A console with 150 GBs can provide the GPU with 5 GB/frame in a 30fps game. A console with 200 GB/s can provide 6.67 GB/frame, which would mean 1/3 more content, whether more detailed textures, objects, or more scene variety.

Of course, in real life it's not that simple.

Fixed for accuracy ;-)

MOD: Thanks. Typed GB/s to habitual amount

Predict: The Next Generation Console Tech

liolio

Aquoiboniste

Hornet

Xenus

scently

Brad Grenz

Philosopher & Poet

Xenio

Bagel seed

Hornet

Carl B

Friends call me xbd

McHuj

Proelite

Brimstone

B3D Shockwave Rider

Carl B

Friends call me xbd

ramr

scently

Carl B

Friends call me xbd

Ketto

Shifty Geezer

uber-Troll!

DoctorFouad

Prophecy2k

Similar threads