Predict: The Next Generation Console Tech

Status
Not open for further replies.
Ps- I will be ticked if Loop is the next real Xbox! But there is a thread to discuss that rumor and our feelings.
The only place I've ever seen the codename "loop" is that rumor site.
(Oh, and it's what we called the main screen of the Kin)
 
How much work is required to change the interface in a memory chip?
Sony owns fabs, could these be used to produce their own memory chips?

I was thinking that, logically, a generic GDDR5 should be less expensive than a custom production of XDR2, making XDR2 very improbable, but.... if we suppose for a moment that Rambus actually delivered on their promises (twice the performance per pin, easier board layout, less board layers, lower power), Sony might have a big advantage if they can keep an XDR2 production secret, preventing competitors from piggybacking on that production. Giving them the edge they need.

XDR being EOL, and no memory manufacturer want to make XDR2, there's not much else in sight for Rambus. Their technology was there since 2005 but it's not produced because nobody ask for it, and nobody ask for it because it's not sampled. Let's say Sony goes to Samsung and ask them for an XDR2 chip, Sony would need, say, half a billion units for the PS4, that's a LOT of wafers and could makes the initial investment for the mask worth it, it's a big contract. Sony would also be able to negotiate a very low licensing cost to Rambus, squeezing their balls. Rambus would have a strong incentive to accept Sony's terms, because without a contract with Sony, XDR2 is essentially dead. Memory producers don't want to make XDR2 memory and didn't ever since 2005, so a contract requiring the XDR2 production being kept secret isn't an impediment for any of them.

I mean, going to a memory manufacturer and ask "we need half a billion of your memory chips, but with this XDR2 macro instead of GDDR5, can we work something out?". It's probably a lot of work, but it's not like you're asking them to design a complete CPU or something.
 
Im quite intrigued by the idea of some sort on on die cache with the gpu, i menntioned the xdr2 idea, but i havnt the knowledge to comment on what sort of size cache would be worthwhile..360 had 10mb on daughter die..which meant 250gb/s bandwith for some simple things like z stencil, AA, etc is that right?

Yes.

I also read, maybe in an article here from dave baurman? that if you moved the cache completely on die, the whole gpu gets access to that bandwidth? is that correct?

sort of...

So would putting say 20mb edram on that tahiti gpu be beneficial?
Then you could just unify the rest of the system with some high density/cheap GDDR3.

The big deal isn't really what parts of the GPU can touch the eDRAM, it's what loads from the GPU benefit from a small pool of very fast memory. The framebuffer is essentially an ideal fit for this -- what you want is very fast access to a data set whose size mostly depends on the screen resolution, and even on full HD there's really no need for more than 20MB. (And here's when the deferred rendering crowd lynches me.:mrgreen:)

On the other hand, textures are pretty much the exact opposite -- the access patterns of textures are very hostile to caching. In the modern world of "every object has it's own texture", when you sample a texel, that texel instantly becomes the least likely piece of memory to be accessed, until next frame. And since you will be working through a large proportion of your texture pool per frame, having a small fast cache for textures doesn't really help you at all. Accordingly, the texture caches on modern GPU's are designed for space locality, not time locality. Because when you read a single texel from a texture, it's quite likely that you *will* need the adjancent texels. So for texture units, you will always benefit from a wide path to a large pool, and you cannot really replace this with an eDRAM cache. At least until you can fit hundreds of MB of eDRAM on the die.
 
(And here's when the deferred rendering crowd lynches me.:mrgreen:)
Yep. ;) Everything seems to be moving towards deferred rendering; at least some parts of the pipeline like light prepass. It's not safe to think of the FB as a small file any more. It's gonna be built out of lots of buffers and render targets which won't all fit in eDRAM concurrently unless there's a truckload of eDRAM, which would cost too much. eDRAM as a working space per buffer makes sense, with each buffer being written out to main RAM and then copied over for final composite, but then you have significant RAM BW consumption which is what the eDRAM is supposed to alleviate.
 
Isn't that what I said? Possible hardware speculation is speculation.
That whole line wasn't speculation but console warring. Console A is more powerful than Console B. No actual hardware specualtion involved, unlike the rest of this thread which is looking at well explained options even if they are unlikely or rampant (PS4 will be 100% raytraced graphics using PowerVR OpenRL engine!). Had the original article had any hardware speculaiton in then it could have been considered. As it was, bkilian was right:
My point was that discussing the ephemeral concept of "power" (which as we've seen in the last 5 years, is a difficult concept to measure) in relation to rumors seems a little premature.
Power speculation isn't the same as hardware speculation. Hell, we can't even measure power on existing, well known systems, so how we're supposed to rationally discuss relative power on unknown boxes, I don't know! ;)
 
Right sort of got that, so what you are saying is that for some tasks such as texturing the edram would have to be significantly bigger, which wouldnt justify its cost...as some of the rumours focused around 64mb of L3 on the power pc core, would that be enough cache to be worth the cost?

Basically what im getting at, is there a neat trick where you could hit every target you want..high bandwidth, 4+gb ram, small pin for better cost reductions in future- all without taking too much of the hardware budget?...
 
Right sort of got that, so what you are saying is that for some tasks such as texturing the edram would have to be significantly bigger, which wouldnt justify its cost...as some of the rumours focused around 64mb of L3 on the power pc core, would that be enough cache to be worth the cost?

64 MB won't cut it, 500MB probably would. And that's just not manufacturable at present.

Basically what im getting at, is there a neat trick where you could hit every target you want..high bandwidth, 4+gb ram, small pin for better cost reductions in future- all without taking too much of the hardware budget?...
And the reason that they won't do that and it won't work is that it would sacrifice performance on drawing textures, and texture detail is one of those things that is very easily visible when you put two differently specced machines side by side.

I know everyone is moving to deferred rendering, and I still think it's probably not the right way to go for consoles. Having an eDRAM render target will essentially double the performance of the memory interface, in a box where a huge chunk of the total cost to manufacture for the console maker will depend very much on the memory interface. I really do think that what they will do is tell people writing fancy renderers to suck it, 20MB framebuffer is what you'll get.
 
Right sort of got that, so what you are saying is that for some tasks such as texturing the edram would have to be significantly bigger...
If you wanted to texture from eDRAM, you'd need lots, but texturing is fairly served by a normal RAM bus and GPU caches - it's not a huge BW consumer.

A 1080p, FP16 buffer requires ~ 16MB with no AA. Excluding Z and AA. So rendering a single render target at a time into eDRAM is going to need a great chunk of the stuff, and you'll still be writing lends of render target back and forth to RAM.

which wouldnt justify its cost...as some of the rumours focused around 64mb of L3 on the power pc core, would that be enough cache to be worth the cost?
Those are stupid rumours. 64 MBs eDRAM on the CPU makes zero sense in a console.

Basically what im getting at, is there a neat trick where you could hit every target you want..high bandwidth, 4+gb ram, small pin for better cost reductions in future- all without taking too much of the hardware budget?...
Nope. ;) Hence the choices of varying compromises. Every possible solution is imperfect in some way and the hardware vendors have to make the mst of them. Best chance looks like XDR2 at the moment, which is still in the vapourware phase.

I know everyone is moving to deferred rendering, and I still think it's probably not the right way to go for consoles.
Why? The visual advantages can't be disputed IMO. I think the console companies should focus on deferred rendering and design the hardware to maximise it.
 
Yep. ;) Everything seems to be moving towards deferred rendering; at least some parts of the pipeline like light prepass. It's not safe to think of the FB as a small file any more. It's gonna be built out of lots of buffers and render targets which won't all fit in eDRAM concurrently unless there's a truckload of eDRAM, which would cost too much. eDRAM as a working space per buffer makes sense, with each buffer being written out to main RAM and then copied over for final composite, but then you have significant RAM BW consumption which is what the eDRAM is supposed to alleviate.

Would 64MB EDRAM, I think a reasonable target for next gen, be enough to alleviate a lot of these concerns?

Then I wonder if 28nm EDRAM exists or is in the near future, but anyway.
 
I see what you are saying, your better off going a conventional bus/ram route as the amount of edam needed would be massive, and in anycase you would still be writing to main memory anyway.

One things or sure, if they intend to make these consoles be the centre of entertainment for the next 8 years or so they will have to come up with a unique design..unless they just go completly conventional, and slap a 256 bit bus on and 4gb GDDR5...i suspect that is too expensive..but who knows
 
Then I wonder if 28nm EDRAM exists or is in the near future, but anyway.

Here is a PR from Renesas from Dec. 2010.

http://www.renesas.com/press/news/2010/news20101209b.jsp

Renesas Electronics Corporation (TSE: 6723), a premier provider of advanced semiconductor solutions, today announced the development of a basic structure for embedded DRAM (eDRAM) highly compatible with standard logic circuit design assets (IP: intellectual property) of the next generation system LSIs at the 28-nanometer (nm) node and beyond.
 
It's a start, but... probably not close enough yet for meeting demand of a high performance part (it'd be easier/sooner for mobile than a game console, of course). There's just not enough information.
 
It's a start, but... probably not close enough yet for meeting demand of a high performance part (it'd be easier/sooner for mobile than a game console, of course). There's just not enough information.

You're right that it's definitely not enough info. I was just showing they've supposedly at least started work on it.
 
How viable would it be to have a(several) PowerVR graphics cores for consoles since its rendering methods is already based on TBDR. Tiling is already done for the 360's framebuffer, so what exactly does it give up from Nvidia/ATIs approach.

From a business standpoint, perhaps a better case can be made for using PowerVR.

----------

I'm weird, I want something totally exotic like a gig of EDRAM,thousands of Series 6 Rogue cores, and a quad-core CBE w/32 SPEs.
 
Imagination Technologies unveils G6200 and G6400, first two GPUs based on PowerVR Series6
http://www.engadget.com/2012/01/10/powervr-series6-g6200-and-6400/

First announced in February of last year, Imagination Technologies has officially announced the licensing availability of its first two GPUs based on the Series6 platform. The PowerVR G6200 and G6400 each promise to bring low power graphics to unprecedented levels and are said to deliver up to 20 times more horsepower than the current generation while also being five times more efficient. In tangible terms, the Series6 GPU cores are capable of exceeding 100 gigaflops and are said to approach the teraflop range. All chipsets based on Series6 are backward compatible with Series5 and fully support OpenGL 3.x, 4.x and ES, along with OpenCL 1.x and DirectX 10. Further, specific models will also support DirectX 11.1 with full WHQL compliance.

...
 
Yea i nice release and all that, but realistically the rogue couldn't be used for a console, they are talking about later additions being able to scale up to 1 teraflop and DX11.1

But the arly versions will be no where near that, and besides the high end consoles will need multi teraflops performance.
 
Yea i nice release and all that, but realistically the rogue couldn't be used for a console, they are talking about later additions being able to scale up to 1 teraflop and DX11.1

But the arly versions will be no where near that, and besides the high end consoles will need multi teraflops performance.


if each core can exceed 100GFLOPS a Series6 with 32 or 64 cores will be multi teraflops
 
Status
Not open for further replies.
Back
Top