Predict: The Next Generation Console Tech

Status
Not open for further replies.
Isn't the I7 read from a single L2 cache and no from all of them in parallel?

Don't know it's a performance benchmark I'd assume it's testing things full throttle as hard as possible, but I'm not familiar with this benchmark so it may or may not be..

I just tested code I wrote myself on my 2500K, and I can pull >60GB/s from the L2 without issue. That benchmark is single-threaded.

(Which makes sense, because the caches are not shared so reporting the aggregate bandwidth from all the cores makes no sense at all. It's not a number you can program against.)

Why are you comparing internal bus bandwidth in Cell to L2 cache??! Surely you should be comparing this to the ring bus bandwidth of Sandybridge (which I'm not aware of).

AFAIK the bandwidth is 32B per clock per agent at full clockspeed, which makes the (completely unreachable) theoretical maximum that's comparable to that 300GB/s number 844GB/s.
 
I just tested code I wrote myself on my 2500K, and I can pull >60GB/s from the L2 without issue. That benchmark is single-threaded.

(Which makes sense, because the caches are not shared so reporting the aggregate bandwidth from all the cores makes no sense at all. It's not a number you can program against.)
Interesting.

AFAIK the bandwidth is 32B per clock per agent at full clockspeed, which makes the (completely unreachable) theoretical maximum that's comparable to that 300GB/s number 844GB/s.
Wow:oops:


pjbliverpool said:
The second link simply repeats what was mentioned earlier with regards to peak FlexIO transfer capacity without any regard to what other off chip communication might be required. The simply fact remains that the PS3 implementation of Cell has 25.6GB/s of main memory bandwidth (IIRC it's fixed at 12.8GB/s in either direction but I may be wrong) and that's half the main memory capacity of an i7.
3.2 GHz
× 2 bits/Hz (double data rate)
× 20 (QPI link width)
× (64/80) (data bits/flit bits)
× 2 (unidirectional send and receive operating simultaneously)
÷ 8 (bits/byte)
= 25.6 GB/s -wiki

Notice it says Unidirectional. that version's numbers are for the previous i7 processor, iirc, the new one has double the bandwidth, but unless they changed it from 2x unidirectional, it will have half the bandwidth available for reading or aggregate 50ish read 25ish.

Regarding cell I do not know but I read that the 25ish GB is achieved while either doing all reads or all writes, intermingling of read and write will reduce it to 20ish GB a similar drop to that of the i7 in real world.

Why are you comparing internal bus bandwidth in Cell to L2 cache??! Surely you should be comparing this to the ring bus bandwidth of Sandybridge (which I'm not aware of).

A 32-KB instruction and 32-KB data first-level cache (L1) for each core[i7]
Well given the paltry l1 cache size, data to be computed will most likely be housed in l2.(ed: thanks to the update it appears this is per core as there are six it may suffice)
Sure that number is big but if we used the cell spu local store bandwidth, which is what cell uses for computing from local memory, the bandwidth would seemingly far eclipse the computational bandwidth available to the i7 at l2.(ed or it might not given this latest update)


While we're comparing random numbers why not look at L1 cache bandwidth while we're at it which has a measured bandwidth of ~600GB/s on older i7's:
Interesting but there is a reason the local store is bigger than that, and if it affects performance it may explain the difficulties the i7 faces even in synthetic benchmarks even overclocked.

And like the last time you posted this I'll ask again, was that measuring single or dual precision performance?
Realtime physics does not seem to necessitate double precision, afaik, if it does then you've a point there.

I'm not sure in which way his comparison is relevant. Actually one has nothing to do with the other.
BY the way SnB of 394 GB/s of aggregate bandwidth to L3 stat it and it offers coherency, etc. (the traffic has to be higher).
I'd assumed local store access was similar to l2 access in terms of computational costs, if l3 access is as efficient or can serve to stream to l2 and l2's performance is sufficient for approaching theoretical performance. Then you have a point.

The i7 benchmark is called intel burn test, let us not assume it is some lousy unoptimized code optimized for some obscure ancient cpu. It's had
Official Intel support for Core i7 processors
for years, and was recoded from the ground up not so long ago.

While running that benchmark it is said the i7 may have 20+C rise in temperature and the processor may even fail to work. Even with this it cannot achieve what the cell sustains without burning up.

As far as I can see it is very likely the cell needs both the local store size, as well as the 25ish GB bandwidth to attain the sustained 200Gflop single precision performance.

If the new i7 quickpath is like the old unidirectional, the available read bandwidth is 25GB the same as cell. But the memory is likely higher latency

1600Mhz ddr 3 latency 10ns vs xdr 1.25/2.0/2.5/3.33 ns request packets
ed
IF the above is true it is unlikely the latest i7 will be able to sustain substantially higher performance than cell at tasks for which the cell was designed for, assuming bandwidth played a critical part in cell sustaining such performance.
 
Last edited by a moderator:
These are some interesting comments of developer optimized code on the old 4 core i7 for a mid 2009 game.
1500 boxes and 200 Ragdolls "maxing out" our 4 core / 8 thread Intel i7 processor.-hardocp
My opinion is that GPGPU has proven to be a huge red herring. HardOCP has been posting videos of tech demos from the Infernal Engine all week. The developers narrating are very emphatic on the point that it is far preferable to have your GPU processing graphics. Their physics engine, which is very impressive, relies on heavily threaded, CPU based processing. The results are pretty cool, but they still point out that the PS3 version is the one they like to demo because it can simulate the most objects. Even better than their PC build with 8 threads on an i7 platform.-Brad Grenz
Keep in mind that for game development purposes one of the spus is locked on the current ps3(for OS purposes), and if this is like other ps3 games the remaining cell is not fully dedicated to physics but is likely aiding the frail gpu.

If the above quotes are true we can assume that a fraction of the cell is enough to outpower a 4 core i7 in an optimized game code when it comes to physics by a professional development team.

The question is, could the latest i7 outpower the full cell at its game? And if so by how much and at what costs?(burning up 100+W with a highspeed fan attached?) .
 
Last edited by a moderator:
Well because I can discard rumors about specs but I can't discard the rumor about the thing either a GPU or a SoC being tape out.

It will take a lot to convince me about why AMD would hand over GCN IP quiet a long ago when their own chip just launched either for a standalone GPU or in a SOC.
They are not even using it GCN fore their upcoming APU, a critical segment in their strategy.

Xenos, with its unified shader architecture, launched one and a half year earlier than Radeon 2900.

And why would MS or SONY give Nvidia or AMD bags of money for old tech?

Either way 6850 is more likely than 7850 if you see what I mean. Or it's better to qualify the option by number of SIMD arrays and the architecture.

AFAICT 7000 series is more efficient per mm^2 die area, per watt spent, and per byte/s bandwidth.

I will give you my answer about why something akin to 6670 could be considered and it simple: it's because it's dirty cheap and mass production is not a problem.

Price != cost. 6670s are cheap because performance is pityful compared to alternatives. You should look at die size, cost directly correlates with that

Cheers
 
Xenos, with its unified shader architecture, launched one and a half year earlier than Radeon 2900.

And why would MS or SONY give Nvidia or AMD bags of money for old tech?
Because they can't produce competent hardware themselves?
AMD offers the best bang buck for bucks they are in the strong position to negotiate as in this regard even older tech beats Nvidia offering.
AMD is in a better shape now they have no intensive to let their best tech in the wild.

Honestly I can't really think of a scenario where MS or Nintendo came to AMD asking get us whatever we want. They know what Nv offers. I believe that either MS and Nintendo would be happy to get a good price for what ever they can get.

the other way around GCN just launched it was unproven when MS or N made their decision? I would say a while ago.
Best case is that AMD agrees but asks a significant premium for GCN or the 4VLIW architecture.

AFAICT 7000 series is more efficient per mm^2 die area, per watt spent, and per byte/s bandwidth.
Disputable a 57xx @28nm would be significantly tinier and consume less power than their current incarnation @ 40nm.


Price != cost. 6670s are cheap because performance is pityful compared to alternatives. You should look at die size, cost directly correlates with that
Cheers
Not sure where you're going here. The real HD 6670 is cheap because many things indeed that include I guess cooling solution, mobo layout, RAM speed, etc. as well as position. I could see the price slides as we should see ~100$ 7750 in a few months.
Then Nvidia and AMD don't compete directly in this price range. GTS 450 perform better and is more expansive but almost twice as big.

Speaking of the "ROPless" chip I describe it would instead be cheap because it's tiny. You're likely to get a lot of functional chip out of a wafer. As a side effect you remove a possible constrain on production.

Think about it, we heard that the chip/soc could be in production at both IBM and GF => it sounds like they want to produce a lot early in the product life. Now I expect that chip to use 32nm SOI HKMG process (EDRAM could be an option still productin cost gets higher). Now they produce a super tiny GPU at TMSC and they would be able to have enough parts to match the production of IBM and GF.
 
Last edited by a moderator:
How does AMD benefit from selling older tech over newer tech to MS? I expect they'll negotiate a price and customize a part (perhaps just a modifcation of tahiti, pitcairn, some future product or whatever, but 6670 (turks) makes absolutely no sense) to best suit Microsofts' needs (price power and performance). AMD in this deal is selling IP, design and technical expertise, not chips or wafers.
 
IF anything its in AMD's best intrest to get MS to take their newest tech. So if AMD plans to use GCN for 2-3 years and the Xbox next comes in its first year of life. It would be smart for AMD to use that. Because then every Xbox next game would be optimised for GCN in a way that it wouldn't be for any other piece of tech out there in the pc world.

AMD would have to pay out the backside for a dev program as good as that.

Its the same with Nintendo. AMD is better off selling them a low end verison of GCN over anything else because then all wii games will also be tuned for it .
 
How does AMD benefit from selling older tech over newer tech to MS? I expect they'll negotiate a price and customize a part (perhaps just a modifcation of tahiti, pitcairn, some future product or whatever, but 6670 (turks) makes absolutely no sense) to best suit Microsofts' needs (price power and performance). AMD in this deal is selling IP, design and technical expertise, not chips or wafers.
They keep their best IP for themselves that's it. Trinity still doesn't include GCN. I see no intensive for them to have another company launching their IP before them stealing their thunder.
Even if they were to green light that why would they sell this tech at the same price as their older IP when the competition(Nv) can't match it? AMD is in a position of strength, I don't expect either Nintendo or MS to get whatever they want how of any sale persons in that situation. Best case scenario for MS and Nintendo is that AMD asks a significant premium for its newer tech, possibly too much of a premium and you end with Nintendo using RV7xx and MS northern island.

Turks so northern island are pretty good products I don't see how it makes no sense.
Look further to hd57xx vs HD77xx. 50% increase in transistors number, 50% increase in performances? In a few specific case maybe, if you take in account the difference in clock speed between the 5770 and 7770 the picture is not that pretty imho.
Northern island have more "fat" than 5770 but it comes with benefits.
 
Last edited by a moderator:
AMD isn't in a position of strength vs MS. Period. They have compelling products for a console, and experience that doesn't mean their isn't competition for the hundreds of millions of dollars such a contract is worth.

And keeping their best IP for themselves, why? To do what with it? Isn't the idea to make money with it? I'm sure the negotiated deal includes all sorts of competition clauses whereby MS won't be selling GPUs that compete with their products. Not that it will matter when the next box launches because any of the parts you're talking about from AMD will be EOL.
 
AMD isn't in a position of strength vs MS. Period. They have compelling products for a console, and experience that doesn't mean their isn't competition for the hundreds of millions of dollars such a contract is worth.

And keeping their best IP for themselves, why? To do what with it? Isn't the idea to make money with it? I'm sure the negotiated deal includes all sorts of competition clauses whereby MS won't be selling GPUs that compete with their products. Not that it will matter when the next box launches because any of the parts you're talking about from AMD will be EOL.
Sorry I never post car comparison but there is clearly an issue about how the price of something is determined.

You have 2 engines manufacturers, one provide the same performances as the other but at 33% less.
Both manufacturers have multiple generation of products but the difference is consistent among those generations.

What you guys are saying is that a company that wants to produce a car and go to the manufacturer whose engine IP provides a 33% reduction in cost production will be offered de facto by the aforementioned engine manufacturers its last IP without any premium or anything because they have no intensive to do so?

:LOL: I don't believe it works that way, actually I'm pretty sure that no economic model expect things to works that way.

How it's works is that the engine provider is pretty well aware of the other engine provider company port folio of product. They have sales persons for that. In regard to competition offering and their own port folio they will set prices ranges for their existing IP and try to get the most out of it during the negotiation. They are more than aware that they have a competitive advantage over engine company B and they better get the most out of it because either way they are fired.

So how the negotiations turns out, the sales persons leverage their port folio to have you pay more than you want to pay. The car company will also try to get the best deal but they know that they are in the weak position as they want one of their engines.

In concrete term the sale persons will ask quiet a premium on their last gen model. You can pay if you want. Either way if you want a better deal you will end up negotiating a sweeter deal on on of their previous generation engine.

Sales persons don't give a shit if it makes sense to produce an older IP or not.

So I don't see how MS is is position of strength. "Period" is a trolling argument.
"evergreen" "southern island" north island" "GCN" IP have no price by themselves. Negotiation in the context more global of what the market offers (see engine company B) will define a price for the IP the car manufacturer will ultimately choose. AMD has no reason to let go their best IP without a premium.

Sorry again but that's how thing works in the real world.
AMD is not ATI crumbling under debt still they need to get the most out of a licensing and they are leader. What are the to leverage during the negotiation but their IPs port folio?

That's not saying that GCN can't be chosen. The car company can agree to pay more a bit than they wanted less than what the sales persons were asking and say I can go this far for your last gen engine and sales persons comply with a smiles dreaming of those extra pennies per engines sold and how their bonus will sky rocket. It could be the same at the car company which may have played it well and ultimately give less than they were willing to give.

Still the car company may just pay what they wanted to pay and it's not enough for the last generation engine, because sometime life sucks. The same process applies as above sales persons may do well or the buyer get a sweeter deal but it's not written in advance.

Either way around the sales persons comes with a real good deal based on last gen engine within the budget of what they think their costumer is willing to pay. Then they leverage their last generation and its shiny cylinders to have the costumers to pay more.

etc. etc.

The car company will comes with a list of requirement usually more than exactly the engine they want but they could come with what they think is the best deal. Ultimately they negotiate.


Short story made long a world where a company is not trying to make the most of its last gen IP is not our world.

--------------------
And now say AMD are angels doing charity business there is also the option that GCN doesn't offer the most bang per transistor for the intended use of the product but that a completely different issue.
Or the MS or NIntendo are iffy about what was unproved IP at the time of the decision.
-------------------
 
Last edited by a moderator:
Why would AMD be competeing with MS ?

Is the 7970 competeing with the xbox 360 ?

They are two diffrent markets and the chip in the consoles is static while AMD's is constantly changing .

For amd getting $5 a console or $10 a console for IP they already designed is a huge boon. Add in the fact that it could be a design they will use for 3 more years and will now be coded for by the huge majority of devs and it could make them even more money for being in the cnosole due to better support and performance on pc released products.

For amd GCN , VLIW 4 and 5 Ip was all designed , so its no diffrent to them price wise , MS may give them more for GCN than VLIW4 however esp if its more efficent. Once AMD finishes the design for MS they are largely done and will just expect payments for the next decade or so.

xbox 360 was huge for AMD/ATI and i'm sure the xbox next will be even bigger
 
Last edited by a moderator:
by 2013 GCN gets in every AMD APU, even the low power ones that succeed to the bobcat based chips ; right down to a tablet variant.

this can be seen on roadmaps here http://www.techpowerup.com/159870/A...2013-Client-Roadmap-Big-Focus-is-on-APUs.html

so for some reason AMD only does VLIW5 / VLIW4 at 32nm or 40nm, but everything 28nm will be GCN. maybe it's coming later than we'd like, but what with processes coming late and trinity maybe already designed a while ago.

this may be an indication that any 28nm console AMD GPU is a GCN. (if it's an APU : Glo Fo or TSMC, by the way?)
 
by 2013 GCN gets in every AMD APU, even the low power ones that succeed to the bobcat based chips ; right down to a tablet variant.

this can be seen on roadmaps here http://www.techpowerup.com/159870/A...2013-Client-Roadmap-Big-Focus-is-on-APUs.html

so for some reason AMD only does VLIW5 / VLIW4 at 32nm or 40nm, but everything 28nm will be GCN. maybe it's coming later than we'd like, but what with processes coming late and trinity maybe already designed a while ago.

this may be an indication that any 28nm console AMD GPU is a GCN. (if it's an APU : Glo Fo or TSMC, by the way?)
How this is relevant to what I said, either if you consider the negotiating part or simply the fact that manufacturers may find better bang for trannies in older designs?
AMD will use the IP they develop that's pretty obvious.
There is no valid reason why whatever VLIW5 AMD architecture can't be implemented on 28nm process if somebody wants to do so.

Eastman are purposefully misreading what I wrote because it's becoming below3d here and I seriously consider actively reporting which I never do usually. What you said is IP have no intrinsic value (like pretty much everything, that's not how price actually is determined by the market / in a negotiation) and so one should not to make the most of its IP. It's not how it's working end of line for anybody selling IP or anything else for that matter.

Are you purposefully ignoring that I said the GCN could still be used but clearly there will be negotiation which have only end purpose: price either you buy the IP or licensing it.

Neither you consider what I just wrote again to Blackowitz, GCN is more efficient,etc it comes at a cost 50% more trannies if you compare 5770 ot 7770.

Why would Nintendo use supposedly a RV700 design in your world? I'll tell you it's what N consider the best bang for bucks in both licensing costs / buying the IP and bang per transistors which also translate in costs productions ones.

Either way I'm a bit bored of writing long posts to have that kind of answers / attitude. Next time I'll pas and use other means.
 
Last edited by a moderator:
How this is relevant to what I said, either if you consider the negotiating part or simply the fact that manufacturers may find better bang for trannies in older designs?
AMD will use the IP they develop that's pretty obvious.
There is no valid reason why whatever VLIW5 AMD architecture can be implemented on 28nm process if somebody wants to do so.

Eastman are purposefully misreading what I wrote because it's becoming below3d here and I seriously consider actively reporting which I never do usually. What you said is IP have no intrinsic value (like pretty much everything, that's not how price actually is determined by the market / in a negotiation) and so one should not to make the most of its IP. It's not how it's working end of line for anybody selling IP or anything else for that matter.

Are you purposefully ignoring that I said the GCN could still be used but clearly there will be negotiation which have only end purpose: price either you buy the IP or licensing it.

Neither you consider what I just wrote again to Blackowitz, GCN is more efficient,etc it comes at a cost 50% more trannies if you compare 5770 ot 7770.

Why would Nintendo used supposedly a RV700 design in your world? I'll tell it's what N consider the best bang for bucks in both licensing costs / buying the IP and bang per transistors which also translate in costs productions ones.

Either way I'm a bit ore or writing long posts to have that kind of answers / attitude. Next time I'll pas and use other means.
AMD is not in a position of strength... As I recall, their last quarter, they lost hundreds of millions of dollars.

Oh yes, they lost $177 million on revenue of 1.69 billion. They have every incentive to offer the best deal and get the most they can out of any technology license, and the way to do that is to offer the best tech they have, or exactly what the partner is looking for. Trust me, as NVidia found out, it's much better to be known as a good partner, because it's not just this deal you're negotiating, it's all the future business it'll generate too.
 
AMD is not in a position of strength... As I recall, their last quarter, they lost hundreds of millions of dollars.

Oh yes, they lost $177 million on revenue of 1.69 billion. They have every incentive to offer the best deal and get the most they can out of any technology license, and the way to do that is to offer the best tech they have, or exactly what the partner is looking for. Trust me, as NVidia found out, it's much better to be known as a good partner, because it's not just this deal you're negotiating, it's all the future business it'll generate too.
Hum disagree, all their IPs since evergreen offer way more performances than Nvidia offerings.
As you say there is nothing else on the market, they pretty much have free way, if they can try to have a reasonable deal for themselves in this situation they are as good as dead. Microsoft Nintendo have to use their technology/ IPs.

So it comes down to negotiations which is different than ripping your partner/costumer. What negotiation are about? Price and what you get for the price.
Especially in their situation AMD need as much money as they can for selling one IP /licensing it.
The last things AMD but any company in thins situation (basically a win in advance contract) is to get their pant down and get ripped for something anybody but them can provide to their costumer.

So there will be negotiations (I guess they are behind us) and for a good reason there is a good reason to negotiate. AMD is doing business so is MS, MS have no reason to over pay AMD has no reasons to under sell what it has (especially as there are no good alternative to MS), it's pretty much what happens all the time in business. There is no reason to consider any agreement a bad one because none (the vendors and buyers) get the best at best prices, etc. Usually there are compromise here and there on both as long as the requirements are filled.

Does MS quit IBM after OoO did not make it into Xenon? They were bad business partners? Are they to pass on them this time around? (trying to pull info out of your nose shamelessly :LOL: ).

Anyway the main point is like for many thing getting GCN is not automatic or free or anything it's part of a greater process. I guess MS came to see AMD with a list of requirements that include sGPU features set, GPU size, projected performance, etc. And then software support, engineering teams, etc.. I guess AMD came with different offerings which may or may not have include GCN. then there are price negotiations. Architecture is a only a part of the deal but there are no reason to discuss price based on it.

Or you can tell us that MS gave AMD a fixed budget and said that's it and there isn't much point to discuss based on price (another shameless attempt to get information out of you). This case is different on a more practical POV, looking at say 5770 and 7770, taking aside improvement of the video engine thingy, support for diretx 11.1 (irrelevant for a console), support for pci express3. I don't feel that GCN offers that much bang per transistor. There is a 50% increase in transistors and putting aside the increase in clock speed the whole thing is not amazing either. So best offers from AMD may not include GCN once again and AMD will have to do with fixed costs and human resources to do its best.

Then there are timelines, proved unproved products, etc. If something is now in production for dev kits (whatever the specs are) at least I expect that decision has been made a while ago. GCN would still have been a shady project. AS MS will want something custom dealing with a well known architecture may have been a strong argument, no "OoO can't in fact make it into the chip" like what supposedly happened with Xenon. Even if this is an urban legend I can still this relevance in the argument.

Either I can't see in anyway how GCN should be a given either versus older architecture that proved brilliant or something really custom (changing too much to proved and well performing design sounds risky depending at how much resources you throw at the design).
 
for the nintendo GPU sure, they use either form of VLIW and have shorter time to market (heck are they on 40nm?)

trannies get cheaper with 28nm, you may be willing to use more of the trannies to extract more performance and features from your power and bandwith budget. I assume 128bit memory bus as a maximum.
actually on the mobile space this is going that way, in a seemingly absurd manner. you get 4+1 cores in the latest Tegra, with many specialised blocks. at any particular time most of the silicon is useless and power gated off. wasting transistors is the business model there.

looking at raw numbers too, so there may be +50% transistors but the die size is 35% smaller between 5770 and 7770. so small that maybe a 20nm shrink may end up pad limited, I don't know.
last but not least there were that marketing slide from AMD, showing ugly vliw assembly code on the left, and nice clean GCN assembly on the right (cheating with ugly COBOL like uppercasing for the older code, but well). GCN wins for the decade-long software side of thing.
 
I don't feel that GCN offers that much bang per transistor. There is a 50% increase in transistors and putting aside the increase in clock speed the whole thing is not amazing either. So best offers from AMD may not include GCN once again and AMD will have to do with fixed costs and human resources to do its best.

This is only half true. The 7770 matches the 6950 in tesselation benchmarks AFAIK, which has 2.6 billion transistors (1.1 billion more than CV).

I don't expect a full GCN neither in the WiiU nor in the next Xbox, but I would be disappointed if it wouldn't use some of it's tech.
 
I don't expect a full GCN neither in the WiiU nor in the next Xbox

Why not? Probably not Wii U I suppose since I'm still going with R700 rumors there, but next Xbox reasonably should not be out until 2013 at least.

The question whether VLIW or GCN is "better" for a console is one I'm not sure about, though. I wonder what the spec power/usage die size on a 28nm 6970 would look like? And how it would compare in a console to a GCN part with the same size?
 
Add in the fact that it could be a design they will use for 3 more years and will now be coded for by the huge majority of devs and it could make them even more money for being in the cnosole due to better support and performance on pc released products.

Yep, I see that as a huge advantage for AMD. Having the new architecture in the next console would give them quite a nice advantage as the effort of most developers trying to get the most out of that design would also spill over to the PC realm...
 
Status
Not open for further replies.
Back
Top