Predict: The Next Generation Console Tech

3dilettante · Apr 12, 2012

Heinrich4 said:
I'm still thinking inside of my ignorance on the subject that may cpus "old" and Athlon II X4 customized (with some thweaks for consoles) and die shrink to 32nm,could be more interesting for their reliability,especially if they put something custom and with HD 5850 shrink at 28 nm (which dissipates "only" 151 watts under 40nm process) could offer more raw power (2.1 + Cpu TFlops) even if have any problems even with two memory controller (DDR3 and GDDR5) for 4 GBs (8 modules) and still remain below 200 watts.

The closest thing approaching an Athlon X4 on 32nm is Llano.
In fact the Athlon II X4 638 and 641 are 32nm Llano chips, albeit most likely salvage product.
Despite it being a power-optimized design, we see Llano pulling about 100W when approaching 3 GHz, and most blame probably falls on the CPU part.

A straight shrink of an X4 to 32nm would most likely have been non-functional, since shrinks aren't that simple anymore.
An X4 tweaked and shuffled about to make it remotely acceptable at 32nm with AMD's newest design methodologies is what Llano is.
The TDPs for the chip were way more sensitive to CPU clocks and power, and Llano's yields were initially terrible and are rumored to be not all that great even with AMD and GF stating the process problems have been significantly improved.

If Sony wanted a 32nm X4 that's not Llano, it would pay more money for AMD to dust off the pipeline it abandoned for good reasons. Llano isn't acceptable for a console now, and it is far better equipped to working at 32nm than an X4.

liolio · Apr 12, 2012

Heinrich it's not even about raw power there is a significant difference in marketing FLOPS and the really world.
Putting some architectural difference away we have 3 type of FLOPS in the GPU world.
count FLOPS with:
vliw5
vliw4
scalar (Nv and GCN).

As I was saying to KB-somker about site that swallow marketing cool aid wrt to diminishing FLOPS peak figures in late AMD GPU and speak of efficiency because I'm iffy if they understand the difference in design.
The difference is not really efficiency it's a sound architectural difference. You can literally remove 20% (possibly a bit more on average) of the peak of a vliw 5 design vs scalar so from 2.1 for example you go down to 1.68TFLOPS for example.

It's not efficiency it's design.
In previous AMD VLIW 5 GPUs, the base of the design is not the Stream processor as touted by marketing materials... but a group of 5 ALUs.

All those ALUs are not equal, you have 4 std ALUs and the special one (in charge of trigonometry, etc. the RYS unit as people call it here

).

A SIMD is not 80 SP acting in vectorized fashion, not at all.
It's indeed 16 5 wide units acting in a vectorized fashion, hence that's why hardware.fr /behardware.com calls them 16 Vec5 (16 Vec4 for cayman).

In fact those VLIW blocks are organized in bigger blocks, quads , of VLIW5 units wrt regard to the register files (massive amount of register files).

So now those 5 wide block are MIMD units "adressed" 'operated' (can't find the proper saying) in a VLIW fashion. So it's up to the compiler to extract parallelism and make sure that those units are busy.
So all the 16 blocks in a SIMD receive them VLIW instructions but work on different data.

At this point you will notice that most of the work done by GPU is on four elements basically the Fifth alus the Transcendental one is there not be used all the time but to make sure specific operations execute fast. It was a cheap way to achieve that (the fifth ALU) as ALU are cheap.
On average utilization of a vliw5 units is 3.8 instructions per cycle I believe (should check but that a bit below 4).
There are reason in graphic workload forthat matter of fact but that are also architectural one, you five ALU and register can be accessed by all of them at the same time. This is complicated but there are plenty of posts in the forum that explains why it set a limitation to the design.

That the reason why AMD moved first to a VLIW4 design. It removes headache for the register ports design (that's if I understand properly...

) as well as the pressure it creates on the compiler.
There was a trade off trancendental operation are slower. Thing is on graphic workload the IPC is mostly the same as in a VLIW5 design (and still below 4)

that's where marketing FLOPS kicks in, the FLOPS were always given including the T unit and based on the SP numbers as if they were equal, BS so called hardware sites don't give a shit they care for clicks not accuracy. It was an irrelevant figure. Now AMD moved from it and using marketing parlance for people that are not interested in tech ( I can understand some people are just gamers and is not a sin). In marketing parlance it's "more efficient".
Whereas this diminishing number of SP and FLOPS as no impact or really marginal on the design but when you fed people SP and FLOPS as metric for performance you have to come with something.

For me it's not efficiency the compiler can't extract more ILP (instruction level parallelism) with the new design, it's just easier to avoid conflict in register access, neither can the hardware VLIW is by design as dumb as can be.

Then why AMD moved to Scalar / pure SIMD design as Nvidia? Because in some situation (not happens much in graphics) the IPC that can be extracted by the compiler is way below 3.8.
By design it can go as low 1 or 2. In effect using marketing parlance you don't have 80 SP in your SIMD but respectively 16 and 32... massive hit in efficiency.

That's whay AMD moved to a scalar?plain SIMD design. Extracting ILP in graphic were pretty easy but as compute get relevant it turns into a double edge sword. there are case where there is simply niot that much ILP to be extracted when it happens the architecture (even refined VLIW4) fails.
AMD gave up with GCN on leveraging Instruction Level parallelism, that;s it.

A nice effect is that comparing a CU so 64 ALUs and a cayman SIMD 64 ALUS too, the former always achieve 64 operation per cycle (it's an incorrect way to put it but that's pretty much the figure looking from the distance) whereas on average Cayman SIMD will do 3.8*16 in graphic and can end up well below in other cases. Even a VLIW 5 design would not beat GCN it would push 3.8*16 pretty much as cayman.
EDIT ALUs utilization is a more correct way to put it as instructions are likely to take more than one cycle to execute. So you have 100% ALUs utilization on one side (plain SIMD) amd3.8/5*100 or 3.8/4*100 on the other/EDIT

I've no clear understanding of those low level stuffs but that's the best description I can give you about "not taking marketing and FLOPS figures... at their face value". Hope it helps.

Others members can correct approximations /or things I would get wrong if they want or provide even more information.

Heinrich4 · Apr 12, 2012

3dilettante said:
The closest thing approaching an Athlon X4 on 32nm is Llano.
In fact the Athlon II X4 638 and 641 are 32nm Llano chips, albeit most likely salvage product.
Despite it being a power-optimized design, we see Llano pulling about 100W when approaching 3 GHz, and most blame probably falls on the CPU part.

A straight shrink of an X4 to 32nm would most likely have been non-functional, since shrinks aren't that simple anymore.
An X4 tweaked and shuffled about to make it remotely acceptable at 32nm with AMD's newest design methodologies is what Llano is.
The TDPs for the chip were way more sensitive to CPU clocks and power, and Llano's yields were initially terrible and are rumored to be not all that great even with AMD and GF stating the process problems have been significantly improved.

If Sony wanted a 32nm X4 that's not Llano, it would pay more money for AMD to dust off the pipeline it abandoned for good reasons. Llano isn't acceptable for a console now, and it is far better equipped to working at 32nm than an X4.

Excellent information you send us always.

But if there is somehow the Athlon II X4 32nm in approaching the shape on the Llano cpu cores then perhaps best to be part of this hypothesis cpu cpu + gpu apart.

And indeed your statements as always accurate Athlon II X4 may not be the best choice, but as I said before ("can be another cpu"), perhaps another cpu (Bulldozeer, Bobcats, "Pill something"etc.

) could be interesting in this paradigm "non APU like".

liolio · Apr 12, 2012

Heinrich4 said:
Excellent information you send us always.

But if there is somehow the Athlon II X4 32nm in approaching the shape on the Llano cpu cores then perhaps best to be part of this hypothesis cpu cpu + gpu apart.

And indeed your statements as always accurate Athlon II X4 may not be the best choice, but as I said before ("can be another cpu"), perhaps another cpu (Bulldozeer, Bobcats, "Pill something"etc.) could be interesting in this paradigm "non APU like".

I think what he means is that AMD moved to bulldozer. It's not like AMD have that much resources.
AMD focus on two architectures Bulldozer in the high power CPUs and the Brazos/whatever they are named low power CPUs.

And that implementing a chip on silicon is getting tougher and tougher as well as more and more expansive. Sony would ahve to found them a hell lot of money to push an architecture they may have left behind for go reasons, no matter how BD v1 performs.

Either CPu or APU (if not llano) his pov is that bulldozer on cpu is to be expected. So for me he already answered your question

Heinrich4 · Apr 12, 2012

liolio said:
Heinrich it's not even about raw power there is a significant difference in marketing FLOPS and the really world.
Putting some architectural difference away we have 3 type of FLOPS in the GPU world.
count FLOPS with:
vliw5
vliw4
scalar (Nv and GCN).

As I was saying to KB-somker about site that swallow marketing cool aid wrt to diminishing FLOPS peak figures in late AMD GPU and speak of efficiency because I'm iffy if they understand the difference in design.
The difference is not really efficiency it's a sound architectural difference. You can literally remove 20% (possibly a bit more on average) of the peak of a vliw 5 design vs scalar so from 2.1 for example you go down to 1.68TFLOPS for example.

It's not efficiency it's design.
In previous AMD VLIW 5 GPUs, the base of the design is not the Stream processor as touted by marketing materials... but a group of 5 ALUs.

All those ALUs are not equal, you have 4 std ALUs and the special one (in charge of trigonometry, etc. the RYS unit as people call it here ).

A SIMD is not 80 SP acting in vectorized fashion, not at all.
It's indeed 16 5 wide units acting in a vectorized fashion, hence that's why hardware.fr /behardware.com calls them 16 Vec5 (16 Vec4 for cayman).

In fact those VLIW blocks are organized in bigger blocks, quads , of VLIW5 units wrt regard to the register files (massive amount of register files).

So now those 5 wide block are MIMD units acting in a VLIW fashion. So it's up to the compiler to extract parallelism and make sure that those units are busy.
So all the 16 blocks in a SIMD receive them VLIW instructions but work on different data.

At this point you will notice that most of the work done by GPU is on four elements basically the Fifth alus the Transcendental one is there not be used all the time but to make sure specific operations execute fast. It was a cheap way to achieve that (the fifth ALU) as ALU are cheap.
On average utilization of a vliw5 units is 3.8 instructions per cycle I believe (should check but that a bit below 4).
There are reason in graphic workload forthat matter of fact but that are also architectural one, you five ALU and register can be accessed by all of them at the same time. This is complicated but there are plenty of posts in the forum that explains why it set a limitation to the design.

That the reason why AMD moved first to a VLIW4 design. It removes headache for the register ports design (that's if I understand properly... ) as well as the pressure it creates on the compiler.
There was a trade off trancendental operation are slower. Thing is on graphic workload the IPC is mostly the same as in a VLIW5 design (and still below 4)

that's where marketing FLOPS kicks in, the FLOPS were always given including the T unit and based on the SP numbers as if they were equal, BS so called hardware sites don't give a shit they care for clicks not accuracy. It was an irrelevant figure. Now AMD moved from it and using marketing parlance for people that are not interested in tech ( I can understand some people are just gamers and is not a sin). In marketing parlance it's "more efficient".
Whereas this diminishing number of SP and FLOPS as no impact or really marginal on the design but when you fed people SP and FLOPS as metric for performance you have to come with something.

For me it's not efficiency the compiler can't extract more ILP (instruction level parallelism) with the new design, it's just easier to avoid conflict in register access, neither can the hardware VLIW is by design as dumb as can be.

Then why AMD moved to Scalar / pure SIMD design as Nvidia? Because in some situation (not happens much in graphics) the IPC that can be extracted by the compiler is way below 3.8.
By design it can go as low 1 or 2. In effect using marketing parlance you don't have 80 SP in your SIMD but respectively 16 and 32... massive hit in efficiency.

That's whay AMD moved to a scalar?plain SIMD design. Extracting ILP in graphic were pretty easy but as compute get relevant it turns into a double edge sword. there are case where there is simply niot that much ILP to be extracted when it happens the architecture (even refined VLIW4) fails.
AMD gave up with GCN on leveraging Instruction Level parallelism, that;s it.

A nice effect is that comparing a CU so 64 ALUs and a cayman SIMD 64 ALUS too, the former always achieve 64 operation per cycle (it's an incorrect way to put it but that's pretty much the figure looking from the distance) whereas on average Cayman SIMD will do 3.8*16 in graphic and can end up well below in other cases. Even a VLIW 5 design would not beat GCN it would push 3.8*16 pretty much as cayman.
EDIT ALUs utilization is a more correct way to put it as instructions are likely to take more than one cycle to execute. So you have 100% ALUs utilization on one side (plain SIMD) amd3.8/5*100 or 3.8/4*100 on the other/EDIT

I've no clear understanding of those low level stuffs but that's the best description I can give you about "not taking marketing and FLOPS figures... at their face value". Hope it helps.

Others members can correct approximations /or things I would get wrong if they want or provide even more information.

I'm always enjoy your posts immensely Liolio.

I agree entirely with the design have a decisive influence on the efficiency (pipes full or processing much more,less latencies, less cycles etc) and these GPUs and APUs, but despite this plethora of "flops" VLIW5 (5850 if not my mystake), VLIW4 (6950) and scalar (NV gpus ,AMD HD 7970..5850/6970 only double precision) i'm still be interesting with his "5850 2.1 TFLOPS marketing" VLIW5/Scalar(is just a simble... eficiencie reach something like 74% on these archtecture according Stanford) against one of Apu about "1.25Tflops marketing".

Talking here in a hipothesis Sony/MS doesn't much care (but not going bleeding ressources like ps360 before...) about development cost close box console... We may don't know (talking about me) for sure the full real pros and cons of an APU with about 2.5 times more power than a A8-3850 (A8-3850 + HD 6670 it seens sony and Ms setting for your APU console),but perhaps if we look at tests,benchmarks,discounting eficiencies of the universe closed box i'm still keep thinking APU probably less Interestingly than a set with Cpu (Bobcat,Enhanced Bobcat, Bulldozzer, whatheaver pile etc) and a custom GPU GCN (less clock etc) apart at same TDP...but its just my only 2 cents..

Heinrich4 · Apr 12, 2012

liolio said:
I think what he means is that AMD moved to bulldozer. It's not like AMD have that much resources.
AMD focus on two architectures Bulldozer in the high power CPUs and the Brazos/whatever they are named low power CPUs.

And that implementing a chip on silicon is getting tougher and tougher as well as more and more expansive. Sony would ahve to found them a hell lot of money to push an architecture they may have left behind for go reasons, no matter how BD v1 performs.

Either CPu or APU (if not llano) his pov is that bulldozer on cpu is to be expected. So for me he already answered your question

Thanx a lot

Yes your statement is perfect,but I just wanted to reiterate that no matter which model AMD cpu are using i'm still have in mind that cpu and dedicated gpu would be better than an APU even in closed box console...I have to remember in a closed box consoles always required customization, so unless Sony and MS follow the model "on the shelf" to reduce the costs ... and if that happened it would be unfortunate and a setback (even RSX was modified with more cache,acess to XDR,The Pentium III and NV2A on xbox etc.) thats my two cents...

Of course we are in the field of speculation, but I think it would be interesting to imagine, if indeed a streamline APU 1Tflop/1.25 TFLOP level (with their "marketing flops" VLIW5/4/scalar "less cycles" etc.) in a close box console 2013/2014 with 2GB GDDR5 would be so efficient that offer superior results in games (under the same resolution, shader effects,FSAA,frame rate etc) compare a medium 2011 pc with a Radeon HD 6970/Geforce GTX580 and 4GB DDR3 Cpu AMD/ Intel Quad core i7 3.5+GHz...

(some developers wants 10+ times more power/performance/memory than ps360)

IllusionistK · Apr 13, 2012

Would the 360's Xenos and daughter die silicon give a real combined footprint of 262mm^2 for the GPU, or should you only consider Xenos? Even if we consider Xenos, something like the Pitcairn XT (1280SPs) could still be shoved in as its only 212mm^2 to Xenos' 182mm^2.

On the CPU side I can see a modern quad-core such a Steamroller or Piledriver which would be <152mm^2 compared with Xenon's 172mm^2 footprint.

I wonder if Microsoft will take advantage of TSMC & GF's 20nm process and prepare for a shortage launch for 2013. It may give them the edge to pack in more.

Quad-core Piledriver 3.2Ghz
Pitcairn XT 1280SPs 900Mhz
2GB 5Ghz GDDR5

With the GPU alone, that gives us the 2.5TF needed to run Samaritan-quality and is relatively small and power efficient.

Perhaps if power permits, we could have a Piledriver SOC w/7660D(384CUs) + Pitcairn XT.

Of course this is all conjecture based on the 200w TDP... I'm hoping more for a 250-300w TDP.

tunafish · Apr 13, 2012

IllusionistK said:
I wonder if Microsoft will take advantage of TSMC & GF's 20nm process and prepare for a shortage launch for 2013. It may give them the edge to pack in more.

GF does *not* have the best track record on sticking to dates on it's schedule. If they wait for 20nm, they might well have to wait quite a bit longer than they anticipate before they have shippable quantities.

Acert93 · Apr 13, 2012

Even if they could get 20nm it sounds like initially it will be an expensive node. TSMC is spending a lot to expand fab space and as nodes mature they do become cheaper but out of the gate they are getting more and more expensive as the cost to bring them online is more. Going with a costly node with low product, and possibly delays, could be a disaster. Honestly it looks like 2012 would be the window to get on 28nm to "pack" stuff in and 2014 on 20nm. 2013 is essentially riding the tail end when the node will be more affordable and allow a quick shrink in 18-24 months to help reduce costs.

Mianca · Apr 13, 2012

Didn't IBM show off 20nm wafers as early as Januar 2011?

Seems like they're about half a year ahead of GF.

AlNom · Apr 13, 2012

Doesn't mean they're anywhere near volume production capable.

Mianca · Apr 13, 2012

They don't need to go for volume production until a year from now.

They just need to make a few hundred working chips to power the final dev kits. Still plenty of time to ramp up volume production after that.

pjbliverpool · Apr 13, 2012

Heinrich4 said:
Of course we are in the field of speculation, but I think it would be interesting to imagine, if indeed a streamline APU 1Tflop/1.25 TFLOP level (with their "marketing flops" VLIW5/4/scalar "less cycles" etc.) in a close box console 2013/2014 with 2GB GDDR5 would be so efficient that offer superior results in games (under the same resolution, shader effects,FSAA,frame rate etc) compare a medium 2011 pc with a Radeon HD 6970/Geforce GTX580 and 4GB DDR3 Cpu AMD/ Intel Quad core i7 3.5+GHz...

You're vastly overestimating the relative performance you can extract from a console.

A 6970 comes in at 2.7 GFLOPS, that's 2.16 - 2.7x the performance of your hypothetical APU. Assuming your using GLFOPS as a yardstick for the real world performance of the APU then there's no way it's going to match a 6970 in a PC nevermind a GTX580.

Realistically compared with a DX11 PC I'd expect you to be achieving 1.6-1.8x the real world performance for a given spec but not until at least 3 years into the generation.

I expect the next generation of console to be packing a lot more than 1TFLOPS worth of GPU power though.

TheWretched · Apr 14, 2012

I recently saw a video about PS1 launch, which stated that from initial draft to production, the PS1 we got in the end had 6 month of development... I found this pretty interesting indeed.

https://www.youtube.com/watch?v=JJw...DvjVQa1PpcFOHnAx7QzmeSBgL27JjNg1PhNX6517ks5I=

Blazkowicz · Apr 14, 2012

pjbliverpool said:
I expect the next generation of console to be packing a lot more than 1TFLOPS worth of GPU power though.

one TFLOPS is already "fucking incredibly high throughput". let's say the GPU can reach 1 sustained teraflops, then I believe it would match your expectations.

we can't get a really monstrous console, it will have to do with good gddr5 on a 128bit bus, or that with edram.
in my book that's a great enough console, paired with a fat quad core CPU (bulldozer refresh, POWER7 derivative or something else, which would be like a Xenon CPU but out-of-order and with more cache)

storage is another debate, there's a whole thread for that but I believe a modern HDD with average 100MB/s reading and writing, with 15ms access time is enough.

Heinrich4 · Apr 14, 2012

pjbliverpool said:
You're vastly overestimating the relative performance you can extract from a console.

A 6970 comes in at 2.7 GFLOPS, that's 2.16 - 2.7x the performance of your hypothetical APU. Assuming your using GLFOPS as a yardstick for the real world performance of the APU then there's no way it's going to match a 6970 in a PC nevermind a GTX580.

Realistically compared with a DX11 PC I'd expect you to be achieving 1.6-1.8x the real world performance for a given spec but not until at least 3 years into the generation.

I expect the next generation of console to be packing a lot more than 1TFLOPS worth of GPU power though.

I fully agree...sorry if im'not clear..i was be ironic...'Cause i do'nt believe it is possible a APU 1/125TFlop be able to match dedicated gpu 2 + TFlops (According to Stanford 5870 and 6970 reach something like 74 to 80% efficiency at maximum).

And I hope that by mid-2013(mass procuction) console makers are able to bring us levels of performance (processing shaders at least) that developers want to (10 * ps360),but these reports/rumours with sdk levels 1.25Tflop left me disappointed ... the best hypothesis i can imagine on these SDKs there only something like a distant tools like early sdks ps360 between 2004 and 2005.

Acert93 · Apr 14, 2012

1TFLOPs is a lot on a GPU? I guess it is perspective. Blaz, I am going to disagree for these reasons.

From the perspective of a 4x increase in flops from 2005 to 2013 in a potential console -- 8 years? No, I think that this is the opposite of incredible. That is pathetic considering (a) the chips have moved from a 90nm process to a 28nm process (a 10x increase smallest feature density), (b) frequencies at a set TDP have increased, and (c) shaders have proportionately increased faster than other units, hence flops have ballooned. How is a 4x increase in FLOPs incredible when the process allows at the same TDP, safely, in the 10-15x range, if not more?

From the market perspective 1TFLOP GPUs is going to be a 5 year old affair come 2013. 4850, 4870, and 4890 (256mm^2 on 55nm) all broke 1TFLOPs in the Summer of 2008 an the 4770 (137mm^2 on 40nm) was in spitting distance at 40nm (960GFLOPs). How is it incredible to get performance in 2013 that the PC had 5 years earlier in 2008?

From a market placement perspective I don't see how it is incredible either. 7770 at $159 MSRP is well over 1GFLOPs and the $109 MSRP 7750 is at spitting distance. These are the lowest end South Island series you can get (tyhe 7670 is an OEM product) or the 6670 from North Island breaks 1TFLOPs. Basically these are your below-mid range GPUs... in 2011. Oh, wait, this is a re-badge of the 5750 -- from 2009 which was almost 1/3rd of the single GPU enthusiest FLOPs at the time. So moving forward 4 years to 2013 I don't see how what was midrange in 2009 and has dropped to low end in 2011 can in 2013 be anything but, "packed in bottom of the barrel" in 2013. How are GPUs that cost $109 MSRP in early 2012 (probably cost less than $70 for the GPU, Memory, PCB, fan, output, etc when you consider the retailer cut, assembler and distributor cut, and then AMD's cut) going to be incredible almost 2 years later in 2013?

From looking what AMD is packing into APUs -- which is a GPU that has to share space with HOT CPUs on-- it has a constrained footprint budget and TDP, we are hearing APUs in 2013 will be hitting 800GFLOPs. Again, how is 1TFLOPs in 2013 impressive when a GPU sharing space and power limitations with 4 ("8") to 6 ("10") AMD CPUs at 800FLOPs incredible?

The 7970 in early 2012 hit 3.7GFLOPs. How is 18+ months later 1 GFLOPs going to be incredible when single chip PC GPUs are going to be marching toward 5GFLOPs?

From the pure technological perspective of where we were and where we are, yeah, it is impressive. Especially when you see that desktop CPUs have taken a very conservative core count increase since 2005 (dual cores were available in 2005 and most new systems are dual or quad core) and their flops are still lagging behind the peak of 2005 consoles. That said, they are also packing in GPUs now to go along with discreet GPUs. The technology is cool but in big picture perspective what I find incredible is that we think what amounts to entry level hardware with budgets in mm^2 WELL BELOW the past generation is nothing but a big step BACKWARDS.

That is what I think is incredible

Blazkowicz · Apr 14, 2012

hey, look at the radeon 7750. same old tired radeon 4850 performance.
but it's 50 watts, efficient, computing friendly, though there's no software to run on it as any GPGPU computing is done on nvidia hardware for now.

I don't care for more performance, run it at 720p with AA and lots of objects on the screen (not run through opengl or something) if needed.
don't pair it with a slow CPU such as PowerPC A2, which is often suggested to save on power and die area.
good enough

if we want sustained 1 tflops, i.e. real 1 tflops in game and not paper flops, bump it up to radeon 7770 level maybe.

hell many say tablets will kill consoles so such a GPU isn't unreasonable.

Acert93 · Apr 14, 2012

The problem with the CPU is who are you going to get a high performance one from at a reasonable TDP and mm^2? Obviously not Intel who has great serial performance and some nice options for power constrained designs but a. too large b. too expensive and c. not available. AMD? Given their size and TDP (and cost?) it is hard to argue that there may not be better options when you, say, have 150mm^2 on 32nm and <50W TDP. A many-core design with SIMD may be, for long term performance, your best bet. Who knows what is really out there at this point though.

As for the GPU, good point on a Tablets. A 1 TFLOPs console in 2013 may be *matched* in TFLOPs by a Tablet in the same year

If you wanted to kill the consoles I guess that is what you do

liolio · Apr 14, 2012

I'm not even sure that the hd 7750 is on hd 4850 level.
Using techreport data it beats he gtx460 on all their games ( not much but reevant selection and they measure gameplay no bench) minus skyrim.
Not the average but 99th percetile as well as.worse.case scenario.
This type of measurement is even more relevant in the console.realm.where games a soft vlocked at 30 fps ( lots lots of.games). Without the spike in perfs the gtx460 would.lag even on raw a erage the hd 7750.
Great arch the techreport review opened my eyes on gcn architecture. Kepler is over rated and for the record let wit for the.next.gen of.gpu (real next.jump wich I don"t expect.sea island to be / at max one gpu could serve as an experiment for.signifcant tweaks).

Anyway if.there is no Soc I would like to see a 12 CUs running at hd7750 clock (+50% in shading power).

Predict: The Next Generation Console Tech

3dilettante

liolio

Aquoiboniste

Heinrich4

liolio

Aquoiboniste

Heinrich4

Heinrich4

IllusionistK

tunafish

Acert93

Artist formerly known as Acert93

Mianca

AlNom

Moderator

Mianca

pjbliverpool

B3D Scallywag

TheWretched

Blazkowicz

Heinrich4

Acert93

Artist formerly known as Acert93

Blazkowicz

Acert93

Artist formerly known as Acert93

liolio

Aquoiboniste

Similar threads