Predict: The Next Generation Console Tech

hoho · Jun 2, 2011

sebbbi said:
- Athlon X2, dual core, 2 hardware threads, 2 GHz
- Intel Pentium D (Pentium 4), dual core, 2 hardware threads (it didn't have hyper threading), 2 GHz
- Intel Pentium 4, single core, 2 hardware threads (hyper threading), 3.2 GHz

Actually AMD had 2.4GHz dualcores and Intel had 3.8GHz single/3.2GHz dualcores before 2005 was over.

Though Netburst was "somewhat" ugly/sub-optimal architecture so the actual clock speed didn't really matter much. Then again big part of it's inefficiency came from huge memory latency and small bandwidth. Compared to the Power cores in XB the latency on Netburst was still several times better.

sebbbi said:
PowerPC had much lower power consumption, and equal (or higher) performance compared to the other options (when running optimized code). I think they made a really good pick.

True but that lower power consumption only came when that CPU core was actually designed specifically for those consoles. The Power before that was rather awful with it's huge power consumption and lowish speed.

Though as you said they were pretty good when running specially optimized code. Being in-order cut out quite a bit of unneeded stuff and allowed to add more raw computational power. Being a fixed target also meant making specially optimized code wasn't that big of a problem while on PCs the CPUs are rarely getting anywhere near as optimized stuff to crunch through.

liolio · Jun 2, 2011

Sorry but that's a bit pointless, those chips were bigger more expensive more power hungry, etc. Neither Ms or Sony could have afford such CPUs.
ARM well I believe A15 is not enough of an improvement vs nowadays console CPUs as they are unlikely to run at he same speed.

Anyway if SOny put 16 SPUs 4 ARMS A15 CPUs + a GPU I'm far from sure it's worse the headache in regard to BC.

Rys · Jun 2, 2011

Rys said:
We're hiring DX11 driver engineers like crazy, probably for something

Ah, as if by magic:

"Cores in the POWERVR Series6 family, which is codenamed ‘Rogue’, support from DirectX 10 up to DirectX 11."

Squilliam · Jun 2, 2011

Oh Rys you're a magical person!

Hey Sebbbi would it make much difference for you to target ARM as an architecture?

Also

What of the rumours of Microsoft buying Nokia? Would that support the idea that I had imagined of an ARM + PowerVR Xbox next by virtue of the awesome synergies they could gain by fully leveraging an ARM ecosystem?

hoho · Jun 2, 2011

I wouldn't say MS buying Nokia's cellphone manufacturing has much effect on their consoles. It's just their try to get into that market without having to rely on simply being superior in some sense, just like they did with original XB.

sebbbi · Jun 2, 2011

Squilliam said:
Hey Sebbbi would it make much difference for you to target ARM as an architecture?

ARM and POWER are both RISC instruction sets / arhitectures. In-order ARM vs in-order POWER wouldn't be that different. Newest ARM instruction sets also have SIMD vector instructions (NEON) that include mostly the same basic vector instructions as competing instruction sets. It of course depends on the implementation how powerful the vector processing capacity of the CPU is. The instruction set alone doesn't set the performance. For a gaming CPU investing more die space to faster/more vector instruction units should be a good idea. Mobile phone / tablet software doesn't generally utilize the vector units that much compared to highly optimized game code.

hoho said:
Actually AMD had 2.4GHz dualcores and Intel had 3.8GHz single/3.2GHz dualcores before 2005 was over.

AMD dual core was more powerful than the P4 at that time, and it's heat/power figures were better.

The question really boils to this: Would I rather have a first generation Athlon X2 (two hardware threads) running at 2.4GHz in my Xbox?
No, I really like the six HW threads of the POWER CPU, the 3.2 GHz clock speed and the powerful VXM128 vector instruction set. Having only two hardware threads would be painful.

For the future CPU: The CPU would likely be more future proof, if we had more cores / hw threads running at lower frequency (or even in-order) compared to high IPC / high clock CPU with just a few cores. 8 cores with SMT = 16 hw threads, would be a really good starting point. Next generation game engines should have no problem in splitting tasks to that many threads. More cores and less IPC seems to provide more total performance, assuming that the software fully utilizes all the cores.

liolio · Jun 2, 2011

Sebbbi on which config would you put you bet, one big chip (so an APU) or two chips (a CPU+GPU or 2 APUs)?

(((interference))) · Jun 2, 2011

So basically we don't think we'll be able to get graphics like the Samaritan demo next get as there's no GPU on the horizon that can match the shading power of the 3 GTX580s?
Will the console optimisations make up for that, or will there still be somethings that the 8th gen consoles can't do (or in similar quality).

Dr. Nick · Jun 2, 2011

(((interference))) said:
So basically we don't think we'll be able to get graphics like the Samaritan demo next get as there's no GPU on the horizon that can match the shading power of the 3 GTX580s?

What about the power of 2 GTX580 in one chip? Anything like that coming down the pipe in the next 2 years? I remember Epic using 2 GeForce 6800 Ultra when they first showed off UE3 in 2005.

GuestLV · Jun 2, 2011

(((interference))) said:
So basically we don't think we'll be able to get graphics like the Samaritan demo next get as there's no GPU on the horizon that can match the shading power of the 3 GTX580s?
Will the console optimisations make up for that, or will there still be somethings that the 8th gen consoles can't do (or in similar quality).

In what resolution demo running?

patsu · Jun 2, 2011

How good is the latest ARM in performance/watt, register # and size, cache size, number of cores, clock speed and vector performance ?

liolio · Jun 2, 2011

patsu said:
How good is the latest ARM in performance/watt, register # and size, cache size, number of cores, clock speed and vector performance ?

Honestly I don't expect miracle, A15 is a three issue cpu with simple OoO implementation match-up with poor bandwidth. Overall depending on the situations it may be a match (per cycle) with Bobcat while consuming less power. I'm not sure neon unit are a match for the SIMD units in bobcat either.
It could do better with a better memory interface which consumes more power.
I don't believe ARM process are the magic bullet as they will aim at higher perfs the benefits vs X86 or other RISK arch as MIPS or POWERPC will become close to irrelevant.
ARM are cleverly hiding this trend by switching way earlier than competing architectures to multicore design. I mean that the others wait for single thread perfs improvement to get really costly to move to multi cores designs, they have the chance to benefit from the improvement made by software and developers on others platform. They already pushing four cores, I'm not sure they will go past this number before a while. Soon they will have to significantly raise single thread perfs, it's costly in many way, transistors, power, memory interface, etc.

liolio · Jun 2, 2011

Dr. Nick said:
What about the power of 2 GTX580 in one chip? Anything like that coming down the pipe in the next 2 years? I remember Epic using 2 GeForce 6800 Ultra when they first showed off UE3 in 2005.

Didn't Epic stated that with a lot of optimizations they could get the thing to run on a single card? (If we speak of console there is rooms for extra optimizations).
The sad thing is that the GTX580 is a monster on any accounts and it most likely still be quiet a chip @28 nm.

hoho · Jun 2, 2011

IIRC huge part of the performance in the UE demo was soaked up by their new AA algorithm that was not optimized at all.

Squilliam · Jun 2, 2011

sebbbi said:
ARM and POWER are both RISC instruction sets / arhitectures. In-order ARM vs in-order POWER wouldn't be that different. Newest ARM instruction sets also have SIMD vector instructions (NEON) that include mostly the same basic vector instructions as competing instruction sets. It of course depends on the implementation how powerful the vector processing capacity of the CPU is. The instruction set alone doesn't set the performance. For a gaming CPU investing more die space to faster/more vector instruction units should be a good idea. Mobile phone / tablet software doesn't generally utilize the vector units that much compared to highly optimized game code.

Could you depend on the GPU do to the vector math? Assuming that the processor was a fusion style architecture, would it be too difficult to rely upon the GPU for that aspect of programming or is it essential that the CPU retain VMX style units?

Heinrich4 · Jun 2, 2011

jonabbey said:
Sorry, why? What makes ARM more interesting than using a Power architecture PPU?

Some integration with NGP and cell phones universe,very low wattage even at 2.5(A9 on NGP 2.5GHz) to 3.5GHz,low cost*.

* A5 dual core US$14(cost 75% more tham A4),A15 quad core maybe have range US$60(28nm) to US$80(40nm).

http://www.isuppli.com/Teardowns/News/Pages/iPad-2-Carries-Bill-of-Materials-of-$326-60-IHS-iSuppli-Teardown-Analysis-Shows.aspx

patsu · Jun 2, 2011

Heinrich4 said:
Some integration with NGP and cell phones universe,very low wattage even at 2.5/3.5GHz,low cost.

Yeah... I wanted to know how low is low ? (some numbers compared to other CPUs, such as the ones in our consoles today).

General CPU code should be fairly portable.

Heinrich4 · Jun 2, 2011

patsu said:
Yeah... I wanted to know how low is low ? (some numbers compared to other CPUs, such as the ones in our consoles today).

General CPU code should be fairly portable.

Something like 15 watts for 2.5 GHz ("compete with intell") :

http://informationtipsonline.blogspot.com/2011/04/arm-cortex-a15-mobile-devices-to-ship.html

http://www.informationweek.com/articles/229401983?cid=RSSfeed_IWK_ALL

http://www.jp.arm.com/event/pdf/forum2010/am-4.pdf

Processing i find this one:

At 3.5GHz(customized for consoles) maybe A15 4 cores reach 45/50.000 MIPs or even more with some "console tweaks".
http://en.wikipedia.org/wiki/Instructions_per_second

(if this cpu will not oriented for renderer porposes it coulb be an very interesting option...)

Off topic ...Edit: searching about ARM wattage i find this one...not fully related about ARM:

http://www.itproportal.com/2011/01/21/nvidia-fuse-cortex-a15-geforce-together-fermi-successor/

sebbbi · Jun 2, 2011

Squilliam said:
Could you depend on the GPU do to the vector math? Assuming that the processor was a fusion style architecture, would it be too difficult to rely upon the GPU for that aspect of programming or is it essential that the CPU retain VMX style units?

GPU is only suitable for large scale computation tasks. You need large scale calculations that can be scaled to hundreds of GPU threads to get peak performance out of the system. You cannot for example instruct the GPU to multiply one matrix with other, and get the result back in less than 100 CPU cycles, like you would with CPU vector instructions. GPUs are deeply pipelined, and run asynchronously. Synchronous CPU-GPU calculation would cause the GPU to run idle most of the time.

CPUs do a lot of vector math in games. You need to do stuff like determine the geometry that is visible in the viewport (viewport culling), setup the object matrices for rendering (camera transform), apply animations and bone transformations for skinning, determine collisions between objects, simulate/apply physics to game objects, do ray casting (for various reasons). Many AI algorithms also need lots of vector processing (AI is interacting in a 3d world after all). Some of these tasks can be offloaded to GPU, but many tasks remain that either require too much logic (too much branching and data structure traversal), require immediate results (GPU calculation has basically one frame latency) or cannot be grouped to large tasks that run hundreds of identical calculations at the same time (GPU calculation model).

Shifty Geezer · Jun 2, 2011

sebbbi said:
GPU is only suitable for large scale computation tasks.

That's a very important reminder that needs repeating! GPGPU is all the rage, but doing small float calcs on a GPU is grossly inefficient. There's still (and always will be?) plenty of room for fast, versatile maths crunchers. SPUs still have a valid purpose as flexible and efficient maths crunchers if a general processing core itself isn't particularly great at maths, or else you need a core with a good float and vector capabilities and memory management to feed them.

Predict: The Next Generation Console Tech

hoho

liolio

Aquoiboniste

Rys

Graphics @ AMD

Squilliam

Beyond3d isn't defined yet

hoho

sebbbi

liolio

Aquoiboniste

(((interference)))

Dr. Nick

GuestLV

patsu

liolio

Aquoiboniste

liolio

Aquoiboniste

hoho

Squilliam

Beyond3d isn't defined yet

Heinrich4

patsu

Heinrich4

sebbbi

Shifty Geezer

uber-Troll!

Similar threads