Predict: The Next Generation Console Tech

Status
Not open for further replies.
What both of you Adex and 3Dillettante would you think about a 8 cores cpu made of the pansemi ppc cores as hinted by gubby, these looks like very well rounded core?
@32nm this would be a pretty tiny chip, the inclusion of a pretty big L3 cache could even been considered, if the gpu has access to it it could really helps cpu and gpu to work togeteher for graphical and non graphical tasks.
More this would allow for pretty huge gpu so ==> no need for say MS to chose a high gflop cpu for marketing purpose...
 
It's actually a bit over-engineered for a console, considering that it's a SOC and it's capable of 3-wide issue and 64 instructions in-flight.

It targets telecom embedded applications, so it's not so clear how it would perform on other apps.

Direct comparisons are difficult, though I think they'd give the edge on power consumption to the SPE, while PA Semi's cores look better in comparison to the PPE.

The PA Semi chip is 65nm, which is a finer process than the 90nm Cell.
The 90nm SPEs burn approximately 4W at 3.2 GHz at 1.1V.
Per-core power consumption at 2.0 GHz for the 65nm PA Semi chip is about 7.5W.
At 1.5, it's about 3.4 W.

GFLOPs per Watt favors the SPE.

The PPE or Xenon on the other hand do not have the SPE's level of specialization and PA Semi's core design looks interesting in comparison.

If we removed the signifcant amount of SOC transistors, the transistor count seems to be in the same neighborhood as Xenon.
The thread count is significantly lower for PA Semi's chip, and there are a number of design decisions appropriate for embedded work that would hurt it compared to the more focused Xenon.

I'd like to see an attempt at a slimmer chip similar to the PA Semi design in a Xenon-style or future chip, and I suspect sustained performance would be better than Xenon at 65nm.
If this design were available for the Xbox 360, I'd give the edge to Cell for peak flops and peak throughput, however, I'd suspect that the sustained performance would be enough that any real advantage would not be discernable until late in the PS3's life cycle, and in many areas I think the other chip would still compete.

To be honest, other factors are making it so that Cell may not display a significant visible advantage over Xenon until rather late in life anyway.
 
As always interesting and focused answer 3dillettante.
From what I read in your responce while PA core may not be optimal for a console, but I also read that low power OoO ppc cores wouldn't be a bad starting point.
------------------------------------------------------------------

I will have another take of what could come next from MS.
I will use more of marketing point of view than a technical one.
I think we (including me) have been way too optimistic about next-gen specifications.
MS will fight to be profitable this gen, Sony may never be profitable with the ps3 on the other hand... Nintendo was profitable from scratch.
Ms will aim for a middle ground.

First point:
*MS won't want to be take its pant down by the Wii2, as Nintendo is the new leader the launch of its new system can't be underestimated by competitor no matter its technical merit.

Second point:
*I believe that Wii2 won't be launched at higher price than the Wii, my bet is that 300$ will be the max nintendo will aim for.

Third point:
*Nintendo proved that even if they manage to attract a new demographic to console, the main part of the existing market is made of casual gamers.

Fourth point:
*Wii2 will be a sexy console (ie tiny, with a neat design)

Fifth point:
*Related to the third, the Wii failed (so far)to push hard core gamers to spend most of their money on the platform (and they can spend quiet a lot). The 360 did and I think that beyond the difference in technical prowess this is mostly due to online appeal and good game library which implie:

Sixth point:
*The system has to offer a great development environment ans as well not a huge departure of what exist elsewhere or previously (think old gen and pc world important as middleware are more and more important as the complexity of games raise).

Seventh point:
Related to the sixth, games are damned expansive to produce, multiplatform games will become even more proeominant and complex see Joshua thread in gaming section here for complains ;)

What can we guess:

*I think MS won't be able to wait till the 32nm process is widely available. So MS is likely to use 45nm part

*Ms will launch a pretty affordable system, I think 300$ should be the goal. Low price is the key for quick mass adoption. If MS want to counter Nintendo quickly they will have to be competitive on the price.

*Ms should give upon the "multi sku" a core like system should be their only offer, it should include a SSD.
(HDD should be sold separately or bundled).
A console has to be a nice convinient little thing!

*Console has to be relatively cheap, we're not ten year ago desktop computers are damned cheap, most people are not likely to pay a lot for a console.

*hardcore gamers expect good performances from their system.

*as said before
myself said:
MS will fight to be profitable this gen, Sony may never be profitable with the ps3 on the other hand... Nintendo was profitable from scratch.
Ms will aim for a middle ground.
MS won't have that many transistors to spend, likely more than Nintendo.

*MS will come with a shorter profitability plan than with the 360.

*MS will choose something close enought to the pc world (middleware, multi platform, etc.).
 
How could the system look?

Pretty conservative, I would say power efficient as Nintendo said but taking in account that I don't think that MS will look for profitability on hardware from scratch.

This will be (one of :LOL: ) my last prediction (I spam this thread but my job has been pretty boring lately...)

UMA architecture 4GB of ram +edram

CPU, MS will be weird :
------------------------------------------------------------------
6 PPC cores:

***when next systems will launched I bet that SMP will still be standard for main hardware vendor, so MS will leverage this while speaking to editors (easy PC ports).

***not a huge chip @45nm

***Ms will try to include OoO execution, MS should use the kind of cores pansemi use for its SoC solution, may be not as wide MS could keep the ppc core two issue as well as include more performant altivex units (job done by IBM on the Power6 could helps there).

***Not sure if Ms wil keep hyperthreading as not all games use the 6 hardware threads available in xenon.

***not clocked as high as the one in xenon
(power envelope budget + Ms may want to keep everything tiny and won't want bulky fans// think something barely bigger than a Wi // by pansemi figures ~2Ghz could fit the bill)

***Memory architecture will MS main focus:

*Faster caches, bigger L2 cache, configurable/lockable (once again all the work made by IBM on power6 could help even if MS choose OO cores).

Question to knowledgeable people what would provide better perfs given the kind of calculation the cpu is likely to handle:
completely shared L2 cache?
L2cache share betwween 2 or 3 cores?
Individual L2 cache?

*Fast communication between the different cores or blocks (depending on how the L2 cache is share)

*the whole "north bridge" should be include on the cpu (where it's included on the gpu in the 360)
The integrated memory controler should provide lower latency acess to ram which would help cpu perfs and enought bandwith (up and down) for the GPU (cores) (texturing, geometric data, or result of non graphical calculation).

*make communication between cpu and gpu easy if some of the gpu power is to be used for non graphical calculations. Some L3 cache accesible by both devices (costly silicon wize)? Acces to L2 cache by the gpu (but better than what the 360 offered)?

***the CPU could be a SOC

1) as the GPU maybe made of more than three cores (see lower) the mobo layout could already be complex (fast ring bus), so it could help.

2) a network accelerator device as in pansemi PWR ppc chips as online will be even more important for the next generation.

3) why not include a dsp like devices for compression/decompression. It could be interesting as every game on the 360 spend an healthy amount on ressoursce on this task (think one core... more if I rememeber properly some graph about PGR3)

4) the same could be true for sound processing

5) the goal? let the six cores handle only main game loop, physic and AI

GPU MS will invest more of its silicon budget here than say Sony who may want to leverage cell development which implies a healthy amount of silicon spent on the cpu (I'm not sure Sony will consider launching again a more expansive system than its competitors) :
-----------------------------------------------------------------
***could be made of three pieces or more
two or three shader+tmu cores
+
one rop/edram cores

or
two or three shader cores
+
one tmu/rop/edram cores

depending on what would be the least stressable for the memory sytem.

***pretty tiny cores => better yelds => easier to cool but more complex modo design

***may be as xenos slightly in advance relatively to dirextX in regard to currently available pc part.
 
If we removed the signifcant amount of SOC transistors, the transistor count seems to be in the same neighborhood as Xenon.

The PA Semi cores are 10mm^2 each in a generic 65nm foundry proces, that's without caches (since SRAM macro size differ from foundry to foundry I guess). A conservative estimate with level 1 caches would put one core at 20 mm^2 in 90nm, still smaller than the XCPU cores.

The core itself, not the dual-core SOC is what I considered a good building block for a future console specific MPU.

The thread count is significantly lower for PA Semi's chip, and there are a number of design decisions appropriate for embedded work that would hurt it compared to the more focused Xenon.

But Xenon is really 6 1.6GHz dual issue in-order cores to the programmer. I'm guessing 3 or 4 higher performing cores could make up for that.

Cheers
 
The PA Semi cores are 10mm^2 each in a generic 65nm foundry proces, that's without caches (since SRAM macro size differ from foundry to foundry I guess). A conservative estimate with level 1 caches would put one core at 20 mm^2 in 90nm, still smaller than the XCPU cores.

The core itself, not the dual-core SOC is what I considered a good building block for a future console specific MPU.

The area figures sound like they might be close, but there are details that might affect the outcome, such as how much dedicated hardware the PA Semi chip has for VMX and at what latencies. VMX-128, if ported over, would contribute to the transistor count.

I also can't find confirmation on the number of ports for Xenon's caches.
If it's double-ported, then data cache density would be less than that of the single-ported cache on the PA Semi chip.

The key spanner in the works that might help out PA semi is that high-clock logic scales less than ideally with process size, though logic in general scales less than perfectly.

But Xenon is really 6 1.6GHz dual issue in-order cores to the programmer. I'm guessing 3 or 4 higher performing cores could make up for that.

Cheers

I believe this would be true.

There are some features, I don't know about, such as latencies and how pipelined its VMX and FP units are, and how the PA Semi's L2 sitting on the other side of an interconnect compares to the crossbar arrangement used by Xenon.

I haven't found a latency figure for Xenon's L2 cache, though PA Semi has a 22-cycle latency.

Changing them to target a console might change some of the compactness of the design.

addendum:
I'm also not sure about how much PA Semi designed for yields.
Redundancy and other process parameters optimized for yields do not always lend themselves well to a compact design.
The product market PA Semi targets (when this product is finally released) is significantly higher-margin, so yield concerns may have more slack.
 
Last edited by a moderator:
As for Nintendo? I think they will start to act again as the leader:

What will be the situation for them:

***The PRO
*They will healthy leader in : Japan and Europe
*They could be co leader with MS in US
*They hardware will be cheap

***The CON
Their hardware is already lagging perfs wize, which hurts on the hardcore gamers segment of the market
Their hardware don't support HD

***SO:
*Nintendo will want to augment its market share
*Nintendo will want to attract hardcore gamers which spend a lot money on software => royalties ==> even more money

***How
Nintendo will take everybody its pant down.
Nintendo will provide complete BC so the wii can continue to sell as a cheap/kid devices
Nintendo will have to provide something good enough to hurt the 360 and the ps3 when they will be the more dangerous (ie good perfs/HD/low price).
Nintendo won't sell Wii II at a loss but may consider zero profit on hardware at launch (and if needed obviously).

My guesses:
***Nintendo will launch first in its home market.
*Sony is their only competitor
*A lot of factors will make the ps3 more and more attractive
*As company Nintendo won't want Sony to start stealing some of its market share.

I think Nintendo could launch the Wii2 in Japan as soon as fall 2009
This way Nintendo won't be too supplied limited and take Sony its pant down when their system will have reached its sweet point in perfs-functionalities / price ratio.
Ie the ps3 should be cheap as a gaming system, as a BRD player, when people will no longer forgive the lack of HD compliance and when the ps3 will have a nice game library

Sadly for Sony I fail to see how Nintendo couln't manage to take them their pant down.

For the two others big territories I think that depending on supply, Nintendo will launch at the end of the first semester 2010

***Consequences:

I think Nintendo will manage to take everybody its pant down...

I expect MS to be able to react quickly:
they launched the 360 in 2005.
they've been working on up coming hardware for quiet some time.
I think they know Nintendo next move andd they will try to adapt:
launch a more performant system at the same price with an adapted profitability plan (see my previous post).
I could see them launch fall 2010 US only to ensure strong presence and good supply in their home market and by the way the world biggest market.

they will launch in Europe and Japan in the 2011 first semester

Sadly for Sony lovers I think Sony will be caught its pant really low.
I don't know what else to say, may be Sony would better delay its next system till 32nm is readily available and provide the best hardware buy fall 2011(but at an affordable price)

-------------------------------------------------------------------------------------------------------------------------

the hardware :

I thinks it pretty easy to guess out of what the Wii 2 will be mad

Wii2 will be produce @45nm

For SMP 4 cores is he sweet spot.

Nintendo will go with 4 OoO PPC cores clocked pretty low to fit their power heat dissipation budget (passive if they can achieve good enough perfs).

Nintendo will use almost standard low/mid end PC part gpu + edram (say 15MB 720p + 2xAA without headache, Nintendo will use part that allow for passive cooling system.

If nintendo is clever they will use 1 GB of cheap low clocked RAM which would provide just enough bandwith for the cpu and textures. If Nintendo want to make WIi2 a ggod enough performer longer they could invest more heavily on their RAM budget.

Nintendo will use hd-dvd or BRD depending on who lead or could choose the chinese format for costs reasons.

Wii2 will include a SSD jsut big enough for the OS and caching and will make money out of hdd or memory cards.

Shortly Nintendo shoudn't have a hard providing @300$ a system that will out perform graphically both the 360 and the ps3:
1GB ram => better textures
better GPU
Good enough CPU to match with good Ai and not over the top physic.

Nintendo mostly only have to execute properly to stay first for quiet some years.
 
The area figures sound like they might be close, but there are details that might affect the outcome, such as how much dedicated hardware the PA Semi chip has for VMX and at what latencies. VMX-128, if ported over, would contribute to the transistor count.

The register renaming capability would limit the benefit for the 128 register implementation of VMX as seen in Xenon. The VMX implementation though is fully pipelined in the PA core where latency for a FMADD is 8 cycles AFAICR, this is higher than for Xenon, but the self scheduling capability of the PA Semi core would mean fewer stalls.

The load-to-use latency of the level one caches are only 4 for PA Semis core vs. 6 (!!!) for Xenon.

I also can't find confirmation on the number of ports for Xenon's caches.
If it's double-ported, then data cache density would be less than that of the single-ported cache on the PA Semi chip.

I can't imagine Xenon's cache is dual ported (but I admit I don't know), it's only dual issue after all, spending effort to dual port the level 1 cache to handle the rare event of two loads issued in the same cycle would be better spent trying to lower the high load-to-use latency of the D-cache

There are some features, I don't know about, such as latencies and how pipelined its VMX and FP units are, and how the PA Semi's L2 sitting on the other side of an interconnect compares to the crossbar arrangement used by Xenon.

I haven't found a latency figure for Xenon's L2 cache, though PA Semi has a 22-cycle latency.

PA Semi's core and cache architecture is well thought out I think. The 22 cycle latency figure for the level 2 cache is low enough that the core can schedule past a level 1 cache miss, level 2 cache hit (22 cycles x 3 instructions per cycle fits almost perfectly with the 64 instruction ROB) at full blast.

The product market PA Semi targets (when this product is finally released) is significantly higher-margin, so yield concerns may have more slack.

I'm not saying that it has to be PA Semi's core, I just think it looks like a well thought out implementation of the PPC ISA, regardless of perspective: Power, performance or die size. It certainly makes it clear that you can do better than the PPE/Xenon core design IMO.

Cheers
 
The load-to-use latency of the level one caches are only 4 for PA Semis core vs. 6 (!!!) for Xenon.

Interestingly, in wall clock time, Xenon's caches are slightly faster. Since the clock speed ratio is slighly greater than 3:2 in favor of Xenon.

The lack of self-scheduling still makes it more painful, though SMT should lessen the hit in most cases, probably.

I can't imagine Xenon's cache is dual ported (but I admit I don't know), it's only dual issue after all, spending effort to dual port the level 1 cache to handle the rare event of two loads issued in the same cycle would be better spent trying to lower the high load-to-use latency of the D-cache
It probably isn't, now that I've reviewed the block diagram, two ports would be overkill for a single Load/Store unit.

PA Semi's core and cache architecture is well thought out I think. The 22 cycle latency figure for the level 2 cache is low enough that the core can schedule past a level 1 cache miss level 2 cache hit (22 cycles x 3 instructions per cycle fits almost perfectly with the 64 instruction ROB) at full blast.
I'm sure they could shave a few cycles off with a more direct connection between the cores and not designing the cache to be scalable up to 8 MB, which would likely not be necessary for the target design.

A faster cache couldn't hurt the design.

I'm not saying that it has to be PA Semi's core, I just think it looks like a well thought out implementation of the PPC ISA, regardless of perspective: Power, performance or die size. It certainly makes it clear that you can do better than the PPE/Xenon core design IMO.

Cheers

I agree with the idea a design like this would likely be better overall than Xenon.

I just wanted to point out that the core might not be as compact if the implementation has to add redundancy or circuits designed to maximize yield.
As the only example of its type, the PA Semi core doesn't give us too many data points.

The big disadvantage is the design and verification cost.
It certainly isn't easier to design a chip of this type with that level of custom design, and time to market concerns likely left Xenon in the shape it is.
 
Last edited by a moderator:
But Xenon is really 6 1.6GHz dual issue in-order cores to the programmer. I'm guessing 3 or 4 higher performing cores could make up for that.

Cheers

Just one word, given MS recommandation about SMt it would be more like a 2.2GHz core and one 1GHz one ;)

That why MS hint to use helper threads to keep the core busy.

I agree with you about the need OoO if we consider at least 6 cores how many helper threads developers are like to find use for?

As for the transistor count in ppx cores vs PA ones does the support for integer would make up for the extra registers and special function in wmx128.

For a console what would you (both of you gubby and 3Dilettante) think a better implementation of the ppc isa than the PA one: na
narower core (two issue)

and for the cache architecture with between 4 to 8 cores :
how would you share L2
completely shared (complex slower)
multiple L2 caches how would you share them (one L2 per core per two cores etc.)
Would you prefer tiny individual super fast L2 cache + slow L3 cache?
 
With regards to the Playstation 4, you're looking at a 16-24 core enhanced Cell based processor which I am going to dub the Sense Engine. Probably looking at 1.5 TFLOPS tops out of this thing, Kutaragi is out of the picture; the moon missions are over. They will try to sell this thing at 400 dollars with minimum loss per unit.

On the GPU side, I do see nVIDIA having part in it again. However, I do not see the GPU just being an off the shelf rip off again, as was sadly the case with RSX. It's going to leverage off the Cell architecture TO A DEGREE, think Visualizer but not with that many cores. This thing is going to be fucking fast on paper, compared to what is available at the time because Sony will be going for the biggest raw numbers for the buck, no better way than making the thing cell based to a degree(think, a few fast fast cores for shaders)Definately alot more interoperability between Sense Engine and RSX2 also.

I see the architecture being a lot more flexible this time around, sustained performance should be pretty good especially if they opt for some fast e-DRAM. The advent of a more custom GPU should lead to some more interesting programming models.

3-4GB total next gen XDR memory for the thing, I do not see them going with the TB/S bandwidth as some expect, though it is a possibility, depending on their use of e-DRAM.
 
Last edited by a moderator:
While I agree on the cell part ie Sony need to spenf lot of silicon here.
I'm not sure Sony will spend as much as MS for gpu.
Or Sony will late again and will deliver an expansive product won't have many legs for price reduction.

Edit don't answer I will finish the post later (I'm currently working :oops: )

Paul, I think a huge marketing effort will happen at sony headquarter.

If Sony give up on GPU pc and opt for something competely custom, this will compromise BC, will need new development from editors, while Sony won't in the same position as thry were with the ps2.

I think people @ Sony headquarter should make a marketing effort to define what theirs goals for the ps3.
KK is no longer here woth his dream of supercomputer, the whole concept of playstation ala KK has to be redifined.
Sony need to adapt to market change, Nintendo will definitively aim for shorter cycles for its product with short profitability plan, MS may choose a middle ground.
No serious manufacter will again come with 400$ or more system.
development costs will be even higher, editors won't be happy with anything that is a huge departure of what is curretly from the main hardware vendors.
Inclusion of pc gpu part was a step in the right direction!
When next gen will launch (next gen won't start when sony will on the market) SMP will still king of the hill in Pc world except in low/mid end where fusion like cpu or Soc could become more mainstream.

For The 1,5 TFlops figures how do you get there?
a spe peaks@25GFlops so 16 => 400GFlops 24=>600GFlops
 
Last edited by a moderator:
If Sony give up on GPU pc and opt for something competely custom, this will compromise BC

I should have emphasized the custom "to a degree" part of my post much more. I do not expect a completely custom GPU, just one with more interoperability with the CPU and some fast SPE's tacked on for shader ops, etc. This should not hinder backwards compatability.

As for CPU, I fully expect future versions of the SPE to pack more than 25GFLOPS.. If you merely double clock speed that gives you 50GFLOPS X 24 = 1.2TFLOPS. Not that I only expect them to merely double the clock, we may not even see a 6+GHZ in 2012 with the way things are going now.
 
Last edited by a moderator:
For the PS4 GPU I agree that they should leverage Cell(2) as much as possible and simply have it perform all geometry calculations. RSX(2) should just be a large pool of SPs with a sprinkling of ROPs & TMUs, large local store/cache, and access to a large(r) pool of DRAM as well as a low latency two-way direct connection to Cell(2).
 
I should have emphasized the custom "to a degree" part of my post much more. I do not expect a completely custom GPU, just one with more interoperability with the CPU and some fast SPE's tacked on for shader ops, etc. This should not hinder backwards compatability.

As for CPU, I fully expect future versions of the SPE to pack more than 25GFLOPS.. If you merely double clock speed that gives you 50GFLOPS X 24 = 1.2TFLOPS. Not that I only expect them to merely double the clock, we may not even see a 6+GHZ in 2012 with the way things are going now.


IBM said it would achieve 1 TFLOP by 2010 by using 32 SPEs. they could do that *without* increasing the clockspeed far above 3.2 GHz.

4GHz should be enough to hit 1 TFLOP with 32 SPEs. As you said, each SPE could be doing more than 25 GFLOPs, and mainly because of architectural improvements, not so much clockspeed. if PS4 comes out later, say during the 2012-2014 timeframe (6-8 years after PS3) well over 1 TFLOP should easily be possible. maybe 2-3 TFLOPs ?


With regards to the Playstation 4, you're looking at a 16-24 core enhanced Cell based processor which I am going to dub the Sense Engine

where did you get 16-24 core?

a 16-24 core CELL doesn't make 'sense' when:


1.) Epic mentioned Unreal Engine 4 being designed to take advantage of 30 to 40 cores.
http://www.eurogamer.net/article.php?article_id=61668

2.) IBM's chief Cell architect said they're targeting 1 TFLOP using 32 SPEs by 2010, which means possibly even more SPEs by the time it's time for PS4.


3.) ..........................................................................the Cell Processor roadmap showed a 64 core Cell ..v..
1027sce_cell_roadmap.jpg
 
Last edited by a moderator:
Here's another recent roadmap from Toshiba in October

Above = high-performance
Below = low-power
Blue = in-development
Red = planning

L_Spurs05.JPG
 
In my dream of next generation... i want see at least some implementation of Real Time Ray-Tracing in gpu or cpu (cell processor like).

In other dream for next gen ...see ps4 with "full broadband engine cell" with 4 cells each with 8 SPEs/SPUs and Vizualizer with 4 cells like each whith 4 SPEs/SPUs + 4 Pixel Engines....like pictures patent in USA september 26/ 2002.

(if not my mistake guys of opem-RT/saarcor.de/Slusallek etc working in some hardware for ray-tracing realtime)
 
Last edited by a moderator:
Talk of ~30 core consoles scares me somewhat.
While great from a theoretical viewpoint I don't really see the enormous benefit to developers. In todays environment only the absolute top level developers with the right backing, timeframe and a publisher who doesn't care so much about ROI would see meaningful benefits...

The concern I have,- is hypothetically had the PS3 had 16 spus, or 32... What benefit would have been seen in the majority of games that came out this year?. Developers seem (at least from my view point) to be struggling with problems like those imposed by optical disks and having systems with hard limits - not the slightly fuzzy boundaries of the PC world (eg virtual memory).
Pushing those limits outwards helps today, no doubt, but does it help developers in the future? You only push the hurdle higher.

The industry is expanding, technology is going crazy. Finding a programmer who is honest is bloody hard as it is, finding one who is competent is a challenge. Finding one who can cope with the technology, adapt to it, learn with it and ultimately exploit it... Well hell. What university teaches multithreading, let alone on such a vast scale? My opinion the pool of people who have the ability to exploit these theoretical systems will only shrink in relation to the size of the industry - and yet that is where everyone expects these systems to focus, on the elite hardcore programmers.

Put me on the fence because I don't like it.
The console makers need to focus on making their system easy to exploit if industry shall continue to expand. Hell, most developers already walk a knife edge as it is - just look at the peril lionhead were in until the MS buyout.

So. My thoughts. I'll take the 360 my reference since it is the system I am most known to champion. (I suppose). Although I'm still technically a casual observer - so to speak

Soften the hard limits. Currently, it's something like:
Code:
    EDRAM <== GPU  <~~> Memory
               |
              CPU <-> slow HDD
Why can't it be something more akin to:
Code:
[CPU/GPU] <===> gihugeous fast cache (like edram) <---> Main memory <---> Flash <---> slow HDD
Yeah I know. Not too well thought out, but it's theoretical. Let the machine deal with the hardware, let developers know they can expect lightning fast performance for their most recent ~64mb, good perf for 2gb (ram), ok for the next 2gb (say, flash, or something), then utter disaster for everything beyond that (basically, just like a PC). Point is let the machine do what machines do best, logical ordering and structuring, don't let a human deal with memory :)

Now. Then comes the programming.
For 90% of the computing world, C++ is dead, let alone assembly. As crazy as I possibly sound, the next xbox must be designed with .net as a first class citizen, perhaps above that of C++.
If you are a game company, hiring is a huge risk as it is. Getting someone familiar with C++ is hard enough, let alone in such a resource restricted environment where a memory leak or memory corruption with pre allocated resources spells doom. Once again, let the machine do it. .net is stupidly efficient at what it does, and at the low level does a better job of targeting the hardware.

For multicore, I don't have an obvious answer. But I do have suggestions. Having cores of differing performance is OK in my book, provided there are no hoops to jump through to use them. The other thing is GP/GPU, layer this into the language and treat the GPU as an extra set of cores. Treat them as a subset of the primary cores, not an entirely different system. Obviously you shouldn't do networking on a GPU core, but that doesn't mean you should segregate them. With a smart language, IDE and compiler, the developer should know this the instant they try. I cannot stress that enough, a *smart* IDE and language/compiler. You read about things like load-hit-store performance optimization (is that the one?) and think 'how on earth would you describe this to an intern, let alone a normal developer?'...

A similar story can be told of optical media. A horrific limitation that has to have huge thought and investment put in to avoid a disaster. The PS3 allowing copying parts of the game to the HDD must certainly ease the burden somewhat, but to say it's an ugly hack is to give it credit. Honestly something needs to change radically here, and for the life of me I can't see it (for physically distributed games). Some form of cheap (read-only?) flash would be a good start, ala DS - (and you could save your game to the disk :)) but it's still hardly ideal yet, and the scale isn't there yet.
Whatever the solution, it needs to be smart. Let the machine handle it, make it fast, and make it good.

And as you may have noticed, I haven't mentioned graphics. As much a graphics geek I am (in the programmer sense), I don't see challenging hurdles. The problems at the moment are all API and design problems, not hardware. Investment in tools and software will (imo) give much, much better results here than investment in hardware. Why do we *still* not have high level filtering functions (that are crazy optimized for the system), logically simple things like calculating a tonemapping constant are hard for lots of programmers, and even shipping games get it badly wrong (I'm looking at you, R6:Vegas...) - and lets not get into efficiency here. Look at XNA, perfect opportunity, yet still painfully low level in places (for no obvious benefit), the heritage of DirectX shows though. (Hence my current hobby project making a shader plugin for XNA).

sigh.

Ok I've probably gone on long enough. Hopefully my point is clear, even if my words are somewhat muddled.
 
where did you get 16-24 core?

a 16-24 core CELL doesn't make 'sense' when:

I am the last person in the world to doubt STI's abilities to create a monster CPU, as many of you already know ;) However, I am a lot more conservative with regards to the specifications on the PS4 because of a few things.

1. The success of the Wii, things will be different going into next generation. Expect to see Sony focus on some type of revolutionary(or not) interface, perhaps utilizing built in cameras that can track and diffirentiate between several bodies. This is why I have dubbed the PS4 CPU the Sense Engine.

2. Kutaragi is(largely) out of the picture, he was the major drive for the most cutting edge hardware possible.

3. They will not launch at anything higher than 400 US dollars and I do not see them taking massive hits on hardware in the beginning.

4. 32 or more cores is just impractical IMO. Think of the SPE's you'll have to reserve for redundancy, sustained performance, die size, etc. It will just be a nightmare for developers, it's best to jump up clock speed and make the necessary architectural improvements for a 16-24 SPE beast, rather than throwing 32+ less powerful cores on there to inflate floating point power.


Things have changed, and while the PS4 will feature an incredible hardware spec; we will not see Sony pushing things to the max. A CPU with 1-1.5 TFLOPS computing power and a complimentary Cell based GPU will be nothing to scoff about. You'll be looking at a next gen Uncharted game looking roughly on par with the CGI in the commercial.
 
Status
Not open for further replies.
Back
Top