Next gen consoles: hints?

darkblu said:
at least one ISA comes to mind where they didn't share your view on instruction decoding 'compression', Dio. then there comes to mind the VLIW - a particularly good CPU philosophy in terms of efficiency (IMHO) where constant insturction length is mandatory
I'm not saying there is a right answer. I tend to be nervous about proclaiming that there is One True Way (because there never is). VLIW is a great addition to the philosophies behind ISA's.

There is no doubt that VLIW, if supplied with effective compilers, is very performance efficient (in terms of exploiting best possible use of the available execution units). However, I was considering cost-efficiency as well as performance efficiency, and I'm not aware of any VLIW architectures where cost-efficiency is any kind of factor at all :). VLIW is inefficient in terms of (instruction) memory usage, and cache memory is a high percentage of chip cost.

And of course it might be possible to make a cheap but highly effective VLIW chip if the savings you get from using VLIW outweigh other increases in cost.
 
DaveBaumann said:
Dio said:
Really? So, I can throw out my CPU and run my Office off it? Wow, I had no idea, and here I'm thinking you guys were still unable to share Shading resources in DX9.
It wouldn't be easy to port, but in theory it can be done. Did you see ET's entry for the shading competition?
http://www.beyond3d.com/forum/viewtopic.php?t=8811

Heh - I was thinking about that as well.

Something else of interest: http://www.gpgpu.org/

With all due respect to ET and the authors at the above site, the "vector/parallell processor attached via bus" idea has been approached many times over the years on systems from mainframes via minis down to PCs. They have always had their niche uses, but have never taken off. Nor is the idea likely to now, given that CPUs already have pretty spiffy vector extensions running at high clock and with access to fast cache. These are also easily extendable by the CPU manufacturers if need be. GPUs have the advantage of being there "for free" so to speak, but that is not sufficient for the concept to become wide spread.

For a computing problem to be profitably migrated to the GPU of a PC, it pretty much has to be parallellizeable, vectorizeable, and fit nicely within the memory area available on the card but not fit within the cache of a CPU. That's a pretty narrow range of problems.
Furthermore, the code has to be programmed by an enthusiast or a masochist, because the tools for using a GPU as a general processor just aren't there AFAIK. With the above mentioned add on accelerators, there typically followed a linkeable library of routines, and at a minimum a set of extensions to one or more FORTRAN/C compilers, so that you had a reasonable chance to be productive.
I don't know if such tool development is underway for GPUs, it would be very interesting to know.

The situation for the PS3 should be quite different.

Edit: Never mind the relative time and cost of developing for a GPU, debugging and extending your code and keeping it up to date and running smoothly across a wide gamut of GPUs from different manufacturers and over time. Ouch. And ouch again.
 
I agree that GPU's aren't designed and shouldn't be used to do much of this, but I was pointing out that anything can be made to run, which Vince thought was impossible.

However, I wouldn't have thought that the PS3 will just run 'plain C' code either, given the unsuitability of standard C and C++ to exploiting parallelism in code.

I would imagine that efficient code will have to be either in some HLL designed for parallelism or in asm. Or - as you describe - it will be vanilla C code making heavy use of libraries that are written in asm or some parallelism-aware HLL.
 
Dio said:
Obviously, the PS3 people here disagree. It seems to me there are three major arguments which are expressed:
- Sony are that good and therefore design efficiencies can be a win
- PS3's silicon cost will be so high that nobody else can compete
- some exotica rendering such as REYES, volumetrics, etc. will ultimately prove significantly more efficient than a primitive pipe.

Personally I'm highly unconvinced by two of those three. Only time will tell, of course.


Isn't it more appropriate to compare the PS3 method as a stream processor? The hardware in the PS3 should have much in common with the Imagine VPU.

Modern commercial graphics processors have hardwired graphics pipelines that implement a given API. Recently, these processors have added programmable elements to their pipelines.

We take the opposite approach: we begin with Imagine, a fully programmable processor, and implement a polygon rendering pipeline on it. Our research centers around the following goals:

We are developing algorithms which exploit our SIMD architecture and fit elegantly into the stream programming model.
Our algorithms make efficient and high-performance use of stream programming hardware.
We explore the advantages of the stream processing approach:
Compare our machine organization against current graphics processors: our organization reduces load imbalance and amortizes the cost of programmability over all stages
Exploit our greater amount of programmability
Reduce or eliminate the need for multipass
Easily map our pipeline to a shading language

http://cva.stanford.edu/imagine/project/polygon_rendering.html
 
Dio said:
I agree that GPU's aren't designed and shouldn't be used to do much of this, but I was pointing out that anything can be made to run, which Vince thought was impossible.

However, I wouldn't have thought that the PS3 will just run 'plain C' code either, given the unsuitability of standard C and C++ to exploiting parallelism in code.

I would imagine that efficient code will have to be either in some HLL designed for parallelism or in asm. Or - as you describe - it will be vanilla C code making heavy use of libraries that are written in asm or some parallelism-aware HLL.

Mfa and I asked the same question some time ago and we both felt that explicitly coding for parallell execution would probably be necessary to achieve the best results. Over here in the console forum, and in another thread, Vince pointed the way to the pdf linked here:
http://wwwooti.win.tue.nl/visions/abstracts/dijkstra.html
This guy is obviously talking about the PS3, and he seems to lean towards using explicitly parallell programming constructs. Whether he has any inside track on the plans of Sony is another question altogether.

I'd assume a combination of techniques and tools to be used, but generally I would tend to regard a movement towards explicitly parallell programming as a good thing, particularly if we project into a future where we fit not four, but 16-64 processors on a die. Good profiling tools will be invaluable. :)
 
He is simply being reasonable, parallelizing compilers for non regular problems is to explicitly parallel programming what Java is to C(++) ... only a little more pronounced.

Personally I think Charm++ would probably be a decent start as far as something which programmers might be remotely comfortable with is concerned ... C++ framework with async message passing (personally I think async messaging doesnt give you anything beyond buffered sync message passing except for a lot of trouble, but most people with more experience seem to disagree so who am I to argue).

Or at least I would think it if the architecture was a little more general than the one from the broadband processor patents. I wouldnt be surprised if stuff like moving data around and resource allocation will actually end up being handled by the developer, despite all the fairy tales on automatic distribution mentioned in the patent.
 
MfA said:
He is simply being reasonable, parallelizing compilers for non regular problems is to explicitly parallel programming what Java is to C(++) ... only a little more pronounced.

Personally I think Charm++ would probably be a decent start as far as something which programmers might be remotely comfortable with is concerned ... C++ framework with async message passing (personally I think async messaging doesnt give you anything beyond buffered sync message passing except for a lot of trouble, but most people with more experience seem to disagree so who am I to argue).

Or at least I would think it if the architecture was a little more general than the one from the broadband processor patents. I wouldnt be surprised if stuff like moving data around and resource allocation will actually end up being handled by the developer, despite all the fairy tales on automatic distribution mentioned in the patent.

To be fair the patent outlines the facilities to route data around nicely, it does not tell you which software will do that work: is it the API Sony gives you ? I would hope so, but something might have to be done by the programmers if they want finer control.
 
Dio said:
Vince said:
Dio said:
One could of course reverse that: why do you think it is acceptable for sampling to be done in a fixed-function manner, but not acceptable for anything else?
Because, I believe that there are some tasks which are inheriently iterative in nature whose function scales linearly or are constant. These are generally things like filtering, sampling, et al. They're resolution or intensity dependant - if you want bilinear, put your 16 multipliers and 12 adders down in silicon concurrently. It just makes sence.
Is your inference "It will be faster to do this bit fixed-function"? If so, the limits you have set there are equally arbitrary to the limits that you deride DaveBaumann for supporting - as his logic is much the same, just with different arbitrary limits.

I mentioned nothing of preformance (absolute or relative) and I thought this was clear when taking my post as a whole. There are some tasks whose resource usage is more or less static:

So, it would be obtuse not to explicitly put these into dedicated constructs since their level is usage is very, very constant. Yet, there are other tasks which don't have these stabilized regions, whose computational usage fluctuate rapidly and whose demands are highly dynamic - I see these as Shaders, Topology and such typically "front end' tasks of the past.

Basically if you were to reduce the tasks a hypothetical Rxx0 can do into a pseudo-mathmatical/logical statement - when looking at where developer computational demand potentialities exist, what tasks (or type of tasks) would define it's order? Which are pseudo-constant and irrelevant?
 
Vince said:
tasks a hypothetical Rxx0 can do
Obviously, I cannot discuss anything but existing ATI hardware, and there are clearly even aspects of that which I cannot discuss.

In an ideal world nothing would be fixed-function - but we have to make chips that deliver the performance people want and that they can afford to buy. So someone has to make a decision what is fixed function and what isn't. That decision is an arbitrary one, based upon a cost/benefit analysis.

I could, and originally did - say much more, but I think that it's unprofitable for everyone for us to continue this pointlessness.
 
Entropy said:
Furthermore, the code has to be programmed by an enthusiast or a masochist, because the tools for using a GPU as a general processor just aren't there AFAIK. With the above mentioned add on accelerators, there typically followed a linkeable library of routines, and at a minimum a set of extensions to one or more FORTRAN/C compilers, so that you had a reasonable chance to be productive.
I don't know if such tool development is underway for GPUs, it would be very interesting to know. .

http://developers.slashdot.org/developers/03/12/21/169200.shtml?tid=152&tid=185
http://graphics.stanford.edu/projects/brookgpu/

;)
 
DaveBaumann said:
Dio said:
I agree that GPU's aren't designed and shouldn't be used to do much of this, but I was pointing out that anything can be made to run, which Vince thought was impossible.

However, I wouldn't have thought that the PS3 will just run 'plain C' code either, given the unsuitability of standard C and C++ to exploiting parallelism in code.

I would imagine that efficient code will have to be either in some HLL designed for parallelism or in asm. Or - as you describe - it will be vanilla C code making heavy use of libraries that are written in asm or some parallelism-aware HLL.

http://developers.slashdot.org/developers/03/12/21/169200.shtml?tid=152&tid=185
http://graphics.stanford.edu/projects/brookgpu/

;)

Great find :)
 
Back
Top