What can streaming FPU processors achieve - example here.

Shifty Geezer · Jul 1, 2005

Tacitblue posted this in another thread where it was lost, but I think this worth reading. Apparently GAMMA has used G70 (an FP intensive stream processor like Cell) to accelerate applications, including database and spreadsheet work. Reading this, it goes totally against the idea that you need GP cores for this sort of work, and that Cell will be able to perform well in PC apps. If they can develop libraries for accelerating via a stream processor, I guess that'll pave the way for a dramatic shift towards Cell like processing even on PC (using GPUs) and also aid STI position with promoting Cell. (And maybe then these programmers will stop whinging about XeCPU and Cell's lack of GPU performance!!)

Tacitblue said:
Here's an interesting article, using GPU's to run non graphics processing. Looks like GPGPU tech is finding some inroads. Wonder what else they can do with it.

http://www.tomshardware.com/hardnews/20050630_161353.html

Titanio · Jul 1, 2005

While I'm not so much interested in running databases etc. on graphics cards, this seemed to be an apt quote given the discussions going on recently:

The trick to exploiting the latent power of the graphics processor while it isn't producing scenery for 3D games, UNC Professor Dinesh Manocha told us, is to rephrase everyday operations as though they were specific two-dimensional graphics functions, like texture mapping.

Beyond GPUs, with the CPUs of the next-gen systems, "rephrasing" is evidently needed. Obviously some things are just not going to work very well on those chips, or parts of those chips, but if you make the effort, the potential reward is high, and you could expand their usefulness beyond the pretty tight constraints some are placing on them at the moment. Certainly, if you can make general stuff like a database work (well) on a GPU, I can't see it being any more difficult to map some other general things to X360's cores or Cell/SPEs (which isn't to say it's not difficult, of course..). Y'hear that, Anandtech sources?

I think we will see that happen with the more ambitious devs. And the performance and technical difference between games from such devs next gen and games from those who don't bother will be much larger than it was this gen (certainly on Cell at least, where such a large proportion of its power is in its SPEs).

Shifty Geezer · Jul 1, 2005

Precisely. What's the GP performance of GPU? Zip - far worse than XeCPU or PPE. What's it's performance in a database sort? Fantastic! Only not if you try to run the Intel code on it.

Ragemare · Jul 1, 2005

So how does this GPUsort algo' work and can these results be gained when attempting to do something other than sorting? Also when will we see GPU's accellerateing entire databases and will enough DB functionality be able to be offloaded onto the GPU to stop the CPU becomeing a bottleneck thus negateing the GPU's impact.

This technique of essentially pretending everything is a game, stated Prof. Govindaraju, reduces the critical elements of such everyday functions as sorting algorithms to a single instruction, which the graphics processor then applies to multiple pipelines at once.

When he says instruction he does mean shader right?

Alpha_Spartan · Jul 1, 2005

Wow...that Anandtech article sure spawned alot of Cell/Xenon apologists.

The main question is: Is it relatively simple to implement? Simple = less time = less money.

Titanio · Jul 1, 2005

Alpha_Spartan said:
The main question is: Is it relatively simple to implement? Simple = less time = less money.

No one's saying it's simple, and whilst that is absolutely relevant to development and soforth for most devs (unfortunately?), it's less relevant if trying to judge technical merit. If something CAN be done on a chip, it doesn't matter if it's hard or simple, it can still be done.

As for accelerating whole DB apps etc. the point is that if you split a task up, even the "general" ones, you may just find some part of it that can be accelerated, even if the whole app/task can't. And if that something was taking a lot of time, it could be very worthwhile.

Shifty Geezer · Jul 1, 2005

Alpha_Spartan said:
Wow...that Anandtech article sure spawned alot of Cell/Xenon apologists.

The main question is: Is it relatively simple to implement? Simple = less time = less money.

As Titanio says, it's not a case of whether it's difficult or not, but whether it's possible. Anand's article said the SPE's were next to worthless, and the XeCPU and Cell were incapable of achieving much needed because they aren't strong in GP. What this GAMMA research proves is that instead of using GP (which is a misnomer, as GP is only an approach to a problem based on one way of modelling it), programs written for streaming FP monsters can outperform GP monsters, by a huge margin even. This is why multithreaded, parallel FP processors is the future and coders will have to learn to think and work in the new model.

I think the GAMMA article is the first to illustrate the 'new thinking' and it's benefits.

scooby_dooby · Jul 1, 2005

Anand never said the cores were worthless,

He just that they were weak single-threaded processors, and it will take Dev's 3-5 years to actually figure out how to properly multi-thread their code, at which point it would be too late because PS4 and XBOX3 will be coming out.

_phil_ · Jul 1, 2005

He just that they were weak single-threaded processors

my car is weak for sea travel ,too.

Shifty Geezer · Jul 1, 2005

"the SPE array ends up being fairly useless in the majority of situations, making it little more than a waste of die space"

The article's gone now, so let's not debate it any more. Just pointing out that anyone thinking GP is essential, there's evidence to the contrary (though as you say, devs need to get a handle on it. Though I'm not happy to think of PS4 out in 3 years!!!)

scooby_dooby · Jul 1, 2005

_phil_ said:
He just that they were weak single-threaded processors

Click to expand...

my car is weak for sea travel ,too.

That might make sense if the vast majority of game code wasn't single threaded.

To use your analogy, 95% of roads would be on water, and your car sucks at sea travel....shitty...

Anyways, I wasn't saying I agree with his conclusions, just wanted to clarify.

article: http://www.ansonwilson.com/anandreview.htm

"While the developers we've spoken to agree that heavily multithreaded game engines are the future, that future won't really take form for another 3 - 5 years. Even Microsoft admitted to us that all developers are focusing on having, at most, one or two threads of execution for the game engine itself - not the four or six threads that the Xbox 360 was designed for.

Even when games become more aggressive with their multithreading, targeting 2 - 4 threads, most of the work will still be done in a single thread. It won't be until the next step in multithreaded architectures where that single thread gets broken down even further, and by that time we'll be talking about Xbox 720 and PlayStation 4. In the end, the more multithreaded nature of these new console CPUs doesn't help paint much of a brighter performance picture - multithreaded or not, game developers are not pleased with the performance of these CPUs. "

Personally I think it's BS that Dev will sit there for 5 years wasting all the potential of these processors, I'm sure they'll move much quicker than Anand predicts.

seismologist · Jul 1, 2005

Isn't triangle sorting one of the primary functions of the GPU?
I'd imagine it's easy to leverage that capability for use in a GP sorting algorithm. The GPU is probably good at other GP routines as well,

What I'd really like to see are the application notes explaining how to port GP code to the Cell architecture.

Frank · Jul 1, 2005

If you look at the Windows or Linux Application Programming Interfaces, you see that most of the stuff that handles unspecified blocks of data (and that is most of it), actually uses streams.

For example, if you ask Windows a list of something, be it a directory listing, the groups a user is a member of or the network packets received, it uses a derivate of the IUnknown (COM) interface, that makes no assumptions about the type of data passed, but just hands you a random amount of data (in the form of structures that can all have their own size), and a description.

As you need your own functions / object for handling such a stream, it is pretty easy to offload the processing. And because just about anyone writes or uses library functions to do that, which deliver a nicely formatted data structure, you would only have to change that once and it will still function as intended afterwards.

Especially when you use RPC calls to a server, it makes a lot of sense to use a callback function for that. A callback function is also often available, especially with Windows, and it will handle the task and call (the event handler of) your program when the next block of data is available. Offloading such things entirely, especially for large data structures would allow the main program thread to do a lot more useful work in the mean time.

So, if the most important of those libraries are changed, it would speed up a lot of programs immensely by just recompiling them with those new libraries. If they can do that with Linux, it will instantly speed things up a whole lot.

Titanio · Jul 1, 2005

scooby_dooby said:
_phil_ said:

He just that they were weak single-threaded processors

Click to expand...

my car is weak for sea travel ,too.

Click to expand...

That might make sense if the vast majority of game code wasn't single threaded.

To use your analogy, 95% of roads would be on water, and your car sucks at sea travel....shitty...

To make this analogy more apt, we should add that it is demanded that the cars get ever faster, but we can't get much faster on water anymore

Fox5 · Jul 1, 2005

Titanio said:
scooby_dooby said:

_phil_ said:

He just that they were weak single-threaded processors

Click to expand...

my car is weak for sea travel ,too.

Click to expand...

That might make sense if the vast majority of game code wasn't single threaded.

To use your analogy, 95% of roads would be on water, and your car sucks at sea travel....shitty...

Click to expand...

To make this analogy more apt, we should add that it is demanded that the cars get ever faster, but we can't get much faster on water anymore

So fill the oceans with liquid nitrogen then.

gurgi · Jul 1, 2005

scooby_dooby said:
Anand never said the cores were worthless

That article said the SPEs were a 'waste of die space'.

ralexand · Jul 1, 2005

I don't like the way people dismiss ease of development so flippedly. How powerful your platform is is meaningless when you can't tap into that power. If a platform has a easier environment to tap the performance even if that platform is less powerful on paper than the competing platform then you're going to get better games on the platform that's easier to develop on. Untapped power or power so elusive that it requires a prohibitive development cost is useless.

gurgi · Jul 1, 2005

ralexand said:
I don't like the way people dismiss ease of development so flippedly. How powerful your platform is is meaningless when you can't tap into that power. If a platform has a easier environment to tap the performance even if that platform is less powerful on paper than the competing platform then you're going to get better games on the platform that's easier to develop on. Untapped power or power so elusive that it requires a prohibitive development cost is useless.

Or only worth 2-3x the power of the last Xbox cpu?

And as a consumer, I don't care if somebody's job is hard. Somebody else will step up to the plate and put out games that utilize the system (see ps2), and developers like anand's source can find easier jobs.

Shifty Geezer · Jul 1, 2005

ralexand said:
I don't like the way people dismiss ease of development so flippedly. Untapped power or power so elusive that it requires a prohibitive development cost is useless.

True, but I believe in humanity's ability to learn and adapt. The PS2 is a hard system to develop for, but they've managed it. Next-gen is forcing new ways onto devs (preceeding the same developments in the PC space) and they WILL adapt because they're livlihood depends on it - unless the consoles die and only PC games get written in future!

Like PS2, it'll be a learning experience, with developers learning as they go, making better and better use of the hardware, and developing new techniques and understanding that will undoubtedly find their way into the PC space when its turn for multicore processing. And it's not like Devs are on their own. The console providers are working to develop tools and make the job easier in this new area.

Titanio · Jul 1, 2005

ralexand said:
I don't like the way people dismiss ease of development so flippedly. How powerful your platform is is meaningless when you can't tap into that power.

Your missing the point. This isn't about how easily something is tapped, but rather whether something can be tapped or not. The determining factor here is less the hardware and more the people working with it - some will make these systems sing, some won't - if some can, then that proves a system's technical capability. And as I said above, the difference in results between those two groups next gen will be even more startling than before.

The ease of development issue is of course important and relevant, but it's a slightly different point than this.

ralexand said:
If a platform has a easier environment to tap the performance even if that platform is less powerful on paper than the competing platform then you're going to get better games on the platform that's easier to develop on.

Better..in what way? Gameplay isn't a function of power. Technically better? Perhaps a wider range of games would take more full advantage of the former system's power, but the latter system is likely to have at least some games, even if a smaller subset, that are technically more impressive than any of them (those games that do take full advantage of the system).

Without naming names, I think the question is..do you aim your system so that everyone can get decent enough power out of it - albeit only relatively speaking, since it'll still require more work than the norm anyway - or do you assume a higher level of talent/competency and aim to reward that with higher performance? I don't think one approach is necessarily better than the other. I think with closed systems and consoles you can take more liberties, though - while not all console developers are equally well talented, they are generally a hardier and more "creative" bunch when it comes to finding solutions than the regular programmer, I think.

What can streaming FPU processors achieve - example here.

Shifty Geezer

uber-Troll!

Titanio

Shifty Geezer

uber-Troll!

Ragemare

Alpha_Spartan

Titanio

Shifty Geezer

uber-Troll!

scooby_dooby

_phil_

Shifty Geezer

uber-Troll!

scooby_dooby

seismologist

Frank

Certified not a majority

Titanio

Fox5

gurgi

ralexand

gurgi

Shifty Geezer

uber-Troll!

Titanio