Well no - forcing this type of data rearranging effectively eliminates any real chance of writting readable code, especially if it's in C++.nAo said:Faf..I wouldn't be so worried cause you would work that way (with at least 4 elements at time) anyway if you like efficiency
Well, for C/C++ code, it's not like the compiler could rotate vector memory patterns for me, so what code I want to write in straight C will be all manual labour to optimize anyhow. Though I at least expect/hope for decent loop optimizers.Let's hope they gives us GOOD SPEs compiler, that's the real concern to me.
Well yeah, but that was when I was 16.Guden Oden said:What, you thought being a games programmer was supposed to be easy, fun, making lots of money and driving around in a yellow topless ferrari?
I share some of your fears, but if they give us a good compiler and I'll just start to use matrices of 4 vectors instead of a single vectors in my loops.Fafalada said:Well no - forcing this type of data rearranging effectively eliminates any real chance of writting readable code, especially if it's in C++.
And I don't know how much you've dealt with writting non-graphics code for vector units, but this kind of limitations will make it insanely difficult to write efficient physics code. Heck you even run into issues with VUs with a far more capable ISA.
Umh..imho it's just a matter to change your way to look how data are stored.Well, for C/C++ code, it's not like the compiler could rotate vector memory patterns for me, so what code I want to write in straight C will be all manual labour to optimize anyhow. Though I at least expect/hope for decent loop optimizers.
We don't know SPEs ISA so we can't know if they are supporting this kind of rearrangements trough special instructions, let's hope that though.I sinceredly doubt efficiency if you start mixing and matching many single vector operations with matrix transforms and expect it to be handled transparently without doing the above mentioned manual rearrangements.
AFAIK they are going to lecture on the usage of OpenMP, so it looks like just another SMP configuration where no such special briefing is required except for the obvious focus on multithread programming.rabidrabbit said:Did Microsoft give similar info of the insides of their xbox next? Or are they relying XNA solves everything?
That was a game developer's conference after all, right? Not just some pr session for Live end user features
Actually that's what I was just wondering about - why bother even giving us vector instructions, just stick a 4x4Matrix multiplier in there and be done with it, it'll be about as usable anyhow. :?nAo said:I share some of your fears, but if they give us a good compiler and I'll just start to use matrices of 4 vectors instead of a single vectors in my loops.
Question is, is the performance worth it if ISA might limit you to something far lower in everything but trivial code? If all they wanted was something that can transform vertices fast I am sure they could have gotten comparable performance cheaper.You're right about potentially fucking up code readibility but if you want huge flops figures (we're talking about being 2/3x powerful than their direct competing company hw..) you have to sacrifice things. Nothing is free..
Well, it still forces using you to use bizarro classes such as vector4x4. Basically it boils down to obfuscating your algorithms into quadruples or whatever the larger atomic unit you will define.Umh..imho it's just a matter to change your way to look how data are stored.
I actually wanted to ask, what prompted you to do this eventually? I know writting all code unrolledx4 can help with VCL optimizer among other things, but I doubt that would be the only reason. And I don't suppose you store anything but the normals like that?I'm already working that way on the PS2.
one said:This is reminiscent of BlueGene/L, as its single compute node has a mini-kernel (CNK) running on it.
Jaws said:...
They get around the uniform ISA problem in a heterogeneous multi-processor environment by compiling to a 'virtual' processor with no overheads in translation due to a very efficient nano-kernel running on each processor. I know the CELL press releases mentions that the CELL processors can run multiple operating systems. If this means multiple nano-kernels or equivalent on each core, then they could be borrowing many ideas from TAOS for CELL.
...
Fafalada said:Well there was nothing really conclusive said in that discussion thoughnAo said:IIRC we already discussed that. No broadcast my friend..AFAIK
Anyway what do they expect me to do with no broadcast and no dotproduct, replicating scalar value into entire vector for every dotproduct during matrix and vector transforms?
Don't even get me started on compiler trying to work in such ways, it took GCC like a decade just to support madds.
That's true..but we still don't know if there is any other kind of support for this stuff in the ISA. a 4x4 matrix multiplier would have been even a bigger a waste with scalar opsFafalada said:Actually that's what I was just wondering about - why bother even giving us vector instructions, just stick a 4x4Matrix multiplier in there and be done with it, it'll be about as usable anyhow. :?
I don't know where the sweet spots is.Question is, is the performance worth it if ISA might limit you to something far lower in everything but trivial code?
100% agreed. But it's clear SPEs will do much more than transforming vertices. Maybe the GPU is going to do most of the vertex shading stuff, who knows :?If all they wanted was something that can transform vertices fast I am sure they could have gotten comparable performance cheaper.
yeah..it like an obfuscating C++ code compo (I don't know if I should cry or if I should laugh..)Basically it boils down to obfuscating your algorithms into quadruples or whatever the larger atomic unit you will define.
The idea prompted when I wrote for the first time some VU code for a classic point light. I realized all the renormalization stuff would have taken tons of clock cycles, then I re-discovered (it happened when I workd about doing per triangle mip mapping and I thought about a fast log2 implementantion on the VUs) the ftoi/itof trick (doing calculation in logartmic space..).I actually wanted to ask, what prompted you to do this eventually? I know writting all code unrolledx4 can help with VCL optimizer among other things, but I doubt that would be the only reason. And I don't suppose you store anything but the normals like that?
It's up to yourself decideMegadrive1988 said:Is Cell looking good or bad at this moment in time ?
nAo said:It's up to yourself decideMegadrive1988 said:Is Cell looking good or bad at this moment in time ?
london-boy said:Well i think it's easy for the gamer to say "YEAH I LOVE IT!", but they're not the ones having to code the bloody thing...
Oh well...
Jaws said:They're just moaning without knowing all the facts. I beleive Faf has 294.912 moan threads con-currently running in his head.
We don't know the CELL ISA yet...and coding to a consistent CELL VM ISA would make life easier...