trinibwoy said:Really? Do you have a source for that info? I thought G70 was SIMD across all quads.
I'm pretty sure Jawed means that 'all' the quads together act as an MIMD 'cluster', but 'each' quad is actually still SIMD...
trinibwoy said:Really? Do you have a source for that info? I thought G70 was SIMD across all quads.
Jaws said:I'm pretty sure Jawed means that 'all' the quads together act as an MIMD 'cluster', but 'each' quad is actually still SIMD...
No, this is a critical, and mostly un-heralded change in G70.trinibwoy said:Really? Do you have a source for that info? I thought G70 was SIMD across all quads.
Makes me curious, what advantages do they have keeping it that large? Any advantage? h/w for handling smaller Design difficulty?Jawed said:No, this is a critical, and mostly un-heralded change in G70.
It even shows itself in synthetic tests (! damn, they do have their uses).
Basically the thread size in G70 is nominally 1024 fragments (as opposed to 4096, 1024-per quad, in NV40 where all the quads are in lockstep).
Yes, that does mean that G70 is significantly better at dynamic branching than NV40 - but 1024 fragments (32x32) is a long way from the granularity of R520: 4x4 or R580: 12x4 or Xenos: 8x8.
I feel somewhat foolish for saying that G70 is single-threaded in its fragment shaders
Jawed
1. I'm sure it saves transistors.zidane1strife said:Makes me curious, what advantages do they have keeping it that large?
Shifty Geezer said:Kutaragi's commented a fair bit on BC. He's said they can use Cell for a software emu but he wants hardware assistance to get things 'perfect' and accomodate developers doing things in a less than usual manner. He's also said there's no eDRAM in RSX as you'd need loads of eDRAM if you aren't tile rendering to fit 1080p.
From this it's pretty safe to say, assuming things haven't changed, there's a degree of hardware BC and no eDRAM, or full PS2 chipset otherwise why aste time with software emu? Even if the PS3 chipset is only 5 quid, that'd be like 500 million pounds over the life of PS3. If they can save the cash by using PS3's hardware it'd make sense to do so.
The 48 GB/s seems the limiting factor, but PS2 didn't support hardware compression. At an average 4x compression the GDDR BW is plenty enough. I don't know what the difference in latency would be though between this and PS2's eDRAM. My guess is for some GS emulation on RSX, which accounts in part for RSX's long development from just a G70, and software EE emulation on Cell.
tema said:Onimusha3 for the PC, http://onimusha3.typhoongames.com/demo/onim3_demo.zip
nothing was left out or downgraded when i did a side by side comparison. Trillinear filtered textures locked 60FPS perfect_plus_ port.
Video captured at 15FPS with Fraps.
*pic*
My PC card is a 128MB AGP8x 6600GT, 14GB/s.
tema said:I heard Sony have a high level emulation wrapper. Onimusha 3 doesnt look like a conversion port, can somebody look at the files?
It's basically a tweak to an architecture that is fundamentally single-threaded. G70's architecture is, in effect, 6-threaded.zidane1strife said:Makes me curious, what advantages do they have keeping it that large? Any advantage? h/w for handling smaller Design difficulty?
Jawed said:It's basically a tweak to an architecture that is fundamentally single-threaded. G70's architecture is, in effect, 6-threaded.
...
That doesn't really help with pixel processing, though, because the pixel and vertex processors are completely independent.Jaws said:In addition to 6 threads across 6 SIMD fragment quad units, there are further 8 MIMD vertex units, running another 8 threads capable of per vertex dynamic branching. The G70 architecture would then have 14 threads of execution...
Chalnoth said:That doesn't really help with pixel processing, though, because the pixel and vertex processors are completely independent.
For instance, one could write a shader that looped through a certain number of vertex lights, determine which ones might influence a particular vertex, and then pass down the index of each relevant light to the pixel shader. The pixel shader could then use this ‘light index’ to determine which light parameters to apply. The pixel shader would then loop over the active lights, then use dynamic branching to exit the shader early once all lights are processed.
Most light types only apply to the front side of an object—the side facing the light. Therefore, you can use both vertex and pixel branching to skip processing for lights that the shader detects as facing away from the light. This can save significant processing time, and speed up the shader. Similar speedups can be used to skip processing of character bone animation as well as many similar algorithms.
Why do you class SPU's as SIMD? Each is capable of processing data independantly of each other and with multiple instructions per data element.Jaws said:This seems to be mirrored with CELL, with 1 MIMD PPE and 7 SIMD SPUs, tuned differently for dynamic branching performace.
Shifty Geezer said:Why do you class SPU's as SIMD? Each is capable of processing data independantly of each other and with multiple instructions per data element.