Where is Revenge ? (3dfx related)

One question about the 2 Rampage/2 SAGE setup would be: which SAGEs are connected to which Rampages? Unless the Rampage was designed specifically to accept input from 2 SAGE chips, the only way 2 SAGEs would give a performance increase over just 1 would be in an AFR setup.
 
Colourless said:
Assumption would be "execute 2 long shaders in parallel" i.e. each chip processes different vertices. Of course, I don't know any specifics about it.

But would that not limit the shader length to one supported by a single SAGE? In other words, if a single SAGE supported the maximum shader length of 96 instructions, would the dual configuration not be limited to the same number maximum number of instructions? In other words, while I definitely see how adding a second SAGE would increase VS performance, I am somewhat at the loss at how it would specifically benefit the execution of long shaders.
 
Most GPUs seem to be designed so that both T&L and setup run at full speed with 1 vertex/tri transformed by a 4 instruction shaders. Longer shaders and you'll be T&L limited. Even with a realy good mesh with 0.5 vertex/tri, you'd still be T&L limited with a 8 instruction shader.

That's a rather short shader. 4 instructions is enough for one matrix*vector operation. Do some matrix palette skinning, morphing and/or per vertex lightning, and you'll do far more instructions than that. And then there's plenty of time in the setup engine.

Same goes with the sage/rampage bus. If you're running "advanced" shaders (that don't realy need to be all that advanced), then you're not pushing so many vertices over that bus, so there are time left for two sages to push data.

There *could* be one shared bus. When a sage finishes a vertex, it sends it to both rampages simultaneously on one bus. When the other sage finishes its vertex, it does the same thing, over the same bus. The sages would need some extra connection to keep in sync so they know when they can send data.

If you run 4-instruction shaders on such a setup. There will of course be stalls that drag down the performance to single rampage levels. But in such a situation you wouldn't likely be limited by T&L anyway.


Geeforcer:
It wouldn't make it possible to run longer shaders. But "long" shaders would get higher throughput.
 
How would such a setup handle clipping? You can't clip on a per vertex basis; you need triangles, so I think you'd need to move clipping onto the rampage chips, which would require a redesign of both chips. You'd also have problems with caching transformed vertices.

Of course, you send alternate triangles to alternate Sage chips, instead of alternate vertices. But then you'd have problems with tri strips etc sharing vertices between triangles...
 
Are you sure that clipping wasn't already done in rampage. That seems like the most natural place to put it. Closer to setup than T&L.

Vertices could be cached at the sage that transformed it. The master sage can have a list with the vertex indices that the slave sage has in it's cache, so it knows if it can tell the slave to send it.
Notice you only need to duplicate a small list of vertex indices, not the whole cache.
 
Basic: Excellent speculation :) And thanks for explaining how two SAGEs could increase performance!

People, that should teach you not to make assumptions about what would or wouldn't be beneficial! Don't just assume it'd take 1,000 + instructions to slow a Shader down. :p
 
People, that should teach you not to make assumptions about what would or wouldn't be beneficial! Don't just assume it'd take 1,000 + instructions to slow a Shader down.

I expected a reasonable explanation from you two and not Basic in fact heh.

When IHV's design architectures, they are fully aware of the advantages/disadvantages or each approach. I'm still expecting to see some sound proof, notion or even indication that there was to be a high end Spectre for the PC graphics market, that would have exceeded the 2x1 rampage/sage config@500$.

The keypoint there, is wether the marked hypothetical benefit is in straight analogy of the final cost - in extension street price - with the addition of an extra geometry processor and all the necessary architectural modifications to host it. More simple: you add another geometry unit and add another 100$ f.e. to final price. Would the added cost have had any analog benefit to the added cost for the projected lifespan of the card and the games that were or where to be released on shelves soon? The first ever true entirely T&L optimized game is UT2003. Up to that the vast majority was CPU bound or fillrate limited. For that 1600MPps/MTps was more than enough.

I saw Colourless on page3 stating soundly, that there were never plans for higher than 2+1 configs. SageII of course was a different chapter of it's own; nonetheless it was to be twice as powerful as the initial Sage anyway.
 
Well, I dunno, maybe I misunderstood - perhaps the 2x2 boards were to be samples only, with 2x1 being the max production?

Ah, well, all I know is, 2x2's do exist... :-?
 
Back
Top