shaders units RISC or CISC?

K.I.L.E.R

Retarded moron
Veteran
I would say RISC, because of the limited instructions available that can be executed incredibly fast. Problem is though, it doesn't explain is if they are RISC then how do they do CISC type operations?

OR

Are the shader units a hypbrid? Best of both worlds type of thing.
 
Possibly VLIW ? There was some talk about doing scalar and vector ops in parallel on some architectures and a VLIW approach might be one way of achieving that.
 
DX9 shaders do not have fixed length instruction word (some instructions occupy two or more slots). However, internal implementation can be quite different.

On the other hand, DX9 shaders have register-register model and simple addressing modes (two notable RISC features). So it is quite "RISCy."
 
Simon F said:
Possibly VLIW ? There was some talk about doing scalar and vector ops in parallel on some architectures and a VLIW approach might be one way of achieving that.

Could be so with ATi's R300. Maybe Chromatic Research's MPact team was an real help here. The MPact - chips were general use VLIW-designs able to work as an modem(?) sound-card; MPEG/2-decoder, 2D-card and 3D-card. So the VLIW-design was quite advanced but really cumbersome to program, and so they had nearly no customer for their chips (MPact 1 and 2). In the end ATi bought them and integrated them into their deign-teams in 1998(??).
 
Another way is to have an all scalar units design, and a 4-way VLIW engine. But it can be inefficient if most operations in shaders are vectors.

A vector-scalar paired design is the basis of DX8 pixel shader, so it could be extended to DX9 shader implementation. It would be interesting to see whether a driver can "merge" a vector instruction and a scalar instruction into a "bundle." Perhaps I should go back to work on my shader benchmarker? :)
 
post-1-1043238922.jpg


XBOX GPU's Vertex Shader
 
I'd say a VLIW approach would be the most likely. You can then do away with decode and scheduling of instructions and just do fetch->execute. Of course You'll end up with alot of empty slots in the instruction word, but shader programs are short and they are shared across all pipelines.

Cheers
Gubbi
 
Hmm... whether VLIW or not, I think it's the driver doing scheduling, or no scheduling at all if all pipelines are of the same length.
 
Having the hardware dependent on software (like a shader compiler) to do instruction scheduling is IMO one of the defining characteristics of VLIW. If the hardware does the scheduling itself, then I'd say it's more RISC/CISC-like. Except that for present vertex/pixel shaders, the instruction set is a bit small and orthogonal to be called CISC and has a bit too complicated instructions to be really RISC. IMO, RISC and CISC are a bit of old CPU categories that it doesn't make too much sense to shoehorn present-day GPU shaders into.
 
Each component in the XBOX vertex shader can be 'microprogrammed' independently? :oops:

That could be a pain in the ass to emulate using DX ;)

About CISC/RISC/VLIW. It is always 'hard' to say what is CISC and what is RISC (however everyone agrees that VAX and x86 are CISC and Alpha and MIPS are RISC, but between them there is a world). RISC would be more related with control simplicity: for example a load/store architecture and register to register only operations. And CISC would be related with 'complexity': for example microcoded instructions and a bunch of addressing modes.

VLIW isn't directly related with CISC/RISC, it is related with functional units and paralelism. The opposite to a VLIW is an out of order processor, not a RISC/CISC. In a VLIW all the scheduling of the instructions (to the functional units) is done by the compiler and in an out of order processor it is done by the hardware (instruction window). Out of order has a much more complex control (and a lot of larger in transistor numbers). VLIW has the problem that there are a lot of dependences that are hard to detect (or maybe more correctly) avoid in static (compile) time.

Of course there are VLIW architectures, IA64, that are also very complex, but that is a problem with the Intel IA64 architecture group that I think missed absolutely the point with that architecture ...

I don't think there will be out of order shader in GPUs for a while, if there even will be at all ...
 
arjan de lumens said:
Having the hardware dependent on software (like a shader compiler) to do instruction scheduling is IMO one of the defining characteristics of VLIW. If the hardware does the scheduling itself, then I'd say it's more RISC/CISC-like. Except that for present vertex/pixel shaders, the instruction set is a bit small and orthogonal to be called CISC and has a bit too complicated instructions to be really RISC. IMO, RISC and CISC are a bit of old CPU categories that it doesn't make too much sense to shoehorn present-day GPU shaders into.

Well, VLIW /= software doing instruction scheduling. If you have only one functional unit, there won't be VLIW at all since there is only one instruction at one time.

I agree with you on the RISC/CISC remark. The internal implementation of a GPU can be very different from these.
 
Vector+scalar would be more aptly named LIW. Im guessing Trident used VLIW though, ie. it compiles the instruction streams for the seperate pixels normally rendered by seperate pixel shaders into a single VLIW instruction stream, it would explain their PR about low transistor counts.
 
VLIW = "Very Long Instruction Word". For the case where you have only 1 or 2 functional units to encode instructions for in each instruction word, you may argue that the instruction word is hardly "Very Long", even in the case where the basic principles of VLIW otherwise apply.
 
Greater utilization of function units than seperate pipelines, probably less complex to implement as far as the hardware is concerned than SMT.
 
MfA, the drawback AFAICS is the requirement for a large, highly ported register file. On the other hand, implementing ddx/ddy becomes trivial.

Seems like this approach pretty much falls down once you need to do per pixel dynamic branching.. staying with the VLIW theme though:

how about making the execution units needed for gamma correction, a floating point bilinear filter per clock, and determining pixel shader input parameters part of the general purpose processor? Basically throw out most of the specialized logic, and replace it with enough general purpose fp32 hardware such that each "pipe" would have maximum throughput equivalent to 1 vector op, 1 scalar op, 1 load/store (local storage), 1 bilinear filtered floating point texel per clock traditional pipeline.

The VLIW program would come from interleaving the instructions required for parameter setup, texture filtering, and shading. You could still look at multiple VLIW instruction streams to further improve efficiency...
 
RoOoBo said:
Each component in the XBOX vertex shader can be 'microprogrammed' independently? :oops:

That could be a pain in the ass to emulate using DX ;)

About CISC/RISC/VLIW. It is always 'hard' to say what is CISC and what is RISC (however everyone agrees that VAX and x86 are CISC and Alpha and MIPS are RISC, but between them there is a world). RISC would be more related with control simplicity: for example a load/store architecture and register to register only operations. And CISC would be related with 'complexity': for example microcoded instructions and a bunch of addressing modes.

VLIW isn't directly related with CISC/RISC, it is related with functional units and paralelism. The opposite to a VLIW is an out of order processor, not a RISC/CISC. In a VLIW all the scheduling of the instructions (to the functional units) is done by the compiler and in an out of order processor it is done by the hardware (instruction window). Out of order has a much more complex control (and a lot of larger in transistor numbers). VLIW has the problem that there are a lot of dependences that are hard to detect (or maybe more correctly) avoid in static (compile) time.

Of course there are VLIW architectures, IA64, that are also very complex, but that is a problem with the Intel IA64 architecture group that I think missed absolutely the point with that architecture ...

I don't think there will be out of order shader in GPUs for a while, if there even will be at all ...

I dont think theres much difference between RISC and CISC hardware anymore, only in the languages. both the Athlon and the PIV are very efficent at breaking down the CISC commands into equal sized RISC like instructions.
 
Back
Top