Luminescent said:I do not believe the pipelines can function independently as you mention.How ridiculous is the idea that each pixel pipeline has fp16 capabilities, and one pipeline's fp16 processing capabilities stalls when the other is doing a fp32 op?
If a pipeline is working on cetain texture section (2*2) with fp32 precision and the texture contains multiple pixel blocks, requiring the same level of precision, how could pipelines vary precision level independently within the clock cycle?
Hmm? Well, a mathematical operation has data, processes it, then outputs data. What I was suggesting was that two units capable of fp16 precision operation be used when operating on fp32 data (perhaps even only some operations, not all). I am suggesting this as opposed to having one unit capable of an operation on fp16 data in one cycle taking two cycles to perform an operation on fp32 data.
I was under the impression data was stored in a commonly accessible area (instruction slot storage space), and this struck me as one thing this might allow.
Given the lack of flow control in the fragment shaders, it seems more likely that the work-load is partitioned equally at the given precision (either fp16 or fp32).
I guess what I'm missing is an understanding of how each pipeline would always be performing all the same operations in each clock cycle. I thought there might be opportunities for idle operational units to be used for gain this way. Especially if units were operating on the same data (how unlikely is that?), or in cases of data dependency (how common could this be?) this seemed a simple opportunity for optimization.
The comments so far, to my understanding, seem to preclude the possibility of anything other than 1 fp32 op per cycle per pipeline, since this idea is just another method of implementing the alternative (AFAICS).