If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#1 |
|
Member
Join Date: Jul 2003
Location: Beijing
Posts: 640
|
Wel, this might not be as interesting as the NV40/R420 discussion, but I want an answer.
I did some quick test on R300, here's the result: Code:
mov r1, c0 texld r0, t0, s0 mul r0, r0, c2 add r0, r0, r1 Code:
mov r1, c0 texld r0, t0, s0 add r0, r0, r1 mul r0, r0, c2 These results suggest R300's mini alu can only do add(I know it can also be register modifier). And I have more: Code:
mov r1, c0 texld r0, t0, s0 add r0, r0, r1 add r0, r0, c2 Code:
def c5, 2.0f, 4.0f, 8.0f, 1.0f mov r1, c0 texld r0, t0, s0 add r0, r0, r1 mul r0, r0, c5.r Now the question is: what makes the full and mini alu can't do add simutaneously? |
|
|
|
|
|
#2 |
|
Senior Member
|
It works this way:
1: Code:
texld r0,t0,s0 (TEX:Pass1) mad r0,r0,c2,c0 (ALU:Pass1) Code:
texld r0,t0,s0 (TEX:Pass1) add r0,r0,c0 (ALU:Pass1) mul r0,r0,c2 (ALU:Pass2) Code:
texld r0,t0,s0 (TEX:Pass1) add r0,r0,c0 (ALU:Pass1) add r0,r0,c2 (ALU:Pass2) Code:
texld r0,t0,s0 (TEX:Pass1) add r0,r0,c0 (ALU:Pass1) r0=r0*2 (Mini-ALU:Pass1)
__________________
GPU blog |
|
|
|
|
|
#3 |
|
Member
Join Date: Jul 2003
Location: Beijing
Posts: 640
|
Thanks Demirug, I think you're right, the mini alu can not do add at all.
|
|
|
|
|
|
#4 |
|
Off-season
Join Date: Feb 2002
Location: On the pursuit of happiness
Posts: 3,019
|
Actually, the last example as it is doesn't even prove the presence of a mini-ALU. MOVing c0 to r1 doesn't change the fact that c0 is a constant that can be premultiplied by 2, so you can replace the add and mul with a mad.
__________________
Binary prefixes for bits and bytes |
|
|
|
|
|
#5 |
|
Member
Join Date: Jul 2002
Location: Santa Clara, CA
Posts: 348
|
You guys are running under the false assumption that both the alus have 0 latency. They don't. Consequently, assuming a data dependancy on a series of adds, it will take 2 cycles of latency for 2 adds. However, we can issue to all units every cycle, so you'll need to interleave multiple operations to hide the latencies.
|
|
|
|
|
|
#6 |
|
Naughty Boy!
Join Date: Jan 2002
Posts: 3,266
|
Man, and I thought all the DevRel guys do at dev houses are play games...
__________________
Reverend Dev Anon : Best game ever? Hmm... you mean other than anything from us? (2005) |
|
|
|
|
|
#7 | |
|
Member
Join Date: Jul 2003
Location: Beijing
Posts: 640
|
Quote:
And another thing, it seems R300 only have one inerpolator for v0 and v1 in the pixel shader, is that true? |
|
|
|
|
|
|
#8 |
|
Senior Member
Join Date: Nov 2002
Location: Edmonton, Alberta, Canada
Posts: 1,765
|
Hmm. . . Their documentation says that ten vec4s can be interpolated (two being reserved for colours and being clamped to [0-1] at 12 bit precision).
__________________
"Extremism is so easy. You've got your position, and that's it. It doesn't take much thought. And when you go far enough to the right, you meet the same idiots coming around from the left." -- Clint Eastwood -Ostsol |
|
|
|
|
|
#9 | |
|
Tea maker
Join Date: Feb 2002
Location: In the Island of Sodor, where the steam trains lie
Posts: 4,382
|
Quote:
__________________
"Your work is both good and original. Unfortunately the part that is good is not original and the part that is original is not good." -(attributed to) Samuel Johnson "I invented the term Object-Oriented, and I can tell you I did not have C++ in mind." Alan Kay |
|
|
|
|
|
|
#10 | ||
|
Epsilon plus three
Join Date: Feb 2002
Location: Chania
Posts: 7,768
|
Quote:
|
||
|
|
|
|
|
#11 | |
|
Senior Member
Join Date: Feb 2002
Location: gjethus, Norway
Posts: 1,256
|
Quote:
|
|
|
|
|
|
|
#12 | |
|
Off-season
Join Date: Feb 2002
Location: On the pursuit of happiness
Posts: 3,019
|
Quote:
__________________
Binary prefixes for bits and bytes |
|
|
|
|
|
|
#13 | ||
|
Member
Join Date: Jul 2002
Location: Santa Clara, CA
Posts: 348
|
Quote:
|
||
|
|
|
|
|
#14 | ||
|
Member
Join Date: Jul 2002
Location: Santa Clara, CA
Posts: 348
|
Quote:
|
||
|
|
|
|
|
#15 | |
|
Member
Join Date: Jul 2002
Location: Santa Clara, CA
Posts: 348
|
Quote:
Not sure that we've put out the iterator rate, so I'll leave it as an exercize for the reader |
|
|
|
|
|
|
#16 |
|
Senior Member
Join Date: Aug 2002
Location: Miami, Fl
Posts: 1,036
|
Do the Vec units ever use the scalar units for input on an instruction? Would this be possible?
__________________
"Friendship is unnecessary, like philosophy, like art... It has no survival value; rather it is one of those things that give value to survival." -C.S. Lewis |
|
|
|
|
|
#17 |
|
Member
Join Date: Jul 2002
Location: Santa Clara, CA
Posts: 348
|
I'm not sure I understand. The scalar and vector (both full and mini) run in parallel. You can issue to both sets every cycle, and you can use the output of one into the other every cycle too, but, again, everything takes time to compute. There's always going to be serial data dependancy on operations -- R0=(A op B) followed by R1 = (R0 op C) followed by R2=(R1 op D) has a serial data dependancy. Assuming you can't do an operation such as (X op Y op Z op W) in 1 cycle (op being some sort of operation), then there's a latency you need to wait for, regardless of the number of parallel units.
|
|
|
|
|
|
#18 |
|
Regular
Join Date: Feb 2002
Location: California
Posts: 4,732
|
Maybe he means feeding the result of a special function (like rsq) into a vec? I think he's asking if the serialization is a crossbar.
|
|
|
|
|
|
#19 |
|
Member
Join Date: Jul 2002
Location: Santa Clara, CA
Posts: 348
|
In that case, yes
You can take any scalar output and send it to any of the component inputs of the vector on the next instruction. It's very component. Same thing you can take the vec output and send it to the scalar. Edit: Of course, that also means that any of these types of sequences will take 2 cycles. |
|
|
|
|
|
#20 |
|
Senior Member
Join Date: Aug 2002
Location: Miami, Fl
Posts: 1,036
|
That is exactly what I meant DemoCoder.
Sireric, you mean to say that a Vec/Scalar unit could offer its output as input to any of the other ALUs, mini or large?
__________________
"Friendship is unnecessary, like philosophy, like art... It has no survival value; rather it is one of those things that give value to survival." -C.S. Lewis |
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| PowerVR families - shader capabilities | Megadrive1988 | 3D Architectures & Chips | 29 | 05-Aug-2005 00:43 |
| ImgTech launches programmable shader graphics for mobiles | marco | Press Releases | 0 | 29-Jul-2005 09:40 |
| xbit labs reviews farcry 1.2 | hovz | 3D Hardware, Software & Output Devices | 261 | 26-Jul-2004 09:35 |
| How does the NV30 really store PS programs? | Arun | 3D Architectures & Chips | 19 | 20-Feb-2003 13:54 |
| Microsoft to own every GPU? | Cyborg | 3D Architectures & Chips | 26 | 14-Jul-2002 11:15 |