If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#1 |
|
Gamerscore Wh...
Join Date: Jan 2002
Posts: 12,947
|
Does anyone know the cost of doing an FP16 Bilinear Filter on NV40? 2 Cycles? 3 Cycles? Something else?
|
|
|
|
|
|
#2 |
|
Member
Join Date: Jun 2002
Location: France
Posts: 233
|
It seems that using FP16 textures cost 2 cycles on the NV40 wether you're using point sampling or bilinear filtering.
__________________
|
|
|
|
|
|
#3 |
|
Regular
Join Date: Apr 2003
Location: Louvain-la-Neuve, Belgium
Posts: 523
|
FP16 bilinear filtering is free on NV40. However its texturing unit can't output more than 2 FP16 components per cycle (that's the same with every GPU).
FP16 point sampling x or xy : 1 cycle FP16 point sampling xyz or xyzw : 2 cycles FP16 bilinear filtering x or xy : 1 cycle FP16 bilinear filtering xyz or xyzw : 2 cycles
__________________
Damien Triolet - HardWare.fr Sorry for my bad English. Maybe one day it'll be better :D |
|
|
|
|
|
#4 |
|
Member
Join Date: Jul 2003
Location: Houston
Posts: 652
|
Would it make sense that the reason for that is because that data bus between the texture filtering unit and the shading units is only 32 bits wide? (designed for the usual case of RGBA_8 textures)
__________________
"The struggle of man against power is the struggle of memory against forgetting." -Milan Kundera |
|
|
|
|
|
#5 | |
|
Senior Member
Join Date: Feb 2002
Location: gjethus, Norway
Posts: 1,256
|
Quote:
|
|
|
|
|
|
|
#6 |
|
Senior Member
Join Date: Aug 2002
Location: Miami, Fl
Posts: 1,036
|
Current GPU architecture's can read 4 FP32 bit values in a cycle, but can something like NV4x bilinearly filter those 4 fp32 values in a single cycle, bandwith limitations aside?
__________________
"Friendship is unnecessary, like philosophy, like art... It has no survival value; rather it is one of those things that give value to survival." -C.S. Lewis |
|
|
|
|
|
#7 | |
|
Senior Member
Join Date: Feb 2002
Location: gjethus, Norway
Posts: 1,256
|
Quote:
|
|
|
|
|
|
|
#8 | |
|
Off-season
Join Date: Feb 2002
Location: On the pursuit of happiness
Posts: 3,019
|
Quote:
__________________
Binary prefixes for bits and bytes |
|
|
|
|
|
|
#9 |
|
13 short of a dozen
|
why would they cripple fp reads in this way? surely it wouldnt be very difficult to doubble or even quadrupple that since we're talking about an on-chip bus. do they just not expect anyone to ever actually use fp textures on current-generation hardware?
__________________
This post powered by Macintosh.
|
|
|
|
|
|
#10 | |||
|
Senior Member
Join Date: Aug 2002
Location: Miami, Fl
Posts: 1,036
|
Quote:
Quote:
Secondly, I meant 4 FP16 values, since, as Xmas pointed out, NV40 cannot filter FP32 textures.
__________________
"Friendship is unnecessary, like philosophy, like art... It has no survival value; rather it is one of those things that give value to survival." -C.S. Lewis |
|||
|
|
|
|
|
#11 | |
|
Member
Join Date: Jul 2003
Location: Houston
Posts: 652
|
Quote:
__________________
"The struggle of man against power is the struggle of memory against forgetting." -Milan Kundera |
|
|
|
|
|
|
#12 |
|
Off-season
Join Date: Feb 2002
Location: On the pursuit of happiness
Posts: 3,019
|
As I tried to point out, it's not only the data path, it's the number of units as well. There would be no point to restrict one but not the other.
So there are only two FP16 bilinear interpolators, while there are four 8bit capable interpolators. They could be implemented as 2* FP16 + 2* FX8, or maybe you can somehow combine two FX8 interpolators to form one FP16 interpolator (though I don't see an easy way to do that)
__________________
Binary prefixes for bits and bytes |
|
|
|
|
|
#13 | |
|
Join Date: May 2002
Location: New York, NY
Posts: 12,678
|
Quote:
__________________
April 20, 1979 - America must never forget. |
|
|
|
|
|
|
#14 | ||
|
Off-season
Join Date: Feb 2002
Location: On the pursuit of happiness
Posts: 3,019
|
Quote:
And there's one additional MAD for trilinear/AF sample accumulation.
__________________
Binary prefixes for bits and bytes |
||
|
|
|
|
|
#15 |
|
Join Date: May 2002
Location: New York, NY
Posts: 12,678
|
Actually, it was that one additional interpolator for sample accumulation that I was concerned with. I was assuming each interpolator would pretty much automatically be operating on 4-component objects (though it seems that in nVidia's case the FP16 interpolators are a bit more flexible and capable of dual-issue).
__________________
April 20, 1979 - America must never forget. |
|
|
|
|
|
#16 | ||
|
Member
Join Date: Sep 2002
Posts: 559
|
Quote:
Quote:
-FUDie
__________________
Ph.D. - Piled Higher and Deeper |
||
|
|
|
|
|
#17 | |
|
Join Date: May 2002
Location: New York, NY
Posts: 12,678
|
Quote:
__________________
April 20, 1979 - America must never forget. |
|
|
|
|
|
|
#18 | |
|
Crazy coder
|
Quote:
|
|
|
|
|
|
|
#19 | |
|
Join Date: May 2002
Location: New York, NY
Posts: 12,678
|
Quote:
__________________
April 20, 1979 - America must never forget. |
|
|
|
|
|
|
#20 |
|
13 short of a dozen
|
okay I guess I didn't read carefully enough, I was under the impression that the bus was the limiting factor.
__________________
This post powered by Macintosh.
|
|
|
|
|
|
#21 | |
|
Join Date: May 2002
Location: New York, NY
Posts: 12,678
|
Quote:
__________________
April 20, 1979 - America must never forget. |
|
|
|
|
|
|
#22 |
|
Member
Join Date: Jul 2003
Location: Houston
Posts: 652
|
It that were so then why does point sampling a FP32 texture have a 4 cycle latency? Point sampling shouldn't require any interpolation at all.
__________________
"The struggle of man against power is the struggle of memory against forgetting." -Milan Kundera |
|
|
|
|
|
#23 | |
|
Join Date: May 2002
Location: New York, NY
Posts: 12,678
|
Quote:
__________________
April 20, 1979 - America must never forget. |
|
|
|
|
|
|
#24 |
|
Member
Join Date: Feb 2002
Location: LA, California
Posts: 825
|
Chalnoth - a complete guess, but maybe it's because the texture caching logic fetches/determines cache hits in 2x2 texel blocks. Or, if the latency is always 4 cycles, then probably the bus between the texture unit/shader is only 32bits, so you would need 4 cycles to transfer one fp32 4-vector.
|
|
|
|
|
|
#25 | |
|
Senior Member
Join Date: Mar 2002
Posts: 3,779
|
Quote:
When doing HDR post processing, the bandwidth needed will slow you down anyway, since you have a 1:1 mapping between pixels and texels. Say you were doing a blur of 4 pixels, you need to read 256 bits of data, then write 64 when you're done. I think many of today's GPU's have only 32 bits of bandwidth per pipe per clock, and rarely will you get >90% utilisation. It makes sense to me. Why make the GPU capable of more than the memory will be able to feed it? Only when you get into ordinary usage of FP textures (i.e. not 1:1) will bandwidth be less an issue, and I think the sky in FarCry's HDR mode is the only example so far. |
|
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| 3dfx Rampage ;) | Ante P | 3D Architectures & Chips | 219 | 26-Feb-2012 19:48 |
| Chat Transcript: ATI's texture filtering algorithms | cho | 3D Architectures & Chips | 89 | 23-May-2004 05:06 |
| engineer forgot bilinear filtering unit in PowerVR PCX1 ? | ram | 3D & Semiconductor Industry | 24 | 29-Jan-2004 19:46 |
| Geforce FX Bilinear Anisotropic Filtering Question ?? | Doomtrooper | 3D Architectures & Chips | 152 | 16-Feb-2003 04:26 |
| N64 Bilinear filtering hack | Roly | Console Technology | 2 | 10-Dec-2002 06:39 |