Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

View Poll Results: Pixel Fill rate Or Texel Fill Rate..
A GPU with 4 pixel pipes and 4 TMUS per pipe 9 20.45%
A GPU of Configuration X (tell use below) 6 13.64%
A GPU with 8 pixel pipes and 1 TMU per pipe 29 65.91%
Voters: 44. You may not vote on this poll

Reply
Old 05-Sep-2002, 05:12   #1
Hellbinder
Naughty Boy!
 
Join Date: Feb 2002
Posts: 1,444
Default Pixel fill rate V.s Texel fill rate...

Given todays games,,, and perhaps games comming out over the next year... (but no further)....

All Theoretical Cards have the same core clock, and comparable memory bandwidth (using whatever means), and similar shader performance pixel/vertex.... Would you rather have....
Hellbinder is offline   Reply With Quote
Old 05-Sep-2002, 06:08   #2
Chalnoth
 
Join Date: May 2002
Location: New York, NY
Posts: 12,681
Default

There is only one thing that you didn't mention: Independent of textures, how much math can one pixel pipeline perform?

If the 8x1 pipelines can actually perform all the same math as each of the 4x4 pipelines (as appears to be the case with the R300...), then the 8x1 would definitely be better, for use with anisotropic filtering.
Chalnoth is offline   Reply With Quote
Old 05-Sep-2002, 06:21   #3
multigl2
Junior Member
 
Join Date: May 2002
Posts: 64
Default

call me crazy, but i wouldn't mind seeing a good implementation of 16x0. If you had good loopback capabilities, 16x0 could do a lot of damage to current games. of course the tradeoff would most definitely be slower or harder-to-implement-at-decent-speeds trilinear/anisotropic filtering.
multigl2 is offline   Reply With Quote
Old 05-Sep-2002, 06:27   #4
Saem
Senior Member
 
Join Date: Feb 2002
Posts: 1,532
Send a message via ICQ to Saem Send a message via AIM to Saem Send a message via MSN to Saem
Default

Multigl2,

Not only would you need 16X0, you'd likely need multiple triangle setup engines and the setup in a non-fixed rendering pattern (say 4*4). Otherwise the diminishing returns would really hose your performance.

Personally, I like the super generalized P10 architecture. General execution units, geared for the problems you'll usually encounter.
__________________
Regards.
Saem is offline   Reply With Quote
Old 05-Sep-2002, 07:05   #5
multigl2
Junior Member
 
Join Date: May 2002
Posts: 64
Default

most definitely saem... but from my just toying with shaders perspective:

it would be really nice to see what a 16x0 math power house could do... i mean if it should techinically (setup and bandwidth permitting) as fast as 8x1 in multi texturing duties, but it could do some serious shaders if the pipelines were setup nicely. Like for instance, if the setup permitted, you could treat it as a 4x2 card with 2 free pipes to help process shaders again, setup permitting.
multigl2 is offline   Reply With Quote
Old 05-Sep-2002, 10:56   #6
Nagorak
Member
 
Join Date: Jun 2002
Posts: 854
Default

Sorry for being ignorant but how exactly does a 16*0 setup work? I mean wouldn't that end up being a bunch of untextured polies (obviously not, so please explain ).
Nagorak is offline   Reply With Quote
Old 05-Sep-2002, 11:32   #7
Reverend
Naughty Boy!
 
Join Date: Jan 2002
Posts: 3,266
Default

The poll is too simplistic IMO but if I have lots of shaders in my game, I'd probably prefer a card with more pipes. However, given the differences in architectures (which will probably always exist), the bottomline is the performance - it won't matter to me if it is 8x1 or 4x4 or whatever since this is transparent to a developer.
__________________
Reverend
Dev Anon : Best game ever? Hmm... you mean other than anything from us? (2005)
Reverend is offline   Reply With Quote
Old 05-Sep-2002, 11:57   #8
Saem
Senior Member
 
Join Date: Feb 2002
Posts: 1,532
Send a message via ICQ to Saem Send a message via AIM to Saem Send a message via MSN to Saem
Default

Quote:
Sorry for being ignorant but how exactly does a 16*0 setup work? I mean wouldn't that end up being a bunch of untextured polies (obviously not, so please explain ).
Think PS2. It can do something in the order of 2400 mp/s all untextured. You cut that number in half for adding in a texture layer. The point of this setup is for doing things that don't involve texturing -I'm guessing stencil buffers would be one- this ends up being more efficient since you're not using the TMU anyways. People can argue that the returns provided by a TMU are huge. One could allow for significant loop back and this would be a less of a problem, a simplificantion of the circuit could also lead to higher clocks. Though, I'm guessing TMUs aren't your big inhibiters.
__________________
Regards.
Saem is offline   Reply With Quote
Old 05-Sep-2002, 14:21   #9
alexsok
Member
 
Join Date: Jul 2002
Location: Toronto, Canada
Posts: 803
Send a message via ICQ to alexsok Send a message via MSN to alexsok
Default

8 pixel pipes and two TMUS per pipe

I do have to agree with Saem though, P10's architecture is really flexible in many ways and is targeted towards generalization of everything.

I'm very intrested in the flexibility of NV30's architecture, since it's been suggested that it might be even more flexble than P10's! (not in all areas obviously, but in most of them).
alexsok is offline   Reply With Quote
Old 05-Sep-2002, 18:21   #10
Tahir2
Itchy
 
Join Date: Feb 2002
Location: United Queendom
Posts: 2,873
Default

8*2 would then require a very high memory bandwidth to take advantage of. Something a lot higher than the 19gb/sec with the Radeon 9700 Pro.
__________________
"Unless I am very mistakenů and yes, I am very much mistaken." - The Legend M Walker
Tahir2 is offline   Reply With Quote
Old 05-Sep-2002, 18:50   #11
alexsok
Member
 
Join Date: Jul 2002
Location: Toronto, Canada
Posts: 803
Send a message via ICQ to alexsok Send a message via MSN to alexsok
Default

Quote:
Originally Posted by misae
8*2 would then require a very high memory bandwidth to take advantage of. Something a lot higher than the 19gb/sec with the Radeon 9700 Pro.
I know m8, I know...
alexsok is offline   Reply With Quote
Old 05-Sep-2002, 19:14   #12
Tahir2
Itchy
 
Join Date: Feb 2002
Location: United Queendom
Posts: 2,873
Default

Maybe the NV30 has it?
__________________
"Unless I am very mistakenů and yes, I am very much mistaken." - The Legend M Walker
Tahir2 is offline   Reply With Quote
Old 05-Sep-2002, 19:19   #13
alexsok
Member
 
Join Date: Jul 2002
Location: Toronto, Canada
Posts: 803
Send a message via ICQ to alexsok Send a message via MSN to alexsok
Default

Quote:
Originally Posted by misae
Maybe the NV30 has it?
Well... we shall see...
alexsok is offline   Reply With Quote
Old 05-Sep-2002, 19:33   #14
Dave Baumann
Gamerscore Wh...
 
Join Date: Jan 2002
Posts: 13,598
Default

Quote:
Originally Posted by alexsok
Quote:
Originally Posted by misae
Maybe the NV30 has it?
Well... we shall see...
Yawn. Lets get back on topic shall we.
__________________
Radeon is Gaming
Tweet Tweet!
Dave Baumann is offline   Reply With Quote
Old 05-Sep-2002, 19:40   #15
alexsok
Member
 
Join Date: Jul 2002
Location: Toronto, Canada
Posts: 803
Send a message via ICQ to alexsok Send a message via MSN to alexsok
Default

Quote:
Yawn. Lets get back on topic shall we.
Sure Dave! Now where were we? oh yeah, 8 pipes and 2 TMUS on each pipe would be great with aproximately 25-30gb/s of bandwidth and of course a 256 bit memory bus.
alexsok is offline   Reply With Quote
Old 05-Sep-2002, 20:45   #16
psurge
Member
 
Join Date: Feb 2002
Location: LA, California
Posts: 854
Default

Question on the pixel pipes of the p10.

Notice that they rasterize tris into 8x8 tiles (and perform visibility culling at this level).

On top of that they have 64 = (8*8 ) texture coordinate processors and 64 pixel shading ALUs.

To me this says: 64 pipe card, with each pipe locked to a specific pixel in an 8x8 tile? I haven't seen any claims that p10 can do data-dependent branching in pixel programs (there does appear to be some form of loop support for texture sampling), or that it can handle programs of arbitrary length, or that it's pixel pipes can operate on arbitrary pixels, pixels from different triangles, or even pixels with different shaders.

IMO if this kind of thing were possible with p10, wouldn't the performance numbers reflect it?

(Before you all say, 64 pipes! no way! - note that the p10 ALUs are not SIMD - i.e. they process 1 float/int at a time as opposed to 4.)

However they do describe their programmeable units as "SIMD vertex texture and pixel arrays". That would tend to indicate that each ALU is executing the same instruction as all the other ones each cycle.

So why exactly does everyone think p10 is "so flexible" compared to say r300?
psurge is offline   Reply With Quote
Old 05-Sep-2002, 22:02   #17
BRiT
...
 
Join Date: Feb 2002
Location: Cleveland
Posts: 5,503
Default

Quote:
Originally Posted by psurge
So why exactly does everyone think p10 is "so flexible" compared to say r300?
Points to marketting material from 3DLabs. It says so right there. :P

--|BRiT|
BRiT is offline   Reply With Quote
Old 05-Sep-2002, 23:06   #18
Tonyo
Junior Member
 
Join Date: Aug 2002
Posts: 29
Default

Quote:
Originally Posted by psurge
Question on the pixel pipes of the p10.
[...]
To me this says: 64 pipe card, with each pipe locked to a specific pixel in an 8x8 tile? I haven't seen any claims that p10 can do data-dependent branching in pixel programs (there does appear to be some form of loop support for texture sampling), or that it can handle programs of arbitrary length, or that it's pixel pipes can operate on arbitrary pixels, pixels from different triangles, or even pixels with different shaders.
[...]
However they do describe their programmeable units as "SIMD vertex texture and pixel arrays". That would tend to indicate that each ALU is executing the same instruction as all the other ones each cycle.

So why exactly does everyone think p10 is "so flexible" compared to say r300?
Because it is .
Yes, P10 has data dependent branching and looping in the fragment shader. Regarding the relationship between shaders and pixels/fragments: At rendering time, a primitive (say, a triangle) is decomposed in the tiles the projected 2D primitive touches and the shaders are run for each tile, so in that sense the shader cannot displace pixels around the screen and the shader run is the same for the whole primitive.

From Wavy's P10 preview:
Quote:
The maximum number of instructions that the vertex processor can handle at a time is 256 instructions (per unit); but, as mentioned before, the processors can use loops and subroutines so it can be much more efficient in the use of the 256 instructions.
http://www.beyond3d.com/articles/p10...page=page2.inc

I haven't been able to find any source disclosing the number of instructions in any of the fragment-pixel units (coordinate, shader, address and pixel)though
Tonyo is offline   Reply With Quote
Old 05-Sep-2002, 23:11   #19
sancheuz
Junior Member
 
Join Date: Jul 2002
Posts: 44
Default

I would recommend no less then 20 pixel pipes and around 25 tmu's per pass
__________________
What the heck is an NV30? ;-)
sancheuz is offline   Reply With Quote
Old 05-Sep-2002, 23:20   #20
Saem
Senior Member
 
Join Date: Feb 2002
Posts: 1,532
Send a message via ICQ to Saem Send a message via AIM to Saem Send a message via MSN to Saem
Default

psurge,

In this thread over here. I asked Dave Baumann about whether the "pixel pipes" were fixed, and he felt that they weren't.

When looking at the diagram at the end of the page here which describes the P10 microarhitecture. It seems that it is possible to load more than one triangle and have the pixel processing -of course this will take more cycles. I feel this is the case because as Dave mentioned in his P10 technology preview the P10 uses a lot of mulitlevel cache, the P10 could easily have the ability to cache a few tiles or patches. As pixels would be processed, the cache (FIFO buffer) would spit out another pixel onto the chopping block.

As for the "SIMD arrays", this could be very much like the vertex processer where this is simply an abstracted look and in actuality, the pipelines are independently executing.
__________________
Regards.
Saem is offline   Reply With Quote
Old 06-Sep-2002, 01:12   #21
Althornin
Senior Lurker
 
Join Date: Feb 2002
Posts: 1,326
Default

Quote:
Originally Posted by sancheuz
I would recommend no less then 20 pixel pipes and around 25 tmu's per pass
per pass?
wtf?
Althornin is offline   Reply With Quote
Old 06-Sep-2002, 01:51   #22
GetStuff
Junior Member
 
Join Date: Jul 2002
Posts: 67
Default

Quote:
Originally Posted by sancheuz
I would recommend no less then 20 pixel pipes and around 25 tmu's per pass

No you need atleast 28 tmu's per pass for tommorows 2d applications.
GetStuff is offline   Reply With Quote
Old 06-Sep-2002, 03:13   #23
psurge
Member
 
Join Date: Feb 2002
Location: LA, California
Posts: 854
Default

Tonyo, Saem

How do you know there is data-dependent branching in the fragment shaders - link? I figured this to be the case for the vertex processors,
as Dave's preview mentions that each vertex unit has it's own program storage. But - is it really realistic to assume that all 128 of the pixel processors have their own program/temp storage?

Tonyo, I'm not sure if i understand what you mean with the tiles - here is what my guess was : take a tri, split it into 8x8 tiles. For the 64 pixels in each tile, run the same pixel program on each pixel inside the triangle.
Is this what you're saying?

Saem, what confuses me the most in that diagram is the 4 "texture pipes". Each has a "setup" stage - does this mean each can handle pixels from a different triangle?
psurge is offline   Reply With Quote
Old 06-Sep-2002, 04:14   #24
Saem
Senior Member
 
Join Date: Feb 2002
Posts: 1,532
Send a message via ICQ to Saem Send a message via AIM to Saem Send a message via MSN to Saem
Default

psurge,

Well the thing is that there doesn't need to be that much of a "program" for the shading pipes -they might just get on instruction and then get another in the next clock, not sure where the control logic is. All that needs to happen is they get a pixel, execute all the work necessary -what they're told to do- for that pixel and say "done". When "done" is said, they get more work, lather, rinse and repeat. It doesn't depend where that work comes from, if I understand things correctly -this is handled by the allusive control logic. >=|

The "setup" stage is bugging me as well. It says it's house keeping, but I'm not sure if that's the whole story. I could be some sort of program setup and evaluation. Pixel operations might be rather numerous and some if not all will require some special provisions, this could be an area where these are taken care of. Perhaps, one can even program the "house keeping."

As for data-dependent branching, I'm not sure. It could be that the "setup" stage actually evalutes a program and then runs it. Again, this setup stage could be large or small, I'm not sure what to think of it right now. They might have basically recycled the RISC cores they used in the vertex shaders, here they use it to do some extra logic to handle house keeping tasks. ARGH, what the heck does that do? *poke Mr Baumann*
__________________
Regards.
Saem is offline   Reply With Quote
Old 06-Sep-2002, 21:00   #25
Tonyo
Junior Member
 
Join Date: Aug 2002
Posts: 29
Default

Quote:
Originally Posted by psurge
How do you know there is data-dependent branching in the fragment shaders - link?
Well, I know it , but I don't have links to back it up :"(

Quote:
Originally Posted by psurge
But - is it really realistic to assume that all 128 of the pixel processors have their own program/temp storage?
The instruction storage could be shared across all the SIMD processors (Single Instruction), but I guess that would complicate the routing :?.

Note that all the processors in a tile work on the same primitive, so they all are executing the same program, or as you put it:

Quote:
Originally Posted by psurge
Tonyo, I'm not sure if i understand what you mean with the tiles - here is what my guess was : take a tri, split it into 8x8 tiles. For the 64 pixels in each tile, run the same pixel program on each pixel inside the triangle.
Is this what you're saying?
Yes, it's exactly exactly that.

One of the slides says "Rasterize triangle into tiles" and then, with those tiles:
Quote:
The 64 processor arrays through the pixel and texture pipelines are arranged in an 8x8 block, which is the basic unit of processing and memory transfer - 3Dlabs refer to this block as a 'tile' or 'patch'.
http://www.beyond3d.com/articles/p10...page=page3.inc
Tonyo is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
3dfx Rampage ;) Ante P 3D Architectures & Chips 256 13-Dec-2013 17:38
PowerVR Serie 5 is a DX9 chip? ActionNews 3D Architectures & Chips 269 15-Apr-2003 19:26
FX and PS 1.4, DX9 tests? Ante P 3D Architectures & Chips 148 10-Feb-2003 19:28
Expert opinion needed re: Fill rate cellarboy 3D Hardware, Software & Output Devices 4 16-Dec-2002 19:35
GF4 has inflated 3dmarks scores so says the INQ..... jb 3D Architectures & Chips 126 19-Jun-2002 23:35


All times are GMT +1. The time now is 15:09.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.