Parhelia core clock speed, performance expectations ?

Teasy said:
Having a 1200mpixel/s fillrate like say a Geforce 4 isn't so important if you only have the bandwidth to achieve about 700mpixels/s or less? Core speed and theoretical fillrate is not everything, infact its nothing without the bandwidth to allow it to acheive its fillrate. High theoritcal fillrate can help sometimes but 880mpixels/s (220mhz x 4 pipes) is high enough, especially with 20gb/s bandwidth behind it.

Of course I understand that. (Expect I'm not positive GF4 is that much bandwidth limited.) No need for the elementary lecture in this instance, pretty please. :p

Also because of the 4 TU's per pipe the Parhelia will keep that full fillrate even with 4 texture layers while cards like the Geforce 4 will drop down to 600mpixels/s.

Correct me if I'm ass-backwards... but Matrox says Parhelia's 4 TMUs can do (bilinear quad-texturing or) trilinear dual-texturing. (Respectively, 4 pipes x 4 x 4 or 4 pipes x 8 x 2, for the total 64 samples figure they advertise.) As trilinear is the minimum for *me*, Parhelia is effectively dual-texturing capable from my personal point of view!

I don't know how GF4 does, I remember seeing a claim that it can deliver trilinear dual-texturing almost free: if this is correct (is it?), then IMHO Parhelia doesn't benefit from having 4 bilinear-capable instead of 2 trilinear-capable TMUs per pipe.

I repeat: I don't want bilinear! I loathe and despise bilinear! ;)

So in conclusion, 880mpixel/s is fine, its no big deal that it doesn't have a huge theoretical fillrate, its the memory bandwidth that's important here IMO and it does certainly have a huge memory bandwidth.

Yes I'm sure Parhelia sustains that fillrate very well. All in all I just thought it could sustain some more very well too, and that's why I wondered why the clock speed ended up lower than even the most cautious estimates I saw (rates between 250 MHz and 375 MHz were aired).

I'll probably buy it anyway for the IQ reason I stated earlier, I'm just b*tching that they couldn't get the core faster, what with all that bandwidth in their disposal.
 
Pete said:
[Compared to GF4...] it has twice the texturing units, but the same number of pixel pipes (4x4 vs. 4x2).

It boils down to what one texturing unit can do per cycle! (Trilinear or bilinear.)

After that it boils down to less visible architectural details and optimisations, and very importantly the drivers, of course.
 
Of course I understand that. (Expect I'm not positive GF4 is that much bandwidth limited.)

You don't think Geforce 4 would be significantly faster if it had say 20gb/s memory bandwidth rather then 11gb/s?

No need for the elementary lecture in this instance, pretty please.

Ok, it just seemed like you needed that elimentary lesson because of your first comment, and after all I don't know you so I don't know the extent of your knowledge.

Correct me if I'm ass-backwards... but Matrox says Parhelia's 4 TMUs can do (bilinear quad-texturing or) trilinear dual-texturing. (Respectively, 4 pipes x 4 x 4 or 4 pipes x 8 x 2, for the total 64 samples figure they advertise.) As trilinear is the minimum for *me*, Parhelia is effectively dual-texturing capable from my personal point of view!

I'm not sure about that, however I do remember hearing that Geforce 4's TU's are bilinear.. are they not?

Yes I'm sure Parhelia sustains that fillrate very well. All in all I just thought it could sustain some more very well too, and that's why I wondered why the clock speed ended up lower than even the most cautious estimates I saw (rates between 250 MHz and 375 MHz were aired).

Well as has been said it does have 80 million transistors, that's around 25% more transistors then a Geforce 4. The core is clocked around 25% slower then the Geforce 4 (if it is indeed 220mhz) so it seems about right to me.
 
Well as has been said it does have 80 million transistors, that's around 25% more transistors then a Geforce 4. The core is clocked around 25% slower then the Geforce 4 (if it is indeed 220mhz) so it seems about right to me.

That would imply that the clock speed is limited by the temperature it runs at, not the IPC.

GF4 ultra anyone? wouldn't suprise me.
 
DaveBaumann said:
Thats still a fairly alpha board by the look of things - theres some reworks going on there.

That rework just looks like they had trouble with the clock generator circuit. Decided to fit a Oscillator rather than a Crystal. Up's the cost by a few pence, but it's much more stable.

If that's it......
 
Teasy,

I said I'm not positive. I don't know how much extra bandwidth it could use until computational/fillrate limitations kick in. I simply haven't seen a good investigation of GF4's bandwidth limitedness. (Even Reverend's VisionTek 4600 article left this vague.) So give me linkage to some instead of your own sentiments. I certainly don't know you.

Hopefully somebody can clarify us on the GF4 TMUs (bilinear or trlinear for free).

Dave B, don't ask me. Maybe memory bandwidth, or maybe every IMR is held back for other reasons. Rev's article mentions 3DMark fillrate test result of 1073 out of 1200, which isn't a miserable result. Yeah I know it's only an uber-simplistic ideal case raw fillrate test, but goes to show that how much it is held back depends on the situation. Linkage to show how much is it held back and when? (I would appreciate the read, actually, don't get me wrong.)

Back on topic: Maybe I just became too hopeful when nobody seemed to object to those 250 to 375 MHz figures, what with me not knowing Matrox' architecture/fabbing intimately. Hafta say I can't remember you guys considering them too high for 80 Mio at .15um, either... If R300 is 107 Mio at .15um, is 165 MHz a good achievement for it? Do you expect that clock speed -- or more?
 
I said I'm not positive. I don't know how much extra bandwidth it could use until computational/fillrate limitations kick in.

Well for instance using SS:SE the Geforce 4 TI 4600 hits 700mpixels/s (if you want a link for that, I think it was in the NV17 and NV20 preview at Anandtech). That's just over 50% of its peak fillrate. That ight not be exactly the same in all games, but that game is a good indicator of what sort of fillrate a Geforce 4 TI4600 hits. So surely if you doubled its bandwidth from 10gb/s to 20gb/s it would get allot closer to that peak fillrate of 1200mpixels/s.

Maybe memory bandwidth, or maybe every IMR is held back for other reasons.

Like what? If you can't think of any other reasons then the most obvious reason surely has to be excepted.. and that is memory bandwidth.

Anyway obviously Matrox feel all that bandwidth is very worthwile, even with a 880mpixels/s fillrate. Who wants to bet now that the Parhelia will hit its full 880mpixels/s fillrate in the SS:SE fillrate test?

Anyway my main point is 880mpixels/s of real fillrate is actually extremely high. So their's no reason to be disapointed or think that this won't be enough to drive your 21 inch monitor at high res. Because if it hits that full 880mpixels/s fillrate (which is likely due to all that bandwidth) it will have more real fillrate then any card currently on the market.

Also we still don't know that it will be 220mhz, it may end up as 240mhz or something.
 
I checked those Anand's SS:SE findings. BTW, the link:
( http://www.anandtech.com/showdoc.html?i=1583&p=10 )

Select picks:
GF4 4400 ( core/mem: 300/350 ) MPps: 765, MTps: 9.78
GF4 4400 ( core/mem: 275/275 ) MTps: 646, MTps: 9.07

Juxtaposing those two, because it's the same core with different settings, I noticed that the fillrate difference [ratio] ( 765:646= 1.18 ) corresponds to the memory clock difference ( 350:275= 1.18 ). Here fillrate scales linearly with bandwidth.

The triangle rate difference ( 9.78:9.07= 1.08 ) corresponds to the core clock difference ( 300:275= 1.09 ). The triangle rate scales linearly with core clock.

I'd like to know what resolution these tests run at, and whether they are two separate runs or one. The results of the fps tests (below) would indicate that fillrate tests are higher rez than the triangle rate tests. How is it?

The benchmarked fps differences ( 4600:4400 ) are: @1024x768 109.7:99.7= 1.10, @1280x1024 76:67= 1.13, @1600x1200 55:47.5= 1.16

So as resolution goes up, the fps difference between the two cards scales from the triangle rate difference nearly to the fillrate difference -- so at least 4400 becomes bandwith limited on the way. But what about the 4600?

We can assume that @1024x768 the 4600 isn't bandwidth limited, as in the triangle rate test it didn't benefit from it's (dis)proportionally higher bandwidth -- the 4600:4400 difference was all core speed there.

Now, if 4600 isn't bandwidth limited @1024x768, let's see how 4600 did at the ( fps ) benchmark as resolution went up:

Screen size @1600x1200= 1.920 MP, @1024x768= 0.786 MP
4600's benchmark result @1024x768= 109.7 fps, @1600x1200 = 55 fps

1.920:0.786= 2.44 ( 1600x1200 is 2.44 times bigger )
109.7:55= 1.99 ( the 4600 is 1.99 times slower at 1600x1200 )

The 4600 should be more than 2.44 times slower @1600x1200 if it becomes bandwith limited after @1024x768.

Now your turn to twist the numbers (no wonder they're called that :p ), your way this time. 8)

[Edit: typos]
 
Back
Top