If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
|
|
#1 |
|
Senior Member
Join Date: Mar 2002
Posts: 3,779
|
Even though we found out that parhelia will only have a 220 MHz clock speed, I was still expecting decent performance out of Parhelia. Lets take Quake3 at 1600x1200x32 for example.
Counting pixels, a GF4 Ti4600 can do 1.2 billion / (1600x1200) = 625 fps. Taking overdraw to be 3.3 (some STMicro dude said that, and its about right), we're down to 190 fps. Because GF4 has some HSR, that will go back up to about 230 fps or so. Now what brings it down to ~150 fps that it actually scores? 1. Texture bandwidth. In some areas of the screen, the textures can require upwards of 50 bits per pixel (trilinear base map at 1 texel/pixel + light map + cache inefficiencies), although mostly it is much less. 2. Color bandwidth. When doing alpha blending, that extra read (32 bits extra) can sometimes really put bandwidth requirements over the top. 3. Trilinear filtering. Closer pixels are magnifying the biggest mipmap so its the same as bilinear filtering, but after a certain point you'll need to get two bilinear samples per texture, requiring an extra clock cycle and reducing fillrate to 2 pixels per clock. 4. Drivers, CPU, and T&L. While you can argue that the test is no longer CPU limited at 1600x1200, it is close enough to the CPU limit of ~210 fps (as seen in lower resolutions) that there will be some moments when the CPU is holding the GF4 back. T&L is also minimal, as Quake 3 is hardly polygon intensive. Overall, the GF4's fillrate is effectively 3 pixels per clock instead of 4+HSR, which is actually extremely efficient - better than any other card today, and it trounces that POS known as GF2. Now lets look at Parhelia. At 220 MHz, it's fillrate leads to a speed of 880 million / (1600 x 1200 x 3.3) = 139 fps. However, it has no excuses, as the above does not apply: 1,2: With a 256-bit bus and a much higher memory clock than core clock, Parhelia has more than twice the bandwidth per pixel than the GF4 (160 vs. 69 bits per pixel per clock). Very rarely would there be pixels requiring this much bandwidth. 3: With 4 texture units per pipe, Matrox has no excuse for extra cycles in trilinear filtering. 4: The Parhelia's score for 1024x768 is more than twice the score at 1600x1200 (151 vs 70 fps, or 2.2x). This is almost entirely due to the increase in pixels on the screen at 1600x1200 (2.4x), meaning that there are very few driver, CPU and T&L related bottlenecks. Parhelia really should be getting close to 130 fps, but gets a truly pathetic 70 fps. Well, there you have it. A mathematical proof of how much the Parhelia blows Matrox's G400 was a very good card at the time it came out, but the Parhelia is a horrible effort considering they had 3 years since the G400 to focus on one chip, and other manufacturers put out 2 generations of cards in that time, with a third coming soon. They do not have a very talented hardware design team at all. |
|
|
|
|
#2 |
|
Member
|
I guess if the reviews so far has said anything, it's that NVIDIA and the Geforce4 are pretty darn optimized. All the best engineers are there, for one reason or another.
|
|
|
|
|
#3 |
|
Member
Join Date: Feb 2002
Location: Germany
Posts: 845
|
IMHO the math works a little bit different.
Looking at Quake3 : At a resolution of 1024x768 you need 768432 x 3 Texel (3Dfx-style) fillrate. The 3Texel come from multitexturing and alphablending. On top you have an overdraw of around 1.2-1.5 (from memory only). So you need an effective fillrate of 3538944 Texel (3DFx-Style; Overdraw 1.5) for every frame. Bandwidth demand for this fillrate ( with 2pass rendering ) : Textures : 3,5 Mio x 32bit x 4(bilinear filtering) *0,33 ( cache-missrate ) / 8 = ~19MB / Frame Z-Buffer : 3,5 Mio x 32 x 2 x2(2passes) /8 = ~56 MB / Frame ( the GF4 can save up to 75% of this bandwidth ) Framebuffer : 3,5 Mio x 32 x2(2passes) /8 = ~28 MB /frame So the GF4 4200 can do theoretically around : 1000 / ( 3,5 x4/3; cause of the pipeline ) = 214fps Bandwidth demand : 214 x ( 19+0,25*56+28 ) = 13054 MB/sec The GF4 4200 has only 8GB/sec bandwidth, and so it reaches only around 60% of the theoretical figure. This gives 214 x 0,6 = 130 fps As you see this figure is too low, but this is because this is only an rough estimate without taking into account 8bit alpha-textures, S3TC, real cache-missrate etc... The Parhelia on the other side would have the following speed : 880 / ( 3,5 x4/3; cause of the pipeline ) = 188 fps Bandwidth demand : 188 x ( 19+56+28 ) = 19364 MB /sec Theoretically the bandwidth demand for the Z-buffer and the framebuffer would be only half this amount given, because the Parhelia has 4TMU's per Pipeline, but most games are writen with 2TMU's per Pipeline in mind and so the extra units sit idle ( but can be used for trilinear filtering or anisotropic filtering ) real bandwidth : 275x256x2 /8 = 17,6 GB/sec So the speed of the Parhelia should be 188 x (17600/19364) = 170fps. The real speed should be even higher as we see with the GF4 4200. So in the end, I agree the Parhelia is very inefficient at the moment; BUT this can be corrected with drivers more or less (unless Matrox has build an really inefficient chip ). |
|
|
|
|
#4 |
|
Senior Daddy
Join Date: Feb 2002
Location: London
Posts: 1,869
|
I dont know the theoritcal maths, but even I would have expected the raw speed to at least equal Gf4 Ti4200/4400 performance and then see it pull well ahead as the IQ was turned on.
|
|
|
|
|
#5 |
|
Senior Member
Join Date: Jan 2002
Location: Abbots Langley
Posts: 732
|
Has anybody seen any single texture results from 3DMark, all I find is the multitexture ones where Matrox wins. I wonder about their pixel throughput rate...
K~ |
|
|
|
|
#6 |
|
Junior Member
Join Date: Feb 2002
Posts: 14
|
Single Texture was around 700ish.
|
|
|
|
|
#7 |
|
Junior Member
Join Date: Jun 2002
Posts: 49
|
|
|
|
|
|
#8 |
|
Senior Member
Join Date: Jan 2002
Location: Abbots Langley
Posts: 732
|
OK, 880 theoretical and 750.7 in a fairly cache friendly test with twice the bandwidth available as the others per pixel... efficiency rate : 85%. You'd think they would at least hit 100% efficiency on that one... wonder if they have any ability in the drivers to tweak their memory interface.
K~ |
|
|
|
|
#9 |
|
Member
|
Maybe they need those old Voodoo5 drivers with the HSR.
|
|
|
|
|
#10 | |
|
lp0 On Fire!
|
Quote:
I have been reading their white papers and started to wonder, if their FAA unit knows what pixels are totally covered, why not adding something Pixel Skip to that one? It would not have took much more room and would have been helping a quite a lot.
__________________
Nappe1 of Division & Future Vision Founder of AF3DE |
|
|
|
|
|
#11 |
|
Member
Join Date: May 2002
Posts: 116
|
hmmm...
I used a much rougher calculation for Quake III using a single pixel per pipe and HQ filtering 880000000 / (1280 * 1024 * 6) the 6 being 2 textures * 2 overdraw + 2 more for alpha and dynamic lights That works out to 112, a far cry from the mid 70's reported in some reviews. It's impossible to guestimate the impact of special effects in Q3. A multiplier of 7 or 8 may be more accurate. |
|
|
|
|
#12 | |
|
Senior Member
Join Date: Mar 2002
Posts: 3,779
|
Quote:
![]() As far as I know Parhelia can combine two pipelines together for 8 textures per pass (unless they can only combine the pixel shader part), so we're talking 320 bits of bandwidth per pixel with 8 textures. The textures are very low resolution and cache friendly, as you said, so the texture bandwidth for 8 textures will be very little, maybe ~20 bits per pixel at most (even lower at higher resolutions). Add in an alpha read, color buffer write, and no Z (I believe Z is disabled for this test, hence the good scores of Radeon 8500 and Geforce4), and you get only ~85 bits per pixel. They have SO much bandwidth to spare, it isn't funny. Even if they can only do 4 textures per pass, there is plenty of bandwidth to spare (160 bits per pixel bandwidth available and ~75 bits required). I can understand the multitexture rate being slight less than 4 times the single texture rate, but to drop down to about 3 times is pretty bad. Look at the Radeon 8500 - 76% efficient in single texturing and 93% efficient in multitexturing. To hit 70% efficiency in multitexturing with over twice the bandwidth per pixel per clock is just abhorrent. |
|
|
|
|
|
#13 |
|
Senior Member
Join Date: Feb 2002
Location: gjethus, Norway
Posts: 1,256
|
Appears to me so far that Matrox, when designing this chip, just assumed that raw memory bandwidth was the only big bottleneck holding back GPU performance, and that therefore a 256-bit bus alone would be enough to beat Nvidia/ATI's 128-bit solutions. The lack of significant performance optimizations in the chip (like fast/hierarchical Z tests, Z-compression, crossbar memory controllers, etc) and the generally low efficiency (70% efficiency when multi-texturing, as well as the performance hit taken when doing anisotropic mapping, point to a badly optimized texture cache; and the 4 vertex shaders deliver less than impressive performance also) of the architecture would point in that direction.
|
|
|
|
|
#14 | ||||||||||
|
Senior Member
Join Date: Mar 2002
Posts: 3,779
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
fps = [texel rate] / [# texels on the screen] that means (250*4*2)/(3.5)=571 fps. But your texel count is wrong - its 1024x768 x 2 textures x 3.3 overdraw = 5.9 million. Then you get fps of 340. Quote:
Quote:
Quote:
Anyway, this method of calculating theoretical framerate has more holes than a block of swiss cheese. |
||||||||||
|
|
|
|
#15 | |
|
Senior Member
Join Date: Feb 2002
Posts: 2,019
|
Quote:
A driver comment unrelated to my first paragraph. I think the first chart on the following page at Anandtech might show that the drivers are creating some CPU inefficiencies. I find it odd that Parhelia trails significantly when the other cards are practically identical. http://www.anandtech.com/video/showdoc.html?i=1645&p=9 |
|
|
|
|
|
#16 | ||
|
Senior Member
Join Date: Mar 2002
Posts: 3,779
|
Quote:
Also, raising the resolution in Quake 3 increases the number of pixels without raising the number of render state changes, polygons, or CPU work per frame. Still, the Parhelia's scores are scaling almost exactly with increase in fillrate demands (demonstrated above - the 2.2x vs 2.4x thing), suggesting inefficient fillrate, not any other reason. I'm not talking about 5-10% increase, I'm talking about a 80%+ increase (at least in Q3 at 1600x1200) needed for performance to be where it is expected. Unless they have some problem with the card such as limiting it to only 128 bit due to a timing bug or haven't payed any attention to getting performance reasonable (this a far bigger problem than just "tweaking"), there really doesn't seem to be hope that performance will reach expected levels, or at least not what I'm expecting as explained above. With over double the bandwidth per clock and double the texture units, we should see at least a 20% increase in efficiency PER CLOCK (i.e. if Parhelia were to run at GF4 speeds) compared to Radeon 8500 or GF4. Instead, its efficiency is actually significantly lower. |
||
|
|
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Parhelia experience | Typedef Enum | 3D Architectures & Chips | 114 | 25-Sep-2002 16:22 |
| Matrox to showcase new Parhelia at Seybold 2002 | Dave Baumann | Press Releases | 0 | 03-Sep-2002 22:58 |
| Matrox Announce Parhelia Based Boards | Rookie | 3D Architectures & Chips | 5 | 18-Jun-2002 16:49 |
| Matrox Introduces Parhelia-512 Graphics Accelerators | Dave Baumann | Press Releases | 0 | 18-Jun-2002 14:48 |
| Anand on Parhelia vs NV30 | SteveG | 3D Architectures & Chips | 37 | 16-May-2002 16:22 |