Anand on Parhelia vs NV30

SteveG

Newcomer
I'm sure we've all had a chance to read the Parhelia previews by now. Wondering what everyone thought about Anand's thinly (and not-so-thinly) veiled allusions to NV30 performance? There's been plenty of discussion in other threads re: Parhelia vs. NV30, with many insisting that Parhelia will beat NV30 in raw performance simply due to it's 256 bit memory bus. Anand's preview of Parhelia clearly suggests that this is a poor assumption. Keep in mind that Anand is in a unique position (as a trusted journalist) where he probably has as good an idea as anyone of the comparative strengths of the upcoming chips - he's not just "guessing" like many of us. He doesn't come right out and say NV30 will be faster, but it's not hard to read between the lines. From his Parhelia review:

"The Parhelia-512 has the ability to take the short-term performance crown away from NVIDIA."

"In the end, the Parhelia-512 has the potential of being the king of the hill between now and the release of NV30."

And:

"The lack of any serious Z-occlusion culling technology is a major disappointment. If you've noticed, occlusion culling is something that ATI and NVIDIA are continuing to improve on. The next-generation Radeon and NVIDIA's NV30 will both have extremely sophisticated forms of occlusion culling built into the hardware. This tradeoff can become a killer for Matrox in situations where complex pixel shader programs are applied to pixels that should have been occluded."

Comments?
 
We really can't comment on either until we see how drivers, stability and real-world performance clock in with these two pieces of hardware.

What we *think* we know from hints, rumors and early product specs-
* The NV30 will not be a 256 bit part
* The Parhelia will be 256 bit with 20gb/s memory bandwidth.

The only argument Anand has brought forth to sour the Parhelia is the lack of Z-occlusion, which helps remove *some* memory bandwidth bottlenecks today, but not nearly to the degree of instead having pure bandwidth.

I think if any sane, logical and reasonable person had their chance between 20gb/s bandwidth, or say 9.2gb/sec "marketing" bandwidth (i.e. after z-occlusion on back to front, overdraw 10x style conditions that fit the z-occlusion perfectly, 100% utopian "light speed" optimized accesses and superbly perfect texturing conditions) - I'd say ther 20gb/s is the real desire.

So all this leads to is questions of just what the NV30 will have for occlusion technology and raw, useable bandwidth. According to Anand's opinion, "average case" would easily exceed a 20gb/s unit, so that builds pretty high expectations for the part, especially already being touted as not being 256bit.

All of this also totally ignored driver quality and stability as usable products. It's anyone's guess if the NV30 or Parhelia will ever get to know a fraction of their potential as far as drivers go before they become obsolete and superceded by the next product line.
 
I think... that Parhelia-512 should compete OK with NV30 and R300... the main selling point for Matrox will of course be totall incomparable IQ though.
 
I think... that Parhelia-512 should compete OK with NV30 and R300... the main selling point for Matrox will of course be totall incomparable IQ though.

Exactly. The Q/A with Matrox does spell this out. Whether or not that pans out or not, we will have to wait-n-see.

But at this point...It does appear that, at least in the case of NV30, nVidia is banking on a superior occlusion/memory-architecture than a straight-up wider bus.

Perhaps, in the end, it will all be a wash....but more economical (for nVidia). Maybe, just maybe, what they have in store is efficient to the point where there won't be any justification for that larger bandwidth...or, their final analysis was that...if they were to have gone the 256-bit route, they would have had to chop a bunch of stuff from the architecture in order to hit the desired yields/clock.

Me wonders just how long we will have to wait until they do a paper launch...or if they will simply forego that and wait until they can get hardware out to reviewers.
 
We certainly can say that there is at least 4 players in the race, each having their good and bad sides.

Most interesting thing will be to see, if nVidia truly goes more powerful Oculsion Culling route and ATI takes wider bus and 'normal' Oculsion Culling, while Matrox had to make compromise between transistor count and No Oculsion Culling at all. Of course then we have 3DLabs, but their VPU is so flexible to program that It is almost impossible to say how efficiently it will fit to gamers market. (they have pretty good chances in every case.)

and don't forget the SiS... I think SiS Xabre series being main reason why nVidia has geared up NV18 development.

do you agree with me on comment, "this will be most interesting time since Rage128/G400/Voodoo3/TNT2 launches."? :)
 
Nappe, agree. Now the only thing missing is the 'boys with an announcement... [Yeah, I know they'll shut up until they have product to show. BTW, if you don't mind, it's "Occlusion"... them tongue/finger-twisting latin loan words again ;) ]

***

In Matrox' Parhelia diagram there was a "Depth Acceleration Unit" and a "Depth Cache". Are those standard items nowadays, or something new and special (for Z-testing) -- any savings on bandwidth?

And, needless to say, Matrox' edge AA method combines (hopefully) high image quality and low extra bandwidth usage -- the first solution giving both?
 
According to tomshardware Matrox has at least implemented Fast ZClear, which can be a good save of fillrate at least.
 
Gunhead said:
...Yeah, I know they'll shut up until they have product to show. BTW, if you don't mind, it's "Occlusion"... them tongue/finger-twisting latin loan words again ;)

hehe :) thanks for the correction. I knew there was something wrong with that word, but as not a native speaker of english, I didn't find what was wrong.

and for the 'Boys being quiet is the best decision at the moment. So don't expect them even update their pages any time soon. Anyways, as all ppl working in some gfx comany knows, there is only 10 to 15% things that leaks or are said in publicity. In the case BB, I think that Number is even smaller.

Those who know what has been going on since last August, would definately agree with JF_Aidan_Pryde. He said at the temporal boards: "I think Bitboys is cursed."

Still I am those rare ppl that still believes their tech and ability to do their job. and I must admit that I am kinda fan boy. (and yes, it is hard to be ethuastic to something that have been failed because unbeliable bad luck. Bashers don't make this situation any easier. Still, I am following their projects, but I will not talk about them anymore. That basically means, that if there is some presentations on this year Assembly, it is up to others tell about... after last year, I am not willing to that again.)

But, let's forget them forawhile and talk then when there is something to talk. There is so much more to follow right now that it would be shame to waste that.
 
I think if any sane, logical and reasonable person had their chance between 20gb/s bandwidth, or say 9.2gb/sec "marketing" bandwidth (i.e. after z-occlusion on back to front, overdraw 10x style conditions that fit the z-occlusion perfectly, 100% utopian "light speed" optimized accesses and superbly perfect texturing conditions) - I'd say ther 20gb/s is the real desire.
You mention logical, sane and reasonable, yet rant and redicule many bandwidth saving methods implemented in current technology. What is so wrong with trying to use actual bandwidth more effectively? These 20gb/s are also theoretical and will NEVER be achieved in a real-world scenario thus they are a "marketing" number too. Don't get me wrong, its certainly good to have all that theoretical bandwidth and yes, in a black and white world where *only* specs and theory count it would always be preferable to have 20 gb/s bandwidth over 10 gb/s, nobody would question that, but its not that simple.

In my world, and I guess this is the same world many other people live in, cost and effectiveness actually plays a big role too - and assuming ATI and Nvidia stick to their current execution, there will be several more interesting cards available in a couple of months. They might only have ~2/3 of Parhelia's raw bandwidth, but with their advanced bandwidth saving features I would be very surprised if their "high-end" models can't reach and/or exceed Parhelia's performance/features at a comparable price! There will be mid- and low-range models too though, that will probably offer a reasonable amount of performance (maybe 70-90% of Parhelia?) at a *considerably* lower price (like about 50%) - those are the cards *I* am waiting for.

That said, I am really impressed by Parhelia after what I've so far read about it, its far more than I would ever have expected from Matrox and a very pleasant surprise, but I rally doubt it will "rock my world" when the first cards hit the stores! The bandwidth may be impressive, as are the features and the IQ, but the price is prohibitive! Looks almost like the V5 6000 to me, impressive but veeery expensive and thus not really interesting for over 90% of the crowd. Considering that in this case the chip is a big factor in board costs, it won't be easy to create an LE or MX type board either. Clearly Matrox wasn't very interested in this kinda market anyway, but its what most people would actually care about - a Parhelia-card in the sub 300$ area - now that would truly get me excited! As it is now, I'm just happy that someone finally brings some live into this market again (same goes for the P10 and even the new SIS chips), and with an impressive piece of engineering too...

Funny though, years ago already people here were discussing if someone (usually Nvidia was mentioned I think) would go for a 256bit path and most were dismissing it as unpractical and less than elegant brute-force - now that someone's actually doing it its not that bad anymore eh? ;)
 
You mention logical, sane and reasonable, yet rant and redicule many bandwidth saving methods implemented in current technology. What is so wrong with trying to use actual bandwidth more effectively? These 20gb/s are also theoretical and will NEVER be achieved in a real-world scenario thus they are a "marketing" number too. Don't get me wrong, its certainly good to have all that theoretical bandwidth and yes, in a black and white world where *only* specs and theory count it would always be preferable to have 20 gb/s bandwidth over 10 gb/s, nobody would question that, but its not that simple.
I agree with you, we need to see some benchmarks. I am waiting for it.

IMHO that $400 is not absurd for a 256bits card like Parhelia. Absurd is $400 for GF4Ti4600.

In my world, and I guess this is the same world many other people live in, cost and effectiveness actually plays a big role too - and assuming ATI and Nvidia stick to their current execution, there will be several more interesting cards available in a couple of months. They might only have ~2/3 of Parhelia's raw bandwidth, but with their advanced bandwidth saving features I would be very surprised if their "high-end" models can't reach and/or exceed Parhelia's performance/features at a comparable price! There will be mid- and low-range models too though, that will probably offer a reasonable amount of performance (maybe 70-90% of Parhelia?) at a *considerably* lower price (like about 50%) - those are the cards *I* am waiting for.
To have 2/3 of 22.4GB/s it will need 16GB/s or DDRII1000. I dont see it this year. Probably some NV30 with 12.8GB/s DDRII800 will cost $400 and will have to compete with Parhelia.
 
Funny though, years ago already people here were discussing if someone (usually Nvidia was mentioned I think) would go for a 256bit path and most were dismissing it as unpractical and less than elegant brute-force - now that someone's actually doing it its not that bad anymore eh?

Heh, actually it was fairly recently that I and a few others were pondering the "next step" in bandwidth technology, and I personally was expecting 256 bit bus to be a the next big thing. ;)

edit: Ahhh...hindsight: http://64.246.22.60/~admin61/forum/viewtopic.php?p=156

The only "complaint" about a wider bus was it's supposed extremely high cost. The fact that it's being introduced at what should be the same cost as the "typical" top of the line gamers card have been going for the past couple years, is what is causing it to be "accepted".

Sure, the real tech / math heads here typically frown on "brute force" approaches for their lack of "elegance." I prefer to marvel at them as engineering achievments in their own right.

Most others saw 256 bits as something not viable for anything other than $600+ boards. I was looking at it from the "it's just the next step in the logical progression of the memory bus."
 
Most interesting thing will be to see, if nVidia truly goes more powerful Oculsion Culling route and ATI takes wider bus and 'normal' Oculsion Culling

Indeed... Whatever route ATi takes they are going to be driving 8 pixel pipes this time around. THey will either *Have* to have Huge raw bandwidth, or extremely advanced Occlusion Culling. My guess though is a mixture of both. Perhaps they will have Dual Mem controlers and seperate busses, maybe splitting the load? Or simply the Hiest Bandwidth they can muster and whatever HyperZ 3 ends up being....

I had not thougth of this before but perhaps R300 is a dual chip design, similar to the Rumors about Nvidia having a seperated T&L unit
 
Most interesting thing will be to see, if nVidia truly goes more powerful Oculsion Culling route and ATI takes wider bus and 'normal' Oculsion Culling, while Matrox had to make compromise between transistor count and No Oculsion Culling at all. Of course then we have 3DLabs, but their VPU is so flexible to program that It is almost impossible to say how efficiently it will fit to gamers market. (they have pretty good chances in every case.)

What a sweet fall/winter this will be for all us 3D enthusiasts.

New high end 3D technology from at least 4 different companys (Ati, Nvidia, Matrox, 3D Labs). And maybe more, who knows :)

Not only do we have a lot of new cards. They all seem to take a different approach to get to the next level of performance/iq quality. Well, we don't know that much, if anything about the R300 and NV30 but it seems like they will differ from both the Parhelia and P10. Which makes it even more interesting.

And, we have new "standard" bechmarks coming up, the new Unreal engine and probably a Doom3 test (maybe even the actual game).

Sweeeet !!!
 
Matrox have been noted for stating that the Parhelia is "not a card for the masses". They don't expect it to sell in millions. They wanted a very high quality card, with high speed and lots of features, all for a price that would sell in their market.

I think that they have achieved, even exceeded, all their goals with Parhelia.

If you read their forums and their tech documents, you can tell that the Matrox developers are very pleased how Parhelia turned out.

I don't think that they will make a "cut down" version of the card for a lower price. It just does not fit the Parhelia "criteria".
 
Nappe1 said:
I think SiS Xabre series being main reason why nVidia has geared up NV18 development.

do you agree with me on comment, "this will be most interesting time since Rage128/G400/Voodoo3/TNT2 launches."? :)

With each release Nvidia have devised ever elaborate cooling solutions to dispense heat & to keep power fluctuations in check. The regulators on the GF4 bare this out and shows current gpu @ limit for process. SIS appear to have a core which uses very few regulators @ similar bandwidth and are already working on dx9 part. Can't wait! :LOL:
 
Sharkfood said:
The only argument Anand has brought forth to sour the Parhelia is the lack of Z-occlusion, which helps remove *some* memory bandwidth bottlenecks today, but not nearly to the degree of instead having pure bandwidth.

I think if any sane, logical and reasonable person had their chance between 20gb/s bandwidth, or say 9.2gb/sec "marketing" bandwidth (i.e. after z-occlusion on back to front, overdraw 10x style conditions that fit the z-occlusion perfectly, 100% utopian "light speed" optimized accesses and superbly perfect texturing conditions) - I'd say ther 20gb/s is the real desire.

The raw bandwidth alone is a bad indicator of performance, as has been talked about for years at B3D. With a 256bit bus, you are talking about a large penalty for every missed cycle, stall, cache miss, dependent lookup latency, etc. Maximizing and the memory throughput, preventing stalls, keeping everything fed is vitally important. It's very easy to waste over 50% of your bandwidth if you're not careful. This applies to everything, CPU, networking as well. I'm not saying Matrox engineers are stupid, but if they didn't put enough effort into bandwidth efficiency, their 20gb/s could be more like 12-15gb/s in practice.


Every pixel rendered by the GPU has a memory overhead, and some of it can't be erased by bandwidth alone or even cache tricks. For example, dependent texture lookups. It is better not to render that pixel at all rather than rely on more bandwidth, since the problem with dependent lookups is not just bandwidth but latency as well. Of course, deferred rendering is the holy grail here.


I must admit, I am very surpised by this Matrox card. I thought Matrox was dead. They don't have the resources of NVidia or ATI, yet they delivered a pretty feature rich, high performance card, which proves that smaller players can still afford to be in (or enter the market). That's refreshing, since it means more novel approaches. If the two big companies kill everyone off, it will be a two horse race like AMD and Intel, which is a boring predictable incrementalism.

I have a feeling that Nvidia's going to go more towards using Ned Greene's occlusion culling techniques than to just use Gigapixel's directly. Some sort of fusion of the two techniques may be possible. Tiling obviously makes it possible for very high AA levels without needing 1gigabyte of RAM on your card.
 
Nappe, amen. [BTW, saw them at ASM01...]

Hellbinder, LOL @ Nvidia's separate T&L chip! Someone is having royal fun watching how far that proposition has spread :p

It seems nobody is willing to put ATi and deferred rendering together?
 
McElvis said:
I don't think that they will make a "cut down" version of the card for a lower price. It just does not fit the Parhelia "criteria".

Probably more like not fit the Parhelia "technology" since they decided to go for a 256 bit bus and no or very little bandwidh saving technology. At least compared to what their competition are doing (going by Anand's hints about the NV30 f.e). A cut down version with f.e a 128 bit bus would probably be rather slow because of this.
 
no, becuase ATi just does not seem to be currently interested or heading the Tile/deffered method at all.

I do however wonder if they might be using a TCU style architecture similar to the Flipper chip. With 8 pipes, it may be faster and save bandwidth to have a defferedish rendering process... in the sense that a single or dual TCU(4 pipes feading each) that can process multiple texture stages the same way kyro and Flipper applies up to 8 textures in a single pass.

Game Cube has proven Quite capable of high quality and sustained FPS going this route. A really beffed up version might prove most formidable.

{note: This is all just a mental adventure on my part}
 
Gunhead said:
...BTW, saw them at ASM01...

hey... Coming this year? :)
Our small group of all kind of enthuastics (from demo coding to lame gaming.) have 8 computer places reserved from table D14 for Asm '02
 
Back
Top