I'd like to get some opinions on this.. (please)

Status
Not open for further replies.
This is from the conclusion of Anands new Gigabyte 9700 article...

Quite possibly more important than R300 boards by ATI's Taiwanese partners, is the incredible overclocking success we had with the Radeon 9700 Pro. The fact that we were able to overclock the very first revisions of the shipping R300 core to speeds as high as 400MHz leads us to believe that there's much more potential in the R300 that ATI has yet to expose. The main limitation at this point seems to be memory bandwidth and die size; the lack of additional memory bandwidth and die size constraints prevented ATI from going to two texture units per pipeline, which would definitely increase performance across the board.

We can see ATI's 0.13-micron successor to the R300 implementing that precious second texture unit, but if the need should arise we can also see ATI releasing a slightly faster version of the Radeon 9700 Pro in the interim. At higher resolutions, a bump up to a 400MHz core clock and a 674MHz memory clock gave us as much as a 15% performance boost.

ATI's board partners may also want to take it upon themselves to release their own overclocked versions of the Radeon 9700 Pro akin to what NVIDIA partners did during the days of the TNT2. Although NVIDIA has tightened up what they allow their board partners to do when it comes to shipping products overclocked, it may be in ATI's best interest to give a little more freedom in this respect to really get their partners' feet off the ground.

Seriously... Why the focus on the second TMU? Overclocking the R300 to 400mhz like he did offered a 15% increase in speed using initial release drivers. Would a second TMU, given the design on the R300 increase performance by much? how much? Vs a single TMU design (given the added bandwidth of faster ram)...

It seems to me that the R300 is designed to perform well without a second TMU.
 
How much more bandwidth would you need to keep that second TMU busy? Just wondering....
 
Maybe the addition of the second TMU will come along with the change to a .13u process and DDRII. In it's current state I can't really see an extra TMU making any difference, there would not be enough bandwidth to support it.
 
BoardBonobo said:
Maybe the addition of the second TMU will come along with the change to a .13u process and DDRII. In it's current state I can't really see an extra TMU making any difference, there would not be enough bandwidth to support it.

I agree.

That's why a second TMU & DDR II are important parts of NV30.
 
Too bad the fast DDR II won't quite make up for the 128-bit memory interface. If it does have 2 TMUs it will be a seriously unbalanced, bandwidth constrained card.
 
antlers4 said:
Too bad the fast DDR II won't quite make up for the 128-bit memory interface. If it does have 2 TMUs it will be a seriously unbalanced, bandwidth constrained card.

bingo.

I'm quite sure now the nv30 will have a 128bit bus... Nvidia must have hella effective HSR on the chip if it's going to be able to keep the pipes fed.
 
It will have a 256-bit bus (all since GeForce 256 have had that). That's what all the rumors say, and I believe them. It's just the memory interface that will be 128-bit.
 
You guys can keep guessing but NV30 will have a 256bit memory interface (as R300), DDRII memory and will be 8x2 (8 pipelines and 16 texture blocks).

Further than that, it's possible that it will be 32x8, rather than 64x4 and that's a more effective method.
 
antlers4 said:
It will have a 256-bit bus (all since GeForce 256 have had that). That's what all the rumors say, and I believe them. It's just the memory interface that will be 128-bit.

pardon me.. that's what I ment. :oops:
 
Supposedly the NV30 will indeed have a 256-bit bus.

Regardless, I had heard from these boards that the Radeon 9700 can process four bilinear samples per pixel pipeline per clock. This is apparently the #1 reason why the 9700 has good anisotropic performance, as well as good quality.

If the pixel pipelines were modified so that the texture filtering power was the same, but could instead address two different textures per clock (Similar to the move from the GeForce DDR -> GeForce2 GTS), then the majority of the benefit would be felt when anisotropic filtering and FSAA are disabled. That is, anisotropic filtering would often actually use those extra samples per clock, and may impose limits on how effectively the second texture per clock could be put to use.

Toward that end, I'm not so sure that the move to an 8x2 architecture, given the same texture filtering power, is a good one. While it is true that it is possible to make better use of the texture filtering power available, it would require a much more flexible design than the GeForce4 has. Otherwise it wouldn't help anisotropic filtering at all, and if it doesn't help out anisotropic filtering, with the fillrate that these cards have, it's pretty much useless to have the second TMU.

If nVidia indeed has a second TMU, I'm really hopeful that they made the pixel pipelines more flexible than the GeForce4's.
 
The fact that Anand insists that NV30 will be faster "on paper" and has talked about a 2nd texture unit does, indeed, leave one with the impression that NV30 will have 2/pipeline.

I'm really not sure why it's believed that NV30 will run on a 128-bit memory interface, given the amount of discussion that has taken place over the last 8 weeks or so.

Honestly...other than specifics/particulars, the general NV30 specifications should be pretty well understood at this point in time. About the only things that are clearly unknown are things like "Accuview II", "Lightspeed IV", etc. (the marketing names will very likely change to account for a new generation architecture, however).
 
From all the info I supplied previously, here is a short rundown on the specs of NV30:

0.13
120 million transistors
8 pipelines
16 texture blocks (opposed to 8 on R300, but the same amount as on Parhelia 512)
That sums up to 8x2 (as opposed to 8x1 on R300 and 4x4 on Parhelia 512)
400mhz core clock speed (the number may change, but its highly unlikely)
Fillrate: 6400 Mtexel/s (opposed to 2400 Mtexel/s on R300)
Memory Bus: 256bit (it's currently unknown whether it's 64*4 as on R300, or 32*8 what I'm currently leaning towards and what is effectively a much better solution)
Memory Type: DDR II
Memory Speed: 400-500mhz (note: it all depends on the model, Samsung will probably be unable to deliever 500mhz to Nvidia in time, so 480mhz for the top model is a big possibility my source says).
Bandwidth: 26-32gb/s (depending on what the memory speed will eventually turn out to be)
Effective Bandwidth: 48gb/s (taking into account HSR, etc)

The mysterious per primitive processor won't make it's appearance in NV30 just yet...

The AA & Aniso methods employed in NV30 are currently a mystery to both my source and to many other people, except Nvidia people of course :D

Don't expect any info about this till the card is announced.

Again, I'm 99.9% sure in this info, why not 100%? cause nothing is certain till the card is announced.
 
Typedef Enum said:
I'm really not sure why it's believed that NV30 will run on a 128-bit memory interface, given the amount of discussion that has taken place over the last 8 weeks or so.

Because none of the information indicating otherwise has come from verifiable, official sources. While there has been information (somewhat old, but not predating the nv30 design process) from verifiable (if not official) sources that nVidia considered a 256-bit memory interface "overkill" for this generation of GPUs. There has also been marketing spin that this generation of GPUs would be limited more by computational power than by other constraints, such as bandwidth. There is also nVidia's history of producing bandwidth-starved GPUs, especially in the first version of a new architecture.
 
16 texture units does not automaticly mean that they will be arranged in an 8x2 architecture.

Not it doesn't, but I do know it's 8x2, so in that case it's justified.

It's pretty weird actually how the 4x4 arrangement on Parhelia 512 is so ineffective...
 
There is also nVidia's history of producing bandwidth-starved CPUs, especially in the first version of a new architecture.

Amen. Milk the early adopters, then release a follow up ala geforce DDR. Hope it works for em this time tho, shoppers learn from experience and the f@nboy base won't be enough to support em this time around.
 
ram said:
alexsok said:
Not it doesn't, but I do know it's 8x2, so in that case it's justified.

Doh, then there would be no way the part can dynamicly allocate TMUs ...

Actually, as I said previously, it did at one point have the ability to dynamically allocate TMUs (yes, with 8x2), but it no longer has it.
 
If anything, bandwidth limited and overfillrated graphics cards have been pretty successful. If in the end they are still faster, consumers will buy them and not give a damn about how its architecture is not -balanced-.
 
The issue here is supposed to be about the R300.

What I want to know, is, With the R300's pipeline design... once they add higher clocked DDR II... will the lack of a second TMU really hold them back???

It seems to me that anand has no idea what he is talking about. and he is only saying things like that becuase he *assumes* that the Nv30's 2 Tmu design is inherently faster. Thus even a higher clocked R300 will be at a disadvantage.

I just dont think this will be the case 99% of the time. Of course it depends a great deal (as far as comparrisons go) what the Nv30 ends up with for bandwidth, and the flexability of their pipelines.

I just want someone with the technical knowledge to address the issue. just from the Radeon's side. Will a second TMU, given increased bandwidth, give the R300 a noticable performance gain over the same bandwidth/clock increase and only one tmu.
 
Status
Not open for further replies.
Back
Top