Are you ready ? (Nv 30 256 bit bus ?)

Vince said:
Um, thats my point. If nVidia's new architecture infact does not require an increase in bandwith, like the other IHVs, to feed nothing more than a brute-force architecture thats just more parallelization and concurrency - then why design it around something thats not necessary at this point?

Of course reading between the lines of certain comments from nVidia representatives recently regarding DDR2 and GDDR3 memories the implication is that they are aiming for a memory subsystem running at speeds approaching 500Mhz. This would naturally be a significant increase in bandwidth from their previous 'brute-force' generation... unless you choose to believe that they are also moving to a 64 or 32-bit bus... ;)

If they are increasing their bandwidth then this implies that their new architecture requires the increased bandwidth... unless you also want to believe that they are putting the extra bandwidth on for no reason at all and just feel like wasting some money...
 
Chalnoth said:
If nVidia can indeed deliver with a card that outperforms the Radeon 9700 with a 128-bit bus (even in the higher FSAA modes...which wouldn't be easy...), then nVidia is in a far, far better position with the NV30 architecture than ATI is with the R300 architecture. Since nVidia sells chips, they will have an easier time having board designers pay more for them, since board designers won't have to pay as much to put out a similarly-performing product. Compound this with the cheaper packaging for the nVidia chip, and you have a win-win situation all around.

While it does appear that nVidia's choice to go with .13 micron was bad in the short-term, it almost certainly was a good idea in the long-run. That is, ATI will have to do much of the work in did with the R300 all over again when it moves to .13 micron.

Everything comes at a cost. What will it take nV to do that. Super fast DDR2 ram? How much does that add to the cost? What about die cost? Are you shure that a single nV30 is cheaper than a signle R300? What about the loss of time to mark? ATI has a 4 month head start on DX9 parts and will soon launch more "value/midstream" products which are all based off the same or a simular R300 core? The R300 seems to be somewhat scalable. Whats the cost for NV to match them? Will they have cheaper/slower memory? Will then even have any other products intitally other than the nV30? Yea lots of unknowns here. No way to tell for sure until we see how some of this plays out.

Also just because nV had issues with the .13u process does not mean jack about ATI and bringing an R300 design to .13u. Yes when ATI does that they will have some raod blocks. Heck the could even have more issues than nV did. One thing that is fairly certian is that some of the process issues of building a part with 110+ million transistors on a .13u die is something the have less of a chance to encounter. Also you/we have no idea how well the part will shrink. If you design your part good enough then you can get close doing a die shrink with out "major" tweaking. Once again too many unknows. And yes nV is ok as they seem to have gotten over this hurdle which is good thing.
 
DaveBaumann said:
‘Efficiency’ doesn’t have to be left purely to the crossbar and ‘the gods’ you can control efficiency to some degree with caching – why do you think KYRO’s or P10’s tile sizes are the size they are? They caching the pixels to be optimal for their memory bus.
We are talking about very different things here, a TBR can just burst out its frame buffer tile with very high efficiency and I don't think mem access granularity has a so big impact when the hw is just writing a tile to memory. I would take as much more important parameter the mem page size in that case.
On a IMR things are much more difficult, and yes, caching (and coalescing reads/writes to ram) helps a lot..but reality is that you're wasting tons of bandwith if the hw reads 32 bytes at time when you'd need much less.
Gf1/2 read data at 16 x 2 = 32 bytes at time...and I know you are well aware of the big impact multiple and concurrent memory accesses with better granularity had with the gf3 introduction.

Likewise your texture caches can also be optimal to make efficient use of the available memory bus.
Of course. that's why we don't want to read tons of texel that will be never reused. Texture mapping access memory pattern can be well known in advance from the hw, and texture cache needs to be HUGE to make a real difference from a little cache cause you can't reuse texels more than a given number of times (depending on the filter/recontruction kernel)...with the exception of dependent texture accesses.
My thought is simple, increase bandwith is good, but good granularity on memory accesses is good too, and when you increase the first without keeping 'small' the second, the bandwith used for real is going to be far from the max bandwith. Probably the tipical amount of data the hw needs to fetch/store (in a single random access) for dx9 style applications is bigger than the actual one so things are not so bad...so let wait and see :)

ciao,
Marco
 
Vince said:
T2k said:
Ridiculous - just like yours. :p
Nobody talk about your rendering ways - I don't see if something faster is available (in terms of pricing, etc), why we should stay with the oldschool one and rather do some crazy optimizations only. Imagine when optimization and the faster bus are together: plenty of room to do something! But you'd say no, thanks - congratulations.
This idea is REALLY ridiculous for me. :devilish:

Wow, your not comprehending this. What I'm saying is that architectural and computational effeciency can minimize the necessary bandwith requirements by several times. If your rendering and perhaps acessing only what you see - bus bandwith drops quite steeply.

Cool. And?
As I said: if you can get more room to grow (grow = new features, added funtions), you'd say 'No, thanks?'

Congradulations, you're basically saying that fundimentally changing how an IMR renders is somehow "old-school" when doing a brute-force doubling of the pin count for the memory bus is... let me guess... "new-school" or revolutiuonary? I find this quite an odd PoV.

:rolleyes:
My second sentence started like this Nobodu talk about your rendering ways - so why do you act like a child?
'Oldschool' means 128bit.
I hope now it's clear.

And yours? All that would do is: slow down the evolution, generations. Thanks for that.

What? Slow down the evolution? The move from 128->256 bit bus is not an evolutiuon with a true future. Are they going to move to 512bit in a year? And then 1024bit the year after? And then 2048bit in 2005?

Maybe it hurts you but it's still true: The move from 128->256 bit bus IS an evolutiuon with a true future.

As I see, you can't agree this. Why??

On the other hand, DDRII will evolve - and quite rapidly to provide ever increasing bandwith.. not just a one-time 2X increase. Lithograhy advances (Ram) quite abit faster than manufacturing (packaging) ability.

Yes, will evolve - on 256bit, believe me.

You know what? I bet the next NV-chip will sports 256-bit bus. And do you bet on 128-bit? :p :LOL:

Interesting. Will NV now prefer the 'elegancy'? Wow. Totally contradict its history.

Times change, their no longer restricted by a legacy architecture, they've had an infusion of engineers with differing perspectives on the entire rendering 'pipeline' - I think it's possible.
[/quote]

Yes, it's true. Just it'll a little bit hard to explain why we prefer suddenly these numbers... ;) ;)
 
Vince said:
On the other hand, DDRII will evolve - and quite rapidly to provide ever increasing bandwith.. not just a one-time 2X increase. Lithograhy advances (Ram) quite abit faster than manufacturing (packaging) ability.

You are correct. DDRII will not provide a one-time 2X increase. It won't even provide 2X at all. The current roadmap for DDRII tops out around 550-600MHz. At that point you run into I/O signaling issues and clock uncertainty issues with the bidirectional strobe. Since DDRI is already past 300MHz, and can go higher (vendors have 375MHz rated DDRI parts), you're looking at maybe a 1.5X increase, total. Add into the mix the relatively horrible parameters of DDRII (higher latency, longer bus turn around, longer row cycle time), and you need higher frequencies just to match the same effective performance of a DDRI device.
 
nAo said:
We are talking about very different things here, a TBR can just burst out its frame buffer tile with very high efficiency and I don't think mem access granularity has a so big impact when the hw is just writing a tile to memory. I would take as much more important parameter the mem page size in that case.
On a IMR things are much more difficult, and yes, caching (and coalescing reads/writes to ram) helps a lot..but reality is that you're wasting tons of bandwith if the hw reads 32 bytes at time when you'd need much less.
Gf1/2 read data at 16 x 2 = 32 bytes at time...and I know you are well aware of the big impact multiple and concurrent memory accesses with better granularity had with the gf3 introduction.

P10 is an IMR, but operates on a 8x8 tiled basis.
 
DaveBaumann said:
P10 is an IMR, but operates on a 8x8 tiled basis.
Yeah, I know. According some nvidia patents even nvdia ROP works on 8x8 pixel tiles.(I addressed that when I wrote about coalesced reads/writes) Do you believe an IMR would fill a 8x8 tile with 'good' datas all the time? I don't.

ciao,
Marco
 
Vince said:
Moving to a 256-bit bus - while your probobly right and the overhead is significantly dilluted compared to where it once was - has NO long term future. You get a 2X bandwith advantage [ideally] and thats it - DDRII will scale and provide them with added bandwith for quite sometime.

So, whatcha think?

I think ATI's R300 core is capable of using DDR-II memory as well. So, no matter what speed DDR-II memory nVidia decides to use, ATi can always buy the same memory and still have a 2X raw bandwidth advantage, if nVidia actually has chosen to go the 128-bit route.

In fact, I think this is a perfect example of your "pay now reap benefits later" philosophy.
 
Bigus Dickus said:
I think ATI's R300 core is capable of using DDR-II memory as well. So, no matter what speed DDR-II memory nVidia decides to use, ATi can always buy the same memory and still have a 2X raw bandwidth advantage, if nVidia actually has chosen to go the 128-bit route.

Wow, common now, are you totally missing my point?

When an advanced group sits down and maps out a next generation architecture, if they're planning around an architecture which can possibly reduce bandwith traffic by 400%, then why would they even think of incorperating a 256-bit bus?

If nVidia has architectural superiority in the NV3x, in computational and drawing/accessing effeciency, they they don't need memory sub-system parity with ATI to maintain a higher level of preformance.

Whats so difficult to see? I mean, if you were designing a new architecture and mapped out some IMR with advanced raster functionality that used temporal and spatial coherency, or deferred shading/rasterization, or whatever Ned Greene and the 3dfx/GP boys can dream up; why would you impliment something like a 256-bit bus thats totally not necessary as your chip isn't as dependant upon memory access as a competitors architecture.


Why should someone buy an SUV/Truck to transport 5 board members (Of which have much general knowledge thats not needed) to a meeting if they can instead keep their present car and just move 1 adult (with all knowledge needed for the meeting) around? So they can show off their new SUV? Hype up their new toy? Play the nomenclature game?

When the company expands in power sufficiently, the added capacity thats needed can be added - but why now?
 
Testiculus Giganticus said:
256-bit bus is a step forward, no matter how you look at it. All you have to ask yourself is:is it a big enough one? 8)

Given that no-one has had an answer for 9700 for the past 3 months and probably at least the next three then there is only one answer to that for now.
 
When an advanced group sits down and maps out a next generation architecture, if they're planning around an architecture which can possibly reduce bandwith traffic by 400%, then why would they even think of incorperating a 256-bit bus?

Well, as people have been point out here its obvious they do need a buttload of bandwidth if they are using 500MHz RAM - why aren't you listening?

Of course, here's another one for ya - when that advanced group sits down and starts thrashing around all the possibilities for their architecture 2 years down the line isn't it possible they may have just made the wrong decision as far as RAM goes? Afterall, their process decision wasn't that great...
 
Vince, you've completely changed your argument. Which is it?

You started by saying that DDR-II gives nVidia the advantage because they can scale to faster and faster memories in the future, instead of being stuck with slower DDR on a "one time 2X bang" 256-bit bus.

When shown that your argument doesn't hold water, you then switch and try to tell us that you were claiming that nVidia just doesn't need 256 bit (which you weren't, but nVidia does in any case).

As pointed out, the fact that the NV30 will likely use very fast DDR-II and have raw bandwidth within earshout of the R300 proves that it isn't designed to only need a 128 bit bus with mediocre memory bandwidth. nV simply chose fast DDR-II to get the bandwidth that the chip needed (assuming it is actually 128 bit, which it may not be, but is what this whole discussion assumes).
 
I think ATI's R300 core is capable of using DDR-II memory as well. So, no matter what speed DDR-II memory nVidia decides to use, ATi can always buy the same memory and still have a 2X raw bandwidth advantage, if nVidia actually has chosen to go the 128-bit route.

In fact, I think this is a perfect example of your "pay now reap benefits later" philosophy.

well said :D
 
Hey, wouldn't it be something of a shocker if Nv30 had a 512-bit bus?
Parhelia has a 512-bit bus I believe, does it not, but it's not a main memory one, as Parhelia's memory bus is 256-bit.
 
Megadrive1988 said:
Hey, wouldn't it be something of a shocker if Nv30 had a 512-bit bus?
Parhelia has a 512-bit bus I believe, does it not, but it's not a main memory one, as Parhelia's memory bus is 256-bit.
What's novel about a 512-bit bus? Parhelia's 512-bit bus is just a 256-bit DDR interface. Remember, since DDR is giving you twice as much data per cycle, you need to be able to absorb twice as much data on the chip side as well.

P.S. Radeon 9700 also has a 512-bit bus.
P.P.S. There's no way NV30 will have a 512-bit external bus, i.e. 512-bit DDR.
 
Bigus Dickus said:
Vince, you've completely changed your argument. Which is it?

Actually, I haven't: Both points are complimentary and have been seen throught the argument:

If the NV30 implimentation of CineFx does indeed have a deferred or otherwise exotic architecture, I find this whole 128-256bit kind of ridiculous. Back in the day, 3dfx/GP anticipated a 10X reduction is bandwith needs by using a region based deferred rendering scheme - this is my only starting point. Even if nVidia can only achieve 40% of that, using the same underlying/baseline architecture, they'd achieve NOTHING by moving to a 256-bit bus except for added costs and having their supporters toting around higher numbers/nomenclature. - Vince, Page 3

And then Page 4 in responce to Dave Baumann:

Moving to a 256-bit bus - while your probobly right and the overhead is significantly dilluted compared to where it once was - has NO long term future. You get a 2X bandwith advantage [ideally] and thats it - DDRII will scale and provide them with added bandwith for quite sometime.

You started by saying that DDR-II gives nVidia the advantage because they can scale to faster and faster memories in the future, instead of being stuck with slower DDR on a "one time 2X bang" 256-bit bus.

When shown that your argument doesn't hold water, you then switch and try to tell us that you were claiming that nVidia just doesn't need 256 bit (which you weren't, but nVidia does in any case).

Totally false, I stated with the OTHER argument- as seen above on page 3 that nVidia will have an exotic arcitecture that doesn't need the 2X bandwith increase (at a cost) that a 256-bit bus provides.

Only after talking with Dave did I state that I can see why an advanced devlopment goup at Nvidia would rely on DDRII and not a "One time 2X bang" at a cost that a 256-bit bus provides when DDR or DDRII provides enough bandwith.

Where do you people get this from?

As pointed out, the fact that the NV30 will likely use very fast DDR-II and have raw bandwidth within earshout of the R300 proves that it isn't designed to only need a 128 bit bus with mediocre memory bandwidth.

Um, or could nvidia have just seen that DDRII is the future of memory, and as their history will show, they ALLWAYS jump on the bleeding-edge bandwagon. If not with DDR, then pushing Lithography to the extreme (0.22, 0.18, 0.15, and now 0.13um).

Beyond the reasons I already stated, Has anyone thought of technical reasons why they would choose not to use a 256 bit bus? Just curious. If they remain at 4 main pixel pipelines - is it worth increasing the crossbar granularity to 64-bits*4 ways? Is their any use in keeping it at 8*32 if they remain at 4 pipes?

I mean, you people are great at bitching, but you still haven't answered wny a 256-bit bus is necessary (especially in the face of an architectural departure from the typical IMR) or even explored why they wouldn't use it other than bitching at nVidia for incompetence.
 
There are many different technologies that 3d vendors can use to address memory bandwidth. A wide 256 bit bus is currently a good one. There are many others.

Hardware designers take a look at their memory bandwidth needs, look at the most suitable technologies to meet those needs in the given time frame of their design, choose one, and create the design.

At any given point in time, different vendors will choose different approaches, not just on memory bandwidth solutions, but on many design decisions. It is not as though engineers design a chip that is starved for memory bandwidth, or that has far more than necessary.

Chip performance tends to be mostly limited by transistor count and frequency. As those increase, 3d vendors pick what they think will be the most appropriate set of technologies to meet the necessary memory bandwidth and use that.

That said, a sudden step function of double the memory bandwidth is nice to have in a design.
 
Back
Top