NV50 specs?

What do you think will have the bigger jump in performace/technology?

  • NV30-NV40?

    Votes: 0 0.0%
  • NV40-NV50?

    Votes: 0 0.0%
  • r9700-r9900?

    Votes: 0 0.0%

  • Total voters
    299
Uttar said:
AFAIK, NVIDIA targets the NV60, and not the NV50, to have a 512-bit memory bus.
Oh, and completely OT, talking of huge amounts of external memory: what I'd like to see, personally, is varying speeds of memory for different types of data. For example, you could have 512MB of low-end mainstream DDR for textures, 128MB of super-mega-high-end GDDR3 for the Z buffer maybe, and then another 128MB for the Z-Buffer.

Or, you could just go 3DLabs' way: varying memorybus width (example: http://www.3dlabs.com/product/wildcat4/specs.htm)


Uttar

I have to agree with Uttar here and don't really understand why this hasn't been implimented yet.

This would allow you to sell a card that performs almost identically to a topend card at a mainstream pricepoint. 64mb of super-fast memory for frame/z buffering etc and the balance as cheaper memory for textures.

This would help out far more than fancy pixel/vs shaders do IMO and help IQ also (more texture mem means more textures, higher quality textures). Textures are the primary way of making a 3d game look good and I don't see that changing significantly for a long time to come. The more memory available for texturing onboard the card the better.
 
The main problem with segmenting memory into frame, Z, vertex and texture memories on separate buses is that it makes render-to-texture (as well as a number of OpenGL 2.0 super-buffer techniques) annoyingly slow. Also, you lose the ability to dynamically distribute memory bandwidth load between different functions on demand (e.g. if you are doing Z-only rendering for Doom3-style shadows, you will want as much bandwidth as possible to the Z-buffer, whereas color buffer and texture bandwidth are less important; if you are doing 16x anisotropic mapping or FP32 textures, you will want all the texture bandwidth you can get, with framebuffer bandwidth being less important; and so on).
 
This would allow you to sell a card that performs almost identically to a topend card at a mainstream pricepoint.

If Nvidia or Ati ever did such a thing they would be filing for Chapter 7 Liquidation Protection within months. A good chunk of income for both firms comes from high-end cards (low volume albeit high margin), and that would kill those sales.[/quote]
 
kenneth9265_3 said:
IMHO, NV50/R500 won't even see DX10 or 4.0 shaders, that pleasure will belong to the NV60/R600. Everything is pointing to an 2006/7 launch date.(XBox 2/DX10/Longhorn) I predict the XBox 2 will have a GPU will have a performance between that of a R500 and a R600. It will be faster than a R500 but have DX10 features. Plus I bet the R600 will be called Radeon X (don't ask how I would know that, you'll would laugh :LOL: ).

PS.... All of this is base guesses and a lots of surfing of the internet. None based on actual facts and I am not a expert, but a gamer. :)

Oversimplyfied and concentrated on shaders alone, 4.0 is more or less PS/VS3.0 with a unified grid and unlimited resources.

I rather doubt that either R500/NV50 will not be either DX-Next compliant or damn close to it. That's not the real question either; the real question to me is if Microsoft will follow a similar policy with DX-Next as with dx9.0. If the API will once again incorporate more than one HW generations, then it boils down to what level of DX-Next compliance we'd be actually talking about.

IMHO the main focus of the future API is or will be rather in the topology/tesselation department; here I personally expect requirements and capabilities to scale between generations more than anything else.
 
Megadrive1988 said:
while NV50/R500 development continues, Nvidia and ATI now have 1 free team each, for working on NV60 / R600. 8)
After reading DB messages, i thought that there was only 1 team working on PCs @ Ati right now, the other working on Xbox2. am i wrong?
 
arjan de lumens said:
The main problem with segmenting memory into frame, Z, vertex and texture memories on separate buses is that it makes render-to-texture (as well as a number of OpenGL 2.0 super-buffer techniques) annoyingly slow. Also, you lose the ability to dynamically distribute memory bandwidth load between different functions on demand (e.g. if you are doing Z-only rendering for Doom3-style shadows, you will want as much bandwidth as possible to the Z-buffer, whereas color buffer and texture bandwidth are less important; if you are doing 16x anisotropic mapping or FP32 textures, you will want all the texture bandwidth you can get, with framebuffer bandwidth being less important; and so on).

You would reserve 4 - 8 mb of the high speed ram as a texture cache and do render to texture etc there.

akira888 said:
This would allow you to sell a card that performs almost identically to a topend card at a mainstream pricepoint.

If Nvidia or Ati ever did such a thing they would be filing for Chapter 7 Liquidation Protection within months. A good chunk of income for both firms comes from high-end cards (low volume albeit high margin), and that would kill those sales.
[/quote]
Hardly. Both ATi and nVidia get most of their profits from the mainstream/value market. Highend products barely earn them anything in comparison. If anything, they would end up more profitable.
 
radar1200gs said:
arjan de lumens said:
The main problem with segmenting memory into frame, Z, vertex and texture memories on separate buses is that it makes render-to-texture (as well as a number of OpenGL 2.0 super-buffer techniques) annoyingly slow. Also, you lose the ability to dynamically distribute memory bandwidth load between different functions on demand (e.g. if you are doing Z-only rendering for Doom3-style shadows, you will want as much bandwidth as possible to the Z-buffer, whereas color buffer and texture bandwidth are less important; if you are doing 16x anisotropic mapping or FP32 textures, you will want all the texture bandwidth you can get, with framebuffer bandwidth being less important; and so on).
You would reserve 4 - 8 mb of the high speed ram as a texture cache and do render to texture etc there.
And what if "4 - 8 mb" is not enough? Performance suddenly falls off a cliff. If you don't use the "4 - 8 mb" then you're wasting memory.
This would allow you to sell a card that performs almost identically to a topend card at a mainstream pricepoint.
A chip with 3 memory controllers would be cheap? And how much would the packaging cost? What about the extra board costs?
 
OpenGL guy said:
radar1200gs said:
arjan de lumens said:
The main problem with segmenting memory into frame, Z, vertex and texture memories on separate buses is that it makes render-to-texture (as well as a number of OpenGL 2.0 super-buffer techniques) annoyingly slow. Also, you lose the ability to dynamically distribute memory bandwidth load between different functions on demand (e.g. if you are doing Z-only rendering for Doom3-style shadows, you will want as much bandwidth as possible to the Z-buffer, whereas color buffer and texture bandwidth are less important; if you are doing 16x anisotropic mapping or FP32 textures, you will want all the texture bandwidth you can get, with framebuffer bandwidth being less important; and so on).
You would reserve 4 - 8 mb of the high speed ram as a texture cache and do render to texture etc there.
And what if "4 - 8 mb" is not enough? Performance suddenly falls off a cliff. If you don't use the "4 - 8 mb" then you're wasting memory.
This would allow you to sell a card that performs almost identically to a topend card at a mainstream pricepoint.
A chip with 3 memory controllers would be cheap? And how much would the packaging cost? What about the extra board costs?

4 -8 mb should be enough most of the time. The memory would be dynamically allocated. Perhaps there should be 96mb of fast memory, giving a 32mb fast texture cache. That would certainly be plenty.

I doubt the memory controllers would increase cost that much. It isn't that different from cpus with L1 -> L2 -> main memory controllers and they manage just fine.

If the board is intelligently laid out I'd say the productions costs would go down somewhat not up, since its likely the portions of the board using the slower memory could be implimented with only 4 layers, rather than the 6 or more highend circuits require.
 
radar1200gs said:
I doubt the memory controllers would increase cost that much. It isn't that different from cpus with L1 -> L2 -> main memory controllers and they manage just fine.
Current CPUs have L1 and L2 on-die, and older CPUs with off-die memory used to have much narrower buses and 1-2 orders of magnitude less bandwidth than today's high-end GPUs. Hardly a good comparison.
If the board is intelligently laid out I'd say the productions costs would go down somewhat not up, since its likely the portions of the board using the slower memory could be implimented with only 4 layers, rather than the 6 or more highend circuits require.
No way. The board layer count is constant throughout the entire board. If you have any bus fast/noisy/wide/dense enough to require 12 layers, over no matter how small area, then the entire board will be 12 layers. No more, no less.

Other than that, the main factor that determines layer count is the number of routing layers and power supply layers needed directly beneath the GPU itself - the number of layers needed is about proportional to GPU pin count, which in turn is proportional to the sum of the width of all its buses + a factor dependent on its power draw.
 
It really doesn't matter if the caches or on or off die. Back in socket 7 days and earlier L2 cache was on the mainboard. It still did its job. The crossbar controller part shouldn't matter either - nearly all modern motherboards use a twin bank (2 bar carossbar) memory controller. I think the CPU comparison is quite valid.

Re: the layers, yes currently if boards require a maximum of say 12 layers then the whole board is made 12 layers, but it doesn't have to be that way, it can be changed - someone just has to do it, thats all.
 
radar1200gs said:
It really doesn't matter if the caches or on or off die. Back in socket 7 days and earlier L2 cache was on the mainboard. It still did its job. The crossbar controller part shouldn't matter either - nearly all modern motherboards use a twin bank (2 bar carossbar) memory controller. I think the CPU comparison is quite valid.
Hardly. There's a huge difference between internal and external RAM. Probably the biggest factor is latency. Imagine how much latency you need to hide for each of these controllers you are proposing. It's less of an issue with on chip RAM because you can make that as fast as you please.
Re: the layers, yes currently if boards require a maximum of say 12 layers then the whole board is made 12 layers, but it doesn't have to be that way, it can be changed - someone just has to do it, thats all.
Good luck with that.
 
This all about affordable performance, not outright performance. If you want outright performance you buy the highend product with expensive ram throughout. I'l guarantee that given the choice 90%+ of consumers won't though.

A little latency won't matter that much, and latency can be covered as Intel has shown with the P4.

There is nothing difficult or mystical about reducing layers of a pcb - the boards are essentialy laminated together, you just don't physically put layers where they aren't required, and no its not that difficult to line everything up properly - if you can line up the copper traces of the layers you can certainly line up the layers themselves.
 
You're still talking about more total bits of memory for the same performance. That won't be cheap.
 
I'd imagine such a board would use GDDR3 for fast memory and GF3 class DDR memory for texturing. Densities on plain DDR should be pretty good by now, so you wouldn't need many chips depending on how much texturing memory you wanted.

Also consider that this would be a mainstream product with the volumes and resulting economies of scale that go with mainstream products. That will lower the cost all by itself.
 
radar1200gs said:
Re: the layers, yes currently if boards require a maximum of say 12 layers then the whole board is made 12 layers, but it doesn't have to be that way, it can be changed - someone just has to do it, thats all.

Um... Perhaps you should do some research on how circuit boards are manufactured. Simple put: A simple board is X layers. There are no part that have less than X layers and no parts that have more than X layers. \

OTOH, if you think you know of a way to do it, then patent it and make loads of money.

Aaron Spink
speaking for myself inc.
 
radar1200gs said:
There is nothing difficult or mystical about reducing layers of a pcb - the boards are essentialy laminated together, you just don't physically put layers where they aren't required, and no its not that difficult to line everything up properly - if you can line up the copper traces of the layers you can certainly line up the layers themselves.

Except you'll take 2 steps back wrt critical dimensions, you'll have bad impedance matching, you'll be using a non-standard process, it will cost 4-10x more money... Do I need to go on? There is a reason PCBs are manufactured the way they are.

Aaron Spink
speaking for myself inc
 
radar1200gs said:
I'd imagine such a board would use GDDR3 for fast memory and GF3 class DDR memory for texturing. Densities on plain DDR should be pretty good by now, so you wouldn't need many chips depending on how much texturing memory you wanted.

Also consider that this would be a mainstream product with the volumes and resulting economies of scale that go with mainstream products. That will lower the cost all by itself.
Except you never considered how much more the packaging would cost. Each interface requires physical pins on the chip. If you have too many pins, then you have to increase the chips area to fit them all which means you are increasing the chip's cost. Also, I believe there are some restrictions on how large a package can physically be.

Everything you are suggestion would increase costs, not reduce them.
 
Because you can always interleave the data onto memory channels it eventually comes down to how much total bandwidth are you going to get.

With one 600MHz GDDR3 8Mx32 chip and six chips 300MHz GDDR 16Mx16 you get the same bandwidth as 8 chips of 375MHz GDDR 16Mx16, and a helluva lot more of headache balancing it as a sidedish. Simply not worth it.
 
OpenGL guy said:
radar1200gs said:
I'd imagine such a board would use GDDR3 for fast memory and GF3 class DDR memory for texturing. Densities on plain DDR should be pretty good by now, so you wouldn't need many chips depending on how much texturing memory you wanted.

Also consider that this would be a mainstream product with the volumes and resulting economies of scale that go with mainstream products. That will lower the cost all by itself.
Except you never considered how much more the packaging would cost. Each interface requires physical pins on the chip. If you have too many pins, then you have to increase the chips area to fit them all which means you are increasing the chip's cost. Also, I believe there are some restrictions on how large a package can physically be.

Everything you are suggestion would increase costs, not reduce them.

Well I don't know about the Radeons but the 5700 supports GDDR3, DDR2 and DDR all off of the same chip so I doubt its quite the problem you are making it out to be.
 
Um, supporting the different memory interfaces isn't the problem. Having traces between multiple separate memory busses is.
 
Back
Top