Next-gen specs

Dave

Newcomer
I've just been doing some thinking lately about next-gen 3D chips. I'm curious what everyone thinks they should include, not only in general specs, but also advanced features. Now I'm not asking what everyone wants to see, I'm just asking what everyone expects to see. Be specific too, even if it is something at the core architecture.
 
Hrm, does anyone know how difficult it would be to design a chip that can do a per frame switch between TBR + OD, IM + OD (HyperZ style) and IM? That'd be interesting. If the T&L is being done on the GPU wouldn't it be able to do a quick and dirty calculation to figure out which would be faster and if it's on the CPU, then it could be supported via an extention, hopefully it won't be hard to implment. If some game has some compatibility issues, the user could force a certian mode of operation or stop the card from using whichever mode is causing the problem. I just thought it'd be a neat idea.

I also would like to see greater than or equal to 32bits internal rendering percision.

I would really really really like to see an OPEN STANDARD for hardware texture compression/decompression (bring back FXT1, if possible, seeing as it's already done) and vertex compression/decompression.

To tell you the truth, right now I'm more interested in bettering older features before "moving" onto new ones, I'm not concerned about shaders. I would like to see fill rate go up by improving the amount/use of bandwidth, I want to see compression schemes that will free up bandwidth to allow better FSAA. I'd also like to see more aggressive filtering methods. As soon as these happen, bring on more features.

Also, what's the feasibility (minimal increase in price and additional perfomance hit relative to current FSAA methods taking equal number of samples) of implmenting a SS FSAA method that will change the number of samples and their orientation on a 4*4 grid based on depth and location on the triangle. I know smoothvision comes close to this, but I believe the sampling pattern doesn't adapt to location on the triangle and I don't think it reduces the number of samples based on distance.
 
Also, what's the feasibility (minimal increase in price and additional perfomance hit relative to current FSAA methods taking equal number of samples) of implmenting a SS FSAA method that will change the number of samples and their orientation on a 4*4 grid based on depth and location on the triangle. I know smoothvision comes close to this, but I believe the sampling pattern doesn't adapt to location on the triangle and I don't think it reduces the number of samples based on distance.

I don't see how that could work at all - for normal supersampling operation, you would, for a given pixel, use the same set of sample points regardless of depth. If you don't do this, you end up Z testing and rendering different sets of sample points for different polygons touching the pixel, which would cause rendering errors. In particular, if you have two polygons share a common edge on the screen and you don't strictly force the two polygons to use the same sample pattern for each given pixel, you get a rather nasty-looking seam between them - if the sample point set used is not identical for both polygons, you end up with some sample points covered by both polygons and other covered by neither. The only time you can safely decide sample points for a pixel is before you render to it, and at that time you cannot know the depth of the pixel.

Edit: Messed up quote tag

<font size=-1>[ This Message was edited by: arjan de lumens on 2002-03-01 23:43 ]</font>
 
I expect next-gen graphics processor to give me very high quality anisotropic filtering and pretty good AA quality at a very low cost in speed (0-10%) at 1024*768*32 85hz under virtually all circonstances.

I would expect core speed to be much faster then what we have today with much more processing power for geometry and pixels. Extreme numbers mean nothing if it's for a simple transform with a single texture.

I would also expect memory bandwidth to go much higher then today, possibly with a 256 bits memory bus or larger. Integrating the memory chip die near the gpu core die in a single chip package will be great, but I expect it too happen.

Like someone else said, I expect current features to become more usable then they are today. The rest should go with much more flexible and programmable shaders.

Frankly, I'm dissapointed with the performences of current hi-end graphic cards, they are not worth the money.
 
According to many rumours floating around on the net we're not going to see a deferred renderer in the upcoming next generation GPUs, so I'm assuming we're stuck on IMRs.

I expect more bandwith.
With that I mean 2 things; more raw bandwith and more usable bandwith (better efficiency).
More raw bandwith could be provided in many ways:
1) increasing external data bush width
2) large onchip-memory

The point one could be accomplished doubling the external data bus wires from 128 to 256 bits or with multichip configurations. It seems an expensive solution at this time so I don't expect this. Point two requires to use non-standard design and cells and requires special foundry libraries and processes. It could be very expensive too if it's going to significantly increase die area.
I believe the only 2 candidates to release a DX9 compliant part in a short time, ATI and NVIDIA, will not going along these routes.
(I'll be happy if someone is going to tell me I'm wrong).
I expect them to virtualize every single pool of bits allocated on external memory. They should cache everything on chip. so I see a large die area devoted to sram (they should move to 0.13 micron process and this should provide some more spare die area to play with). The problem here is to increase data access locality in space (better efficiency on memory) and time (better use of on-chip caches). How? I don't know :smile:
It seems both ATI and NVIDIA use some kind of hierarchical z-buffer, so I expect a better use of it and maybe could be possibile to extract and store informations about scene depth and other useful things from a previous frame to try to make some useful speculation on the current frame.

Obviously we'll have more efficient and effective AA and anistropic filtering, without a big hit on performance.

The hw will be way more programmable with more complex (and faster) vertex and pixel shaders. As I wrote in another thread, I believe the pixel pipeline as a concept should be abandoned and replaced with a lot of independet (full pipelined) functional units and a smart control/dispatch/issue unit (almost like a modern cpu but way more specialized in its tasks)

ciao,
Marco
 
A 10+ GFLOPS fully programmable geometry engine. Optimized for vertex shaders, but it should be general enough to be able traverse scenegraphs, tesselate subdivision surfaces etc. Something like Imagine Stream, but with 4 way SIMD fp, or Bops. Give it a dedicated lighting unit for the standard lighting model though. It would probably be best to give it its own mechanism for sampling&amp;filtering displacement maps.

Still thinking about the rasterizer.
 
Why is everyone so concerned about speed hit, when talking about FSAA and anisotropic filtering.

Raw numbers are obviously more important.

If its between having a gpu that can do
scene 1 (no fsaa and filtering) at 300 fps
scene 2 (full features) on at 60 fps

and a 2nd gpu
scene 1 80fps
scene 2 60fps

then count me in for the first, even though the second has a much lower percentage hit.
 
Multichip scalable solution

chip One (service chip):
-AGP interface
-dual 400MHz RAMDAC
-2D video control
-fast crossbar switch
-3 HT link to chip Two, Three and Four
-Its own 32MB DDR
-HDTV
-video capture with mpeg-2
-VR interface
-firewire
-some fancy functions

chip Two (3D GPU):
-256bits 300MHz DDR interface
-12MB edram
-fully programmable RISC T&amp;L
-fully programmable multipipe RISC rasterizer
-1 HT link to chip one.

Chip Three and Four: the same as Two

Combine it as you wish :cool:

<font size=-1>[ This Message was edited by: pascal on 2002-03-02 05:15 ]</font>
 
> Fred

I don't need 300fps, but I do HATE jumpy and non constant frame rate like we have today, high frame rate usually makes those less frequent or noticable. Of course it would help if some game programmers would put a little bit more efforts into managing data flows. But anyways, you don't get that kind of high frame rate with today's complex engine, so if the price for good AA and aniso is too high then it won't be usable.

Beside I want next-gen to reach a point were image quality features like good aa and very high texture filtering are what everybody expect. Just like high resolutions, 32 bits, mutitexturing is today. What worth is 9129341827378253fps with low-res 8 bits non-textured flat shaded polygons even if card A do it faster then card B ?

I would also like to see constant frame rates with vsync, that's an old dream of mine, but that one goes against the culture of computer games and even if it wasn't the case it's not simple to achieve in an unstable/variable pc environment :cry:



<font size=-1>[ This Message was edited by: SMarth on 2002-03-02 05:55 ]</font>
 
Improving the first post.
Multichip scalable solution

chip One (service/image/geometry chip):
-AGP interface
-fast/low latency crossbar switch with multicast
-4 HT links to chip Two, Three, Four and Five
-Its own 256MB 250MHz DDR 128bits
-Virtual memory management
-fully programmable RISC T&amp;L (geometry)
-HDTV
-fully programmable video/image RISC capable to video capture with mpeg-2
-dual 400MHz RAMDAC
-2D video control
-VR interface
-firewire
-some fancy functions

chip Two, Three, Four and Five (3D rasterizer):
-64MB 128bits 300MHz DDR
-12MB edram
-fully programmable multipipe RISC rasterizer
-sthocastic multisampling FSAA.
-1 HT link to chip one.

Starting with one service/geometry chip and two rasterizers.
 
2D should go where it belongs, and thats on the motherboards. We need dedicated 3D only cards. I expect to see in actuality the first attempts at higher bit precision.
 
MB is not a good place for display chip. It is hard to preserve signal quality on a motherboard.
 
How about a dual kyro2 at .15 micron?
it would be only 25mill resistors ,still very small sharing one ddr 128 bit bus at 250 core?
maybe they could also add a programmable tnl and also free high qual aniso , wallah the perfect gpu :D
 
Not exactly a feature, but a chip using eDRAM like what BitBoys promised would be cool to solve bandwidth problem.

Higher precision of everything would also be cool.
 
MB is not a good place for display chip. It is hard to preserve signal quality on a motherboard.

after the DAC, you mean. well, it may not be an issue once the DVI gets fully adopted.
 
pascal,

wow, sounds like the PS2 :smile:

EE = fully programmable "service chip"
GS = rasterizer w/eDRAM

I know Sony's design didnt work out exactly as well as they hoped, but they were on the right track. Not bad for 1999 ;)

zurich
 
Maybe like Rampage.

My 1999 card was the TNT2 16MB :smile:

Probably multichip solution is not viable in the current marketplace. Lets redesign the chip:
- year 2003
- AGP 8X
- 12MB .13 micron edram 20GB/s
- 300MHz DDR 128bits 10GB/s
- 125 milions polygons/sec programmable geometry engine
- 4 dual pixel programmable processor pipeline
- 8 stage loopback
- 64bits precision
- DX9 and OpenGL 2.0

Well, it is just a dream :rollseyes:
 
The geometry engine should be massively parallel with SIMD FUs - something like 16 FMACs, 4 pipelined FDIVs, and 4 pipelined FSQRT units.

Maybe support simultaneous processing of multiple vertex programs at a time (better FU utilization), or at least support for executing multiple independant instructions simultaneously.

Support for more sophisticated control flow,
creating destroying vertices/triangles, displacement mapping, traversing scenegraphs, tesselation, etc...

Output of the geometry processing should go to a special HZ buffer. This buffer would store data ready to render in a spatial heirarchy (very much the way a tiler bins geometry).

The data stored would consist of geometry and associated state information. Geometry entering the buffer could throw out geometry fully occluded by it. Bounding volume occlusion queries would be supported.

The rasterizer would be able to pick geometry to render from the buffer based on required state changes as well as screen location.

The actual buffer would consist of say 4MB eDRAM - a large intelligent cache between the geometry and rendering processors. It would give some of the benefits of tilers without having to go the fully defferred rendering route.

The rasterizer :
- dynamically allocateable TUs and FUs
- high quality anisotropic filtering
- 64bit fp internal precision for everything
- ability to work on pixels from more than one triangle at a time
- more flexible frame buffer format (as in an F-buffer or R-buffer)
- better FSAA (adaptive number of samples per pixel, jittered sample positions)
- FUs separated from the concept of a pixel pipe. in the programmable model, a pixel pipe would basically consist of state specific to a pixel (like a register file and a program counter for instance).
 
Back
Top