If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
|
|
#1 |
|
Member
Join Date: Feb 2002
Posts: 100
|
What frame rate would you achieve at a resolution of 1600x1200, a frequency of 325 Mhz and just one pixel pipeline if you could actually draw one visible pixel per clock?
|
|
|
|
|
#2 |
|
Member
Join Date: Feb 2002
Posts: 772
|
1600 x 1200 = 1,920,000 pixels on screen = 1,920,000 pixels per frame.
325,000,000 cycles per second x 1 pixel per cycle = 325,000,000 pixels per second. (325,000,000 pix/sec) / (1,920,000 pix/frame) = 169.27 frames/sec
__________________
Looks like it was option "B." Sigh. |
|
|
|
|
#3 |
|
Join Date: May 2002
Location: New York, NY
Posts: 12,678
|
Just don't forget that many modern scenes will apply many textures per pixel, will compute the final color for a pixel in multiple passes, or make use of transparent surfaces.
What all of this means that if you took a real game scene from, say, Unreal Tournament 2003, and put it through hardware that was capable of outputting each pixel only once, it might still need to use many clocks per pixel just to get the processing done. |
|
|
|
|
#4 |
|
Member
Join Date: Feb 2002
Posts: 203
|
Ok, point made. What do you suggest? You've been an open proponent of deffered rendering solutions.
__________________
Like my Dad always says, "The day I can't do my job drunk, is the day I hand in my badge and gun." |
|
|
|
|
#5 |
|
Member
Join Date: Feb 2002
Posts: 100
|
The point is that there is a great deal of inefficiency yet in today's hardware. Improving the rendering efficiency provides a route to improving performance that does not necessarily require costly new processes, large numbers of pipelines, etc. Not that these aren't great to have, they are. Just that there is also still plenty of low hanging fruit that can come from improving rendering efficiency.
As an example, you might simply added 8k of frame/depth buffer cache to a standard IMR (about a 32x32 pixel tile's worth) , then recommend that developers sort their render in roughly tile order and roughly front to back within a tile region. Older titles that did not do this would still see some benefit from the cache while developers that took full advantage of it would get tiler-like performance with a standard IMR. For those developers that wanted to use application driven deferred rendering they could still render the scene twice, once without shading (to set the depth buffer) and then again with shading. Hierarchical z buffering would add even more benefit, especially if the upper levels were cached on the chip. I would recommend up to 5 levels (for quick elimination of large stencil polys, bounding volume occlusion checks, etc.). Providing for the use of z occlusion culling using bounding volumes to eliminate unnecessary hidden vertex and pixel processing. This becomes an ever increasing issue as triangle rates and scene complexity increase. It think it important to provide this capability as a standard feature across all 3d hardware vendors and APIs. Z occlusion culling works particularly well with 5 or more levels of hierarchical z to quickly determine the visibility of the bounding volumes. Using more efficient multisampling AA techniques such as Z3 or other coverage mask approach and sparse grid sampling, could provide 16x or even 32x near stocastic AA with little performance impact. It would correctly handle implicit edges and order independent transparency sorting to boot. There are still some improvements both in performance and quality that can be made in anisotropic filtering as well. Some of the ideas in the Feline approach would be useful. There are, of course, many other possibilities. Improving rendering efficiency has just begun to be tapped and offers all the vendors the opportunity for a great deal of performance improvement in the near term. |
|
|
|
|
#6 | |
|
Senior Member
Join Date: Feb 2002
Location: Somewhere not *that* rotten in Denmark
Posts: 1,197
|
Quote:
Anyway, I had this stupid idea recently about doing the sorting between the vertex and pixel level on a big Z-check onchip buffer before any texels are applied to the pixel (e.g. before any pixels are actually rendered). My lame idea was that you only had to keep the "pre-pixels" Z-value and thus could built up these pre-pixels data in the buffer and remove all the hidden ones based on their Z-values. When every pre-pixel is either rejected or accepted in the buffer, you would go on to actually render those pixels. But then I realized it doesn't make any bloody sense because you have to store a lot of data to go with each and every pixel that is about to be drawn.
__________________
Best regards, LeStoffer |
|
|
|
|
|
#7 | |
|
Naughty Boy!
Join Date: Feb 2002
Posts: 1,444
|
Quote:
|
|
|
|
|
|
#8 | |
|
Senior Member
Join Date: Feb 2002
Location: Somewhere not *that* rotten in Denmark
Posts: 1,197
|
Quote:
__________________
Best regards, LeStoffer |
|
|
|
|
|
#9 |
|
Senior Daddy
Join Date: Feb 2002
Location: London
Posts: 1,869
|
hmm another one of SA's famous hints?
Z3 AA (which I still dont understand fully even after having looked the the white paper) sounds a great implementation. |
|
|
|
|
#10 | |
|
Senior Member
Join Date: Jan 2002
Location: Abbots Langley
Posts: 732
|
Quote:
Err... I think that render order is one of the things that the developer should not have to care about... plenty of other things to worry about. We don't want developers to worry about low-level things like optimising per pixel HSR... this is one of the most basic features of 3D hardware and it should just work efficiently. I am sure that NVIDIA and ATI would prefer that developers start following the absolute basics optimisation rules. Just to give some examples: Do a flip rather than a blit from back buffer to front buffer (one is some pointer changes and one is a full memory copy)... submitting more than 2 polygons per draw primitive call... this all sounds trivial but if there are developers out there that can not even get this right, god only knows what will happen if you expect them to do the kind of sorting you suggested. Also I believe that ATI already has some kind of back-end tile-like buffer, IIRC this was promoted a bit by Marketing for 8500 ? K- |
|
|
|
|
|
#11 | |
|
Tea maker
Join Date: Feb 2002
Location: In the Island of Sodor, where the steam trains lie
Posts: 4,382
|
Quote:
__________________
"Your work is both good and original. Unfortunately the part that is good is not original and the part that is original is not good." -(attributed to) Samuel Johnson "I invented the term Object-Oriented, and I can tell you I did not have C++ in mind." Alan Kay |
|
|
|
|
|
#12 | ||
|
Junior Member
Join Date: Jul 2002
Posts: 67
|
Quote:
As if its really hard to come to a concluscion based on all the bits and pieces floating around the internet... |
||
|
|
|
|
#13 |
|
Senior Member
Join Date: Feb 2002
Location: gjethus, Norway
Posts: 1,256
|
I seem to remember an old block diagram of ATI's Rage128 chip with an 8 Kbyte framebuffer cache - if that old chip had it, I would find it likely that newer chips also have it, probably more than 8 KBytes as well. Also, AFAIK, most IMRs today already use tiled framebuffers, typically with 8x8 pixel tiles, presumably caching multiple such tiles.
Using bounding boxes on 3d objects to do optimizations on them is entirely possible, but requires extensive support at both API and application level. Rejecting bounding boxes based on hierarchical Z seems to be doable on IMR architectures only - and you need to sort the objects in front-to-back order (which probably precludes them from being sorted in tile order) to see this kind of benefit. Z3/coverage mask AA methods? Just wondering what the memory usage, performance hit and image quality on these methods are compared to e.g. ATI's multisampling implementation (compressed multisample buffer => fairly small performance hit). I am not really convinced that there are any really low-hanging fruit left to collect (other than perhaps better texture compression methods) - for now, it looks like compressing the multisample buffer was the last one that didn't require extensive API support. |
|
|
|
|
#14 | |
|
Senior Member
Join Date: Feb 2002
Location: Somewhere not *that* rotten in Denmark
Posts: 1,197
|
Quote:
If you're going for big benefits it would seem that you have to do some kind of sorting of either polygons or pixels into a list instead of just removing some hidden pixel based on Z-check along the way. And thus the question is: Is there any methode where you don't need a full scale sorting?
__________________
Best regards, LeStoffer |
|
|
|
|
|
#15 |
|
Gamerscore Wh...
Join Date: Jan 2002
Posts: 12,950
|
I think SA may be making a point here. If we remember back to the when the remain of 3dfx were purchased by NVIDIA you may remember a number of interviews at the time with NV's CEO, and others, stating that they doubt they would fully adopt the gigapixel deferred rendering approach, but there may be ways of marrying some of the benefits of the tiling approach with IMR's. Now, what SA is talking sounds like one of the possabilities that they were talking about at the time.
|
|
|
|
|
#16 |
|
Senior Member
Join Date: Feb 2002
Location: gjethus, Norway
Posts: 1,256
|
I don't quite see how. Either you do immediate-mode rendering, drawing polygons as you receive them, or you do deferred rendering, collecting polygon data for an entire scene before drawing any of it. To me, it would seem that anything between would inherit the disadvantages of both and the advantages of neither.
Sorting objects in near-tile-order gets difficult with objects that are larger than a tile or straddle tile boundaries - it seems to me that at best you get a rather small increase in the framebuffer cache hit rate (this would, in any case, not require changes to modern IMRs) |
|
|
|
|
#17 | ||
|
Epsilon plus three
Join Date: Feb 2002
Location: Chania
Posts: 7,767
|
Quote:
Quote:
(Simon correct me please if I'm wrong). On a sidenote can someone please add some more simple input on possible advantages of Feline algorithms? Last time a patent was posted I got lost even trying to read it *ahem*. |
||
|
|
|
|
#18 |
|
Senior Member
Join Date: May 2002
Location: germany
Posts: 1,217
|
arjan de lumens, SA has been carefully hinting that despite some people believing otherwise, there is still headroom left for performance improvement in current and future hardware accellerators, by increasing the rendering pipeline efficiency, which doesn't neccessarily mean the way polygons are being fed to the pipeline by IMRs or TBRs IMHO. So why not talk about how this could be achieved and go into where these tweaks might be possible? As an old tech lurker here I was hoping some of the more technically versed people could make some interesting comments to learn from...
|
|
|
|
|
#19 |
|
Senior Member
Join Date: Feb 2002
Location: gjethus, Norway
Posts: 1,256
|
I just do not see that there is all that much efficiency headroom left, at least not in IMR architectures running legacy applications. The tiled framebuffer cache seems to have been around for some time, at least since Radeon8500 and almost certainly much longer (voodoo?); Z3 may look better than 4xRGMS, but requires more per-pixel data for non-edge pixels (=problem in IMR, should work fine in TBR); bounding box optimizations are nice, but require application support; how well does the Feline algorithm perform compared to whatever method it is that ATI uses for anisotropic mapping (assuming that it isn't the very same algorithm)?
There seems to be an idea floating around here about an immediate-mode tiler architecture. Such a beast will require applications/games to be written such that they supply data in tile order. OK so far - here is the difficult part: it needs an efficient method for handling objects that straddle tile boundaries. |
|
|
|
|
#20 |
|
Irregular
Join Date: Feb 2002
Posts: 1,170
|
My guess: Z-only first pass...
With proper hw support it could be a killer feature. The question is not the number of pixel pipes, but the number of Z-operations possible per cycle, when the pixel pipelines are not used... |
|
|
|
|
#21 | |
|
Member
Join Date: Jun 2002
Posts: 305
|
Quote:
|
|
|
|
|
|
#22 | |
|
Senior Member
Join Date: Feb 2002
Location: gjethus, Norway
Posts: 1,256
|
Quote:
|
|
|
|
|
|
#23 | |
|
Irregular
Join Date: Feb 2002
Posts: 1,170
|
Quote:
No environment mapping computation, per-pixel lighting precalc, etc. It's quite a big saving. Also, games are still not vertex limited (not even UT2003). They could also increase the vertex processing power to make it possible (note, I said proper hw support.) |
|
|
|
|
|
#24 | |
|
Crazy coder
|
Quote:
|
|
|
|
|
|
#25 | ||
|
Member
Join Date: Jun 2002
Posts: 854
|
Quote:
|
||
|
|
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| J Allard talks more on Xbox 2 hardware (still vauge) | Megadrive1988 | Console Technology | 63 | 27-May-2004 20:44 |
| Xbox 2 Hardware 'LOCKED DOWN' - More Information in 2005 | Megadrive1988 | Console Technology | 56 | 24-May-2004 09:44 |
| The Way its Meant to be Reviewed? | Dave Baumann | Beyond3D News | 266 | 31-Dec-2003 16:24 |
| 3D Hardware Vendors Key to Microsoft | Dave Baumann | Beyond3D News | 22 | 09-Jun-2003 09:50 |
| +'s/-'s and feasability of lengthened hardware release cycle | JavaJones | 3D Architectures & Chips | 12 | 09-Mar-2002 16:56 |