Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Old 03-Jun-2002, 17:50   #76
MfA
Regular
 
Join Date: Feb 2002
Posts: 5,242
Send a message via ICQ to MfA
Default

Crossbars take too much area, the future is on chip switching fabrics.
MfA is offline  
Old 03-Jun-2002, 17:58   #77
pascal
Senior Member
 
Join Date: Feb 2002
Location: Brasil
Posts: 1,790
Default

What is a switching fabric?
pascal is offline  
Old 03-Jun-2002, 18:26   #78
MfA
Regular
 
Join Date: Feb 2002
Posts: 5,242
Send a message via ICQ to MfA
Default

Its the circuitry needed for routing data.

So instead of a crossbar or centralised mailbox system (can be a register set, or embedded memory) as means of communication between nodes you give them their own switching fabrics and put them in a network ... the most natural topology of the network being a mesh of course.
MfA is offline  
Old 03-Jun-2002, 18:40   #79
pascal
Senior Member
 
Join Date: Feb 2002
Location: Brasil
Posts: 1,790
Default

I think I understand.

Many possibilities of use:
- Have the multiples processors work like a virtual systolic system with cell to cell communication (using the switch fabric) with minimal centralized communication (possible bottleneck).
- Have the processors work each one in an different tile.
- Some multigroup multicast capability could be usefull to transmite data and programs saving bandwith and latency too.

It is all programming 8)
pascal is offline  
Old 03-Jun-2002, 20:54   #80
mboeller
Member
 
Join Date: Feb 2002
Location: Germany
Posts: 846
Default

I just searched for polygon-compression too ( cause of Kristof's comment's about Kyro ). Here are the links I found :

http://www.gvu.gatech.edu/gvu/people...abstracts.html
http://www-grail.usc.edu/pubs.html
http://www.comp.nus.edu.sg/~tulika/publications.htm
http://staff.ncst.ernet.in/~dinesh/r.../geomComp.html (with link to Java3D and VRML compression )
http://www.cc.gatech.edu/gvu/modeling/compression.html
mboeller is offline  
Old 04-Jun-2002, 01:30   #81
arjan de lumens
Senior Member
 
Join Date: Feb 2002
Location: gjethus, Norway
Posts: 1,256
Default

Some thoughts around the sea-of-DSPs approaches that people here suggest:
  • You still need hardwired texturing units for acceptable 3D performance. Breaking the texturing operation (with trilinear interpolation, perspective division, texture coordinate clamping and scaling, compressed texture unpacking, etc) into a sequence of instructions in a standard RISC/CISC/VLIW instruction set will quickly degrade performance by a factor of 15-30 relative to what is acheivable with a similarly clocked hardwired texture mapper. Even SIMD instructions won't help this. One possible solution is to let each DSP have its own texture mapper, accessible through specialized instructions, but this still leaves the problem that the texture mapper, while fully pipelined, has a quite high latency (perhaps ~10-20 clocks) and that you really want it to be utilized as much of the time as possible.
  • There are numerous small tasks performed in a standard fixed-function renderer (rasterization, setup of gouraud colors/texture coordinates, alpha test, fog, stencil test, dithering, polygon Z offsetting etc) which will quickly add up to a substantial number of instructions needed per pixel if done by DSPs. This overhead, along with texturing, is what kills the 3D performance of a pure sea-of-DSPs approach compared to the traditional hardwired pipeline. Again, SIMD doesn't really help very much.
  • Distribution of program code to each DSP can be ugly, unless each of them has a large enough instruction cache. Perhaps not a big problem, but caches cost transistors.
  • Unless you set the DSPs up to function as a systolic array of some sort, the direct interconnects between them probably isn't particularly important for performance. At most, a simple mesh interconnect should suffice for most practical uses. If one DSP produces lots of data that another DSP consumes, it is probably appropriate to let them share a high-bandwidth local memory block.
  • Off-chip memory access patterns would likely be extremely irregular, producing lots of DRAM page breaks all the time. So a crossbar DRAM controller would be absolutely necessary for half-decent performance. Also, the way the DSPs access memory data must be strictly controlled at all times, otherwise you wil run into a total data coherency nightmare.
arjan de lumens is offline  
Old 04-Jun-2002, 01:40   #82
MfA
Regular
 
Join Date: Feb 2002
Posts: 5,242
Send a message via ICQ to MfA
Default

Stream processing can deal quite well with high latencies. Locality from a global point of view is mostly dependent on the algorithm ... a naive memory interface would jump from here to there, but by accumulating and reordering/batching memory accesses you can remove that problem and only worry about the algorithmic side (a crossbar would not suffice IMO).
MfA is offline  
Old 04-Jun-2002, 02:31   #83
Ailuros
Epsilon plus three
 
Join Date: Feb 2002
Location: Chania
Posts: 7,769
Default

Dave, ushac, mboeller,

Thank you very much indeed. Those should keep me busy a couple of days
Ailuros is offline  
Old 04-Jun-2002, 08:03   #84
Vince
 
Join Date: Apr 2002
Posts: 2,158
Default

Here's one on Gamasutra about Dense Meshes (>80K) and how to represent and store them. (Actually talks of Wavelet compression)

http://www.gamasutra.com/features/20...ickhill_01.htm


http://www.vrml.org/WorkingGroups/vrml-cbf/cbfwg.htm


Quote:
You still need hardwired texturing units for acceptable 3D performance. Breaking the texturing operation (with trilinear interpolation, perspective division, texture coordinate clamping and scaling, compressed texture unpacking, etc) into a sequence of instructions in a standard RISC/CISC/VLIW instruction set will quickly degrade performance by a factor of 15-30 relative to what is acheivable with a similarly clocked hardwired texture mapper. Even SIMD instructions won't help this. One possible solution is to let each DSP have its own texture mapper, accessible through specialized instructions, but this still leaves the problem that the texture mapper, while fully pipelined, has a quite high latency (perhaps ~10-20 clocks) and that you really want it to be utilized as much of the time as possible.
It begs the question to be asked, at what point will textures cease to be necessary? Texture Mapping is an approximation/substsitute for geometric detail because the processing power wasn't there. With the advancement in lithography and the huge increase in transistor counts afforded, when - if ever - will texture be replaces by geometry and high-level vertex (traingle = pixel in size = sub-pixel accuracy, why Fragment shade?) shading? Disregarding OD, you need to sustain around 80M triangles a second to cover a 1280*1024 screen @ 60hz. If you can solve the bandwith and storage problems, when - or again, will, this happen?

EDIT: Ohh yeah, you can't do the TCU thing because it's a catch 22 scenario. Your drastically increasing transistor count, which means that (assuming your bound to a set process, say 0.13um) it'll come at the sacrifise of your array and loose programmability. Programmable power is directly related to transistor counts, and thus lithography limts, and as such is ultimatly controlled by Moore law. Unless you can break it threw technological advancment or Multichip.
Vince is offline  
Old 04-Jun-2002, 08:30   #85
MfA
Regular
 
Join Date: Feb 2002
Posts: 5,242
Send a message via ICQ to MfA
Default

If you stop using textures its time to stop using explicit surface representations altogther, they are tied at the hip ... when you go that far its time to switch to point-clouds.
MfA is offline  
Old 04-Jun-2002, 08:55   #86
gking
Member
 
Join Date: Feb 2002
Posts: 130
Default

Quote:
It begs the question to be asked, at what point will textures cease to be necessary? Texture Mapping is an approximation/substsitute for geometric detail because the processing power wasn't there
Probably never. There are enough uses for texture mapping that would require a wasteful amount of geometry power to emulate using just geometry that you are unlikely to ever see it go away.

If you ignore real-time raytracing, the only way to get good real-time reflections is with texture mapping. And, even if a good way for doing geometric reflections were discovered, doing blurry reflections purely in object space would be virtually impossible (you could supersample the reflections; however, that would have all the same problems associated with accumulation-buffer based depth of field and motion blur algorithms). On the other hand, MIP mapping textures is an ideal solution to the problem.

I think another part of the problem is a mentality that most (all) programmers share -- don't be wasteful. Even if a machine had infinite resources, and could handle everything in object space, you would still be likely to see image-space algorithms and texturing featured prominently, because they do exactly what you want easily and cheaply. Texturing isn't broken, so we probably shouldn't focus on fixing it.
gking is offline  
Old 04-Jun-2002, 12:47   #87
pascal
Senior Member
 
Join Date: Feb 2002
Location: Brasil
Posts: 1,790
Default

Quote:
Originally Posted by arjan de lumens
Some thoughts around the sea-of-DSPs approaches that people here suggest:
  • You still need hardwired texturing units for acceptable 3D performance. Breaking the texturing operation (with trilinear interpolation, perspective division, texture coordinate clamping and scaling, compressed texture unpacking, etc) into a sequence of instructions in a standard RISC/CISC/VLIW instruction set will quickly degrade performance by a factor of 15-30 relative to what is acheivable with a similarly clocked hardwired texture mapper. Even SIMD instructions won't help this. One possible solution is to let each DSP have its own texture mapper, accessible through specialized instructions, but this still leaves the problem that the texture mapper, while fully pipelined, has a quite high latency (perhaps ~10-20 clocks) and that you really want it to be utilized as much of the time as possible.
  • There are numerous small tasks performed in a standard fixed-function renderer (rasterization, setup of gouraud colors/texture coordinates, alpha test, fog, stencil test, dithering, polygon Z offsetting etc) which will quickly add up to a substantial number of instructions needed per pixel if done by DSPs. This overhead, along with texturing, is what kills the 3D performance of a pure sea-of-DSPs approach compared to the traditional hardwired pipeline. Again, SIMD doesn't really help very much.
  • Distribution of program code to each DSP can be ugly, unless each of them has a large enough instruction cache. Perhaps not a big problem, but caches cost transistors.
  • Unless you set the DSPs up to function as a systolic array of some sort, the direct interconnects between them probably isn't particularly important for performance. At most, a simple mesh interconnect should suffice for most practical uses. If one DSP produces lots of data that another DSP consumes, it is probably appropriate to let them share a high-bandwidth local memory block.
  • Off-chip memory access patterns would likely be extremely irregular, producing lots of DRAM page breaks all the time. So a crossbar DRAM controller would be absolutely necessary for half-decent performance. Also, the way the DSPs access memory data must be strictly controlled at all times, otherwise you wil run into a total data coherency nightmare.
You are right, let me try to address your concerns:
1- It must not be a pure DSP like RISC. It could be a graphics specialized RISC (microcoded RISC). Was not the Vérité V1000 a risc like processor?
2- Each processor could have a small 1T-SRAM local memory (64KB or 128KB) to store the programs and some data.
3- Tasks could be distributed by a task scheduller processor.
4- It will be like a graphics pipeline then each processor (or group of processors) will run a specialized program determined by the scheduller.
5- The processors of the pipeline will communicate with each other using a switching fabric. The switching fabric is necessary to give flexibility. Some multicast capabalitie could be usefull.
6- One internal 4MB 1T-SRAM with high bandwith could be used as a large cache to the RISC farm.

Well, I am not a graphics expert but I think some of the people here could think/design something better.
pascal is offline  
Old 04-Jun-2002, 13:27   #88
ram
Member
 
Join Date: Feb 2002
Location: Switzerland
Posts: 218
Default

Quote:
Originally Posted by arjan de lumens
I stll find the results they claim hard to believe, since their methods are similar to the methods PNG uses (delta predictors+Huffman for both approaches) yet the compression ratio they claim is far beyond what PNG achieves, even though a PNG encoder, unlike a hardware framebuffer compressor, has practically infinite time available to compress the data.
They tested 24-bit RGB and compressed per color channel. The reason they give for the high compression ratio is that one of their test scenes, the "Atlantis" part, is all shades of blue, therefore they get an unusual high compression ratio for this scene which results in a higher average. The others scenes gave compression rations between 2:1 and 3:1.

Quote:
Also, Huffman encoding/decoding is a highly serial process which is difficult and really expensive to parallellize (due to variable-length symbols; you cannot even begin to correctly encode/decode a symbol correctly until you are done encoding/decoding all preceding symbols), and so is rather badly suited to hardware implementation.
There are other compression algorithms too which should yield to similar compression rations. Nvidia and Ati said they used DDPCM and an algorithm with loseless 4:1 z-compression. Would be interessting to see what kind of compression you get if using this algorithm for RGB data. I still think color buffer is a viable option for the future, even if it is expensive to parallize. Available silicon space is growing much faster than memory bandwith, so "wasting" logic even for relativly small gains like a 2:1 to 3:1 compression for RGB is IMO worth the logic.
ram is offline  
Old 04-Jun-2002, 13:29   #89
ram
Member
 
Join Date: Feb 2002
Location: Switzerland
Posts: 218
Default

Quote:
Originally Posted by pascal
You are right, let me try to address your concerns:
1- It must not be a pure DSP like RISC. It could be a graphics specialized RISC (microcoded RISC). Was not the Vérité V1000 a risc like processor?
It had a RISC core, yes. IIRC it was used for triangle setup stuff and the rendering pipe was hardwired.
ram is offline  
Old 04-Jun-2002, 13:32   #90
Gunhead
Member
 
Join Date: Mar 2002
Location: a vertex
Posts: 354
Default

I agree with gking about texures.

I'd like to add about the "hack" nature of texturing: although, say, a bumpmap emulates "true" geometry and a lightmap emulates "true" lighting, there are other cases where a texture doesn't emulate anything but simply represents the original colouring of the object's surface. So I believe texturing won't need to go away.

And a highly usable (but "simple" to employ) function like render-to-texture would be difficult to replace with something else, I guess?
Gunhead is offline  
Old 04-Jun-2002, 14:47   #91
arjan de lumens
Senior Member
 
Join Date: Feb 2002
Location: gjethus, Norway
Posts: 1,256
Default

Quote:
Originally Posted by ram
They tested 24-bit RGB and compressed per color channel. The reason they give for the high compression ratio is that one of their test scenes, the "Atlantis" part, is all shades of blue, therefore they get an unusual high compression ratio for this scene which results in a higher average. The others scenes gave compression rations between 2:1 and 3:1.
For such a scenario as "Atlantis", depending on how smoothly the color changed, compression rates of 4:1 to 10:1 may possibly be attainable; the other scenarios sound substantially harder to get good compression ratios out of, but 2:1 might be attainable with substantial effort.
Quote:
There are other compression algorithms too which should yield to similar compression rations. Nvidia and Ati said they used DDPCM and an algorithm with loseless 4:1 z-compression. Would be interessting to see what kind of compression you get if using this algorithm for RGB data. I still think color buffer is a viable option for the future, even if it is expensive to parallize. Available silicon space is growing much faster than memory bandwith, so "wasting" logic even for relativly small gains like a 2:1 to 3:1 compression for RGB is IMO worth the logic.
The 4:1 compression NV/ATI claim for Z compression is a best-case number; AFAIK, the typical compression ratio they reach is closer to 2:1 for most practical scenes. DDPCM works well on untextured surfaces or textures with smooth gradients or large uniformly colored patches - for highly detailed textures, it breaks down. If RGB compression is taken into use, I would expect the first methods to be used to be a bit cruder than Huffman, in order to allow better parallellism more easily.

A full parallel Huffman decoder capable of decompressing, say, 256 bits per clock (which is needed if 64-bit color or 8 pipelines are desired) is doable with a technique called parallel prefix computation, but such a circuit takes tens of millions of transistors. You want it? Pay up.
arjan de lumens is offline  

 

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
The Way its Meant to be Reviewed? Dave Baumann Beyond3D News 266 31-Dec-2003 16:24
Ace's Hardware on current and future consoles zurich Console Technology 1 15-Dec-2003 04:39
Which Future Hardware? Dave Baumann 3D Architectures & Chips 31 24-Nov-2003 16:16
Game benchmarks vs benchmark applications Patric Ojala 3D Architectures & Chips 0 16-Oct-2003 13:38
+'s/-'s and feasability of lengthened hardware release cycle JavaJones 3D Architectures & Chips 12 09-Mar-2002 16:56


All times are GMT +1. The time now is 21:58.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.