Dynamic Range and the Human Eye

Discussion in 'Architecture and Products' started by Dave Baumann, Feb 29, 2004.

  1. bloodbob

    bloodbob Trollipop
    Veteran

    Joined:
    May 23, 2003
    Messages:
    1,630
    Likes Received:
    27
    Location:
    Australia
    Maybe my eye sight is really good or maybe 8 bit compentents aren't enough but I can see the rectangles and you can see banding much better when it moves.

    *Removed image it was too small*

    Its not so much that we need more then 16 million colours we don't its that we need more resolution in the luminance you couldn't easily tell the difference if I had slightly altered the luminace.
     
  2. Rolf N

    Rolf N Recurring Membmare
    Veteran

    Joined:
    Aug 18, 2003
    Messages:
    2,494
    Likes Received:
    55
    Location:
    yes
    These slides do nothing but demonstrate just how full of it the peeps at Microsoft are these days.

    Is high dynamic range beneficial for graphics? Yes, certainly. Does Microsoft in any way show an understanding of why this is so? No. Whoever authored these slides doesn't have the slightest clue. He/she/it/them doesn't even seem to understand the difference between range and precision. Duh.
     
  3. rwolf

    rwolf Rock Star
    Regular

    Joined:
    Oct 25, 2002
    Messages:
    968
    Likes Received:
    54
    Location:
    Canada
  4. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    the sRGB color space, irrespective of dynamic range, cannot represent the full gamut of the human visual system, and most of our display devices are curtailed in the color reproduction ability.

    It's not the total number of colors, it's the right colors. We actually see less than 16 million colors.
     
  5. Reverend

    Banned

    Joined:
    Jan 31, 2002
    Messages:
    3,266
    Likes Received:
    24
    I'll have somethuing to say about this but I'm in a hurry to meet a client now... later.
     
  6. Magic-Sim

    Newcomer

    Joined:
    Nov 14, 2003
    Messages:
    99
    Likes Received:
    0
    Location:
    Calais (France)
    LeGreg : You shloud defintely play with this demo ( http://www.daionet.gr.jp/~masa/rthdribl/ ) with graphical hardware taht don't revert to FP16 for any reason based on the age of the captain ;)


    FP24 computations with FX8 output just renders things correctly, and accurately. Moreover, the pupil effect is quite realistic.

    I just wonder what would 10bits/component output would do in a real case scenario.....
     
  7. Captain Chickenpants

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    446
    Likes Received:
    14
    Location:
    Kings Langley
    Slightly off the rendering issues, but standard CRT monitors are analog, there is no 8 bit per channel limit on CRT's.
    The precision that a monitor can resolve will be limited by the behaviour of the electron guns and the support circuitry of the monitor. Generally I suspect there will be some kind of filters to suppress noise, which may reduce precision. The big issue will be when the noise overrides the precision. Given that VGA signal voltage is 0.7V when noise reaches 1/256 of that voltage (at a high enough current) then it will start affecting the output value.

    CC

    CC
     
  8. JohnH

    Regular

    Joined:
    Mar 18, 2002
    Messages:
    595
    Likes Received:
    18
    Location:
    UK
    Well, good CRT projection tubes are capable of contrast ratios of around 15000:1, so is roughly equivalent to 14 bits, although most of this comes from the fact that for black CRT's emit no light (unlike most "digital" display technologies). On a side note, the perception of banding at 8 bits isn't actually that great, and generally speaking the difference between say 8 and 10 bpc is not percievable unless you go specifically looking for it.

    Generally speaking, as far as final output colour is concerned I'm reasonably convinced that greater precison gains little (=more fractional bits), and what is really required is the ability to go brighter than "white" in order to produce more realism, this can't be achieved by changing the digital representation alone...

    John.
     
  9. Magic-Sim

    Newcomer

    Joined:
    Nov 14, 2003
    Messages:
    99
    Likes Received:
    0
    Location:
    Calais (France)
    it is indeed hard to see the banding effect on final output if the computation is good enough.
     
  10. Dio

    Dio
    Veteran

    Joined:
    Jul 1, 2002
    Messages:
    1,758
    Likes Received:
    8
    Location:
    UK
    Actually, that's quite handy, because it means you don't need to read the back buffer and check the results immediately (therefore forcing a pipeline sync).
     
  11. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    I wonder, will we see some "global" registers with shader 4.0? For things like min/max/avg/std deviation over a whole frame.
     
  12. LeGreg

    Newcomer

    Joined:
    Nov 1, 2003
    Messages:
    239
    Likes Received:
    3
    [OT]
    Totally. That's why you have some latitude on how you turn the "brightness/contrast" of your monitor. If it was 8 bits straight, you would have a lot of artefacts by tuning those or you would have a "fixed" setting.
    I'm more concerned that the whole chain is really 8 bits (sometime less if it is not correctly tuned or if you display on a projector or bad quality TV).

    [/OT]

    Sorry for off topic

    LeGreg
     
  13. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Global registers would be very cool. A general way to do this would be to have optional calculations performed on the framebuffer at scanout or buffer swap (doing it at buffer swap would probably be easier). I'm just not sure it would readily fit into the shader paradigm.
     
  14. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    Some scratchpad ram would be nice. But you will need monitors, semaphores, or other paradigms to allow some algorithms to serialized access, plus some atomically guaranteed operations, like compareANDSet, etc. Of course, this will cause slowdowns when used, but hopefully it would be used sparingly.

    Another option is to forgo fixed memory, and implement a Linda-like globally shared FIFO stream, where pipelines can construct n-tuples, place them into the FIFO, and other pipelines can consume one n-tuple, do an operation, and put it back. This fits more with the GPU stream processing paradigm and avoids the deadlocks and hazards of dealing with monitors, semaphores, etc.
     
  15. Basic

    Regular

    Joined:
    Feb 8, 2002
    Messages:
    846
    Likes Received:
    13
    Location:
    Linköping, Sweden
    You've already got a scratchpad - the PS register bank. Just add "global registers" in HLSL, so a register can keep its value from one pixel to the next. The syntax should be such that you can keep the value from one shader to the next.

    When you're ready rendering, you've got the value spread in registers for as many pixels as you can have in flight simultaneously. Flush them into a render target with a shader that just writes the special register. Then merge those values in a last stage. It should be few enough values to do it on the CPU now, if it's to hard to do it on the GPU.

    If the IHV don't want to reveal how many pixels they have in flight, and how they are ordered, they could provide a way to hide the final merging. Where the developer just says what function to use.

    This will of course only work for associative operations, but you'd have a hard time to make any solution for non-associative operations when there's lots of PS running in parallel. God thing is that most operations you'd want to do in this global manner are associative. (min, max, sum, ...)
     
  16. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    Personally Id say put OutVertex and OutFragment instructions in the vertex shader ... the rest follows naturally, although not easily :)
     
  17. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    Problem is, the "merging" function could be arbitrarily complex. Just consider AVG(). Each pipeline would have to keep a running sum, and counter. At the end, the sums would have to be summed, as well as the the counts, plus a division. Thus, you'd have to introduce a whole new shading syntax for the final combiner, one that doesn't let you refer to individual registers in each pipelines, but have aggregrate functions.

    Let Gn, n=0, n=16 be a global register bank as you described. Let G0 hold a running sum, and G1 hold a running count.

    The final combine would have to work something like return SUM(G0)/SUM(G1) (implicit in the final combiner is that SUM(...) is an aggregate function that operates over ALL G0 registers , no matter how many separate ones there are.

    This is a nice easy addition and I'd add it,, but IMHO, it doesn't go far enough because it severely restricts the algorithms that can be run.
     
  18. Basic

    Regular

    Joined:
    Feb 8, 2002
    Messages:
    846
    Likes Received:
    13
    Location:
    Linköping, Sweden
    Yes, it puts a limit on what "merging" you can do, but you can't do much better with a scratchpad. The problem is that you've got a number of parallel processes that want to access this scratchpad. Serializing the access would be very costly. The separate processes runs in sync, so every access to the scratchpad would get the penalty of waiting for all the processes. The best you could expect is a performance hit of <#-of-pipelines> times <latency-of-FPU-op>, for the region that access the scratchpad.

    I don't know the typical FPU latency in a GPU, but since there's little incentive to reduce it, it could be a few clocks, let's say 4. On a 8 pipeline GPU, that would mean that each of the instructions from the first scratchpad read to the final write to it (depending on the read), would cost 32 clocks. That's a lot.

    So you'd pretty much need to restrict the scratchpad to "per thread use", which means that it's almost the same thing as I proposed. The only difference beeing that the scratchpad could be a bit smaller than the sum of all registers made global.

    And best of all, the way I proposed would be pretty much free hardware-wise.
     
  19. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    But how would you decide to merge the values? You'd need to dramatically limit the available instructions that could operate on the global register (I guess you'd have to limit it to an "add" instruction). There may also be issues with multisampling, since the number of times a pixel shader is executed is not equal to the number of pixels on screen (or even an integer multiple of the number of pixels on screen).
     
  20. arjan de lumens

    Veteran

    Joined:
    Feb 10, 2002
    Messages:
    1,274
    Likes Received:
    50
    Location:
    gjethus, Norway
    For summation/averaging over an entire frame, the fastest you can do today is, AFAIK, render to a pbuffer, run auto-mipmap generation on the pbuffer after rendering and then apply the resulting 1x1 mipmap.

    Global registers are generally difficult to implement efficiently; you basically break the pixel parallellism in the pixel shader. If you want to preserve pixel parallellism with something close to 'global registers', you are restricted to global operations with certain limiting characteristics: You calculate one or more numbers per pixel, which are then fed to a combine operator. This operator may combine either pixel values or earlier combine results in a recursive manner. The order in which values are combined MUST be allowed to be unspecified, even nondeterministic - which requires the combine operator to be commutative and associative, or else you get meaningless results. This allows the operator to work first within each pixel pipeline, with full parallellism available, then do a final combine pass afterwards. Some simple examples of useful combine operators would be e.g. SUM, XOR, MIN, MAX; more complex operators are possible, but potentially difficult to implement support for. A very limited version of this functionality already exists in modern immediate-mode renderers with occlusion queries, where the per-pixel computed value is the number of samples covered by the pixel, and the operator is SUM. Also, the histogram and minmax functions in the OpenGL pixel transfer pipeline may be implemented with this kind of functionality.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...