Awesome graphics papers thread

rpg.314

Veteran
Design and Novel Uses of Higher-Dimensional Rasterization
Abstract said:
This paper assumes the availability of a very fast higher-dimensional rasterizer in future graphics processors. Working in up to five dimensions, i.e., adding time and lens parameters, it is well-known that this can be used to render scenes with both motion blur and depth of field. Our hypothesis is that such a rasterizer can also be used as a flexible tool for other, less conventional, usage areas, similar to how the two-dimensional rasterizer in contemporary graphics processors has been used for widely different purposes other than the original intent. We show six such examples, namely, continuous collision detection, caustics rendering, higher-dimensional sampling, glossy reflections and refractions, motion blurred soft shadows, and finally multi-view rendering. The insights gained from these examples are used to put together a coherent model for what a future graphics pipeline that supports these and other use cases should look like. Our work intends to provide inspiration and motivation for hardware and API design, as well as continued research in higher-dimensional rasterization and its uses.

This paper is seriously awesome. Just look at what they do with a higher order rasterizer. There has been a lot of work in the recent past on stochastic rendering with defocus and motion blur. Of course, doing these two and shading in image space means that you need decoupled sampling as well, so it's a pretty big change. But a lot of work has went into this field in a short period of time, and considering the possibilities sketched out, this *just might* be turned into hw. But then it's entirely possible that MS will not care, unless xbox 720 has anything substantially more than dx11.


EDIT: Recent papers only, PLEASE.
 
Last edited by a moderator:
@rpg: Thanks. Interesting paper.

BTW, Is anyone here going to be attending HPG?
 
The 3D display device of the invention is based on an octree structure of data pertaining to an object to be displayed. This structure is memorized in a memory associated with a cache memory sending blocks of data on a bus to which a geometrical processor and an image-generating circuit are connected. The geometrical processor generates the visible part of another octree corresponding to a target universe which may be positioned in any way in relation to the object universe (a cube enclosing all the data to be represented).
(http://www.google.com/patents/US5123084).
In this paper, we describe a parallel volume ray caster that eliminates thrashing by efficiently advancing a ray-front in a front-to-back manner. The method adopts an image-order approach, but capitalizes on the advantages of object-order algorithms as well to almost eliminate the communication overheads. Unlike previous algorithms, we have successfully preserved the thrashless property across a number of incrementally changing screen positions also.
(http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.1295).
 
GPU Accelerated Path Rendering

Abstract said:
For thirty years, resolution-independent 2D standards (e.g. PostScript, SVG) have depended on CPU-based algorithms for the filling and stroking of paths. Advances in graphics hardware have largely ignored accelerating resolution-independent 2D graphics rendered from paths.

We introduce a two-step “Stencil, then Cover” (StC) programming interface. Our GPU-based approach builds upon existing techniques for curve rendering using the stencil buffer, but we explicitly decouple in our programming interface the stencil step to determine a path’s filled or stroked coverage from the subsequent cover step to rasterize conservative geometry intended to test and reset the coverage determinations of the first step while shading color samples within the path. Our goals are completeness, correctness, quality, and performance—yet we go further to unify path rendering with OpenGL’s established 3D and shading pipeline. We have built and productized our approach to accelerate path rendering as an OpenGL extension

http://developer.nvidia.com/game/gpu-accelerated-path-rendering

IMO, I think a TBDR is much better suited for 2D graphics. On an IMR, a compute based implementation would be far better in terms of bandwidth.

It's telling that nvidia chose to implement a proprietary OpenGL extension instead of just implementing OpenVG and sharing buffers with OGL.
 
Softshell: Dynamic Scheduling on GPUs

Abstract said:
In this paper we present Softshell, a novel execution model for de-
vices composed of multiple processing cores operating in a single
instruction, multiple data fashion, such as graphics processing units
(GPUs). The Softshell model is intuitive and more flexible than the
kernel-based adaption of the stream processing model, which is cur-
rently the dominant model for general purpose GPU computation.
Using the Softshell model, algorithms with a relatively low local
degree of parallelism can execute efficiently on massively parallel
architectures. Softshell has the following distinct advantages: (1)
work can be dynamically issued directly on the device, eliminating
the need for synchronization with an external source, i.e., the CPU;
(2) its three-tier dynamic scheduler supports arbitrary scheduling
strategies, including dynamic priorities and real-time scheduling;
and (3) the user can influence, pause, and cancel work already sub-
mitted for parallel execution. The Softshell processing model thus
brings capabilities to GPU architectures that were previously only
known from operating-system designs and reserved for CPU pro-
gramming. As a proof of our claims, we present a publicly avail-
able implementation of the Softshell processing model realized on
top of CUDA. The benchmarks of this implementation demonstrate
that our processing model is easy to use and also performs substan-
tially better than the state-of-the-art kernel-based processing model
for problems that have been difficult to parallelize in the past.

http://www.icg.tugraz.at/Members/steinber/softshell-1/

Nice to see new kinds of scheduling being proposed for more efficient execution.
 
A Sort-based Deferred Shading Architecture for Decoupled Sampling

Intel said:
Stochastic sampling in time and over the lens is essential to produce photo-realistic images, and it has the potential to revolutionize real-time graphics. In this paper, we take an architectural view of the problem and propose a novel hardware architecture for efficient shading in the context of stochastic rendering. We replace previous caching mechanisms by a sorting step to extract coherence, thereby ensuring that only non-occluded samples are shaded. The memory bandwidth is kept at a minimum by operating on tiles and using new buffer compression methods. Our architecture has several unique benefits not traditionally associated with deferred shading. First, shading is performed in primitive order, which enables late shading of vertex attributes and avoids the need to generate a G-buffer of pre-interpolated vertex attributes. Second, we support state changes, e.g., change of shaders and resources in the deferred shading pass, avoiding the need for a single uber-shader. We perform an extensive architectural simulation to quantify the benefits of our algorithm on real workloads.

This is amazing. Basically, Intel took powervr, removed the unbounded EDIT[parameter buffer] needed, added defocus and motion blur in hw and dropped the memory bandwidth by 2x.

Some of the past papers published by Intel also look out for defocus and motion blur via hw extensions. They already have one in production for OIT.

Let's hope MS gets the wake up call and puts it in DX12.
 
Last edited by a moderator:
Where does PowerVR need unbounded geometry memory?

I meant for the parameter buffer. Should have been more clear about that.

Sure, you can cap the memory usage by periodically running the backend, but do it often enough and the bw savings begin to go away.
 
I meant for the parameter buffer. Should have been more clear about that.

Sure, you can cap the memory usage by periodically running the backend, but do it often enough and the bw savings begin to go away.

And so you set the parameter buffer to be a typically useful size. After all, if you want, say, 60hz rendering, there's only so much geometry you can put through a finite system and hit that target. If it (occasionally) goes over, then some tiles do get rendered using more than one pass. It's not a big deal. <shrug>
 
And so you set the parameter buffer to be a typically useful size. After all, if you want, say, 60hz rendering, there's only so much geometry you can put through a finite system and hit that target. If it (occasionally) goes over, then some tiles do get rendered using more than one pass. It's not a big deal. <shrug>

But the advantage of this design is that now you can use caching to save a lot more bw. After all, triangles in a mesh usually have spatial coherence.

I don't think such data has ever been made public, I am guessing that this approach will come out ahead. I think there is a good chance that these authors tried that approach as well.
 
But the advantage of this design is that now you can use caching to save a lot more bw. After all, triangles in a mesh usually have spatial coherence.
Caching of what? Pixels or vertex data?

I don't think such data has ever been made public, I am guessing that this approach will come out ahead. I think there is a good chance that these authors tried that approach as well.
I'm afraid you've lost me.

BTW, the sorting step in that paper is not unfamiliar.
 
Someone ban the above user

ps:
QuakeCon 2013
ohn Carmack will return to the stage for lecture-style presentation focusing on light transport and rendering.

Titled “The Physics of Light and Rendering”, the talk will take place on Friday, August 2nd at 5:00pm CST. Learn how light behaves in the real world, and the approximations and compromises that are involved in simulating the behavior with computers. This lecture will be geared towards those interested in the interactions of light in the world and how computer software simulates it. Note: Not for the technically faint at heart.
 
Back
Top