New ATI patent applicatons

rwolf

Rock Star
Regular
http://appft1.uspto.gov/netacgi/nph...25".PGNR.&OS=DN/20060164425&RS=DN/20060164425

Vitual memory patent.

Methods and apparatus for updating a memory address remapping table


Abstract
Methods and apparatus for updating a memory address remapping table using a graphics processing circuitry are disclosed. The methods include assembling a command sequence of commands executable by the graphics processing circuit, the sequence configured to include one or more memory address remapping table updates for one or more page entries in a memory address remapping table. The command sequence is then communicated to the graphics processing circuit for execution by the graphics processing circuit. Execution of the command sequence with the graphics processing circuit includes executing the one or more memory address remapping table updates causing the graphics processing circuit to update the one or more page entries in the memory address remapping table


http://appft1.uspto.gov/netacgi/nph...p=1&u=/netahtml/PTO/srchnum.html&r=0&f=S&l=50

Method and apparatus for rasterizer interpolation


Abstract
The present invention relates to a rasterizer interpolator. In one embodiment, a setup unit is used to distribute graphics primitive instructions to multiple parallel rasterizers. To increase efficiency, the setup unit calculates the polygon data and checks it against one or more tiles prior to distribution. An output screen is divided into a number of regions, with a number of assignment configurations possible for various number of rasterizer pipelines. For instance, the screen is sub-divided into four regions and one of four rasterizers is granted ownership of one quarter of the screen. To reduce time spent on processing empty times, a problem in prior art implementations, the present invention reduces empty tiles by the process of coarse grain tiling. This process occurs by a series of iterations performed in parallel. Each region undergoes an iterative calculation/tiling process where coverage of the primitive is deduced at a successively more detailed level.

SIMD processor executing min/max instructions


Abstract
A SIMD processor responds to a single min/max instruction to find the minimum or maximum valued data unit in an array of data units. The determined minimum/maximum value and an associated index value thereto may be output. Alternatively, the value of a data unit in another array may be output at a corresponding location. A further single instruction executable by the SIMD processor, may be applied to results obtained using such a single min/max instruction, to allow such instructions to operate on two dimensional arrays.

SIMD processor having enhanced operand storage interconnects


Abstract
A SIMD processor includes an ALU having data interconnects facilitating the concurrent processing of overlapping data portions of at least one operand store. Such interconnects facilitate the calculation of shift-invariant convolutions, and sum of absolute differences between an operand in the operand store and another operand.

SIMD processor and addressing method


Abstract
A single instruction, multiple data (SIMD) processor including a plurality of addressing register sets, used to flexibly calculate effective operand source and destination memory addresses is disclosed. Two or more address generators calculate effective addresses using the register sets. Each register set includes a pointer register, and a scale register. An address generator forms effective addresses from a selected register set's pointer register and scale register; and an offset. For example, the effective memory address may be formed by multiplying the scale value by an offset value and summing the pointer and the scale value multiplied by the offset value.



Method and apparatus for managing tasks in a multiprocessor system


Abstract
In a multiprocessor system, a task control processor may be placed in the path connecting each execution processor to a system bus. Such task control processors may detect the completion of a first task on an associated execution processor and, responsively, generate commands to lead to the initiation of a second task on the same, or another, execution processor. Such task completion detection and task initiation by the task control processors removes, from a central processor or the execution processors, the burden of performing such tasks, thereby improving the efficiency of the entire system.

Method and apparatus for generating compressed stencil test information


Abstract
A method for rendering pixels for display includes generating stencil values on a per pixel basis for storage in stencil buffer memory; selecting a group of stencil values that represent a block of pixels; generating compressed stencil data associated with the group of stencil values; and performing stencil testing on a corresponding incoming block of pixels using the compressed stencil data.

Method and apparatus for generating hierarchical depth culling characteristics


Abstract
A method and apparatus for generating hierarchical depth culling characteristics includes determining a first minimum depth value and a first maximum depth value for a first graphical element. The graphical element may be a primitive. The first minimum depth value may be a minimum Z-plane depth of a pixel within the primitive and a first maximum depth value is a maximum Z-plane value for a pixel within the primitive. The method and apparatus further includes determining a second minimum depth value and a second maximum depth value for a second graphical element, which may be a tile. The method and apparatus further includes calculating an intersection depth range having an intersection minimum depth value and an intersection maximum depth value based on the intersection of the first minimum depth value and the first maximum depth value and the second minimum depth value and the second maximum depth value.
 
Methods and apparatus for updating a memory address remapping table

http://appft1.uspto.gov/netacgi/nph...25".PGNR.&OS=DN/20060164425&RS=DN/20060164425

Methods and apparatus for updating a memory address remapping table using a graphics processing circuitry are disclosed. The methods include assembling a command sequence of commands executable by the graphics processing circuit, the sequence configured to include one or more memory address remapping table updates for one or more page entries in a memory address remapping table. The command sequence is then communicated to the graphics processing circuit for execution by the graphics processing circuit. Execution of the command sequence with the graphics processing circuit includes executing the one or more memory address remapping table updates causing the graphics processing circuit to update the one or more page entries in the memory address remapping table.
Is this evidence that complete support for D3D10 virtual memory will be in R600?

The patent implies that existing GPUs are already subject to memory and remampping table updates, effected directly by the CPU. This, I guess, is simply an aspect of the VRAM appearing within the address space of the CPU.

So if the GPU can manage remapping itself, is this all that's required to support D3D10 virtual memory? I think I'm missing something...

Jawed
 
Video and/or stream-processing SIMD

The three patents relating to SIMD seem to be about video or stream processing.

The third of them:

SIMD processor and addressing method

appears to describe how a pair of dual-ported blocks of on-die memory (2 banks, each 32x 128-bit) can be arbitrarily addressed - i.e. not being forced to address data by word boundary.

Additionally, and this seems to be the key, the blocks of memory can be used to implement FIFO buffers, so that a shader can access data in this memory using arbitrary stride, for both reads and writes, with the stride length for writes not necessarily being equal to the stride length for reads.

Additionally the access logic is designed to "auto-increment" through the memory (i.e. used as FIFO buffer) or the memory can perform as a normal "register file".

So, this patent appears to be the footing for the other patents:

SIMD processor executing min/max instructions

SIMD processor having enhanced operand storage interconnects

So, as a trio of patents, they look like they're well suited to doing things like video-encoding, with blocks of data occupying the register file.

Apart from that, I bet the GPGPU guys could get pretty excited by this kind of architecture. I guess it means it would be dramatically easier to perform random reads and writes to sizable areas of memory, rather than to have to virtualise a block of memory via a set of registers (e.g. R0 to R63).

Jawed
 
I wonder where I was when this thread was posted? Oh well, nice finds rwolf.

That hier-z patent certainly provides some ammo for believing that stencil got some love lately. . . (Edit: Doh, not to mention there's another specifically for stencil)

Anybody notice two of the files dates are from 2006, and one from 2005? Why so quick to issue, I wonder, compared to what we usually see?

The "rasterizer interpolation" one. . . would that be ATI decoupling the ROPs?
 
Anybody notice two of the files dates are from 2006, and one from 2005? Why so quick to issue, I wonder, compared to what we usually see?
They're all patent applications, actually.

I've been lazy and not fully qualified them as such, just calling them "patents".

Jawed
 
Ah, think I was looking at some of the dates wrong. Looking again, about 18 months from file to. . .whatever that other date represents. . .is roughly the norm.
 
The "rasterizer interpolation" one. . . would that be ATI decoupling the ROPs?

To me, it sounded more like something applicable to Crossfire. In fact, it seems like supertiling, or whatever that checkerboard implementation of CF is.
 
To me, it sounded more like something applicable to Crossfire. In fact, it seems like supertiling, or whatever that checkerboard implementation of CF is.

Well, it does mention supertiling specifically. It's certainly some kind of load balancing. I wonder if they previously filed a patent for their implementation of tiling and super-tiling? Maybe they recently came to the conclusion it was patentable, and there isn't really anything new (i.e. that they haven't been doing for some time) here at all?

Hrrm, in fact it refers to it as a "continuation application" from one filed on Nov 18, 2003, and "claims priority" to a third one from 2002. "Claims priority" is what? "We got there first"?
 
Supertiling goes back to R300, originally introduced for high levels of AA.

What's interesting about R420 is that it introduced variable-size tiles.

I haven't read this new patent yet... But the gist of it seems to be that in cohort with the new hierarchical-Z patent, tiling changes on the fly across the screen. Or, rather, that the depth of the Z-hierarchy changes (and, correspondingly, the size of the tile being tested/rasterised). Or, erm, something like that.

Jawed
 
A refinement of the Ultra Threaded Despatch Processor?

Method and apparatus for managing tasks in a multiprocessor system

This patent is rather elusive in its applicability within a GPU, beyond the kind of functionality that we already know is performed by the ultra threaded despatch processors in R580.

There's one big hint in the text that there's something more going on:

[0120] Advantageously, aspects of the present invention allow for a group of the programmable task control processors 220 to be connected together in a network, so that the network can be programmed to implement any given flow-of-tasks among execution processors 118 in the multiprocessor system 200.
which, I guess, relates to the earlier video-encoding/stream-processing patents (hey, it's the same guys!). The implication is that a streaming network configuration can be constructed across a set of execution pipelines, where work is only handed off from one pipeline to the next when specific data conditions are met (e.g. a block of work has been finished). One of the worries I have about this idea is that execution pipelines, being pipelined, shouldn't be signalling that they're idle (which is a key concept in the patent). They should have a constant stream of work lined up!

(Though it's possible to conceive of shader code which has very little texturing instructions in it - hence the texture pipe could fall idle for significant periods.)

Alternatively, it could just be simpler to read this as a programmable thread control system, very similar in functionality to the Sequencer (SEQ) unit in Xenos - the unit that controls despatch of clauses of code to shader arrays, vertex fetch pipes and texture pipes. What's interesting about SEQ in Xenos is that the developer can program it. The same is seemingly true of the thread control processor at the heart of this patent.

The major difference is that this patent describes a distributed network of SEQ units (one per execution pipeline), not the centralised one seen in Xenos.

Since I'm convinced that R580 has four distributed ultra threaded despatch processors (one per pixel pipeline of: 3x ALU quads and 1x TMU quad), I'm still not really getting it.

I had wondered, for example, whether one pipeline, A (from A, B, C, D) could request that pipeline C could do some texturing for it, because C broadcasts to A, B and D that its texture pipe is now idle. The trouble with this is that the overall architecture of ATI's PC GPUs is based upon screen-space tiling. Which directly impacts texture-caching. So if pipeline C starts doing texturing work for pipeline A, there's a significant loss in locality of cached texture data for pipeline A's work. A's texture caches may well have most of the required data, but C's are entirely devoid. So there's a bandwidth hit against external memory. Unless the pipelines all share a common L2 (or L3?) cache. Sigh...

Jawed
 
I haven't read this new patent yet... But the gist of it seems to be that in cohort with the new hierarchical-Z patent, tiling changes on the fly across the screen. Or, rather, that the depth of the Z-hierarchy changes (and, correspondingly, the size of the tile being tested/rasterised). Or, erm, something like that.
Yeah, it's fairly simple.

Since Hierarchical-Z uses ever finer-grained tilings to record Z, it makes sense that the rasteriser works down this hierarchy, before performing rasterisation. That way it can cull quads from its rasterisation list as soon as possible, rather than waiting until after all the quads in a 8x8 tile have been rasterised, and then navigating through Hierarchical-Z to find out if they're visible or not.

Jawed
 
Well, it does mention supertiling specifically. It's certainly some kind of load balancing. I wonder if they previously filed a patent for their implementation of tiling and super-tiling?
Is this any different to what was in Naomi 2/ Elan?
 
while we are at it
http://v3.espacenet.com/textdoc?DB=EPODOC&IDX=US2006202941&F=0&QPN=US2006202941

SYSTEM AND METHOD FOR DETERMINING ILLUMINATION OF A PIXEL BY SHADOW PLANES


A graphics processing circuit includes a pixel shader operative to provide pixel color information in response to image data representing a scene to be rendered; a texture circuit, coupled to the pixel shader, operative to determine a luminance value to be applied to a pixel of interest based on the luminance values of the pixels that define a plane including the pixel of interest; and a render back end circuit, coupled to the texture circuit, operative to compute the luminance values from a shadow map that specifies the distance from the light source of the nearest object at a plurality of locations. A pixel illumination method includes receiving color information for a pixel to be rendered, defining a plane containing at least one pixel of interest, the plane including a plurality of planar values; comparing the plurality of planar values to a corresponding set of distance values; determining a luminance value for the at least one pixel of interest; and applying the luminance value to the at least one pixel of interest.
 
Back
Top