ATILA GPU simulator source code released

RoOoBo

Regular
I have released the source code for the GPU simulator I have been working on.

You can find it here:

ATILA source code
ATILA x86 binaries
Example Doom3 trace file
A few small traces for testing

Features supported by the ATILA GPU simulation framework:

A 'modern' GPU (at least if you think that a cut down R300 in terms of feature set is modern) with an unified shader architecture (non unified architecture also supported). Angle dependant anisotropy filtering is implemented. No antialiasing techniques implemented.

A subset of the OpenGL v1.5 API implemented as the ATILA OpenGL library. It has the bare minimum to be able to simulate traces from this five games: UT2004, Doom3, Quake 4, Chronicles of the Riddick and Prey (I just tested the demo a couple weeks ago, and the only reason it works is because is Yet Another Doom3 Engine Based Game).

What has been released?

The source code for the whole ATILA GPU simulator both the functional and timing simulation and a compiled (in library form) version of the ATILA OpenGL library.

Compiled binaries for the ATILA simulator and the OpenGL trace capturing tools GLInterceptor and GLPlayer (the simulator wouldn't be very useful without something to simulate).

What has not been released?

The last bug fixes and changes in the simulator (the released version is a month old or so). The source code for the OpenGL tools and library. A new detailed GDDR memory controller that Carlos has been developing on his own. The signal trace visualizer (STV) tool (you don't even need to know what this is).

And the most important: NO DOCUMENTATION. Well at least for a while. I will be working on writing it this summer.

If you don't know what this 'simulator' thing is you may want to read this paper:

ATTILA: A Cycle-Level Execution-Driven Simulator for Modern GPU Architectures
 
RoOoBo said:
If you don't know what this 'simulator' thing is you may want to read this paper:

ATTILA: A Cycle-Level Execution-Driven Simulator for Modern GPU Architectures
The paper doesn't spell my name correctly:rolleyes: , but I'll still take a look.:smile:

[Additions]
Some quick questions re comments in the paper:
  • "and the recursive rasterization algorithmdescribed by McCool [15]"
I don't have that paper, but although I could see that it would give great pixel locality, surely a hilbert curve must be relatively expensive to implement in HW?
  • "The Z cache implements a lossless compression algorithm with 1:2 and 1:4 ratios to reduce bandwidth usage."
I presume this means typical compression ratios as there is no way to guarantee a level of compression.
 
Last edited by a moderator:
Simon F said:
The paper doesn't spell my name correctly:rolleyes: , but I'll still take a look.:smile:

Ops, sorry! I can change the pdf version in my page but I think it's going to be difficult to change the version in the IEEE library. I'm not very good proof-reading papers, they always end with many faults even after multiple revisions.

Simon F said:
[Additions]
Some quick questions re comments in the paper:
  • "and the recursive rasterization algorithmdescribed by McCool [15]"
I don't have that paper, but although I could see that it would give great pixel locality, surely a hilbert curve must be relatively expensive to implement in HW?
  • "The Z cache implements a lossless compression algorithm with 1:2 and 1:4 ratios to reduce bandwidth usage."
I presume this means typical compression ratios as there is no way to guarantee a level of compression.

I don't implement the Hilbert curve part, just the recursive part. Ok, but then how it works? Well, in fact after trying it didn't work very well ... so I just implemented a software algorithm that it's unlikely to be ever implementable (it uses queues of tiles for n tile sizes or 'depths') in hardware and I implemented the timing part to produce enough fragments per cycle. After all my research was centered on the shader part of the GPU. If someone else wants to research on a rasterizer it can change the Fragment Generator and Triangle Setup to whatever he/she thinks would be implementable in HW.

The Z compression algorithm has the four typical modes: clear, no compression, compressed to half of the line size, compressed to a quarter of the line size. The compression method is copied from an ATI patent. It's the one described in section 3.5 of the GH2006 paper about depth buffer compression by Hasselgren and Akenine-Möller (Depth Offset Compression). Compression is performed on 8x8 pixel blocks (so the cache line has the insane size of 256 bytes) but the code already implemented could easily work on 4x4 pixel blocks because it's fully parametrized (in fact I plan to do so next week).
 
I have updated the simulator source code and binaries to solve a problem that I found compiling under Visual C 2005 (the ATTILA OpenGL library was making the simulator to crash at initialization). The problem may affect or may not the other other platforms (cygwin, mingw, linux).

ATILA-rei-source.7z

ATILA-rei-binaries.7z

I also posted a paper from another guy in the group about GPU workloads in 3D Games. It will be presented at the IISWC symposium later this month.

Workload Characterization of 3D Games
 
I read the paper: "ATTILA : a cycle-level execution-driven simulator for modern GPU architecture", it said "the Z cache implements a lossless compression algorithm with 1:2 and 1:4 ratios ".

How the lossless z compression algorithm are implemented? Are there any docs about lossless z compression? I cannot see any references about this issue.
 
Those numbers are somewhat old and the implementation was based on an old (1999?) presentation from ATI and some related patents. Curent GPUs, not even counting the massive compression ratios for MSAA, have better algorithms. The best reference about the matter is this paper from Akenine-Möller's group:

Efficient Depth Compression

I think they also worked on compression for float point depth buffers.

Looking around Akenine-Möller's at his Lund University website is one of the best sources I know about 3D hardware rendering because of his group papers and (if still there) his course presentations.

Just looking around I just found this other Master Thesis that looks like a good summary on the matter:

Depth Buffer Compression

The C code for the implementation in ATTILA can be found at 'src/emul/FragmentOpEmulator.cpp', function hiloCompress and the documents used as a base listed in the reference section of one of our papers. In any case the links provided above provide quite more up-to-date and complete information about the matter.
 
yea, I have read Moller's paper and all the slides on his website for the course EDA075. Also I read the Master Thesis "Depth Buffer Compression".
Your job is great and thank you very much!
 
Sponsored by

We have decided to release the source code for the OpenGL library and trace capturing tools that came as compiled libraries or binaries in the ATTILA open source version we released in 2007.

You can now get the package with the whole ATTILA open source simulator and OpenGL framework from the download section in the ATTILA website.

There is no new code here, just some very old code that comes out of the old ages of coding history.

The more juicy code for the ATTILA 2010 version that can render D3D9 games may or may not be released in the future :D.
 
The other PhDs working on it have their own right to decide about the exclusivity of their code and parts of the common code. Try also to guess who is 'the shadow' that really controls the group. It may not be that obvious at first sight
Hope they've finished their PhDs safely :p :smile:

Thanks for your efforts and for the good news.
 
Last edited by a moderator:
Sponsored by

Now that we finally have a few new games to play with in ATTILA (other than the old triad of IDTech4 based OpenGL games) I have decided to publish, remembering the old good times when I was much younger ;), utilization graphs (that some of you may still remember) for the D3D9 games that the simulator can already render (more or less correctly).

ATTILA D3D9 Game Frames Utilization

I will post the utilization graphs for three other games later today (the cluster I'm using for simulation is taking it's time today ...).

Keep in mind that any resemblance between our simulator and 'reATIly' may be pure coincidence.
 
Back
Top