Recent content by crystall

C
The End of The GPU Roadmap

Why? Look at most of the games that get released today on the PC, they are console ports, why would this trend revert? Well, at least we know that *today* we don't need it to use complex pixel-shaders to sell well. I haven't got the slightest idea of what will happen in the computer market...
- crystall
- Post #106
- Aug 28, 2009
- Forum: Architecture and Products
C
The End of The GPU Roadmap

Will they? Considering that the fastest selling console uses a glorified DX7-level chip I wouldn't be quite sure of it. With current engines and techniques the art production pipeline is the largest money sink in many games and I'm unsure how far it can be taken without making game development...
- crystall
- Post #103
- Aug 28, 2009
- Forum: Architecture and Products
C
Nvidia GT300 core: Speculation

Is that the main cause for the quick demise of GDDR4? I was under the impression that it was more of a mix of various factors including the lack of support from nVidia, relatively short lifetime before GDDR5 introduction and higher than predicted scaling of GDDR3. BTW I was surprised to find...
- crystall
- Post #1,735
- Aug 7, 2009
- Forum: Architecture and Products
C
Fast software renderer

Actually it can have more instructions than that in flight; think for example if the branch is waiting on a result from memory, it can happily fill the entire ROB (128 entries IIRC). It can also have many oustanding speculated branches, not just one (can't remember how much though).
- crystall
- Post #71
- Aug 6, 2009
- Forum: Rendering Technology and APIs
C
Fast software renderer

Texture decompression also comes to mind, the way it's done in current graphics hardware would fare very poorly in software. Different approaches would work well without specific decompression hardware and provide the same compression ratios as well as equal or better fidelity. Vector...
- crystall
- Post #60
- Aug 5, 2009
- Forum: Rendering Technology and APIs
C
LRB - ditching x86?

Not really, after a ridiculous number of iterations SSEx remains terribly non-orthogonal. Heck, there's a lot of stuff which was in AltiVec in '99 which is not yet in SSEx and instead we got all kind of horizontal operations which are useless except for a couple of applications which end up in...
- crystall
- Post #45
- Jun 11, 2009
- Forum: Architecture and Products
C
Nvidia GT300 core: Speculation

The first Larrabee paper stated that communication among the four hardware threads of a core went through a queue updated with the CMPXCHG instruction without using the LOCK prefix. This is possible because the four logical threads running on the hardware context (1 FE and 3 BE using Intel's...
- crystall
- Post #1,196
- May 19, 2009
- Forum: Architecture and Products
C
AMD: R8xx Speculation

1200 ALUs means 15 SIMDs (each one 5x16), that doesn't match well with 48 TUs.
- crystall
- Post #455
- Apr 29, 2009
- Forum: Architecture and Products
C
AMD: R8xx Speculation

That's interesting, so there's more to it than the use of a forwarding network and those could be real registers after all. I stand corrected :) BTW as a compiler writer I'd love to see the algorithm they are using in the shader compiler for register allocation. Modeling those 'registers' in...
- crystall
- Post #363
- Apr 18, 2009
- Forum: Architecture and Products
C
AMD: R8xx Speculation

If the instruction scheduling is completely static and predictable then it's not a 'trick', it's a natural consequence of the hardware design. The ISA is actually exposing the fact that you can read your operands right out of the forwarding network in a predictable manner instead of reading them...
- crystall
- Post #359
- Apr 17, 2009
- Forum: Architecture and Products
C
AMD: R8xx Speculation

Those aren't registers, it's the forwarding network. AMD can use as the instruction scheduling inside a clause is completely predictable and so a value can be pulled straight out of the forwarding network w/o having it written to a register. The fact that it is presented in the assembler code as...
- crystall
- Post #357
- Apr 17, 2009
- Forum: Architecture and Products
C
traditional subtracting bright pass filter and LDR

That should be compiled as a branchless conditional select operation even on modern hardware and should work on any SM2+ hardware AFAIK.
- crystall
- Post #3
- Apr 16, 2009
- Forum: Rendering Technology and APIs
C
Larrabee at GDC 09

Only the memory cells of 256 KiB of L2 using 6T SRAM would be over 12 million transistors: 256 * 1024 (bytes) * 8 (bits) * 6 (transistors) ~= 12.6 million transistors That's for a non-ECC protected non-redundant L2. In practice you cannot do away without some kind of data protection and...
- crystall
- Post #231
- Apr 16, 2009
- Forum: Architecture and Products
C
ATI RV740 review/preview

That's not FUD, that's marketing.
- crystall
- Post #327
- Apr 15, 2009
- Forum: Architecture and Products
C
New blog from Nvidias new chief engineer.

Larrabee seems able to execute one scalar instruction or vector store in the first pipe and one vector instruction (which might be a load or load+op instruction) in the second pipe. As you guessed for purely scalar code it's a single-issue x86.
- crystall
- Post #12
- Apr 13, 2009
- Forum: Graphics and Semiconductor Industry