Recent content by crystall

  1. C

    The End of The GPU Roadmap

    Why? Look at most of the games that get released today on the PC, they are console ports, why would this trend revert? Well, at least we know that *today* we don't need it to use complex pixel-shaders to sell well. I haven't got the slightest idea of what will happen in the computer market...
  2. C

    The End of The GPU Roadmap

    Will they? Considering that the fastest selling console uses a glorified DX7-level chip I wouldn't be quite sure of it. With current engines and techniques the art production pipeline is the largest money sink in many games and I'm unsure how far it can be taken without making game development...
  3. C

    Nvidia GT300 core: Speculation

    Is that the main cause for the quick demise of GDDR4? I was under the impression that it was more of a mix of various factors including the lack of support from nVidia, relatively short lifetime before GDDR5 introduction and higher than predicted scaling of GDDR3. BTW I was surprised to find...
  4. C

    Fast software renderer

    Actually it can have more instructions than that in flight; think for example if the branch is waiting on a result from memory, it can happily fill the entire ROB (128 entries IIRC). It can also have many oustanding speculated branches, not just one (can't remember how much though).
  5. C

    Fast software renderer

    Texture decompression also comes to mind, the way it's done in current graphics hardware would fare very poorly in software. Different approaches would work well without specific decompression hardware and provide the same compression ratios as well as equal or better fidelity. Vector...
  6. C

    LRB - ditching x86?

    Not really, after a ridiculous number of iterations SSEx remains terribly non-orthogonal. Heck, there's a lot of stuff which was in AltiVec in '99 which is not yet in SSEx and instead we got all kind of horizontal operations which are useless except for a couple of applications which end up in...
  7. C

    Nvidia GT300 core: Speculation

    The first Larrabee paper stated that communication among the four hardware threads of a core went through a queue updated with the CMPXCHG instruction without using the LOCK prefix. This is possible because the four logical threads running on the hardware context (1 FE and 3 BE using Intel's...
  8. C

    AMD: R8xx Speculation

    1200 ALUs means 15 SIMDs (each one 5x16), that doesn't match well with 48 TUs.
  9. C

    AMD: R8xx Speculation

    That's interesting, so there's more to it than the use of a forwarding network and those could be real registers after all. I stand corrected :) BTW as a compiler writer I'd love to see the algorithm they are using in the shader compiler for register allocation. Modeling those 'registers' in...
  10. C

    AMD: R8xx Speculation

    If the instruction scheduling is completely static and predictable then it's not a 'trick', it's a natural consequence of the hardware design. The ISA is actually exposing the fact that you can read your operands right out of the forwarding network in a predictable manner instead of reading them...
  11. C

    AMD: R8xx Speculation

    Those aren't registers, it's the forwarding network. AMD can use as the instruction scheduling inside a clause is completely predictable and so a value can be pulled straight out of the forwarding network w/o having it written to a register. The fact that it is presented in the assembler code as...
  12. C

    traditional subtracting bright pass filter and LDR

    That should be compiled as a branchless conditional select operation even on modern hardware and should work on any SM2+ hardware AFAIK.
  13. C

    Larrabee at GDC 09

    Only the memory cells of 256 KiB of L2 using 6T SRAM would be over 12 million transistors: 256 * 1024 (bytes) * 8 (bits) * 6 (transistors) ~= 12.6 million transistors That's for a non-ECC protected non-redundant L2. In practice you cannot do away without some kind of data protection and...
  14. C

    ATI RV740 review/preview

    That's not FUD, that's marketing.
  15. C

    New blog from Nvidias new chief engineer.

    Larrabee seems able to execute one scalar instruction or vector store in the first pipe and one vector instruction (which might be a load or load+op instruction) in the second pipe. As you guessed for purely scalar code it's a single-issue x86.
Back
Top