Why? Look at most of the games that get released today on the PC, they are console ports, why would this trend revert?
Well, at least we know that *today* we don't need it to use complex pixel-shaders to sell well.
I haven't got the slightest idea of what will happen in the computer market...
Will they? Considering that the fastest selling console uses a glorified DX7-level chip I wouldn't be quite sure of it. With current engines and techniques the art production pipeline is the largest money sink in many games and I'm unsure how far it can be taken without making game development...
Is that the main cause for the quick demise of GDDR4? I was under the impression that it was more of a mix of various factors including the lack of support from nVidia, relatively short lifetime before GDDR5 introduction and higher than predicted scaling of GDDR3. BTW I was surprised to find...
Actually it can have more instructions than that in flight; think for example if the branch is waiting on a result from memory, it can happily fill the entire ROB (128 entries IIRC). It can also have many oustanding speculated branches, not just one (can't remember how much though).
Texture decompression also comes to mind, the way it's done in current graphics hardware would fare very poorly in software. Different approaches would work well without specific decompression hardware and provide the same compression ratios as well as equal or better fidelity. Vector...
Not really, after a ridiculous number of iterations SSEx remains terribly non-orthogonal. Heck, there's a lot of stuff which was in AltiVec in '99 which is not yet in SSEx and instead we got all kind of horizontal operations which are useless except for a couple of applications which end up in...
The first Larrabee paper stated that communication among the four hardware threads of a core went through a queue updated with the CMPXCHG instruction without using the LOCK prefix. This is possible because the four logical threads running on the hardware context (1 FE and 3 BE using Intel's...
That's interesting, so there's more to it than the use of a forwarding network and those could be real registers after all. I stand corrected :) BTW as a compiler writer I'd love to see the algorithm they are using in the shader compiler for register allocation. Modeling those 'registers' in...
If the instruction scheduling is completely static and predictable then it's not a 'trick', it's a natural consequence of the hardware design. The ISA is actually exposing the fact that you can read your operands right out of the forwarding network in a predictable manner instead of reading them...
Those aren't registers, it's the forwarding network. AMD can use as the instruction scheduling inside a clause is completely predictable and so a value can be pulled straight out of the forwarding network w/o having it written to a register. The fact that it is presented in the assembler code as...
Only the memory cells of 256 KiB of L2 using 6T SRAM would be over 12 million transistors:
256 * 1024 (bytes) * 8 (bits) * 6 (transistors) ~= 12.6 million transistors
That's for a non-ECC protected non-redundant L2. In practice you cannot do away without some kind of data protection and...
Larrabee seems able to execute one scalar instruction or vector store in the first pipe and one vector instruction (which might be a load or load+op instruction) in the second pipe. As you guessed for purely scalar code it's a single-issue x86.