If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#751 | |
|
Regular
|
Quote:
I think the problem lies elsewhere. e.g. $2 billion yearly TAM, say, for performance/enthusiast discrete just isn't worth chasing in comparison with server/cloud/HPC. Also life's simpler for Intel if it doesn't have to write drivers for D3D. There was always the question hanging over the architecture of how long it would take Intel to get a game's performance right, with worrying statements that months after game release would be required. (AMD doesn't seem to have much of a different attitude, though.)
__________________
Can it play WoW? |
|
|
|
|
|
|
#752 | ||
|
Senior Member
|
Quote:
Quote:
|
||
|
|
|
|
|
#753 |
|
Regular
|
The serialisation is actually per tile-pixel (or more granular, e.g. per tile qquad), and local to a single core since tiles in rasterisation (stages post setup until back-end) don't span cores.
__________________
Can it play WoW? |
|
|
|
|
|
#754 |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,071
|
Is that assuming the implementation placed the tesselation stage in the front-end and not the back?
It would have been an interesting exercise to see what numbers Larrabee could have pulled in Heaven, its applicability to current workloads aside and assuming that the software renderer had been functionally coded to DX11 spec. This latest Intel statement is far more down on Larrabee graphics than I've seen thus far, and is a noticeable drop from a position I have already perceived as being rather lukewarm. I suppose Tim Sweeny will need to wait a little longer for his software rendering dream to come true.
__________________
Dreaming of a .065 micron etch-a-sketch. |
|
|
|
|
|
#755 | |
|
Regular
|
Quote:
Tessellation was very much an open question, I don't remember any of Intel's materials covering it.
__________________
Can it play WoW? |
|
|
|
|
|
|
#756 |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,071
|
The option existed to run the tesselation stages either in the front-end or back-end.
Whether there was ever an implementation of it for Larrabee is something I do not know, but Intel did discuss the possibility. If a primitive is allocated to a bin and the back-end is responsible for performing tesselation, the generated triangles on one core could cross the bin's tile boundaries.
__________________
Dreaming of a .065 micron etch-a-sketch. |
|
|
|
|
|
#757 |
|
Unknown.
Join Date: Aug 2002
Location: UK
Posts: 4,877
|
Logarithmic shadow maps: now even more of a pipe dream! Oh well.
Anyway... if you look at communications, the vast vast majority of software-centric architectures still do problematic algorithms like Turbo Coding and Viterbi in hardware blocks. But there are exceptions that do those very efficiently in software - the trick is their architecture is incredibly unusual and very different from a traditional processor, even though it could afaict rightfully be called Turing Complete (as long as you look at a large enough piece of it rather than just a subsystem). The basic problem with graphics is that the number of blocks that would benefit from such exotic architectures is actually very small, and their data flow is very complex (rasterisation being the poster child). And going down that route would create a lot of complexity at the compiler for more normal shading workloads, so overall it just doesn't make any sense and the best approach remains fixed-function. The one thing Larrabee did provide above and beyond any current desktop GPU architecture is scalar/MIMD, and interestingly on-core rather than as a separate on-chip block. I'm honestly unsure whether there is much benefit to on-core SIMD+MIMD in either graphics or GPGPU compared to separate SIMD and MIMD cores, but a frequent problem of the latter in 80s/90s architectures is the lack of bandwidth between the scalar and the vector part. With the power consumption of data communication even on-chip increasing to dramatic levels, there might be something to be said for on-core integration of the two not (just?) from a software level but from a hardware level. Some sort of close coupling at least would make sense. Of course, ideally we'd all go pure MIMD. Rys, can I haz Series6?
__________________
Focusing on non-graphics projects in 2013 (but I still love triangles) "[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions." |
|
|
|
|
|
#758 | ||
|
Regular
|
Quote:
Quote:
The patches should be able to run in parallel through VS/HS to generate input to TS and DS. Ordering of triangles coming out of DS should be keyed by Patch ID, I presume (TS generating sub-patch triangle ID). Is there a serialisation I'm missing? Anyway, screen-space tiling for binning of triangles involved in tessellation (input or output) would be done post-GS.
__________________
Can it play WoW? |
||
|
|
|
|
|
#759 | |||
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,071
|
Tom Forsyth's SIGGRAPH 2008 presentation touted the flexibility in assigning stages either to the front or back-end.
Included in that set is GS and tesselation. Quote:
Quote:
Quote:
__________________
Dreaming of a .065 micron etch-a-sketch. |
|||
|
|
|
|
|
#760 | |||
|
Regular
|
Quote:
Quote:
Quote:
Maybe there are some other usages of GS that are amenable to delayed execution (e.g. generating attributes)? --- By the way, the term "rasteriser" is often used to describe all of these stages: setup->rasterisation->pixel shading->output merger (ROP). So it's possible to interpret the statement about the lack of a fixed-function rasteriser as actually descriptive of lack of "setup->rasterisation->pixel shading->output merger". To be honest I think this is very likely the correct interpretation. I pretty much always thought it would be years before Intel was competitive at the enthusiast end, but process would eventually allow it to catch up. A major question for the other IHVs is what proportion of die space ends up being programmable compute, and the higher that rises the more competitive Intel becomes.
__________________
Can it play WoW? |
|||
|
|
|
|
|
#761 | |||||
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,071
|
Quote:
Quote:
Quote:
Quote:
Quote:
A good amount of the uncore would need to scale as well, otherwise the comput portion would be strangled. x86 penalty aside, the decision to use full cores for that 2/3 of the die was also a contributing factor to the size and power concerns. There can be programmable processing units either way, but past a certain number of fully-fledged CPU cores the utility of having even more would have been reduced. There was a lot of front-end and support silicon for the amount of vector resources one got per core.
__________________
Dreaming of a .065 micron etch-a-sketch. |
|||||
|
|
|
|
|
#762 | |
|
Epsilon plus three
Join Date: Feb 2002
Location: Chania
Posts: 7,762
|
Quote:
Stupid acronyms aside, he won't be telling you but my old sniffing nose tells me that one of the aces of S6 has gotten Intel to sit up when they first saw it. Wrong thread, bad timing and I urgently need some coffee
__________________
People are more violently opposed to fur than leather; because it's easier to harass rich ladies than motorcycle gangs. |
|
|
|
|
|
|
#763 | |
|
Member
Join Date: Mar 2010
Posts: 195
|
Quote:
http://www.semiaccurate.com/forums/s...ead.php?t=3361 On topic, its a bit sad Intel pushed Larrabee indefinitely, it would be nice to have a 3rd player (or even 2nd, NV future is a bit unclear atm in mass GPU business), even though in the beginning they would be behind competition. But since GPUs are more and more programmable, IMO its just a matter of time till they release it, even if it will be Larrabee 10th incarnation. P.S. I can almost see Tim Sweeney crying somewhere |
|
|
|
|
|
|
#764 |
|
yes, i'm drunk
|
Apparently Knights Ferry, despite being marketed a bit different, is the first iteration of Larrabee as it is, so there might be chance for future developement to enter gfx markets too?
__________________
I'm nothing but a shattered soul... Been ravaged by the chaotic beauty... Ruined by the unreal temptations... I was betrayed by my own beliefs... |
|
|
|
|
|
#765 | |||
|
Member
Join Date: Mar 2010
Posts: 195
|
Quote:
http://www.pcworld.com/businesscente..._32_cores.html Quote:
Quote:
|
|||
|
|
|
|
|
#766 | |
|
Senior Member
|
Didn't feel like making a new thread for this
Quote:
So cleaner than LRB1. What might be considered ugly about LRB1's scatter/gather? It seemed very nice to me. O(1) time gather for any datum in L1, unlike LRB1 which had O(n) |
|
|
|
|
|
|
#767 |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,071
|
I'm not sure how to interpret "LRB'3".
LRB's or LRB3? edit: "Clean" may mean done in hardware without microcode or software with hardware assist.
__________________
Dreaming of a .065 micron etch-a-sketch. |
|
|
|
|
|
#768 |
|
Senior Member
|
IIRC, charlie has often referred to the second coming of LRB as LRB3.0. Apparently LRB1.0 was uncovered and canned and LRB2.0 went with it. The third version was supposed to be the game changer. We'll see.
|
|
|
|
|
|
#769 |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,071
|
The award was for 2009, while the cancellation of the first LRB took place in 2010.
I find the time frame interesting, since it would imply significant progress in implementing the design by the time the shuffle was reported.
__________________
Dreaming of a .065 micron etch-a-sketch. |
|
|
|
|
|
#770 |
|
Junior Member
Join Date: Aug 2003
Posts: 64
|
That is one of the authors of a paper I linked to back in 2008.
http://forum.beyond3d.com/showpost.p...2&postcount=31 "Atomic Vector Operations on Chip Multiprocessors" http://doi.acm.org/10.1145/1394608.1382154 |
|
|
|
|
|
#771 |
|
Regular
|
Generally impossible without a generic n-ported cache, but you might be able to implement a n-banked cache (like local data in GPUs) cheaply to get it down to O(log n) for the general case of cache hits (or even better with some buffering of accesses) and O(1) for any access which hits all the banks. Being able to service cache accesses that fast can interact in annoying ways with the coherency though (up to n times the traffic).
__________________
Cinematic is the new streamlined. |
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|