If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
|
|
#1 |
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
Hi all,
Since Intel's 22 nm FinFET process technology will be production ready at about the same time as TSMC's 28 nm process, I was wondering if this means Intel is actually two generations ahead now. I think this could give them the opportunity to launch an improved Larrabee product. The inherent inefficiency of such a highly generic architecture at running legacy games could be compensated by the sheer process advantage. Other applications and games could potentially be leaps ahead of those running on existing GPU architectures (e.g. for ray-tracing, to name just one out of thousands). In particular for consoles this could be revolutionary. They needs lot of flexibility to last for many years, and the software always has to be rewritten from scratch anyway so it can make direct use of Larrabee's capabilities (instead of taking detours through restrictive APIs). It seems to me that the best way for AMD and NVIDIA to counter this is to create their own fully generic architecture based on a more efficient ISA. Thoughts? Nicolas |
|
|
|
|
|
#2 |
|
Beyond3d isn't defined yet
Join Date: Jan 2008
Location: New Zealand
Posts: 3,037
|
Maybe we'll see a rebirth of 'Larrabee in consoles'?
__________________
It all makes sense now: Gay marriage legalized on the same day as marijuana makes perfect biblical sense. Leviticus 20:13 "A man who lays with another man should be stoned". Our interpretation has been wrong all these years! |
|
|
|
|
|
#3 |
|
Junior Member
Join Date: May 2005
Posts: 26
|
This is the latest Knights Ferry Tech-Demo I could find. A real-time ray-tracing of Wolfenstein running on 4 Knights-Ferry Servers and to be honest it still looks like shit...
http://www.youtube.com/watch?v=XVZDH15TRro Based on that, I don´t think Intel offers a viable solution for next-gen consoles. 22nm won´t help that much I think. |
|
|
|
|
|
#4 |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,071
|
Larrabee in this context would probably be compared in terms of its rasterization rates, not ray-tracing. The ROI for ray-tracing at this point would be an exercise in how many stumbling blocks you can put in the way of a good process.
Using Larrabee primarily as a software rasterizer would probably get more competitive results given the workloads it would probably encounter. The FinFET's benefits are interesting to consider. At the same process node for a low-power device, the 20-30% gain in power efficiency could negate the ~20% inefficiency in being x86 versus some other less cumbersome ISA. GPUs would probably be at a higher voltage realm, where the benefits are over 18% but probably less than the maximum 50% improvement over 32nm. Density-wise, it would be an improvement. Historically, I would characterize Intel's density figures in this particular segment to not be an advantage, even with a node advantage. Cayman, for example is much denser than Sandy Bridge. Larrabee's density was pretty bad, but this may have been due to a lack of optimization in physical design. Without knowing how much Intel would try to optimize, 22nm would still leave Larrabee 22nm at a marked disadvantage. I would be mildly curious if it would beat the densest 40nm GPUs. As an aside, Intel will have the distinction of having the first 22nm GPU in IB. The power advantages to the process would be notable if it were facing off against a similar generic manycore with only a different ISA. While the ISA probably contributed a measurable deficit to the power and performance gap, I have stated my suspicions before that it's really not the biggest factor. The possible longer maturation period for the novel process may delay the deployment of a chip of Larrabee's size.
__________________
Dreaming of a .065 micron etch-a-sketch. |
|
|
|
|
|
#5 |
|
Senior Member
|
It is not at all obvious that a 22nm lrb will be a straightforward scale up of 45 nm lrb. They may very well choose to constrain the architecture or ditch x86 for the next rev.
|
|
|
|
|
|
#6 |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,071
|
Larrabee 3 could be very different.
I'm not sure if there is to be a GPU card based on Larrabee that it would depart from Intel's x86 above all else mantra, and the returns on a new ASIC taking on established titans may not be too great. My question is whether Intel even wants to make a discrete card anymore, and it still seems to be pitting the onboard GPUs in its current and future CPUs against what should have been the introduction of on-die Larrabee(ish?) cores. A lack of consistency and support could lead to a repeat of the original embarrassment.
__________________
Dreaming of a .065 micron etch-a-sketch. |
|
|
|
|
|
#7 | ||
|
Senior Member
|
Quote:
Quote:
|
||
|
|
|
|
|
#8 |
|
Member
Join Date: Jun 2007
Posts: 263
|
In my opinion Intel has completely abandoned the idea of producing Larrabee GPUs. If Larrabee would be a viable architecture, Intel would have used it in Ivy bridge, but they don't.
|
|
|
|
|
|
#9 |
|
Member
Join Date: Jun 2007
Posts: 263
|
And personally I would have loved it would have been used in Ivy bridge, because for us rendering specialists it is a dream.
|
|
|
|
|
|
#10 | |
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
Quote:
AVX is specified to support register widths up to 1024 bits. So they could relatively easily execute 1024-bit vector operations on the currently present 256-bit execution units, in 4 cycles (throughput). The obvious benefit to this is power efficiency. Then all that's left to add is gather/scatter support and the IGP can be eliminated, leaving a fully generic architecture that is both low latency and high throughput. Larrabee in your CPU socket, without compromises. |
|
|
|
|
|
|
#11 |
|
Senior Member
|
They did claim a 50+ core part would be out on 22nm.
|
|
|
|
|
|
#12 |
|
Junior Member
Join Date: Jul 2010
Posts: 31
|
>They did claim a 50+ core part would be out on 22nm.
But its not a GPU as such. Purely a computing accelerator to compete with Nvidia Teslas. I don't see how that can be comercially viable without a mass market GPU product line to pay for the develoment costs. |
|
|
|
|
|
#13 |
|
Senior Member
|
Entirely depends on just how competitive their renderer is.
|
|
|
|
|
|
#14 |
|
Member
Join Date: Jun 2007
Posts: 263
|
Also depends on how deep their pockets are, think Itanium.
|
|
|
|
|
|
#15 |
|
Senior Member
|
Well, then they would definitely release a discrete part.
|
|
|
|
|
|
#16 |
|
Member
Join Date: Mar 2004
Posts: 751
|
Didn't they bascially say that they carved it up as experience for future IGP designs?
However a dedicated card that could be used as GPGPU for professionals would be grand. Many professional applications (video-editing, photographing etc) could really use the power. Nothing I hate more than waiting for a render to complete.
__________________
Never Argue With An Idiot. They'll Lower You To Their Level And Then Beat You With Experience! |
|
|
|
|
|
#17 |
|
Senior Member
|
The way you're putting makes it sound so easy…
__________________
English is not my native tongue. Before flaming please consider the possiblity that I did not mean to say what you might have read from my posts. Work| RecreationWarning! This posting may contain unhealthy doses of gross humor, sarcastic remarks and exaggeration! |
|
|
|
|
|
#18 |
|
Ohio frog
Join Date: Jun 2005
Location: Ohio, USA
Posts: 4,172
|
I've a couple of "honest" questions. Some here are real software developers as Nick others seems to know their fair share either about hardware/micro-electronic and software, I'm just a geek, so no offence
First in regard to the comparison between SwiftShader and intel HD3000. There's x5 difference in the 3D mark06 score. OK. *What is the cost of running ShiftShader "itself" on the CPU? Is it in the same ball park as running the HD3000 drivers? Or higher, if yes significantly? *In regard to power consumption, what is the usual power consumption of a CPU running 3Dmark06 on a discrete GPU? As I think it would be fair to consider the incompressible/fixed CPU cost to run something as 3Dmark. *Overall can we consider the overall cost (in power and compute power) of swiftshader in the same ballpark as drivers? *Another thing the HD3000 is not tiny by any mean if this floormap correct, it looks more like equal ~2 cores: ![]() Overall it would be more fair to compare a quadcore to a dual core+IGP. From a costumer POV what serves the most? A quadcore? a dual core+ (shitty anyway)IGP? In regard to power how a HD3000 compares to two SnB cores? I guess that tough to find out. Anyway the IGP is likely way better in perfs per Watts by quiet an healthy margin. Some questions more specifically aimed at you Nick. * Is swiftShader optimized for AVX already? * What are your expectations in regard to for example 3Dmark06 if it were implement if not straight to the metal using various libraries? How close do you think it would come to the IGP/HD3000? * Say a bench or game were desgin with a CPU as hardware target, how close to think the end result would compare to an IGP (the HD3000 can serve as ref). Say you pass on some calculations and use more complex, bigger datastructures so more precompute values, do sacrifices clever trick elsewhere. Devs could count on 4GB or more of RAM, lot of cache, etc. Basically do you think that it would be possible achieve for a quad-cores the "same" result as with an IGP+dual cores.
__________________
What's trying to be a bunch of presentations PS360 youtube channel Sebbbi about virtual texturing Tuned EADGCF and liking it :) Last edited by liolio; 13-May-2011 at 15:16. |
|
|
|
|
|
#19 |
|
Senior Member
|
It's more like 1.5. You have to count the L3 cache per core as well.
|
|
|
|
|
|
#20 |
|
Member
|
|
|
|
|
|
|
#21 |
|
Senior Member
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,071
|
The GPU has access to the L3. The L3's dimensions are determined by the cores in SB. The tiny L2 on SB and its advanced power gating rely on there being an L3 tile per core.
__________________
Dreaming of a .065 micron etch-a-sketch. |
|
|
|
|
|
#22 | |||||
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
Quote:
Quote:
That said, some reviews report that Intel puts a lot of load on the CPU while rendering 3D graphics: CPU Usage in Graphics. Some even claim all geometry shaders execute on the CPU. In any case to objectively compare pure software rendering against the IGP, I don't think we can neglect the many roles the CPU still plays for assisting the IGP. Unfortunately I don't have a Sandy Bridge system myself so I can't provide any accurate numbers. Quote:
Quote:
Just look at the sheer computing power. An i7-2600 can do 218 GFLOPS (not counting in any turbo mode). At 800x600, that's a staggering 450,000 floating-point operations per pixel per second, or a budget of 15,000 operations per pixel at 30 frames per second. Currently a lot of this power goes to waste though because of the lack of gather/scatter (forcing some memory accesses to be serial scalar operations), and because the API demands certain detours. Quote:
|
|||||
|
|
|
|
|
#23 |
|
Member
|
So the more cores the more L3. And the GPU has access.
Are there any tests showing the same GPU in 2 vs 4 core sandyb. configurations? |
|
|
|
|
|
#24 |
|
Senior Member
|
The additional SB cores can also access other cores tiles, but they have to go the long way, increasing latency. It's not like the IGP suddenly has more memory for itself.
__________________
English is not my native tongue. Before flaming please consider the possiblity that I did not mean to say what you might have read from my posts. Work| RecreationWarning! This posting may contain unhealthy doses of gross humor, sarcastic remarks and exaggeration! |
|
|
|
|
|
#25 |
|
Senior Member
|
afaik, that access is read only.
|
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|