Go Back   Beyond3D Forum > Core Forums > 3D Architectures & Chips

Reply
 
Thread Tools Display Modes
Old 09-Jun-2009, 18:21   #1
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: India
Posts: 1,294
Send a message via Skype™ to rpg.314
Intel LRB - ditching x86?

Here

Quote:
One critical point we were told was that 1st and 2nd generation Larrabee GPUs will not be compatible with 3rd generation Larrabee.
Quote:
According to the data, Intel's 3rd generation part will have an emulation mode for backwards compatibility.
and here
Quote:
We were told that Larrabee is currently only capable of performance levels similar to Nvidia's GeForce GTX 285.
If it's the 600mm2 part, who the hell they expect to pay for this spectacular piece of crap?

EDIT: Perhaps then, they are holding out to release it on 32 nm.
__________________
My blog

Eigen2 : simd done right.
rpg.314 is offline   Reply With Quote
Old 09-Jun-2009, 18:32   #2
3dilettante
Senior Member
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 2,535
Default

It might be the other way around.

LRB new instructions are not compatible with the main line of x86 ISA extensions.
Intel is possibly holding back from deploying Larrabee more widely to avoid fracturing the field with another x86 extension.

If the third generation of Larrabee has cores that can be used in consumer CPUs, then it would probably happen after Larrabee and the x86 main lines hit some convergent ISA extension.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline   Reply With Quote
Old 09-Jun-2009, 18:58   #3
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: India
Posts: 1,294
Send a message via Skype™ to rpg.314
Default

Well, if they wanted to introduce LRBni to the desktop, surely, they would have thought of it before they forked the x86 ISA. While they can emulate LRBni using AVX2 or some such thing, they may be actually trying to emulate the legacy x86 crap.
__________________
My blog

Eigen2 : simd done right.
rpg.314 is offline   Reply With Quote
Old 09-Jun-2009, 19:16   #4
Panajev2001a
Senior Member
 
Join Date: Mar 2002
Posts: 3,124
Send a message via MSN to Panajev2001a
Default

Quote:
Originally Posted by 3dilettante View Post
It might be the other way around.

LRB new instructions are not compatible with the main line of x86 ISA extensions.
Intel is possibly holding back from deploying Larrabee more widely to avoid fracturing the field with another x86 extension.

If the third generation of Larrabee has cores that can be used in consumer CPUs, then it would probably happen after Larrabee and the x86 main lines hit some convergent ISA extension.
Please do correct me... because it might be very naive of me and I'd like feedback, but historically the x86 front-end tax became a non-issue against the major RISC players because as CPU's got larger and larger, transistor budgets ballooned up, and more and more % of the chip's area was spent in cache, execution units, branch prediction, out of order issue and execution logic, etc... the x86 decoding part (essentially decoupled from the rest of processor pipeline after the Pentium Pro days) became almost a non-factor in terms of CPU cost and chip's real estate.

In the many-core era does not that problem get worse and worse?

Say that feature X costs you 0.01% of your total core's budget and you deploy 32 cores on a single chip or more... that seemed small for a single core, but over 32+ cores that does add up.

I like the idea of x86 compatibility, but I feel that technology wise ARM (Intel still has a license) could have helped make the chip smaller without sacrificing performance.

What do you think?
__________________
"Any idea worth a damn is already patented... twice" -Mfa
Panajev2001a is offline   Reply With Quote
Old 09-Jun-2009, 19:18   #5
3dilettante
Senior Member
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 2,535
Default

Quote:
Originally Posted by rpg.314 View Post
Well, if they wanted to introduce LRBni to the desktop, surely, they would have thought of it before they forked the x86 ISA. While they can emulate LRBni using AVX2 or some such thing, they may be actually trying to emulate the legacy x86 crap.
Larrabee's somewhat of a bastard child of x86 that was conceived as a thing apart and not welcome even now.

Its support of x86 past the original Pentium is zero, and other elements of Intel do not want to expose it as a general programming target.

We'll also have to see where the opcode space is allocated LRBni. Given the weak coordination between the main line and the Larrabee team, we don't know what LRB encodings might already be taken up by existing x86 extensions.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline   Reply With Quote
Old 09-Jun-2009, 19:41   #6
3dilettante
Senior Member
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 2,535
Default

Quote:
Originally Posted by Panajev2001a View Post
In the many-core era does not that problem get worse and worse?
Manycore does force things back to how they were before bloat hid the cost of ISA complexity, and its primary way of scaling means that the proportion does not shrink with transistor budgets.

Quote:
I like the idea of x86 compatibility, but I feel that technology wise ARM (Intel still has a license) could have helped make the chip smaller without sacrificing performance.

What do you think?
A while back I tried to guess at what the cost could have been.
Comparing a Pentium to a roughly contemporaneous RISC lead to an estimated 12-16% penalty in die area.
Some things have come to light that make the guess less applicable, such as the fact that Intel has narrowed the standard x86 issue capability to 1 instruction, with a possible commensurate reduction in the amount of hardware on the x86 side of the core.

edit:
Other confounding factors are that at the time I didn't know if there would be texturing hardware and what proportion of the die woudl be taken up by other things besides cores, which reduces the proportion. It might be something like 10% in aggregate.
What power penalty is something I don't have the data to calculate, and it would be dominated by the vector unit.

We wouldn't really know without Nvidia or somebody springing an ARM Larrabee on us.

edit edit:
Although, the die shots show that the L2 is smaller than 1/4 of the core+cache tile area, which means with the vector unit taking up 1/3, the area that is x86 is actually somewhat bigger than what I guessed at.
__________________
Dreaming of a .065 micron etch-a-sketch.

Last edited by 3dilettante; 09-Jun-2009 at 19:54.
3dilettante is offline   Reply With Quote
Old 09-Jun-2009, 20:16   #7
TimothyFarrar
Member
 
Join Date: Nov 2007
Location: Santa Clara, CA
Posts: 427
Default

"One critical point we were told was that 1st and 2nd generation Larrabee GPUs will not be compatible with 3rd generation Larrabee."

I'm missing why (even if this is true) that one wouldn't expect something like this anyway. C++ with vector intrinsics isn't exactly designed to be ideal even within the same architecture over generations, got DX11/OpenGL/OpenCL for that (and even that isn't ideal either). Given AMD's entrance of 64bit extensions, different cacheline sizes, all the changes to SSE over the years, x86 hasn't ever been future safe. Even with x86's backwards compatibility, it has still been a good idea to recode performance critical assembly or intrinsics code for different generations of x86 arch, so who cares if the Larrabee ISA changes with different generations!
__________________
Timothy Farrar :: blog
TimothyFarrar is offline   Reply With Quote
Old 09-Jun-2009, 20:25   #8
3dilettante
Senior Member
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 2,535
Default

There are subtle difference between new and older x86 chips in various places that can cause problems, even with backwards compatibility.

To state that there is an actual break in compatibility is indicative of something more substantial than a tightened specification or differing instruction corner cases.
It means something more significant might happen at that point for Larrabee, possibly more widespread use.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline   Reply With Quote
Old 09-Jun-2009, 20:40   #9
MfA
Senior Member
 
Join Date: Feb 2002
Posts: 3,656
Send a message via ICQ to MfA
Default

Quote:
Originally Posted by 3dilettante View Post
If the third generation of Larrabee has cores that can be used in consumer CPUs, then it would probably happen after Larrabee and the x86 main lines hit some convergent ISA extension.
It would be a bit depressing to see the Larrabee ISA hubbled with all the superfluous legacy of umpteen generations of x86.
MfA is online now   Reply With Quote
Old 09-Jun-2009, 20:50   #10
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: India
Posts: 1,294
Send a message via Skype™ to rpg.314
Default

Quote:
I like the idea of x86 compatibility, but I feel that technology wise ARM (Intel still has a license) could have helped make the chip smaller without sacrificing performance
.

I don't like the idea of x86 compatibility, but yes an ARM LRB would be great. But even there, there are many questions. IE, would you like to have the Jazelle, Thumb, NEON etc.?

Quote:
Although, the die shots show that the L2 is smaller than 1/4 of the core+cache tile area, which means with the vector unit taking up 1/3, the area that is x86 is actually somewhat bigger than what I guessed at.
OK 1/4 cache + 1/3 VPU ~ 58% is useful area. The rest of x86 takes up, 42% which is massive. Minimizing this waste could help LRB catch up. Remember, they need (apparently) 600 mm2 on 45 nm to catch up with at ~480 mm2 chip on 55 nm.

And it would probably need 300W to stay alive.
__________________
My blog

Eigen2 : simd done right.
rpg.314 is offline   Reply With Quote
Old 09-Jun-2009, 20:53   #11
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: India
Posts: 1,294
Send a message via Skype™ to rpg.314
Default

Quote:
Originally Posted by MfA View Post
It would be a bit depressing to see the Larrabee ISA hubbled with all the superfluous legacy of umpteen generations of x86.
Hmm, let's see. MMX for float 2, SSEx for float4, AVX for float8 and LRBni for float16.

Spaghetti, anyone?
__________________
My blog

Eigen2 : simd done right.
rpg.314 is offline   Reply With Quote
Old 09-Jun-2009, 20:53   #12
3dilettante
Senior Member
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 2,535
Default

Quote:
Originally Posted by MfA View Post
It would be a bit depressing to see the Larrabee ISA hubbled with all the superfluous legacy of umpteen generations of x86.
I am currently at a loss for alternatives that would match making an incompatible x86 Larrabee chip.

Shifting the Larrabee instruction set so that it can be used more widely might justify breaking the compatibility with the insular accelleration board variants.
There hasn't been any comment about something frightfully wrong with the current instructions, so why remove compatibility on a whim?
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline   Reply With Quote
Old 09-Jun-2009, 21:00   #13
MfA
Senior Member
 
Join Date: Feb 2002
Posts: 3,656
Send a message via ICQ to MfA
Default

They might go narrower
MfA is online now   Reply With Quote
Old 09-Jun-2009, 22:44   #14
Nick
Senior Member
 
Join Date: Jan 2003
Location: Ghent, Belgium
Posts: 1,311
Default

Quote:
Originally Posted by rpg.314 View Post
Hmm, let's see. MMX for float 2, SSEx for float4, AVX for float8 and LRBni for float16.

Spaghetti, anyone?
You're confusing MMX with 3DNow! And AVX can be extended up to float32.
Nick is offline   Reply With Quote
Old 09-Jun-2009, 22:59   #15
PhilTaylor
Junior Member
 
Join Date: Oct 2008
Location: Redmond area
Posts: 25
Default

Quote:
Originally Posted by rpg.314 View Post
I'd take some of this Tom's reporting with a grain of salt, as I said here, a few of their latest articles contradict each other.

Until Intel states what the ISA compatibility story is from LRB1 to LRB2 and LRB2 to LRB3, this story is just speculation.
__________________
http://www.futuregpu.net
ex-D3D/Flight Sim/MS
PhilTaylor is offline   Reply With Quote
Old 09-Jun-2009, 23:57   #16
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: India
Posts: 1,294
Send a message via Skype™ to rpg.314
Default

Quote:
Originally Posted by Nick View Post
You're confusing MMX with 3DNow! And AVX can be extended up to float32.
In a LRBni compatible way? I mean with all the masks and swizzles etc.?
__________________
My blog

Eigen2 : simd done right.
rpg.314 is offline   Reply With Quote
Old 10-Jun-2009, 02:17   #17
Barbarian
Senior Member
 
Join Date: Jun 2005
Location: California, USA
Posts: 187
Default

Quote:
Originally Posted by rpg.314 View Post
And it would probably need 300W to stay alive.
Now, now, this is pure speculation. Let's wait for the actual hardware before declaring winners.
I personally would be very impressed if they get a Larrabee implementation matching GTX 285 - with completely software DX10 driver mind you.
If this kind of performance can be achieved while emulating a hardware oriented API, I can't wait to see what direct low level access can get you in terms of performance, quality and features.
Barbarian is offline   Reply With Quote
Old 10-Jun-2009, 02:57   #18
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: India
Posts: 1,294
Send a message via Skype™ to rpg.314
Default

Quote:
Originally Posted by Barbarian View Post
Now, now, this is pure speculation. Let's wait for the actual hardware before declaring winners.
Fair enough.
Quote:
I personally would be very impressed if they get a Larrabee implementation matching GTX 285 - with completely software DX10 driver mind you.
To each, his own. I'd be disappointed if they brought a product to the market with ~1.3x the area on a smaller and more mature process. And let's face it, what can LRB provide you as a programmer that GT200 can't (from a programmability POV). I mean, what the hell is the point of a super corei7 with some nice vector ISA if it deosn't deliver kick-ass perf. Do you really care if it is x86 compliant. Are you going to call mmap() or the BCD arithmetic instructions from a shader? (or from a CUDA/opencl kernel). It doesn't (or can't do) something that it's competitors can't.


To be fair though, I am really impressed with the automatic load balancing and the multi-LRB scaling possibility.
Quote:
If this kind of performance can be achieved while emulating a hardware oriented API, I can't wait to see what direct low level access can get you in terms of performance, quality and features.
This perf is from a direct low-level access. After all, Abrash and co. were involved in their graphics pipeline implementation.
__________________
My blog

Eigen2 : simd done right.
rpg.314 is offline   Reply With Quote
Old 10-Jun-2009, 05:09   #19
Davros
Darlek ******
 
Join Date: Jun 2004
Posts: 5,335
Default

A noob question if I may
I'd buy larrabee to replace my gpu / cpu or both ?
__________________
Guardian of the "Sacred Terabyte of Gaming Goodness™"
Davros is online now   Reply With Quote
Old 10-Jun-2009, 06:05   #20
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: India
Posts: 1,294
Send a message via Skype™ to rpg.314
Default

Good for you then. I'd prefer to have something which has better perf/price. I really don't care if my GPU can run MS Word.
__________________
My blog

Eigen2 : simd done right.
rpg.314 is offline   Reply With Quote
Old 10-Jun-2009, 06:48   #21
dkanter
Senior Member
 
Join Date: Jan 2008
Posts: 194
Default

RPG - lrb is about programmability. In theory, a pure accumulator architecture can run the same shaders as GT200 or LRB. In practice...

I see no reason to remove x86 compatibility from LRB, although it'd be a good idea to shrink the overhead as much as possible.

Also, LRB has some pretty big handicaps compared to NV's GPU efforts:
1. First discrete GPU by Intel (in a long time)
2. First time the design team has worked together
3. First time the driver team has worked together
etc. etc.

Intel is coming at this from a big disadvantage and hopefully they can get close to NV.

DK
__________________
www.realworldtech.com
dkanter is offline   Reply With Quote
Old 10-Jun-2009, 07:11   #22
hoho
Member
 
Join Date: Aug 2007
Location: Estonia
Posts: 189
Send a message via MSN to hoho Send a message via Skype™ to hoho
Default

Quote:
Originally Posted by rpg.314 View Post
OK 1/4 cache + 1/3 VPU ~ 58% is useful area. The rest of x86 takes up, 42% which is massive.
Does that 42% area also contain ringbus connection, scalar core or any other "useless" things?
Quote:
Originally Posted by rpg.314
And let's face it, what can LRB provide you as a programmer that GT200 can't (from a programmability POV).
Depends on what you are after. For regular DX/GL stuff it doesn't provide much but LRB is not limited to those areas.
hoho is offline   Reply With Quote
Old 10-Jun-2009, 13:40   #23
3dilettante
Senior Member
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 2,535
Default

Quote:
Originally Posted by hoho View Post
Does that 42% area also contain ringbus connection, scalar core or any other "useless" things?
The 42% area given is the scalar core (don't know about the exact number, the actual die shots show differing ratios), which if going by comparisons of Pentium-era chips is possibly a third too large. The original guesswork didn't go beyond rough fractions because too much wasn't known.
Now that we see die shots of Larrabee, a lot of data about what is taken up by the cores, cache, and other hardware is now known, and this does dilute the x86 penalty further.

The dominant die penalty as far as current graphics applications are concerned might not be x86, so much as Intel's insistence on using a fully featured CPU core 32 times over.
x86 might have made the solution in the ballpark of 10% larger than it otherwise would have been--although there are more modern examples of even slimmer cores than the earlier RISC I used as a baseline (and some really tiny embedded cores).
Until more exotic methods of rendering become available for a good comparison, we can see from the Larrabee rasterizer description that a decent chunk of the instructions and much of the general capability of the x86 cores is not used.
A general-purpose RISC core would still have this functionality, and while it might be slimmer, it would still not be used and multiplied 32 times over.

I cannot really figure on what power penalties there might be, as there is no good way to compare without physical implementations, and too much can vary between manufacturers and processes. The vector units themselves will likely swamp the TDP calculations during peak loads.
__________________
Dreaming of a .065 micron etch-a-sketch.

Last edited by 3dilettante; 10-Jun-2009 at 13:47.
3dilettante is offline   Reply With Quote
Old 10-Jun-2009, 14:53   #24
Nick
Senior Member
 
Join Date: Jan 2003
Location: Ghent, Belgium
Posts: 1,311
Default

Quote:
Originally Posted by rpg.314 View Post
In a LRBni compatible way? I mean with all the masks and swizzles etc.?
No. Although masking and swizzling support is pretty good, AVX sorely lacks scatter/gather. As vectors get wider, collecting elements from different memory locations becomes a huge bottleneck. Also lacking is any reasonably fast exp/log instructions.

On the other hand, AVX does support vectors of small integers. But with 512-bit registers I don't think that actually matters much anyway.
Nick is offline   Reply With Quote
Old 10-Jun-2009, 16:02   #25
PhilTaylor
Junior Member
 
Join Date: Oct 2008
Location: Redmond area
Posts: 25
Default

Quote:
Originally Posted by Davros View Post
A noob question if I may
I'd buy larrabee to replace my gpu / cpu or both ?
LRB1 will be a discrete add in card, eg GPU.
__________________
http://www.futuregpu.net
ex-D3D/Flight Sim/MS
PhilTaylor is offline   Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 13:48.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.