Go Back   Beyond3D Forum > News Forums > Beyond3D News

Reply
 
Thread Tools Display Modes
Old 11-Aug-2006, 21:16   #1
Arun
One Mind, One Goal
 
Join Date: Aug 2002
Posts: 4,085
Default Open G965 drivers released - Exclusive hardware details found

Two days ago, Intel released a new version of their open source Linux IGP driver and opened a new website for IGP driver development and discussion. Intel's IGP drivers have been open source for a long time now, but the big news in this new release is the full implementation of Mesa 3D, and therefore OpenGL, for the G965, Intel's next-generation D3D10 IGP. By examining the source code, we have managed to extract some interesting and previously unknown information about this new chip. Here's what we can say about the G965 at the moment:
  • All programmable shading is handled in unified execution units, codenamed "GEN4 EU". Fixed function subsystems "call" those units.
  • All programmable ALUs are scalar to maximize usage efficiency. That means they work on a single component at a time, not vectors.
  • Triangle setup and related operations are also done in the EUs. In traditional architectures, a special-purpose unit would exist for it.
  • Fog and alpha testing are implemented as parts of the pixel shader, which is expected of all DX10 architectures.
  • Math functions (EXP, LOG, SIN, etc.) are implemented in a 16-way "Mathbox" external unit with both full and partial precision.
  • Taylor expansions are sometimes faster than the Mathbox, because they don't require values to be in such specific bounds.
  • The geometry shader is already used to implement some OpenGL functionality, including wireframe rendering.
  • The pixel shader works on blocks of 16 pixels. It is unknown whether that is also the case for vertices and primitives.
This driver was apparently developed on a simulator and later tested on revision c1 silicon, where a few issues came up - no cause for alarm from a chip's first revision. Finally, it seems likely that G965 is based on an array of 16 scalar ALUs and (4?) dedicated texture processors. But the drivers never state that explicitly, so this conclusion should be taken with a grain of salt until officially confirmed. We look forward to bringing you a more detailed review--including performance profiling--as actual chips become available.
Arun is offline   Reply With Quote
Old 11-Aug-2006, 21:32   #2
Arun
One Mind, One Goal
 
Join Date: Aug 2002
Posts: 4,085
Default

And for programmers interested in looking at the Mesa 3D driver source code themselves, it can easily be downloaded through CVS with the following command:
cvs -d :pserver:anoncvs@anoncvs.freedesktop.org:/cvs/mesa checkout Mesa/src/mesa/drivers/dri/i965

Uttar
Arun is offline   Reply With Quote
Old 11-Aug-2006, 21:34   #3
Tim Murray
whoops
 
Join Date: May 2003
Location: Santa Clara, CA
Posts: 3,266
Default

Any chance you could give us some code statistics so I don't have to check it out from CVS? You know, basic stuff like lines of code, number of files, stupid meaningless crap like that?
Tim Murray is offline   Reply With Quote
Old 11-Aug-2006, 22:33   #4
Demirug
Senior Member
 
Join Date: Dec 2002
Posts: 1,253
Send a message via MSN to Demirug
Default

They mesa driver (the hardware specific part) have around 100 files and 1 MB code.
__________________
GPU blog
Demirug is offline   Reply With Quote
Old 12-Aug-2006, 03:05   #5
zed
Member
 
Join Date: Dec 2005
Posts: 1,278
Default

excellant news
if u dont wanna use CVS u can download the latest stable release of mesa from there website in a single download
zed is offline   Reply With Quote
Old 12-Aug-2006, 07:32   #6
Arun
One Mind, One Goal
 
Join Date: Aug 2002
Posts: 4,085
Default

Quote:
Originally Posted by zed View Post
excellant news
if u dont wanna use CVS u can download the latest stable release of mesa from there website in a single download
That's too old to include the G965's stuff though, afaik.

Uttar
Arun is offline   Reply With Quote
Old 12-Aug-2006, 08:06   #7
Killer-Kris
Member
 
Join Date: May 2003
Posts: 540
Default

Thank you very much for sharing your insight.

Quote:
Originally Posted by Uttar View Post
We look forward to bringing you a more detailed review--including performance profiling--as actual chips become available.

And I very much look forward to your findings!
Killer-Kris is offline   Reply With Quote
Old 12-Aug-2006, 08:47   #8
Demirug
Senior Member
 
Join Date: Dec 2002
Posts: 1,253
Send a message via MSN to Demirug
Default

Quote:
Originally Posted by Uttar View Post
Finally, it seems likely that G965 is based on an array of 16 scalar ALUs and (4?) dedicated texture processors.
May I ask what where the conclusion of the 16 scalar ALUs is based on? I am still looking at the code by my own but from what I have seen so far I have the feeling that they use vector units.
__________________
GPU blog
Demirug is offline   Reply With Quote
Old 12-Aug-2006, 09:11   #9
Arun
One Mind, One Goal
 
Join Date: Aug 2002
Posts: 4,085
Default

There ya go

Code:
static void emit_mad( struct brw_compile *p, 
		      const struct brw_reg *dst,
		      GLuint mask,
		      const struct brw_reg *arg0,
		      const struct brw_reg *arg1,
		      const struct brw_reg *arg2 )
{
   GLuint i;

   for (i = 0; i < 4; i++) {
      if (mask & (1 << i)) {
	 brw_MUL(p, dst[i], arg0[i], arg1[i]);

	 brw_set_saturate(p, (mask & SATURATE) ? 1 : 0);
	 brw_ADD(p, dst[i], dst[i], arg2[i]);
	 brw_set_saturate(p, 0);
      }
   }
}
And, even more obvious:
Code:
static void emit_dp3( struct brw_compile *p, 
		      const struct brw_reg *dst,
		      GLuint mask,
		      const struct brw_reg *arg0,
		      const struct brw_reg *arg1 )
{
   assert((mask & WRITEMASK_XYZW) == WRITEMASK_X);

   brw_MUL(p, brw_null_reg(), arg0[0], arg1[0]);
   brw_MAC(p, brw_null_reg(), arg0[1], arg1[1]);

   brw_set_saturate(p, (mask & SATURATE) ? 1 : 0);
   brw_MAC(p, dst[0], arg0[2], arg1[2]);
   brw_set_saturate(p, 0);
}
Uttar
Arun is offline   Reply With Quote
Old 12-Aug-2006, 10:23   #10
Demirug
Senior Member
 
Join Date: Dec 2002
Posts: 1,253
Send a message via MSN to Demirug
Default

I know this part of the code and it looks similar to that part of my own code that I want to use for a D3D10 CPU SSE device. I believe they confusion was based on the definition of vector and scalar ALU/FPU. It seems you are talking about vector ALUs that are working on vector that represented a logical vector like an XYZW or RGBA value. I was talking about vectors of values were the ALU does the same operation for every value in the vector like 16 R values of 16 different pixel.

Anyway it’s interesting seeing that Intel take the same approach like 3DLabs when it comes to shader/program execution. The only thing that surprised me a little bit a first is that they seem to use a 16 channel math unit but a 8 channel general register file. Maybe they can use the 8 32 Bit entry’s there as 16 16 bit values too.
Hopefully they will add the new OpenGL extension for the “D3D10 features” soon. But this will require an update for MESA too.
__________________
GPU blog
Demirug is offline   Reply With Quote
Old 12-Aug-2006, 11:37   #11
Arun
One Mind, One Goal
 
Join Date: Aug 2002
Posts: 4,085
Default

Quote:
Originally Posted by Demirug View Post
I know this part of the code and it looks similar to that part of my own code that I want to use for a D3D10 CPU SSE device. I believe they confusion was based on the definition of vector and scalar ALU/FPU. It seems you are talking about vector ALUs that are working on vector that represented a logical vector like an XYZW or RGBA value. I was talking about vectors of values were the ALU does the same operation for every value in the vector like 16 R values of 16 different pixel.
Yup, that was definitively the confusion. So fundamentally their ALUs are 16x SIMD, but for each pixel/vertex/whatever, they only execute one scalar op per cycle. At least, apparently. So what I meant is that they're scalar from a shader programmer's perspective.
Quote:
Anyway it’s interesting seeing that Intel take the same approach like 3DLabs when it comes to shader/program execution.
I don't think they're the only one taking such an approach for the D3D10 Generation, but this is not today's subject...
Quote:
The only thing that surprised me a little bit a first is that they seem to use a 16 channel math unit but a 8 channel general register file. Maybe they can use the 8 32 Bit entry’s there as 16 16 bit values too.
I'm really not sure about that. Maybe they have multiple register pools depending on how much latency they can hide, and they allocate them in blocks of 16xFP16 or 2x8xFP32. Of course, I am probably thinking of this too much from a NV40-perspective.
Quote:
Hopefully they will add the new OpenGL extension for the “D3D10 features” soon. But this will require an update for MESA too.
Indeed, and I certainly hope the G80 and R600 come with their full functionality exposed in OpenGL on release day! I mean, I do have plans for these babies, after all


Uttar
Arun is offline   Reply With Quote
Old 13-Aug-2006, 01:22   #12
mczak
Senior Member
 
Join Date: Oct 2002
Posts: 1,297
Default

Quote:
Originally Posted by The Baron View Post
Any chance you could give us some code statistics so I don't have to check it out from CVS? You know, basic stuff like lines of code, number of files, stupid meaningless crap like that?
You could browse Mesa cvs online, though it doesn't really show basic stats about the files: http://cvsweb.freedesktop.org/mesa/M...vers/dri/i965/
(Anyone knows what that omnipresent "brw" abbreviation means?)
(edit: D'oh! Broadwater...)

Quote:
Originally Posted by Uttar
Finally, it seems likely that G965 is based on an array of 16 scalar ALUs and (4?) dedicated texture processors.
I can't really see where you got that number from too, the hardware could handle more physical EUs fully transparent. That said, 16 scalar ALUs isn't that much, though they should be clocked fairly high according to rumours (667Mhz - wasn't that in that whitepaper even?). Compared to the ati xpress200 which has "the raw power equivalent of only 8 scalar ALUs" (3+1 component/2 pipes) in the pixel shader and which is clocked lower this doesn't sound too bad, though of course those units in the xpress200 have nothing else to do than pixel shading.
On paper, I think I really like the gma x3000 so far. Looks like a very flexible architecture. Of course, performance might not be there... (I want a review!!!)

Last edited by mczak; 13-Aug-2006 at 21:13.
mczak is offline   Reply With Quote
Old 13-Aug-2006, 03:57   #13
Dave Baumann
Gamerscore Wh...
 
Join Date: Jan 2002
Posts: 12,196
Default

XPRESS 200 is has a Vector MADD and Vector ADD per pipe, both 3+1.
__________________
Expand. Accelerate. Dominate.
ATI Radeon HD 5800 Series Graphics Cards - Designed by the Community
Dave Baumann is offline   Reply With Quote
Old 13-Aug-2006, 14:44   #14
mczak
Senior Member
 
Join Date: Oct 2002
Posts: 1,297
Default

Quote:
Originally Posted by Dave Baumann View Post
XPRESS 200 is has a Vector MADD and Vector ADD per pipe, both 3+1.
Ah right. Still, if you'd look at ALU power in the pixel shader alone, a gma x3000 with 16 scalar alus capable of a single MAD at 667Mhz would be quite a bit more powerful than even a xpress 1150 (at 400Mhz) with its 2*(3+1) MAD + ADD. Then again, rs600/690 is supposed to be more powerful but it's not here yet. Rumours say though that the x3000 is slower than the old gma 950 for some reason however. It doesn't need to be a spectacular performer but that would be awful .

(edit: actually that single MAD the gma x300 can do is only a MAC since it looks the ALU only ever takes 2 source and 1 destination argument. Thus the 3rd argument (for the add part) needs to be the same as the destination argument, otherwise you'd need to split a MAD into two instructions.)

Last edited by mczak; 14-Aug-2006 at 23:16.
mczak is offline   Reply With Quote
Old 13-Aug-2006, 15:10   #15
Dave Baumann
Gamerscore Wh...
 
Join Date: Jan 2002
Posts: 12,196
Default

From the sounds of it the ALU's in G965 are tasked with a lot of stuff that other graphics processors have fixed function processors for, which probably explains the speed differential.
__________________
Expand. Accelerate. Dominate.
ATI Radeon HD 5800 Series Graphics Cards - Designed by the Community
Dave Baumann is offline   Reply With Quote
Old 13-Aug-2006, 17:52   #16
Geo
Mostly Harmless
 
Join Date: Apr 2002
Location: Uffda-land
Posts: 9,156
Send a message via MSN to Geo
Default

Quote:
Originally Posted by Dave Baumann View Post
From the sounds of it the ALU's in G965 are tasked with a lot of stuff that other graphics processors have fixed function processors for, which probably explains the speed differential.
What, unified is slower than fixed function? [geo runs away very quickly, giggling]
__________________
"We'll thrash them --absolutely thrash them."--Richard Huddy on Larrabee
"Our multi-decade old 3D graphics rendering architecture that's based on a rasterization approach is no longer scalable and suitable for the demands of the future." --Pat Gelsinger, Intel
". . .its taking us longer than we would have liked to get a [Crossfire game] profiling system out there" --Terry Makedon, ATI, July 2006
"Christ, this is Beyond3D; just get rid of any f**ker talking about patterned chihuahuas! Can the dog write GLSL? No. Then it can f**k off." --Da Boss
Geo is offline   Reply With Quote
Old 13-Aug-2006, 17:57   #17
Demirug
Senior Member
 
Join Date: Dec 2002
Posts: 1,253
Send a message via MSN to Demirug
Default

Quote:
Originally Posted by geo View Post
What, unified is slower than fixed function? [geo runs away very quickly, giggling]
Programmable is slower than fixed. As Dave mentioned it looks like that Intel used their unified “shader” unit for more than only the typical shader jobs.
__________________
GPU blog
Demirug is offline   Reply With Quote
Old 13-Aug-2006, 21:21   #18
mczak
Senior Member
 
Join Date: Oct 2002
Posts: 1,297
Default

Quote:
Originally Posted by Demirug View Post
Programmable is slower than fixed. As Dave mentioned it looks like that Intel used their unified “shader” unit for more than only the typical shader jobs.
It is still interesting to see that in terms of functionality intel seems to have the most advanced graphic "chip" (almost) available on the market. Of course it might lack the performance so that the functionality isn't really useable (that is it could be underpowered like a FX5200 was wrt to PS 2.0 functionality to the point applications treat it as older generation part).
mczak is offline   Reply With Quote
Old 13-Aug-2006, 21:40   #19
Demirug
Senior Member
 
Join Date: Dec 2002
Posts: 1,253
Send a message via MSN to Demirug
Default

Quote:
Originally Posted by mczak View Post
It is still interesting to see that in terms of functionality intel seems to have the most advanced graphic "chip" (almost) available on the market. Of course it might lack the performance so that the functionality isn't really useable (that is it could be underpowered like a FX5200 was wrt to PS 2.0 functionality to the point applications treat it as older generation part).
Yes, and even with this slow performance they could win many developers if they provide a Beta D3D10 driver for Vista.
__________________
GPU blog
Demirug is offline   Reply With Quote
Old 14-Aug-2006, 03:32   #20
Dave Baumann
Gamerscore Wh...
 
Join Date: Jan 2002
Posts: 12,196
Default

Quote:
Originally Posted by geo View Post
What, unified is slower than fixed function? [geo runs away very quickly, giggling]
Programmable, at any one thing, is slower than fixed function, yes. Bear in mind that, for the most part, programmable shaders are an addition to the fixed function pipeline, and haven't really replaced that much overall.
__________________
Expand. Accelerate. Dominate.
ATI Radeon HD 5800 Series Graphics Cards - Designed by the Community
Dave Baumann is offline   Reply With Quote
Old 14-Aug-2006, 05:51   #21
DudeMiester
Member
 
Join Date: Aug 2004
Location: GTA, Ontario, Canada
Posts: 630
Send a message via MSN to DudeMiester
Default

Quote:
Originally Posted by mczak View Post
It is still interesting to see that in terms of functionality intel seems to have the most advanced graphic "chip" (almost) available on the market.
Well, I could throw any one of a number of software renderers on my CPU and say the same thing. It's not just the features that matter, it's the performance.
__________________
Forums are the Opiate of the Masses
DudeMiester is offline   Reply With Quote
Old 14-Aug-2006, 13:50   #22
Alci
Registered
 
Join Date: Aug 2006
Posts: 1
Default

Quote:
Originally Posted by Dave Baumann View Post
Programmable, at any one thing, is slower than fixed function, yes. Bear in mind that, for the most part, programmable shaders are an addition to the fixed function pipeline, and haven't really replaced that much overall.
Do you mean this purely in this market segment (ie low end) or am I just buying into the marketing hype at the higher end on the extent to which programmable shaders have taken over much of the traditional pipeline.
Alci is offline   Reply With Quote
Old 14-Aug-2006, 23:10   #23
mczak
Senior Member
 
Join Date: Oct 2002
Posts: 1,297
Default

Even for something as old as the r200 (radeon 8500), the arithmetic part of fragment programs were replacing the equivalent fixed function functionality (roughly texture environments in opengl) completely in hardware. But that's really all, the rest of the fixed function functionality remained. Even with something like r520 you still have dedicated hardware for things like triangle setup, but this, as Uttar mentioned, seems to be no longer the case for the new intel igp. It not only "simply" unifies vertex and pixel shaders.
Can't say I especially like the idea of the external mathbox though, it kinda goes against the idea of having generic execution units for everything.

Edit: actually the G965 still has some fixed function units for things like blending. There is some rough explanation of it at the beginning of brw_context.h: http://cvsweb.freedesktop.org/mesa/M....1&view=markup

Last edited by mczak; 15-Aug-2006 at 00:36.
mczak is offline   Reply With Quote
Old 21-Aug-2006, 12:33   #24
neliz
Senile Member
 
Join Date: Mar 2005
Location: Walsh Avenue, Santa Clara
Posts: 3,319
Send a message via ICQ to neliz Send a message via MSN to neliz
Default

Quote:
Originally Posted by Demirug View Post
Anyway it’s interesting seeing that Intel take the same approach like 3DLabs when it comes to shader/program execution.
I'm not sure if this was noted yet, but intel took over most engineers from 3DLabs according to l'inq last february.
That would certainly give your hunch a profound base.
__________________
Have a foot in the Stirrup

Last edited by neliz; 21-Aug-2006 at 12:39.
neliz is offline   Reply With Quote
Old 22-Aug-2006, 02:42   #25
3dcgi
Senior Member
 
Join Date: Feb 2002
Posts: 1,426
Default

Quote:
Originally Posted by neliz View Post
I'm not sure if this was noted yet, but intel took over most engineers from 3DLabs according to l'inq last february.
That would certainly give your hunch a profound base.
Once again I think the Inquirer is not quite accurate. Fort Collins was but one of two US sites that closed when 3dlabs closed the workstation business and I believe they were mostly if not all software engineers. Many from the Huntsville site now work for Nvidia.

Aquiring engineers 6 months ago is not enough time for them to make a significant impact on the architecture of a product that should ship soon.
__________________
http://www.3dcgi.com/

Last edited by 3dcgi; 22-Aug-2006 at 02:44.
3dcgi is offline   Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On

Forum Jump


All times are GMT +1. The time now is 16:19.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.