pegisys said:and I think in a ati interview they say microsoft had a big hand in designing the gpu(might have been the article that started this post)
In the CPU too.
pegisys said:and I think in a ati interview they say microsoft had a big hand in designing the gpu(might have been the article that started this post)
pegisys said:And I think in a ati interview they say microsoft had a big hand in designing the gpu(might have been the article that started this post)
tema said:Xenos is an unrefined lower-clocked R580 with edram.
They found a way to use this Z only pass to assist with tiling the screen to optimise the eDRAM utilisation. During the Z only rendering pass the max extents within the screen space of each object is calculated and saved in order to alleviate the necessity for calculation of the geometry multiple times. Each command is tagged with a header of which screen tile(s) it will affect. After the Z only rendering pass the Hierarchical Z Buffer is fully populated for the entire screen which results in the render order not being an issue. When rendering a particular tile the command fetching processor looks at the header that was applied in the Z only rendering pass to see whether its resultant data will fall into the tile it is currently processing and if so it will queue it, if not it will discard it until the next tile is ready to render. This process is repeated for each tile that requires rendering. Once the first tile has been fully rendered the tile can be resolved (FSAA down-sample) and that tile of the back-buffer data can be written to system RAM; the next tile can begin rendering whilst the first is still being resolved. In essence this process has similarities with tile based deferred rendering, except that it is not deferring for a frame and that the "tile" it is operating on is order of magnitudes larger than most other tilers have utilised before.
inefficient said:Well of course they had a big hand, they put up all the money and the product was for them.
But as far as the actual engineering and architecting of microprocessors, MS has little to nothing to contribute. The best engineers in these fields do work for hardware companies not software companies.
Microsoft chip veterans Larry Yang, Jeff Andrews and Nick Baker -- former 3DO hardware engineers who joined Microsoft through the WebTV acquisition -- told IBM exactly what they needed the chip to do. In fact, that's why IBM's top engineer on the project is considered the "chief engineer," while Micrsooft's Jeff Andrews is considered the chief architect.
"At engineering meetings, it was impossible to tell who were the Microsoft engineers and who were the IBM engineers," Comfort.
ATI Engineering said:You could claim either are correct depending on what you want to call a FLOP.
It is not as clean (i.e. the scalar engine cannot do a MUL-ADD) which would give us a straightforward 240 GFlops.
It really comes down to how you want to term operations like LOG(x), 1/X, 1/SQRT(X), SIN(X), COS(X). If we want to say that each of these is only a single FLOP, then 216 GFlops is the “correct” number, however, these operations are NOT single flops on any standard CPU (they take 3-6x as long on an Intel CPU as standard MUL or ADD) and are actually comprised in our implementation of multiple floating point muls and adds to achieve them, so you could claim as many as perhaps 6 flops for these operations (although some are not pure IEEE floating point operations).
So, 216 is the absolute lowest number and really does not do the scalar engine justice, but in a world where people only want to talk about muls and adds, is simple.
On the other end, a number as high as 6 flops for the scalar engine would give you a total of 336 GFlops.
And Nvidia used the same 'trick' to inflate their flops figures IIRCOn the other end, a number as high as 6 flops for the scalar engine would give you a total of 336 GFlops.
Dave Baumann said:OK, asked ATI over XMas if the Scalar processor was a duplicate of each of the components of the vector processor, hence whether the FLOPS rating was 216 or 240, and it appears not - in fact, it seems to be more of a special function processor:
Dave Baumann said:The Scalar ALU will act as a co-issue ALU for the the functions that it supports, which includes ADD / MUL (much the same as NVIDIA's Vector ALU will co-issue two instructions when it can).
What post-rendering functionality would that be?Lysander said:No, edram does not have only post-rendering gfx functionality
What R580 will do (or not) hasn't been announced yet. And xenos isn't any more bandwidth efficient than any other current GPU, it can't do hidden surface removal through deferred rendering, instead it (optionally) uses a Z-only rendering pass, which can be done (and IS done in some current PC game titles) on any 3D accelerator.but it is also embedded frame buffer for tile-based-rendering. R580 will not do that (I think). TBR is very BW efficient.
Guden Oden said:...
(4k bits aggregate bus width?)
...
Yea, I talked about core logic on daughter die. "Post-rendering" in a sense to apply hdr and aa after tile was already rendered on shaders.Guden Oden said:What post-rendering functionality would that be? The eDRAM has no rendering functionality at all
Ati Engineering said:these operations are NOT single flops on any standard CPU (they take 3-6x as long on an Intel CPU as standard MUL or ADD
Indeed. Sony should take note from Ati and NVidia and revise the Flop rating of a certain CPU to 11.4GFlops.nAo said:And Nvidia used the same 'trick' to inflate their flops figures IIRC
Jaws said:Dave,
Can you clarify when Xenos is pixel shading, if the ALUs can co-issue/issue like below?
1) vec3 (madds) + scalar (non-madds) ~ 7 flops/ALU
2) vec2 (madds) + vec2 (madds) ~ 8 flops/ALU
3) vec4 (madds) ~ 8 flops/ALU
I would think 1 is fine but not sure about 2 and 3?
Dave Baumann said:Co-issue means two instructions can be issued in parallel of the same cycle. Xenos's structure is different from current pixel shaders in that while most PC pixel shader pipelines are only Vector (that can optionally "co-issue" some non-vector combinations) Xenos can co-issue a full vector with a scalar instruction. i.e. its combinations will be Vec4 + Scalar, Vec3 + scalar, Vec2 + Scalar, Scalar + Scalar (with the vector ALU being on the left of the +) irrespective of pixel or vertex operations. AFAIK full vector instructions are by far the most frequent, with Vec3 and scalar after, Vec2 isn't very frequent at all.
To be quite honest I don't understand why the "_prev" modifier is so noteworthy.–Can co-issue 1 vector4 and 1 scalar op per cyclemul r0,r1,r2 // vector operation–Special “_prev†scalar operations use results of previous scalar operations:
+ rsq r3.x,r4.x // scalar operation
rsq r3._,r0.x // scalar result is retained
mul r0,r1,r2
+ adds_prev r4.x,r5.x // Adds result of rsq to r5.x
Dave Baumann said:Co-issue means two instructions can be issued in parallel of the same cycle. Xenos's structure is different from current pixel shaders in that while most PC pixel shader pipelines are only Vector (that can optionally "co-issue" some non-vector combinations) Xenos can co-issue a full vector with a scalar instruction. i.e. its combinations will be Vec4 + Scalar, Vec3 + scalar, Vec2 + Scalar, Scalar + Scalar (with the vector ALU being on the left of the +) irrespective of pixel or vertex operations. AFAIK full vector instructions are by far the most frequent, with Vec3 and scalar after, Vec2 isn't very frequent at all.