Qualcomm shows working MSM8x60 at CES.

rpg.314 · Aug 19, 2011

It's actually funny that some are saying that deferred rendering in hw means something else and deferred rendering means in sw means something else.

Lazy8s · Aug 19, 2011

In Falanx's old "Competitive Advantages..." flyer, they talk about being an IMR; subsequent marketing is what confused the matter.

I imagine Adreno works a little like Xenos, with obvious exceptions for the eDRAM, and Dave wrote a good article here on that.

JohnH · Aug 19, 2011

rpg.314 said:
It's actually funny that some are saying that deferred rendering in hw means something else and deferred rendering means in sw means something else.

Tell me about it. It's the usual marketing ploy of generating confusion around any features or technical advantages your competitors may have by naming something different the same, often ignoring/re-writing history in the process. Another amusing example is NV calling Tegra shadng pipes cores so that they can "claim" that they're multi-core.

Lazy8s · Aug 19, 2011

... Which is really strange on nVidia's part since they'd be better off just boasting about the advantage of not including the overhead of MP cores.

JohnH · Aug 19, 2011

Lazy8s said:
In Falanx's old "Competitive Advantages..." flyer, they talk about being an IMR; subsequent marketing is what confused the matter.
.

That's another classic, they tried to claim they where somehow a hybrid IMR/TBR and also claimed that this meant they where "more" compatible than "traditional" tile based rendering solutions, when in fact they where just taking all the difficulties associated with TBR while removing some of the advantages of true TBDR.

Lazy8s · Aug 19, 2011

nVidia will have to change its marketing again next generation when they're up against SoCs like the A9600. Getting beat in performance when they're claiming 32+ or however many GPU "cores" versus their competitor which is still using two cores would be very awkward.

Exophase · Aug 19, 2011

Lazy8s said:
I imagine Adreno works a little like Xenos, with obvious exceptions for the eDRAM, and Dave wrote a good article here on that.

Yes, it's kind of like Xenos stripped down to one each of its most atomic units. The i.MX5x SoCs license it (got it from AMD before the Qualcomm purchase) and contain a good bit of information in their user manuals.

The Adreno 200, the earliest and most basic in the line, contains:

- 1 VLIW ALU capable of 2 vec4 FP32 operations and 1 scalar operation
- 1 TMU w/anisotropic filtering
- 1 ROP
- Triangle setup of 6 pixels/clock
- Early-Z rate of 4 pixels/clock

Instead of eDRAM it has embedded SRAM in 4-banks of 32-bit each, allowing for the specified 4x early-Z. The amount of SRAM is configurable.. for instance i.MX51x has 128KB and i.MX53x has 256KB. It's a given that substantial tiling should be done to get decent performance, unlike with Xenos where you didn't always need it.

I don't know much about the Adreno 200 successors (205, 220, etc), except that they obviously add more of these units.

3dcgi · Aug 20, 2011

The first Qualcomm design which came from Ati was quite different from Xenos' binning which was not a TBDR and my understanding is the first post Ati design is exclusively a TBDR.

I assume the definition is to bin the scene and don't start rasterization until binning is completed, though it seems JohnH is using a subtly different definition which I don't understand. Is there a document from Imagination explaining their definition? Does Imagination render depth only first or sort prior to final rasterization and this is the distinction? I think I remember talk about sorting in the past.

Xenos had very coarse grained binning, but Qualcomm uses much smaller tiles so the packet based approach was no good and required a different design.

Rys · Aug 20, 2011

3dcgi said:
Does Imagination render depth only first or sort prior to final rasterization and this is the distinction? I think I remember talk about sorting in the past.

We depth test and only rasterise visible geometry.

darkblu · Aug 20, 2011

JohnH said:
That's another classic, they tried to claim they where somehow a hybrid IMR/TBR and also claimed that this meant they where "more" compatible than "traditional" tile based rendering solutions, when in fact they where just taking all the difficulties associated with TBR while removing some of the advantages of true TBDR.

Come on, John, TBDR have their own difficulties which Yamato/Adreno did not inherit precisely because it's an IMR with early-z and binning, and not a deferred shading gpu. Outside of binning, the two architectures are as different as IMR is from TBDR. And I feel awkward by even mentioning this to you : )

Ailuros · Aug 20, 2011

I don't care what Qualcomm calls its Adreno GPU cores, but albeit 220 does a lot better than former cores it's barely competitive at the moment from what I've seen from public benchmarks with a SGX540, ULP GF etc.

Lazy8s · Aug 20, 2011

JohnH was referring to Mali there in the comment about the "hybrid IMR/TBR".

All of the designers in the mobile GPU space have presented really strong offerings, actually. So, their classification amounts to little more than a label; a rose by any other name and all that.

I'm more intrigued by the gap by which other video solutions seem to trail VXD/VXE in efficiency/power consumption.

JohnH · Aug 20, 2011

darkblu said:
Come on, John, TBDR have their own difficulties which Yamato/Adreno did not inherit precisely because it's an IMR with early-z and binning, and not a deferred shading gpu. Outside of binning, the two architectures are as different as IMR is from TBDR. And I feel awkward by even mentioning this to you : )

Eh? Comment was directed at Mali. However I would suggest that any TBR/TBDR/binning etc solution inherently carries some or all of the primary problems that it's detractors generally point to irrespective of the origins of the design and the, often large, differences in core architectures. The solutions to those problems (if the architecture even includes solution) are also somewhat variable.

3cgi, I think there's a link to a paper from our devtec guys that describes something of the differences in the architectures.

That asside, as I said above I think it's pretty pointless debating the terminolgy used here as marketing deartments will always twist and turn irrespective of historical use or even technical correctness!

Ailuros · Aug 21, 2011

Lazy8s said:
JohnH was referring to Mali there in the comment about the "hybrid IMR/TBR".

Well as he says marketing will always be there to twist terminology any way it suits. Albeit those gentlemen that decide for large semis while licensing or buying X technology are human too and can easily fall for any marketing twist, it's still a fact that a LOT of other factors count than what each IHV calls its core. Stuff like perf/W, perf/mm2, perf/$, experience, driver stability and a hole lot of other things.

Remember winning N desing wins once is definitely nice, but the real stake is to keep that N number alive in the future and struggle to constantly increase it. Gaining a design win because you've called it Borgo Sphinxter won't take you very far if the semi has bad or mediocre experiences with it.

All of the designers in the mobile GPU space have presented really strong offerings, actually. So, their classification amounts to little more than a label; a rose by any other name and all that.

That's the nice thing about competition. The even nicer aspect to it is that it helps reducing prices constantly. Up to recently smart-phones were high end luxury products mostly.

Exophase · Aug 22, 2011

3dcgi said:
The first Qualcomm design which came from Ati was quite different from Xenos' binning which was not a TBDR and my understanding is the first post Ati design is exclusively a TBDR.

I assume the definition is to bin the scene and don't start rasterization until binning is completed, though it seems JohnH is using a subtly different definition which I don't understand. Is there a document from Imagination explaining their definition? Does Imagination render depth only first or sort prior to final rasterization and this is the distinction? I think I remember talk about sorting in the past.

Xenos had very coarse grained binning, but Qualcomm uses much smaller tiles so the packet based approach was no good and required a different design.

The tiles are smaller than Xenos, but they're also a much larger than the ones you'd find in SGX or Mali designs. I couldn't say exactly how large it is in Adreno implementations, I just know that it's up to 256KB for i.MX53 which is equivalent to the original z430/Adreno 200. Rendering an 800x480 or so framebuffer with 32bpp color and 32-bit depth/stencil on would take about 12 tiles.

I was under the impression that the approach still has rasterization operating per-tile and not with full scene binning first, and I wasn't aware of any sort of hardware binner to assist with this. I guess the question is exactly what the drivers are doing in order to decouple vertex shading from rasterization in order to avoid having to perform multiple passes on the vertexes per-tile.

IMG coined the term "TBDR" and have never used the "D" to mean scene-grabbing, but deferred shading. It's accomplished by performing depth/stencil first, but presumably also outputting a primitive ID per-pixel as well instead of relying on a second early-Z test in order to determine ownership of the pixel. So yes, the end-result is per-pixel sorting.

3dcgi · Aug 24, 2011

Exophase said:
The tiles are smaller than Xenos, but they're also a much larger than the ones you'd find in SGX or Mali designs. I couldn't say exactly how large it is in Adreno implementations, I just know that it's up to 256KB for i.MX53 which is equivalent to the original z430/Adreno 200. Rendering an 800x480 or so framebuffer with 32bpp color and 32-bit depth/stencil on would take about 12 tiles.

I was under the impression that the approach still has rasterization operating per-tile and not with full scene binning first, and I wasn't aware of any sort of hardware binner to assist with this. I guess the question is exactly what the drivers are doing in order to decouple vertex shading from rasterization in order to avoid having to perform multiple passes on the vertexes per-tile.

Yamato (Ati) had some hardware assist for binning and Yoda (Qualcomm) takes it further. I don't know how much driver involvement is necessary relative to IMG's designs.

Exophase · Aug 24, 2011

3dcgi said:
Yamato (Ati) had some hardware assist for binning and Yoda (Qualcomm) takes it further. I don't know how much driver involvement is necessary relative to IMG's designs.

Any source for this?

JohnH · Aug 24, 2011

3dcgi said:
I don't know how much driver involvement is necessary relative to IMG's designs.

IMG tiling is entirely HW based, this was a lesson we learned from early implmentations where we've had SW based and mixed HW+SW before moving to full HW based is S3 (although CLX was also HW based).

John.

3dcgi · Aug 25, 2011

Exophase said:
Any source for this?

The Yamato spec (prior to it being acquired by Qualcomm) is the source for the first. I don't remember who told me about Yoda though I couldn't quote or link anything anyway.

Exophase · Aug 25, 2011

3dcgi said:
The Yamato spec (prior to it being acquired by Qualcomm) is the source for the first. I don't remember who told me about Yoda though I couldn't quote or link anything anyway.

No link for a Yamato spec claiming this?

Qualcomm shows working MSM8x60 at CES.

rpg.314

Lazy8s

JohnH

Lazy8s

JohnH

Lazy8s

Exophase

3dcgi

Rys

Graphics @ AMD

darkblu

Ailuros

Epsilon plus three

Lazy8s

JohnH

Ailuros

Epsilon plus three

Exophase

3dcgi

Exophase

JohnH

3dcgi

Exophase

Similar threads