Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 30-May-2011, 21:25   #526
Ailuros
Epsilon plus three
 
Join Date: Feb 2002
Location: Chania
Posts: 7,767
Default

Quote:
Originally Posted by tangey View Post
ummmm...shake it ?
That's not what I understand under overflow; nice try though nonetheless.
__________________
People are more violently opposed to fur than leather; because it's easier to harass rich ladies than motorcycle gangs.
Ailuros is offline   Reply With Quote
Old 30-May-2011, 21:29   #527
Npl
Senior Member
 
Join Date: Dec 2004
Posts: 1,746
Default

Quote:
Originally Posted by Ailuros View Post
How do you get a bucket to overflow when you always empty it in time?
you cant, but thats an analogy for immediate renderers I guess.
With a TBDR you cant render a pixel until you know theres no further triangle that "hits" it - which means you cant empty your bucket until you poured all the water in (finished the scene). Thats assuming the TBDR requires to operate a single pass, if not then it needs to create a incomplete picture (and some information about ZValues) and then it can empty the bucket before accepting more water.
Npl is offline   Reply With Quote
Old 30-May-2011, 21:40   #528
Ailuros
Epsilon plus three
 
Join Date: Feb 2002
Location: Chania
Posts: 7,767
Default

Quote:
Originally Posted by Npl View Post
you cant, but thats an analogy for immediate renderers I guess.
With a TBDR you cant render a pixel until you know theres no further triangle that "hits" it - which means you cant empty your bucket until you poured all the water in (finished the scene). Thats assuming the TBDR requires to operate a single pass, if not then it needs to create a incomplete picture (and some information about ZValues) and then it can empty the bucket before accepting more water.
The 4 cores operate quite in a complicated fashion when it comes to macro- and micro-tiling.

http://worldwide.espacenet.com/publi...068895A1&KC=A1

http://worldwide.espacenet.com/publi...068895A1&KC=A1

Or display list related patents like that one: http://worldwide.espacenet.com/publi...115778A1&KC=A1
__________________
People are more violently opposed to fur than leather; because it's easier to harass rich ladies than motorcycle gangs.
Ailuros is offline   Reply With Quote
Old 30-May-2011, 21:58   #529
Npl
Senior Member
 
Join Date: Dec 2004
Posts: 1,746
Default

Quote:
Originally Posted by Ailuros View Post
The 4 cores operate quite in a complicated fashion when it comes to macro- and micro-tiling.

http://worldwide.espacenet.com/publi...068895A1&KC=A1

http://worldwide.espacenet.com/publi...068895A1&KC=A1

Or display list related patents like that one: http://worldwide.espacenet.com/publi...115778A1&KC=A1
and how is multicore related to the problems of overflowing buffers?
the problem is that you cant start (fragment-)processing a single tile unless you know there is nothing, like say a translucent triangle above the ones you have in your displaylist, that affects the outcome.
you are limited in the amount of information you can store before you begin rendering, so either you decide to drop something and hope none notices or you render what-you-have and then accept new data (the extreme example beeing immediate renderers, or some "hybrid" renderer that only defers aslong there is space).
Npl is offline   Reply With Quote
Old 30-May-2011, 22:35   #530
Ailuros
Epsilon plus three
 
Join Date: Feb 2002
Location: Chania
Posts: 7,767
Default

Quote:
Originally Posted by Npl View Post
and how is multicore related to the problems of overflowing buffers?
How is it unrelated in this particular thread in the first place? You've got 4 GPU cores in the NGP, so what how and why would overflow? Do you have same sized or dynamically sized macro tiles for all of the 4 cores and one or multiple display lists? I can only imagine they use multi-level display lists (or buffers or whatever one wants to call them), compress the hell out of it and store only the absolutely necessary.

Quote:
the problem is that you cant start (fragment-)processing a single tile unless you know there is nothing, like say a translucent triangle above the ones you have in your displaylist, that affects the outcome.
you are limited in the amount of information you can store before you begin rendering, so either you decide to drop something and hope none notices or you render what-you-have and then accept new data (the extreme example beeing immediate renderers, or some "hybrid" renderer that only defers aslong there is space).
Don't you think engineers have taken any of those considerations into account? Heck IMG has more than a few patents that affect display list control, compression etc etc. I can't imagine the display list(-s) are that small that they can be overflown that easily and doubt even more that if you manage to overflow that one you wouldn't get an IMR or hybrid whatever into the exact same theoretical trouble.
__________________
People are more violently opposed to fur than leather; because it's easier to harass rich ladies than motorcycle gangs.
Ailuros is offline   Reply With Quote
Old 30-May-2011, 23:32   #531
Exophase
Senior Member
 
Join Date: Mar 2010
Location: Cleveland, OH
Posts: 1,567
Default

IMG's GPUs do have support for multi-pass rendering, they don't have a problem loading a framebuffer from memory (and storing one to memory) if necessary. I'm almost certain that they'll stop binning and render what they have if they run out of space (including the space for the depth/stencil buffer they now need..)
Exophase is offline   Reply With Quote
Old 31-May-2011, 00:10   #532
Npl
Senior Member
 
Join Date: Dec 2004
Posts: 1,746
Default

Quote:
Originally Posted by Ailuros View Post
How is it unrelated in this particular thread in the first place? You've got 4 GPU cores in the NGP, so what how and why would overflow? Do you have same sized or dynamically sized macro tiles for all of the 4 cores and one or multiple display lists? I can only imagine they use multi-level display lists (or buffers or whatever one wants to call them), compress the hell out of it and store only the absolutely necessary.
4 cores only means 4 times the buffer size, doesnt change a thing for me. fixed size is fixed size and dynamic workload can exceed it.

Quote:
Originally Posted by Ailuros View Post
Don't you think engineers have taken any of those considerations into account? Heck IMG has more than a few patents that affect display list control, compression etc etc. I can't imagine the display list(-s) are that small that they can be overflown that easily and doubt even more that if you manage to overflow that one you wouldn't get an IMR or hybrid whatever into the exact same theoretical trouble.
Im pretty sure IMG has chosen a way to deal with this, however unlikely a problem it might be. Certainly devs on a closed platform will be vary of any limits (which likely are high enough so you dont reach them in realtime graphics) but a generic OpenGL driver surely has to take such things into account. And I just asked which way they deal with the problem (as Im curious of the implications)

@Exophase: Thanks. I guess this could be a reason recent PowerVR GPUs cant guarantee order-independent transparency, the output might dependent on where the rendering is halted and stored possibly truncated (less accurate then one-pass rendering)
Npl is offline   Reply With Quote
Old 31-May-2011, 04:59   #533
Aeoniss
Member
 
Join Date: Mar 2007
Location: Nebraska
Posts: 451
Default



Well then..
Aeoniss is offline   Reply With Quote
Old 31-May-2011, 11:04   #534
Ailuros
Epsilon plus three
 
Join Date: Feb 2002
Location: Chania
Posts: 7,767
Default

Quote:
Originally Posted by Npl View Post
4 cores only means 4 times the buffer size, doesnt change a thing for me. fixed size is fixed size and dynamic workload can exceed it.
Not necessarily 4x times the buffer size; unfortunately details are quite sparse and it takes a lot of guesswork what they're exactly doing. However assuming that you have under an instance all 4 cores completely available, if the scene gets split up into 4 viewports/macro tiles why would you need 4x times the buffer size? It's an honest question. Note that the relevant patent for MPs doesn't specify whether one or multiple display lists will be used; it's an either/or option.

Quote:
Im pretty sure IMG has chosen a way to deal with this, however unlikely a problem it might be. Certainly devs on a closed platform will be vary of any limits (which likely are high enough so you dont reach them in realtime graphics) but a generic OpenGL driver surely has to take such things into account. And I just asked which way they deal with the problem (as Im curious of the implications).
I'd be interested too in an as simple as possible explanation (in order to understand it myself) but I suspect they keep it under wraps as some sort of secret sauce.

In any case if there should be any cases where for whatever reason a DR would be forced to operate as an IMR (always in a highly relative sense) one of its advantages it would lose IMO would be effective fill-rate amongst others. But since in the embedded space GPUs in general don't have excessive fill-rates I don't see it as problem. If the NGP GPU is clocked at 200MHz as rumors want it, then it has 400MTexels and 3.2 GPixels z/stencil raw fill-rates per core.

Besides as Arun already noted, senior Simon's 2bpp & 4bpp PVRTC are a blessing aside other things.
__________________
People are more violently opposed to fur than leather; because it's easier to harass rich ladies than motorcycle gangs.
Ailuros is offline   Reply With Quote
Old 31-May-2011, 12:11   #535
Rys
Tiled
 
Join Date: Oct 2003
Location: Kings Langley, UK
Posts: 2,675
Default

Quote:
Originally Posted by Ailuros View Post
I'd be interested too in an as simple as possible explanation (in order to understand it myself) but I suspect they keep it under wraps as some sort of secret sauce.
It's not necessarily secret sauce as such, but you're right that we haven't talked about it much in public yet. I'll see about changing that, so there's a bit more information about how MP works at the work distribution and memory costs level.
__________________
A major redesign of the core ALU pineapple boomerang fortress.
Rys is offline   Reply With Quote
Old 31-May-2011, 17:44   #536
JohnH
Member
 
Join Date: Mar 2002
Location: UK
Posts: 570
Default

Actually I think we can be specific in saying that there is no significant change in memory cost associated with multi-core, I'm not sure why anyone would think there was.

Pretty certain there's been a public talk given on Multi-core by Tony King Smith that explained it's operation pretty well.

John.
JohnH is offline   Reply With Quote
Old 31-May-2011, 20:33   #537
ToTTenTranz
Senior Member
 
Join Date: Jul 2008
Posts: 2,157
Default

Quote:
Originally Posted by ToTTenTranz View Post
Doesn't the GPU-dedicated RAM need to increase in bandwidth as you increase the number of cores? Not theoretically of course, but practically.

You couldn't get a "high-end" version of your GPUs to "infinitely" scale linearly with increasing the number of cores without increasing memory bandwidth, right?
Quote:
Originally Posted by Rys View Post
Right, but the bandwidth requirement for extra cores is low.
Quote:
Originally Posted by JohnH View Post
Actually I think we can be specific in saying that there is no significant change in memory cost associated with multi-core, I'm not sure why anyone would think there was.

I'm not trying to start a flamewar between comrades or anything.. but where are we standing exactly?

Is it "so low" that you consider it "non significant"?
What ratios are we talking about? For each 100% increase in cores, you'll need 10% increase in memory bandwidth? More? Less? Not allowed to specify?
ToTTenTranz is online now   Reply With Quote
Old 31-May-2011, 20:41   #538
Ailuros
Epsilon plus three
 
Join Date: Feb 2002
Location: Chania
Posts: 7,767
Default

Series5XT scales up to 16 cores and not more. Ditto though for Series6/Rogue.

In any case if they claim officially themselves that past 16 it doesn't make any sense anymore, then obviously it wouldn't be worth bothering for something over 16 in pure theory for such a hypothetical case.

As for where you're standing, uhmm trust the more experienced one out of the two
__________________
People are more violently opposed to fur than leather; because it's easier to harass rich ladies than motorcycle gangs.
Ailuros is offline   Reply With Quote
Old 31-May-2011, 20:54   #539
ToTTenTranz
Senior Member
 
Join Date: Jul 2008
Posts: 2,157
Default

There's a core amount cap for series 6?
ToTTenTranz is online now   Reply With Quote
Old 31-May-2011, 21:15   #540
Rys
Tiled
 
Join Date: Oct 2003
Location: Kings Langley, UK
Posts: 2,675
Default

Quote:
Originally Posted by ToTTenTranz
I'm not trying to start a flamewar between comrades or anything.. but where are we standing exactly?
We said the same thing, just in different words.

Quote:
Is it "so low" that you consider it "non significant"?
What ratios are we talking about? For each 100% increase in cores, you'll need 10% increase in memory bandwidth? More? Less? Not allowed to specify?
No MP config I know of has considered adjusting memory config to increase bandwidth to help performance, versus single core. We obviously don't want to give away figures (sadly, I'd love to walk you through a profiled frame or frames and discuss the consumers).
__________________
A major redesign of the core ALU pineapple boomerang fortress.
Rys is offline   Reply With Quote
Old 31-May-2011, 21:18   #541
Rys
Tiled
 
Join Date: Oct 2003
Location: Kings Langley, UK
Posts: 2,675
Default

Quote:
Originally Posted by Ailuros
As for where you're standing, uhmm trust the more experienced one out of the two
Where did we say different things?
__________________
A major redesign of the core ALU pineapple boomerang fortress.
Rys is offline   Reply With Quote
Old 31-May-2011, 21:40   #542
JohnH
Member
 
Join Date: Mar 2002
Location: UK
Posts: 570
Default

Quote:
Originally Posted by ToTTenTranz View Post
I'm not trying to start a flamewar between comrades or anything.. but where are we standing exactly?

Is it "so low" that you consider it "non significant"?
What ratios are we talking about? For each 100% increase in cores, you'll need 10% increase in memory bandwidth? More? Less? Not allowed to specify?
Hmm, here I'll say it again for you,
Quote:
Actually I think we can be specific in saying that there is no significant change in memory cost associated with multi-core, I'm not sure why anyone would think there was
JohnH is offline   Reply With Quote
Old 31-May-2011, 23:37   #543
Entropy
Senior Member
 
Join Date: Feb 2002
Posts: 1,865
Default

Quote:
Originally Posted by ToTTenTranz View Post
I'm not trying to start a flamewar between comrades or anything.. but where are we standing exactly?

Is it "so low" that you consider it "non significant"?
What ratios are we talking about? For each 100% increase in cores, you'll need 10% increase in memory bandwidth? More? Less? Not allowed to specify?
This is getting strange - the amount of bandwidth needed per core will depend on what you're trying to do with it. And what the guys from IMG are saying is that if for example Apple wanted to quadruple the number of pixels on the iPad3 and therefore (hypothetically) utilized a 543MP8 configuration or four times the number of cores, those cores would use four times the bandwidth to do the same job, they wouldn't incur any extra penalty.

The real difficulty lies in predicting what types of codes your customer wants to run, what resources will be spent on memory paths, and design for maximum efficiency in terms of gates/power/cost. If you over engineer, then your design will be bloated with baggage that goes largely unused, and you leave an open window for your competitors to do more with less. On the other hand, obviously you want to provide the capabilities that the customer may want, as well as provide juicy new IP to license. So IMG offers both cores with different levels of complexity, and also the possibility to widen many of these to fit perceived need.

What I would like to see is bandwidth usage data for different but typical tasks, for, say, IMG, Mali, and Tegra respectively.
Entropy is offline   Reply With Quote
Old 31-May-2011, 23:52   #544
Shifty Geezer
Grumpy Mod
 
Join Date: Dec 2004
Location: In a pretty pink padded cell
Posts: 26,045
Default

Quote:
Originally Posted by Entropy View Post
This is getting strange - the amount of bandwidth needed per core will depend on what you're trying to do with it. And what the guys from IMG are saying is that if for example Apple wanted to quadruple the number of pixels on the iPad3 and therefore (hypothetically) utilized a 543MP8 configuration or four times the number of cores, those cores would use four times the bandwidth to do the same job, they wouldn't incur any extra penalty.
Yes. And I think some were suggesting that as you introduce more cores, there's an overhead, so instead of being a linear use it's exponential. eg. A linear memory would have 1 core using 5 GB/s, say; 2 cores 10 gb/s; 4 cores 20 GB/s. Whereas the suggestion is with overhead, that where 1 core consumes 5 GB/s, a second would push the total up to 12 GB/s, and quad-core would go to 28 GB/s.

Both Rys and JohnH are telling us that memory usage is linear, based on workload. You'll clearly need X times as much memory to drive X number of cores as they all consume data, but there's no additional penalty.
__________________
Shifty Geezer
...

Tolerance for internet moronism is exhausted. Anyone talking about people's attitudes in the Console fora, rather than games and technology, will feel my wrath. Read the FAQ to remind yourself how to behave and avoid unsightly incidents.
Shifty Geezer is online now   Reply With Quote
Old 01-Jun-2011, 01:04   #545
Rys
Tiled
 
Join Date: Oct 2003
Location: Kings Langley, UK
Posts: 2,675
Default

For the same number of pixels, bandwidth goes up almost negligably when you add cores to work on them. So it's not linear at all (for us anyway) when doing MP.
__________________
A major redesign of the core ALU pineapple boomerang fortress.
Rys is offline   Reply With Quote
Old 01-Jun-2011, 01:20   #546
Exophase
Senior Member
 
Join Date: Mar 2010
Location: Cleveland, OH
Posts: 1,567
Default

Quote:
Originally Posted by Shifty Geezer View Post
Yes. And I think some were suggesting that as you introduce more cores, there's an overhead, so instead of being a linear use it's exponential. eg. A linear memory would have 1 core using 5 GB/s, say; 2 cores 10 gb/s; 4 cores 20 GB/s. Whereas the suggestion is with overhead, that where 1 core consumes 5 GB/s, a second would push the total up to 12 GB/s, and quad-core would go to 28 GB/s.

Both Rys and JohnH are telling us that memory usage is linear, based on workload. You'll clearly need X times as much memory to drive X number of cores as they all consume data, but there's no additional penalty.
Why would you need X times as much memory for X cores, assuming that their job is to render a scene X times faster than a single core would? The amount of data assets wouldn't increase; same framebuffer size, same textures, and as far as binning is concerned I expect that to be the same too - you'd just be distributing the bins between different cores.

As far as bandwidth goes, there'd be no increase in outgoing to render targets since this is subdivided between cores with no overlap. There may be some instances where the same data needs to be loaded into separate cores where it would have been retained in the cache of a single core, but in that case it'll either stay in the cache of all the cores that loaded it or it wouldn't have stayed in the cache of the single core. In fact, the multiple cores will texture cache better because they'll have smaller working sets but the same amount of cache each (presumably). And if they have a shared L2 cache that's even better; I'd fully expect bandwidth requirements to go down after this, not up. Maybe someone can tell me if Series5XT MP has anything like this (like Mali-400MP does)

(then again, someone please tell me if there's some glaring flaw in my reasoning)
Exophase is offline   Reply With Quote
Old 01-Jun-2011, 07:43   #547
Ailuros
Epsilon plus three
 
Join Date: Feb 2002
Location: Chania
Posts: 7,767
Default

Quote:
Originally Posted by Rys View Post
For the same number of pixels, bandwidth goes up almost negligably when you add cores to work on them. So it's not linear at all (for us anyway) when doing MP.
Yes but if workload increases for a config with N amount more cores, then naturally the bandwidth requirements should increase too. I know it's obviously a dumb clarification since it's self explanatory, but some of us need unfortunately a KISS approach to comprehend it easier.

Else for workload X irrelevant in theory if you have N pipelines from a hypothetical single core vs. the same amount of N pipelines spread over Y cores the bandwidth requirements are fairly similar, yes?
__________________
People are more violently opposed to fur than leather; because it's easier to harass rich ladies than motorcycle gangs.
Ailuros is offline   Reply With Quote
Old 01-Jun-2011, 09:21   #548
Shifty Geezer
Grumpy Mod
 
Join Date: Dec 2004
Location: In a pretty pink padded cell
Posts: 26,045
Default

Quote:
Originally Posted by Exophase View Post
Why would you need X times as much memory for X cores,
I missed a 'bandwidth' there.

Quote:
Originally Posted by Ailuros View Post
Quote:
Originally Posted by Rys View Post
For the same number of pixels, bandwidth goes up almost negligably when you add cores to work on them. So it's not linear at all (for us anyway) when doing MP.
Yes but if workload increases for a config with N amount more cores, then naturally the bandwidth requirements should increase too.
Yeah, that's what I was getting at. If you are targeting a more powerful GPU, you'll feed it more data needing more BW. If you have the same graphics workload and start adding more cores, it won't change the bandwidth usage as it's the same amount of data, just being processed faster. Clearly the choice to go with a four or eight core SGX in a device would have to be coupled with a choice to increase RAM BW accordingly to feed them, but not with any overhead that means more cores = less memory efficiency.
__________________
Shifty Geezer
...

Tolerance for internet moronism is exhausted. Anyone talking about people's attitudes in the Console fora, rather than games and technology, will feel my wrath. Read the FAQ to remind yourself how to behave and avoid unsightly incidents.
Shifty Geezer is online now   Reply With Quote
Old 01-Jun-2011, 09:34   #549
RudeCurve
Senior Member
 
Join Date: Jun 2008
Posts: 1,747
Default

Isn't the main benefit of TBDRs is that you don't need to go off-chip as IMRs meaning you need less bandwidth for a given workload?
RudeCurve is offline   Reply With Quote
Old 01-Jun-2011, 10:40   #550
ToTTenTranz
Senior Member
 
Join Date: Jul 2008
Posts: 2,157
Default

Quote:
Originally Posted by Shifty Geezer View Post
Clearly the choice to go with a four or eight core SGX in a device would have to be coupled with a choice to increase RAM BW accordingly to feed them, but not with any overhead that means more cores = less memory efficiency.
That was exactly the info I was looking for.

By increasing the number of cores, one assumes the purpose is to also increase the number of rendered pixels, increased geometry, post-processing effects, higher-resolution textures, etc.
And by doing so, the SGX5 architecture would naturally need to also increase the memory bandwidth available for the whole GPU (and not memory bandwidth per-core), or the graphics subsystem would face a bottleneck eventually.


Maybe for not using the right terms, the answers I was getting were not for the question I made, hence all the confusion.
ToTTenTranz is online now   Reply With Quote

Reply

Tags
543mp4, dreamcast, imgtec, sgx, tbdr, vindicated

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 11:46.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.