Tiling on the Xbox 360: Status and Should Future Consoles Use eDRAM?

While I was searching for the proper naming of the nvidia technology ie Quicunx, I found review of the GeForce 6800 stating that quicunx was no longer available.

Is quicunx available on geforce 7xxx? disable in drivers?

From what I found the quicunx mode was offering almost 4xAA quality but at the perfs of 2xAA.
 
Quincunx is a rather crap technology. It's basically just a blur of one of the two samples in 2xAA. It looks fine for black polygons on a light background, as per NVidia's PR material, but for textured materials it sucks. ATI's wide tent filter in R600 sort of blows as well for the same reason.

NVidia probably disabled it because it was looked down upon in the community, and didn't really enhance the image much. Rendering a slightly lower resolution with 4xAA gives better edge quality and a sharper picture, IMO.
 
Pardon the sarcasm but...
Quincunx is a rather crap technology. It's basically just a blur of one of the two samples in 2xAA. It looks fine for black polygons on a light background, as per NVidia's PR material, but for textured materials it sucks. ATI's wide tent filter in R600 sort of blows as well for the same reason.
.... Rendering a slightly lower resolution with 4xAA gives better edge quality and a sharper picture, IMO.
It seems to me that you are telling us the startling news that using "~"4*xSAA is better than 2*xSAA. I never would have guessed.

You should be comparing 2*xSAA with a 1x1 pixel wide box filter versus 2x xSAA with a 2x2 pixel wide tent filter. I think you will find the latter is better, at least for FSAA. (For MSAA, I suspect that some small adjustments should be made to the MIP map level selection)
 
Google's Cached Posts for the remainder of Page 1:

ShootMyMonkey said:
How so? What's so intensive about going through the scene graph multiple times? Even if it was, why can't you just keep a buffer of pointers to each object to be drawn with a flag for each tile, so that you only have to traverse it once? Even with hundreds of thousands of objects on screen, it'll be under 1MB.
The way the original quote was stated made it sound like you were thinking of going through the entire graph multiple times. The data size isn't the problem, but the pointer chasing is. So what if the math is only a few tens of cycles? That doesn't mean much if your data organization is such a mess to begin with that getting to an object stalls you for several thousand (I've actually seen a few cases where people reported hitting 750,000-cycle stalls -- how they got there, I have no idea, but apparently it can happen).

Putting the issuing of a job as something that gets deferred into tile queues awaiting actual issue isn't quite as bad, but in most cases, you can't "just add" that.
Any examples?
Probably the canonical example that comes to mind is (offline) pre-built command buffers that make it easy for you to draw a static object in the form of set-your-constants-and-issue-the-buffer. This is a common leftover for a lot of people transitioning from previous gen, though it certainly isn't something you can use anymore (tiling and SPU culling both fly in the face of it). It kind of hurts when tiling because the idea back then was that too fine a granularity to your scene chunking was a bad thing. With tiling, you could end up with a lot of wasted work, though. If a chunk that crosses tile boundaries is simply a handful of big polygons, that's no big deal, but if it's big chunks of lots of little polygons, it really can bite back.

This is one of those things that happens at a lower level, and a lot of studios don't like the idea of revamping something so fundamental to their pipeline no matter what it means. The smaller you are, the more likely, I think. I know at my previous job we were kind of running into this, and all my assertions about sharing so much of your design philosophy for PS2/PSP with your renderpath for 360 seemed to be for naught.
The funny thing is that lack of tiling seems to be even more prevalent on XB360 exclusives.
Hmmm... well, I don't know what to tell you there. But it may well be that devs on exclusives could just feel free to play more cheats that are more specific to the 360.
That seems like a lot more trouble than it's worth. You can't even use the same Z-buffer, can you? Or can you resolve it and load it back in somehow? If not, it would have to be simple compositing.
It's actually not that big a deal, and you can at least share a Z-prepass, but it is possible to share the resolved multisample-Z. We do it as well, but in our case, we're compositing separate scenes entirely (and in our case, between those scenes, we clear the Z-buffer anyway).
I guess the best solution is to have a fallback of traditional rendering straight to RAM. Since BW of RAM seems to increase at a much slower pace than transistor density, there's a good chance that by next gen the ROP cost to make this possible would be almost negligible. I think this also means that EDRAM will become a necessity to get all you can from the billion-transistor chips we'll see next gen.
I've kind of liked the idea of having caches of much smaller working tiles. You can basically get your bandwidth but hide the tiling nature of the work. As long as there's a large enough cache so that you're not flushing tiles a million times, it can be relatively effective. Hardly perfect, but nothing ever could be.
-------------------------------------------------------------------------------------------------------------------------------------
Jesus2006 said:
Doesn't "no tiling" give you even more advantages through the EDRAM (bandwidth wise)? Maybe that's also a reason why so many (even first partys) don't do it and just keep a "steady" framebuffer in there.

Mintmaster said:
It doesn't give you "even more" advantage, but yes, obviously even without tiling the 360 is much faster than an equivalent 128-bit machine w/o EDRAM, and possibly faster than a 256-bit one.

The issue with tiling is that the work needed to implement it discourages MSAA. They're taking the free boost they get with <10MB framebuffers, and not bothering with the even bigger boost they get for larger framebuffers when tiling is used.

ShootMyMonkey said:
I don't know about "bandwidth-wise" since you have to write the resolved framebuffer out to RAM anyway... There are a few things you can't use while tiling (e.g. MEMEXPORT), but then you can always use those things in passes that don't rely on tiling anyway (assuming you can afford those passes).

But in general, I don't think there's a single person out there who'd say that "free" with some sacrifices is as good as "free" and orthogonal to anything else. I doubt anyone will ever look upon tiling on the 360 as a positive... only as a necessary evil to get a certain feature.

Mintmaster said:
Obviously, but that's a pretty stupid comparison to make. Free with some sacrifices is generally better than halving performance or worse.

MS wanted a 128-bit bus for cost scaling reasons. Given that constraint, it's hard to imagine any non-EDRAM design approaching the performance of Xenos.

ShootMyMonkey said:
Sure, but that doesn't say anything about why developers would elect to use lower resolution and upscaling or various other hacks to avoid tiling. All you're saying is that eDRAM at any cost vs. no eDRAM at all is mostly an improvement.

My only thing with this whole mess is that the wrong combination of constraints were pushed. If they were going to make 720p w/ 2xMSAA mandatory, then they should have put 14 MB of eDRAM rather than 10. Let tiling become an issue for those who choose to *exceed* the minimum requirements... not those who try to meet it. What would have happened if instead of 10 MB, we got 4 MB of eDRAM?

If they were so hard-set at a 10 MB limit (for what was most likely cost reasons), then they should have lifted the requirement for 2xAA or lifted the requirement for HD res. Instead MS gave us the trifecta, and said "we know better than you ever will." If not for that, this thread wouldn't even exist.
-------------------------------------------------------------------------------------------------------------------------------------
 
Barbarian said:
ShootMyMonkey said:
My only thing with this whole mess is that the wrong combination of constraints were pushed. If they were going to make 720p w/ 2xMSAA mandatory, then they should have put 14 MB of eDRAM rather than 10. Let tiling become an issue for those who choose to *exceed* the minimum requirements... not those who try to meet it. What would have happened if instead of 10 MB, we got 4 MB of eDRAM?
That's my exact thinking as well. Basically if next gen has enough EDRAM space for 4xMSAA 1080p with no tiling I'll be happy. Additionally I think they need to make that 4xMSAA compressed so that it can fit more than just 4 pixels.
Extra bonus would be to have programmable ROPS or at least read/texturing capability from EDRAM so we can avoid some of these costly resolves (which at 1080p would be a nightmare).
--------------------------------------------------------------------------------------------------------------------------------
Mintmaster said:
ShootMyMonkey said:
My only thing with this whole mess is that the wrong combination of constraints were pushed. If they were going to make 720p w/ 2xMSAA mandatory, then they should have put 14 MB of eDRAM rather than 10. Let tiling become an issue for those who choose to *exceed* the minimum requirements... not those who try to meet it. What would have happened if instead of 10 MB, we got 4 MB of eDRAM?
Just how "mandatory" is it? If all games were meeting that requirement, we wouldn't be having these AA discussions. The two biggest titles on the 360 -- Gears and Halo 3 -- don't have AA, so it's not really mandatory.

I think MS made it a "requirement" because without it, few devs would bother with AA. Just look at what you and joker454 said about how much game studios care about AA given what the average gamer thinks. For everyone else, though, MS is trying to enforce a higher level of image quality to take advantage of the hardware.

I think the amount was chosen because 14MB was too expensive, and 10MB means only 3 tiles for 720p w/ 4xAA (as opposed to 4 for 7MB), which is the standard they're hoping devs will go for. It should be rather easy if you already have 2 tiles going.
--------------------------------------------------------------------------------------------------------------------------------
ShootMyMonkey said:
Mintmaster said:
Just how "mandatory" is it? If all games were meeting that requirement, we wouldn't be having these AA discussions. The two biggest titles on the 360 -- Gears and Halo 3 -- don't have AA, so it's not really mandatory.
Originally, it was an absolute 100% mandatory rule... over the course of time, every developer on the planet basically said "no, it's out of the question." And there was a whole lot of screaming and gigabytes of emails all roughly saying "hell no." The mandate since then has been "something that can pass for visually equivalent in most cases" with the addendum that they Strongly Prefer(TM)* you tile and use hardware AA. That change was made a few months before the 360 came out on the market, so few people know of it.

In any case, if not for that change, you wouldn't see people trying to cover in various ways. It would simply mean that pretty much all the early 360 titles would have graphically sucked more than they did. Back when the 2xAA "equivalent" TRC change occurred, you could see people on the newsgroups posting stuff like "yay! that means we can actually hit our target of 500,000 polys per frame!" Which by today's standards, tiling or not, is really quite nothing as you know... Would things be different if we were approached with that requirement now? Maybe, but it's a bit late. Tiling has already been stamped with a poison label.

It might have also been a little better if the original samples and suggested methods that MS had talked about early on were actually halfway decent.

Mintmaster said:
For everyone else, though, MS is trying to enforce a higher level of image quality to take advantage of the hardware.
Correction : "as a selling point for the hardware." Saying "ours is better because all the games have AA." Of course, it kind of put Sony on the hotseat as well, and now they both suck for it.

Mintmaster said:
I think the amount was chosen because 14MB was too expensive, and 10MB means only 3 tiles for 720p w/ 4xAA (as opposed to 4 for 7MB), which is the standard they're hoping devs will go for. It should be rather easy if you already have 2 tiles going.
10 MB is fine for *something*... it's a matter of what else you want to put on the devs' tables.

Say for instance, they didn't require any AA, but still required output at 720p... 10 MB is fine. Say they required 2xMSAA at 1024*576p... again 10MB is fine. Similarly, if they'd foot the bill for 14 MB, no one would have a problem with 720p with 2xMSAA being made an absolute requirement either.

Perhaps I should have stated it as... there are cheats and avoidance measures partly because the original requirement was there and partly because the current requirement is vague by nature.


* Note that this is the Microsoft definition of "strongly prefer", which means they'll beat your ass for not abiding by their preference and make concessions on various other things if you do.
--------------------------------------------------------------------------------------------------------------------------------
ninelven said:
Should Future Consoles Use eDRAM?
/not a dev

The way I see the above question is that we are discussing a fixed target for the future (1080p). The question that needs to be asked is what problem does eDRAM currently address, and will this still be an issue in the future. The primary benefit of eDRAM appears to me to be bandwidth, so the question would be is memory bandwidth going to be an issue for consoles rendering at 1080p in 2009+?

I'm going to go out on a limb and guess no. I would also hope that fp16 blending with AA is standard by then, so if a console uses eDRAM it better have lots of it (which I would think would be cost prohibitive but I have no idea really).
--------------------------------------------------------------------------------------------------------------------------------
Cyan said:
I think you people have made some great contributions to this topic. Love this thread.
Mintmaster said:
I think this also means that EDRAM will become a necessity to get all you can from the billion-transistor chips we'll see next gen.
Fran said:
Mixed feelings here: one day I love EDRAM, the next day I would just get rid of it all together. As I wrote before, EDRAM makes some things extremely fast and some problems just disappear (bandwidth to the framebuffer can be considered for all practical purposes infinite).
Love it or leave it... but EDRAM is here to stay (I hope so). What would happen with backwards compatibility then? GPU side the unified arquitecture seems to be the future of graphics, therefore it's easy to emulate. On the contrary, the EDRAM is impossible to emulate because of its little logic.
ninelven said:
I'm going to go out on a limb and guess no. I would also hope that fp16 blending with AA is standard by then, so if a console uses eDRAM it better have lots of it (which I would think would be cost prohibitive but I have no idea really).
I need scarcely to say that I concur with you almost to the fullest extent (fp10 is doing quite fine this gen), and if they go again with tiling they better find -unlike with the X360 as Shootmymonkey already pointed out- a decent solution to the problem before a new console comes out or it's going to be a huge disappointment.

Anyways, looks like they found a viable solution for the blunder a couple of months ago as it has been mentioned before. 14 MB should be a must and 28MB just perfect this gen...
--------------------------------------------------------------------------------------------------------------------------------
nelg said:
Just how expensive is eDRAM? Is the cost reasonably static or is it scaling like other semiconductors?
Shifty Geezer said:
As I understand it, it's just like a CPU. The more you add, the bigger the die, the more faults and the lower the yields. On a 300 million transistor GPU, you could divide the transistor budget between 100 million on 10MB eDRAM and 200 million on shader arrays, or 150 million on 15MB eDRAM and 150 on the ALUs, or any such combination in a balancing act (easy example figures, not accurate!). The reason XB360 hasn't got more eDRAM is price. MS (or ATI?) decided to balance it how they did.
Gubbi said:
Building in redundancy to improve yield is ubiquitous. DRAM dies yield very. That means cost is linear with size (twice the size half the dies/wafer).

Cheers
Shifty Geezer said:
That's clearly true, now you mention it, although the issue is still one of balancing costs. If eDRAM upscaling is linear though, why'd MS not choose 14 MB as the best compromise?
--------------------------------------------------------------------------------------------------------------------------------
Mintmaster said:
ShootMyMonkey said:
Correction : "as a selling point for the hardware." Saying "ours is better because all the games have AA."
Of course! I thought it was sort of implied. Everything Microsoft does is to sell more hardware.
Say for instance, they didn't require any AA, but still required output at 720p... 10 MB is fine. Say they required 2xMSAA at 1024*576p... again 10MB is fine. Similarly, if they'd foot the bill for 14 MB, no one would have a problem with 720p with 2xMSAA being made an absolute requirement either.
Like I said, they really want you to do 4xAA. Once people get two tiles, three is a small step. 14MB and almost nobody does 4xAA. Relaxed TRC and again nobody tiles. I also have a feeling that MS wants tiling to become ubiquitous so that they don't need 64-128MB of EDRAM next gen to avoid tiling. If something can be done in software that's sort of painless (in the grand scheme of things) to simplify hardware, it should be done. They're trying to make you tile without having good devs/titles abandon the platform because they don't want to.

Looking at posts from joker454 and Fafalada as well as the number of MP titles using tiling on Quaz51's list in the upscaling thread, it can't be that bad. Thinking about it, it really shouldn't be.

Your opposition is mostly related to legacy assets and code, right? Where you bunch similar objects together into large groups to avoid renderstate changes and extra draw calls? Didn't this really hurt your frustum culling ability? Smaller pieces would be quite useful even without tiling. As long as your nominal piece size isn't much bigger than 10,000 polys, tiling should have minimal impact on your total poly budget.
--------------------------------------------------------------------------------------------------------------------------------
Shifty Geezer said:
Mintmaster said:
Like I said, they really want you to do 4xAA. Once people get two tiles, three is a small step. 14MB and almost nobody does 4xAA.
That makes a lot of sense. MS wants 4xMSAA and designed the system for that, only to hit protestations from developers, seemingly in part to issues using the hardware as MS intended. MS had to relax their ideas or they'd get no games at all, and now developers are used to having more of a choice, rather than agreeing to MS's plan for 4xAA throughout.

I can buy that reasoning.
--------------------------------------------------------------------------------------------------------------------------------
AlStrong said:
Warning: what follows is wild speculation that could get me insitutionalized
icon_exclaim.gif


Could it have been some diabolical plot to solve bandwidth issues in the PC arena? Suppose for a minute that tiling were introduced and executed properly to result in widespread adoption. With all the cross-platform development going on, it would have been an easy fit.

AMD could branch out to producing PC parts with eDRAM, and it would be fabbed in their own facilities. And because of tiling, they could vary the amount of eDRAM, and beef up the triangle power or compensate. At the same time they wouldn't have to resort to giant 512-bit memory busses, and then they wouldn't have a need for expensive 512-bit GDDR4 or ultra-high frequency memory chips.

For games that aren't compatible with tiling, let the GPU write out the framebuffer to the GPU's traditional local memory - say a 128 or 256-bit memory bus.

Of course, that could still happen in the future... maybe.

icon_redface.gif
(I really need to get out of this lab)
Mintmaster said:
Definately not! If anything, the conspiracy is the other way around.

For the PC, the IHV's want devs to use the most inefficient rendering method possible. They want FP16 to be standard, and they want people to use high res displays. Their whole business is based on gamers continually upgrading their cards.

With consoles, Sony and MS have a huge incentive to reduce hardware cost as much as possible, and they have exclusive titles to take advantage of every feature they put in. Radical departures on the PC side, however, are impossible.

The goals and impetus behind graphics and games development on the console side is very different from the PC side.
--------------------------------------------------------------------------------------------------------------------------------
3dcgi said:
mboeller said:
Also, ATi had access to the Flipquad MSAA patents from Bitboys (I'm not sure if they had already aquired Bitboys back then, but it was close) and so 2xMSAA / Flipquad could have been possible too. But thats a completely different story (except that only the eDRAM chip is involved too)
I don't remember the date, but Bitboys was bought way too late to affect Xenos.

mboeller said:
I used only a normal back buffer:

back buffer: 1280 x 720 x 32bit

Z-Buffer: 1280 x 720 x 32bit x 2 (2x MSAA)

I thought that would be enough.

Thanks
Nope. You need to store a color for each subpixel as you can't resolve until the frame is complete.
--------------------------------------------------------------------------------------------------------------------------------
V3 said:
I think you have to look at it in term of the cost of tiling in term of shader, bandwidth, CPU cycles, etc Vs how many shader units, need to be given up to make room for more eDRAM. They do have limited budget afterall. Going from 10 MB to 14 MB eDRAM might have cost MS another $2 billion for all we know.

But I think in term of reducing cost, eDRAM is the sure way. The way Xbox 360 solved its bandwidth problem is cheaper in the long run compare to PS3. And that will be important when they get to the $150 price point.
--------------------------------------------------------------------------------------------------------------------------------
Fafalada said:
Mintmaster said:
it can't be that bad.
It's not really a question of "that bad" though.
Last gen on PS2, Sony always "recommended" that you do extra work on aliasing issues, especially texture aliasing, which was 100% fixable(ie. you get equivalent to PC GPU mipmapping) if you perform per-polygon inclination correction for mipmaps. The method was perfectly doable, and didn't cost much. In grand scheme of things, there were probably only a handful of titles in total that bothered with it, and I don't know if any 3rd party ever did - and that was on a market controlling platform.
Not to mention that the highest selling 1st party titles were some of the worst offenders in terms of texture IQ - reminds you of something?

IMO it's an issue of development priorities, additional resources/time purely for aliasing fixes just seems to be harder to justify then other stuff (once you're already on the "par" level with most of the market in that regard).
Squeak said:
Fafalada said:
if you perform per-polygon inclination correction for mipmaps. The method was perfectly doable, and didn't cost much. In grand scheme of things, there were probably only a handful of titles in total that bothered with it,
What were those titles? I never remember seeing a PS2 game without MIP map inclination issues?
icon_smile.gif
--------------------------------------------------------------------------------------------------------------------------------
Joshua Luna said:
Shifty Geezer said:
As I understand it, it's just like a CPU. The more you add, the bigger the die, the more faults and the lower the yields. On a 300 million transistor GPU, you could divide the transistor budget between 100 million on 10MB eDRAM and 200 million on shader arrays, or 150 million on 15MB eDRAM and 150 on the ALUs, or any such combination in a balancing act (easy example figures, not accurate!). The reason XB360 hasn't got more eDRAM is price. MS (or ATI?) decided to balance it how they did.
Just to toss it out... another factor is 640x480 with 4xMSAA fits nicely into eDRAM. Obviously tiling was in the plans based on the hardware capabilities (both ROP performance, tiling commands, etc), but I wonder if early on the thinking was more rough in hitting these desired targets:

a.) 480p with 4xMSAA (no tiling)
b.) 720p w/o MSAA (no tiling)
c.) 720p (and above) w/ MSAA (tiling)

(a.) and (b.) being natively possible within the 10MB eDRAM limits, with (c.) being possible with a little work. So a compromise all-around. As the market matured and the hardware came to fruition, market forced came into view (namely HDTV market penetration) MS/ATI began building their marketing bulletpoints as well setting up TRCs for developers.

If you are designing in 2002/2003 to have the chip do a first tapeout in November 2004, it may not have been at all clear the necessity for 720p w/ MSAA as standard.
liolio said:
Interesting note Joshua.

Does somebody has some clue about witch percentage of 360/ps3 owners actually own a HDTV?

It's very possible that even by the end of this gen Hd support will a waste for most of the users.
The sad part is that the SDTV users could have been offer better game on a technical point of view. DVD like resolution would allow 4xAA maybe better textures filtering AF (I don't know where is the limiting factor on this one) more complex pixel shader.


Anyway it seems that MS didn't want to let hd bullet point ot Sony even if tiling is a late solution to match Sony marketting point (ok sony wanted to push BD but Sony is in the same situation as MS in regard to the hdtv penetration).

Joshua, I feel like you usually do very well with pools, do you think that these propositions could be "worse it" for a pool.
the subject would be "do you think that hd gaming was way to anticipated in regard to hdtv adoption" (nb could be something more fancy).
Choices could be :

1)HD gaming is too anticipated even really late in this gen most users (more than a half)will still use sdtv, and I feel like game could have been better.

2)I reckon that HD gaming is too anticipated even really late in this gen most users (more than a half)will still use sdtv, I don't care as I own a HDTV.

3)HD gaming is anticipated but by the end of this gen hd user will come slightly on top in user based (ie manufacturers were right)

I guess questions could be reformulated, but that could give us some answers.

My opinion on this matter is that even by the end of this gen most users will still use SDTV, and the marketting wars will have prevent them to put theirs hands on prettier games.
--------------------------------------------------------------------------------------------------------------------------------
 
All I could find of Page 3:

Mintmaster said:
Fafalada said:
IMO it's an issue of development priorities, additional resources/time purely for aliasing fixes just seems to be harder to justify then other stuff (once you're already on the "par" level with most of the market in that regard).
Right. When I said it's not that bad, I mean that the percentage of a dev budget needed to implement effective tiling is probably miniscule. Your point, on the other hand, is that the percentage of gamers who really care is probably miniscule too. Hence the priority dilemma.
---------------------------------------------------------------------------------------------------------------------------------
ShootMyMonkey said:
Mintmaster said:
Like I said, they really want you to do 4xAA. Once people get two tiles, three is a small step. 14MB and almost nobody does 4xAA. Relaxed TRC and again nobody tiles. I also have a feeling that MS wants tiling to become ubiquitous so that they don't need 64-128MB of EDRAM next gen to avoid tiling.
Somehow, I'm less likely to think that Microsoft would be that concerned about the next hardware iteration in light of the current one. For the most part, they've shown that their concerns are almost always on the here and now (the CPU is a prime example of that). For all we know at this point, 5 years from now, we may not need a huge eDRAM block anyway, and tiling wouldn't have to be something that the developer would have to worry about. As far as the future of PC GPUs is concerned, for the very same reasons you mentioned, I don't foresee big eDRAM blocks coming to the PC anyway, so I wouldn't believe they'd be concerned with getting people used to tiling there either...
Mintmaster said:
it can't be that bad. Thinking about it, it really shouldn't be.
It's not really that bad. It's not good either. And of course, it's low priority. The other thing is that even if you can say that you take only a 5% hit for it, it's really quite often the case that 5% is many times larger than the available margin (and that's again related to the fact that other things stand at higher priority). 1% slower is all it takes to go from 60 to 30.
Mintmaster said:
Your opposition is mostly related to legacy assets and code, right? Where you bunch similar objects together into large groups to avoid renderstate changes and extra draw calls? Didn't this really hurt your frustum culling ability? Smaller pieces would be quite useful even without tiling. As long as your nominal piece size isn't much bigger than 10,000 polys, tiling should have minimal impact on your total poly budget.
It's not that big a problem for us where I currently am. I've seen it become a problem at other places, and even there, batching large jobs wasn't a big deal for frustum culling at all (but then it certainly could have been if they were moving million-poly scenes and all). For us here, it's not that big of a problem because batches aren't that big to begin with what with all the material and shader variance artists put into the scenes these days (and since I've worked on tools that make it all the easier for them to do that, they do go a little wild at times).

For other studios, there are simply a host of problems with multiplatform, not just because you need a transparent API, but because you also want to be able to share assets, and the same batching which is amenable to tiling is never going to be friendly to PS3(I say "never" because bad luck is the perpetual king of the hill), where just 100 redundant verts is as bad as drawing an extra character on the screen (at least, this is how it's treated).
---------------------------------------------------------------------------------------------------------------------------------
Mintmaster said:
ShootMyMonkey said:
For all we know at this point, 5 years from now, we may not need a huge eDRAM block anyway, and tiling wouldn't have to be something that the developer would have to worry about.
Well, I'm just saying it gives them options. Should EDRAM be a big benefit in the future for a console (which IMO is likely), then if devs are comfortable with tiling MS can reduce cost and price of their next console.
It's not really that bad. It's not good either. And of course, it's low priority. The other thing is that even if you can say that you take only a 5% hit for it, it's really quite often the case that 5% is many times larger than the available margin (and that's again related to the fact that other things stand at higher priority). 1% slower is all it takes to go from 60 to 30.
Part of what I was saying is that 4xAA at a slightly lower resolution upscaled looks better than no AA at native res. Reduce resolution 5-10% in each direction and you'll probably get a much bigger perf boost than the cost you're talking about.
For us here, it's not that big of a problem because batches aren't that big to begin with what with all the material and shader variance artists put into the scenes these days (and since I've worked on tools that make it all the easier for them to do that, they do go a little wild at times).
That's the sort of situation I've always envisioned. Just a few things like terrain could be high-poly batches, but I'd expect devs to chop it up decently to avoid sending a crapload of hidden polys to the card.
For other studios, there are simply a host of problems with multiplatform, not just because you need a transparent API, but because you also want to be able to share assets, and the same batching which is amenable to tiling is never going to be friendly to PS3(I say "never" because bad luck is the perpetual king of the hill), where just 100 redundant verts is as bad as drawing an extra character on the screen (at least, this is how it's treated).
I see. That makes a lot of sense. Still, wasn't PS2 quite amenable to small batches and frequent state changes? I always thought this was a common strength of consoles vs. PC parts. Is RSX is a bit of a black sheep in this respect?


In any case, you don't have to have that small batches to make tiling feasible. It's only the really big batches that would cause a notable perf hit. If your largest batch is, say, 5% of your average scene throughput, then I can't imagine tiling having much of a perf hit, as it would be pretty bizarre for a large percentage of batches to cross tile boundaries.
---------------------------------------------------------------------------------------------------------------------------------
Fafalada said:
Joshua Luna said:
If you are designing in 2002/2003 to have the chip do a first tapeout in November 2004, it may not have been at all clear the necessity for 720p w/ MSAA as standard.
I think more likely scenario is that they originally intended to have more eDram but the process/cost didn't pan out. Similar things happened to PS2 and GC which afaik both received cuts to eDram in final stages of console design.
And in case of PS2, it probably even affected long term strategy of the platform - while the chip was designed with HDTV support, memory ultimately didn't live up to it.
Squeak said:
What were those titles? I never remember seeing a PS2 game without MIP map inclination issues
I know Jak2/3 did it, not sure about what others(if any) right now.
Squeak said:
Fafalada said:
I know Jak2/3 did it, not sure about what others(if any) right now.
Ok, but that was kind of cheating, as the camera was always at fixed angle with regards to the ground. I.e. the ground at least, could always be predicted with almost no calculation.
---------------------------------------------------------------------------------------------------------------------------------
ShootMyMonkey said:
Mintmaster said:
Well, I'm just saying it gives them options. Should EDRAM be a big benefit in the future for a console (which IMO is likely), then if devs are comfortable with tiling MS can reduce cost and price of their next console.
I guess.
Mintmaster said:
That's the sort of situation I've always envisioned. Just a few things like terrain could be high-poly batches, but I'd expect devs to chop it up decently to avoid sending a crapload of hidden polys to the card.
Yeah, and even terrain in many cases isn't that huge because people often try to vary materials and detail textures to hide all the texture repetition... that's on top of things like chunking streamable blocks and so on, which I'd expect a lot of people to do.
Mintmaster said:
I see. That makes a lot of sense. Still, wasn't PS2 quite amenable to small batches and frequent state changes? I always thought this was a common strength of consoles vs. PC parts. Is RSX is a bit of a black sheep in this respect?
State changes aren't so much of an issue (it is a console after all). There are simply things you worry about between batches when using SPU vertex processing (I'm not sure what it was like prior to our using that because I would only test with little debug levels where you have less than 50k polys in the entire scene -- that, and stuff which was never meant to be realtime in the first place). With lots of small batches, you end up with little sync drops in the command stream for all the times the SPUs are still busy... those take up a fair bit of time even when you don't actually have to wait (thousands of GPU cycles). Getting rid of those often means making batches of batches (state changes and all included) in larger display lists. Immediate mode rendering is half-dead, I guess you could say.
---------------------------------------------------------------------------------------------------------------------------------
Fafalada said:
Squeak said:
Ok, but that was kind of cheating, as the camera was always at fixed angle with regards to the ground. I.e. the ground at least, could always be predicted with almost no calculation.
Fair enough, but calculation overhead should be pretty minimal for general case anyway, all you need is ratio between triangle screen and UV area (and second could optionally be precomputed). The real 'annoyance' is having additional attribute per output triangle, which means less vertices per VU batch, and more Gif traffic.
Although on the flipside, GS should actually be thankful
clip_image001.gif
---------------------------------------------------------------------------------------------------------------------------------
 
(just stating my opinion :) )
I am more for just adding more RAM.
It doesn't have to be Embedded. If they could just add more RAM at the collateral cost of the Embedded RAM I think it should benefit more in terms of flexibility in RAM use.

Although I prefer a UMA design (less copying AFAIK)
I don't mind having 2 pools of memory since they have their advantages as well (might offer a higher bandwidth)
 
(just stating my opinion :) )
I am more for just adding more RAM.
It doesn't have to be Embedded. If they could just add more RAM at the collateral cost of the Embedded RAM I think it should benefit more in terms of flexibility in RAM use.

Although I prefer a UMA design (less copying AFAIK)
I don't mind having 2 pools of memory since they have their advantages as well (might offer a higher bandwidth)
First thanks to the mods to save from the abyss some really interesting posts!!!

I posted something about it Lunchbox (witch got delete but anyway not worse it in regard to the saved posts ;) ).

Ms has designed the 360 with costs reductions in mind.
So I guess they've spent quiet some time the pro/con of :
128bits bus + edram
256bits bus no edram

One can really wonder if that was the way to go, but it could prove difficult to prove it.

May be the 360 could have do well with 44.8GB/s of bandwidth, as the ps3 do well with almost the same bandwidth available, and more it would have allowed for way better use of xenos special functions.
But who knows I think (again) that MS thank about it.

But there's an idea in my mind that prove difficult to me to dismiss :
Was MS really aiming for HD rendering?
especially as without Reign efforts MS seemed to want only 256MB of Ram.
 
I'd say that with shader calculations rapidly growing and 4x AA mostly 'enough' for the wide market the best way would be to remove any EDRAM altogether, dedicate these transistors to GPU logic and make at least 256-bit external memory bus to it.

But considering that wide memory buses isn't very cost-effective for a console hardware (they don't get cheaper to produce with time) i guess that EDRAM will still be relevant in some way or form.

Just my 2c :)
 
The memory bus may not have to be widened. The memory has gotten a whole lot faster. Of course all that bandwidth could better served for other things. So EDRAM could still serve a purpose. Something I wonder about is if it would be better to separate the memory controller from the GPU or even have two+ memory controllers that can talk to the CPU directly without the GPU's involvment.
 
I'd say that with shader calculations rapidly growing and 4x AA mostly 'enough' for the wide market the best way would be to remove any EDRAM altogether, dedicate these transistors to GPU logic and make at least 256-bit external memory bus to it.
4xAA is enough, IMO, but we want 1080p and extra detail on top of today's games. Things like grass and leaves look great when you have fillrate to burn.

I think devs can make good use of 3-4x the fillrate of 360 for next gen, i.e. 1TB/s uncompressed. A 256-bit bus is not close to enough for that, and you want to save most of that BW for texturing and CPU usage (the latter only applying in the case of a UMA, of course).

Remember that RAM speed does not grow anywhere near as fast as computational speed, and much wider buses are tough to pull off for a system that you want to scale below $150.

If flip-chip bonding technology improves to the point that we can fit lots of pads in a small area and don't waste gobs of die space, then yes, a wide bus is definately the right way to go. I just don't see such a technology developing and becoming suitable for high volume, low cost applications in the time we have left.
 
First thanks to the mods to save from the abyss some really interesting posts!!!

I posted something about it Lunchbox (witch got delete but anyway not worse it in regard to the saved posts ;) ).

Ms has designed the 360 with costs reductions in mind.
So I guess they've spent quiet some time the pro/con of :
128bits bus + edram
256bits bus no edram

One can really wonder if that was the way to go, but it could prove difficult to prove it.

May be the 360 could have do well with 44.8GB/s of bandwidth, as the ps3 do well with almost the same bandwidth available, and more it would have allowed for way better use of xenos special functions.
But who knows I think (again) that MS thank about it.

But there's an idea in my mind that prove difficult to me to dismiss :
Was MS really aiming for HD rendering?
especially as without Reign efforts MS seemed to want only 256MB of Ram.

very true!
it seems maybe the reason why embedded RAM is at 10MB is because MS was aiming for 256MB in the first place.

so tiling is more of an afterthought???
 
so tiling is more of an afterthought???
I doubt it. HD resolutions were obviously expected this gen, and I think Xenos has special features devoted to tiling as well, like determining screen bounds on an object in hardware and streaming that info back to the API.
 
I doubt it. HD resolutions were obviously expected this gen, and I think Xenos has special features devoted to tiling as well, like determining screen bounds on an object in hardware and streaming that info back to the API.
OK really interesting, this takes away some of my doubt ;)

I have a question that some could maybe answer in this thread.

Gt5 prologue run @ 1280x1080 with 2xAA
and @1280x720 with 4xAA, (Info coming from Quaz51 and its really interesting calculation and fisrtly from One investigations "à tout seigneur tout honneur").

How come devs on the ps3 manage to do this with 22.4 GB/s of bandwidth (the bandwidth requirements seem huge even with compression without any compression MS thank that 256GB/s was needed for 4xAA @720p)?
 
The bandwidth requirements consider overdraw/particles/fillrate into account. For a racing game, I wouldn't expect that to be much of an issue.
 
OK really interesting, this takes away some of my doubt ;)

I have a question that some could maybe answer in this thread.

Gt5 prologue run @ 1280x1080 with 2xAA
and @1280x720 with 4xAA, (Info coming from Quaz51 and its really interesting calculation and fisrtly from One investigations "à tout seigneur tout honneur").

How come devs on the ps3 manage to do this with 22.4 GB/s of bandwidth (the bandwidth requirements seem huge even with compression without any compression MS thank that 256GB/s was needed for 4xAA @720p)?

I think it also has to do with the type of compression techniques that are used for the memory system. reminds me how a Geforce4 completly crushed a Matrox Parhelia GPU eventhough the PArhelia had more than twice the VRAM bandwidth (~10GB/s vs ~20GB/s).

http://en.wikipedia.org/wiki/Matrox_Parhelia
Parhelia was also crippled by poor bandwidth saving technologies, while ATI had their 3rd generation HyperZ in Radeon 9700 and NVIDIA had their Lightning Memory Architecture 2 in GeForce 4. So, while the Parhelia had formidable memory bandwidth, much of it was wasted because the card didn't have the ability to efficiently prevent overdraw or compress z-buffer data, among other inefficiencies. Parhelia was also believed to have a crippled triangle-setup engine that starved the rest of the chip in typical 3D rendering tasks.

But PS3 has 2 memory buses a 22.4GB/s although I dont know aboutthe penalties for using both buses for graphics. neverthless was not the EDRAM supposed to do more than just AA? :smile:
 
Gt5 prologue run @ 1280x1080 with 2xAA
and @1280x720 with 4xAA, (Info coming from Quaz51 and its really interesting calculation and fisrtly from One investigations "à tout seigneur tout honneur").

How come devs on the ps3 manage to do this with 22.4 GB/s of bandwidth (the bandwidth requirements seem huge even with compression without any compression MS thank that 256GB/s was needed for 4xAA @720p)?

Excuse me, but do you know what you are talking about? What does "seem huge"?

1280 (width) * 720 (height) * 4 (samples) * 8 (bytes, ARGB+Z/stencil) * 60 (fps) = 1.64 GB = 22.4 / 13.65

If there was no bandwidth losses due to alpha-blending, Z-compare etc., (and no gains due to color and Z compression, of course), this means at 1280x720x4xaa you can paint the screen 13.65 times and still fit in 60 fps. Great as the GT5 shots are, they don't look like they do complex multipass stuff.

I don't think MS have claimed that 256 GB/s is *needed* for 4xAA @ 720p; this is simply the theoretical number for their hardware.
 
Excuse me, but do you know what you are talking about? What does "seem huge"?

1280 (width) * 720 (height) * 4 (samples) * 8 (bytes, ARGB+Z/stencil) * 60 (fps) = 1.64 GB = 22.4 / 13.65

If there was no bandwidth losses due to alpha-blending, Z-compare etc., (and no gains due to color and Z compression, of course), this means at 1280x720x4xaa you can paint the screen 13.65 times and still fit in 60 fps. Great as the GT5 shots are, they don't look like they do complex multipass stuff.

I don't think MS have claimed that 256 GB/s is *needed* for 4xAA @ 720p; this is simply the theoretical number for their hardware.

I don't know that much so that's why I asked ;)

Anyway I thank you and the others!
As i said in the "upscaling"thread my goal was just to understand not to start a flame war or question Quaz51 conclusions.
Just wandering ;)
 
Back
Top