Can the RSX framebuffer be split between GDDR3 and XDR RAM?

Fafalada said:
They had 360 kits before PS3 ones (obviously they no longer do now).
Besides, guys like nAo, Deano and a few others here make it a point to keep informed about what technologies are out there, even if we don't work with them directly.
Sticking your head in the sand is never beneficial in this industry (though I've seen people still do it reardless).

Fafalada, you quoted the wrong guy. spdistro made the original statement. :(

_phil_ said:
it's hm.. pretty easy to get full X360 documentation...You don't have to be an official x360 dev to get some curiosity satisfied...

Granted, I am interested in XB360 hands-on experiences because it's difficult to know some of the implications without actually doing it.

PS3 is one of the few systems where the developer has full control for the entire run-time. It's relatively easier (for me) to have an intuitive feel about Cell's real world performance without coding it. Too bad RSX remains a mystery since access is limited.

XB360 is more interesting to profile due to its UMA architecture and unified shaders. I have no doubt that Xenos is fast. I'm keen to find out how all the various (good and not-so-great) pieces fit "end-to-end" in practice to maximize its efficiency.
 
Last edited by a moderator:
scificube said:
My questions is this. As I take it using ALU op heavy shaders is something that works well on Xenos, is it your opinion that doing the same on RSX is not and why do you feel that way if you do? Or could you elaborate why it may not be such a bad thing on RSX but is significantly better on Xenos as far as performance in the end goes?

Although I worked a lot on NV4X and G7X architecture during the development of BW2, my experience with the RSX has been very limited so far, so I'm not in the position to give you a good answer on this and make good comparisons between the two GPUs, when it comes to ALU heavy shaders.

Nao or Faf can answer this much better than me.

What I can say is that my current post processing pipeline is an evolution of BW2 HDR pipeline and it tends to run "sensibly" faster on 360 than on a high-end PC with a G71. But on 360 I use a fp10 framebuffer and filterable F16 textures in input, while on PC it's full fp16 thus consuming more bandwidht. We obviously can't directly compare a PC with a G71 to a PS3 with the RSX.

I can also add that when it comes to post processing and HDR rendering, the R500 offers a very neat support with a 32bit HDR formatp (fp10) handled natively by the GPU and a filterable fp16 format that can be used in input. Fp10 doesn't require any shader instruction to encode/decode the HDR color, it just works. It's all handled by the GPU and during the resolve operation you can apply a constant exponent to further scale all your colors for free. Very handy.

Fran/Fable2
 
Fran said:
Although I worked a lot on NV4X and G7X architecture during the development of BW2, my experience with the RSX has been very limited so far, so I'm not in the position to give you a good answer on this and make good comparisons between the two GPUs, when it comes to ALU heavy shaders.

Nao or Faf can answer this much better than me.

What I can say is that my current post processing pipeline is an evolution of BW2 HDR pipeline and it tends to run "sensibly" faster on 360 than on a high-end PC with a G71. But on 360 I use a fp10 framebuffer and filterable F16 textures in input, while on PC it's full fp16 thus consuming more bandwidht. We obviously can't directly compare a PC with a G71 to a PS3 with the RSX.

I can also add that when it comes to post processing and HDR rendering, the R500 offers a very neat support with a 32bit HDR formatp (fp10) handled natively by the GPU and a filterable fp16 format that can be used in input. Fp10 doesn't require any shader instruction to encode/decode the HDR color, it just works. It's all handled by the GPU and during the resolve operation you can apply a constant exponent to further scale all your colors for free. Very handy.

Fran/Fable2

Thankyou for your thoughts :)

The FP10 format does indeed seem a very nice thing to have as a feature with Xenos as of course it's exactly like you describe in being straight forward to use and is less expensive than using FP16. I do of course also appreciate nAo's contribution to HDR on the PS3 with the NAO32 format. True it has a cost but for what you get in exhange of a few shader ops is well worth it to me and well...Ninja theory too it seems!

Would you care to comment on the merit of using NAO32 on the Xenos? Would it not be of benefit to use a less costly but effective format for say opague geometry and then a more costly format such as FP16 for blending or would you prefer to use FP10 all the way through?

Have you encountered any IQ problems with FP10 given it doesn't have the same range as say FP16?

Would you care to comment on whether a better color space in general is needed for HDR that gives preference to luminosity for instance?

It's more than undestandable if you elect no to answer any of these questions but it's hard to resist asking given it's rare X360 devs speak up at all around Beyond3D if they're here.
 
Last edited by a moderator:
scificube said:
Would you care to comment on the merit of using NAO32 on the Xenos? Would it not be of benefit to use a lesser costly but effective format for say opague geometry and then a more costly format such as FP16 for blending or would you prefer to use FP10 all the way through?

Having a native support for fp10, I can't see a compelling reason to use anything different at the moment. I tend to favor simplicity as a design goal unless very specific and clear needs ask for something else. At the moment fp10 is perfectly fine for what I need and very very simple to use. Rendering to fp10 is full speed with no blending or with additive blending, and half speed with other blending modes. In the last situation (high precision blending) the color from the framebuffer is automatically expanded by the GPU to fp16, blending is computed at fp16, and the result is written as fp10, effectively doing something similiar to what you suggest.

Have you encountered any IQ problems with FP10 given it doesn't have the same range as say FP16?

Yes, the gradient coming out from atmospheric scattering solutions in the sky renderer tends to suffer from the limited precision, causing some banding.

Would you care to comment on whether a better color space in general is needed for HDR that gives preference to luminosity for instance?

I think it depends heavily on the scene you are trying to render. After all HDR rendering is an artistic process and requires a lot of input from the artists: i don't think anything like the "best format" or the "best tonemapper" exist. At the moment our artists don't need extra precision for luminance. Maybe the sky would benefit from it tho.
 
Fran said:
the last situation (high precision blending) the color from the framebuffer is automatically expanded by the GPU to fp16, blending is computed at fp16, and the result is written as fp10, effectively doing something similiar to what you suggest.

Actually, it's not my suggestion as nAo/DeanoC here at beyond 3D spoke to what I was saying in a previous discussion about NAO32 vs. straight FP16 rendering. Just want to be clear on that because it is them not I where these ideas originate here on Beyond3D.

I undestand why anyone would like simplicity in implementation if they could have it so I get where you're coming from. There is of course a difference between "need to" and "can do".

What you speak of in the quoted section above seems a bit different than what I was speaking about. You describe something the VPU does automatically where as what I was suggesting was something more explicit. For instance, using an FP16 buffer for alpha blended transparencies and what not and then a less costly format for your buffer (int8,rgb8,fp10,whatever) for what is completely opage in the scene. This way you get the results you want and gain back some in the way of performance as FP16 isn't used for everything even it it's not needed.

What you describe seemingly doesn't save on how much work you have to do with FP16 but rather how large your buffer is after you've finished blending...a good thing consideringly the eDram is a finite resource.

If I misunderstand you, please correct me :)

If not there's no need for you to opine any further as you've already stated you value simplicity here and how things are fine enough for what you're doing right now.

Oh...and thankyou again for stopping in to chat with us :)
 
Last edited by a moderator:
Fran said:
Yes, the gradient coming out from atmospheric scattering solutions in the sky renderer tends to suffer from the limited precision, causing some banding.
Would it be reasonable to add some sort of noise or dithering to rearrange the banding enough to make it imperceptible?
 
Alstrong said:
hm.. ok, maybe this was answered in the jungle of technobabble I don't understand above:

Q: What (reasonable) hardware features/extensions to eDRAM would developers/you want for assisting post-processing?
Don't worry, it wasn't answered.

PS2 was one example where the eDRAM could be used like addressable memory. You could texture from it, and it had very high bandwidth, especially for the time. The main reason is that it was not on a separate chip, so there were no design restrictions.

With Xenos, there's only a 32GB/s connection between the chips. To keep everything simple, there are very limited types of data transfer going on between them. Pixels are streamed in 2x2 packets, and the data transfer is unidirectional. Copying the framebuffer out to GDDR3 is a simple unidirectional transfer also achieved with a single command from Xenos for which latency is completely irrelevent. To texture from eDRAM while rendering you need random access with decent latency, the ability to interleave read and write commands, more complicated interface logic, a connection to the memory controller which also needs to handle higher throughput, etc.

As I said before, if the texture cache is working properly then bandwidth might not even be a problem for post-processing, especially with these long shaders that nAo and Fran are talking about. I can understand why XB360 didn't go the PS2 route.
 
Mintmaster said:
Don't worry, it wasn't answered.

...

Thank you. :)

hm... you make it sound like the addressable eDRAM is something we won't see again. Or... how about having two different eDRAM pools? :???: But I guess that defeats the purpose (post-processing)...


(I'd spread the green, but I need a wider target range apparently. ;) )
 
Fran, thanks for sharing ! :D

If you were to write a game to showcase next-gen gaming (ignoring budget and time constraints), what would you do ? How would it change your current pipeline ?
 
scificube said:
What you speak of in the quoted section above seems a bit different than what I was speaking about. You describe something the VPU does automatically where as what I was suggesting was something more explicit. For instance, using an FP16 buffer for alpha blended transparencies and what not and then a less costly format for your buffer (int8,rgb8,fp10,whatever) for what is completely opage in the scene. This way you get the results you want and gain back some in the way of performance as FP16 isn't used for everything even it it's not needed.

What you describe seemingly doesn't save on how much work you have to do with FP16 but rather how large your buffer is after you've finished blending...a good thing consideringly the eDram is a finite resource.

If I misunderstand you, please correct me :)

You are correct. The difference is that what you speak about is somehow handled automatically by the GPU when fp10 render target and high precision blending are used. Another difference is that the result of the blending operation is stored as fp10 loosing some precision. It's a drawback for the simplicity of having it handled automatically.

Sky dithering: yes, it's a solution to minimize the problem.

patsu said:
Fran, thanks for sharing ! :D

If you were to write a game to showcase next-gen gaming (ignoring budget and time constraints), what would you do ? How would it change your current pipeline ?

If I was to write a game ignoring budget and time constraints, development would last forever :)
So we can't ignore them but it's not that bad after all: to showcase next-gen, I would write exactly the tech the artists ask for and as simple as possible to be used effectively by them. After all they are the bright and creative people who make things shine.
 
Dave Baumann said:
I know Deanos early HS development and discussion were based on Radeon 9800 and X800 but I don't recall anything centering around final dev kits. Do you know if they did get those?
We only got to play with Alphas (Mac with standard Radeon 9800 in), however we did have docs.
However its worth noting that its someone like Marco's job to be able to extrapolate from docs, in almost every case things only get worse from the theoritical docs/hardware specs almost never better. So being able to take a doc and work how to solve a problem is what we do...

For the record both chips are very powerful, just in different ways. Which makes cross-platform stuff hard but exclusives on both platforms easy.
The entire pipeline would be radically different between the two platforms because the bottleneck is likely to be in a different place.
 
DeanoC said:
For the record both chips are very powerful, just in different ways. Which makes cross-platform stuff hard but exclusives on both platforms easy.
I guess from NT's POV, becoming a PS3 exclusive was a huge boon? Because you can focus on the pne platform, not splitting resources and not struggling to make the most of two or three.
 
DeanoC said:
We only got to play with Alphas (Mac with standard Radeon 9800 in), however we did have docs.
However its worth noting that its someone like Marco's job to be able to extrapolate from docs, in almost every case things only get worse from the theoritical docs/hardware specs almost never better. So being able to take a doc and work how to solve a problem is what we do...
Absolutely. And I'm sure thats exactly what Microsofts developers are doing as well. But nothing can replace practical experience, can it?
 
Fran said:
If I was to write a game ignoring budget and time constraints, development would last forever :)
So we can't ignore them but it's not that bad after all: to showcase next-gen, I would write exactly the tech the artists ask for and as simple as possible to be used effectively by them. After all they are the bright and creative people who make things shine.

Fair enough. I was hoping for more forward looking, exploration, dream-up possibility type answers ;-)

DeanoC said:
However its worth noting that its someone like Marco's job to be able to extrapolate from docs, in almost every case things only get worse from the theoritical docs/hardware specs almost never better. So being able to take a doc and work how to solve a problem is what we do...

For the record both chips are very powerful, just in different ways. Which makes cross-platform stuff hard but exclusives on both platforms easy.
The entire pipeline would be radically different between the two platforms because the bottleneck is likely to be in a different place.

Definitely. It's just that an XB360 developer would be staring at Xenos' inherent potentials (or problems) long enough to want to do something about it... kinda like how Marco invented NAO32 ? :)

That said, I did not find anything unfair or inaccurate about Marco's "framebuffer effect" comment in the original interview. It's an excellent article.
 
Dave Baumann said:
Absolutely. And I'm sure thats exactly what Microsofts developers are doing as well. But nothing can replace practical experience, can it?

Indeed so. We programmers are historically very bad at guessing bottlenecks in what we write without actually profiling it. Hence the mantra: "make it work, make it nice, make it fast (after profiling)". I think practical experience is priceless.
 
Dave Baumann said:
Absolutely. And I'm sure thats exactly what Microsofts developers are doing as well. But nothing can replace practical experience, can it?
I should hope so :) Me I'd be doing some crazy shit with memexport if I had a chance...


But given an inherit design choice thats explained in a doc, you would simple choose not to do somethings on one platform. I wouldn't think of faking memexport on RSX and wouldn't use an algorithm that uses both busses as R/W on R500.
 
Great interview; kudos to all involved. :D

And, en passant thread browsing comment, welcome to Fran --don't be a stranger, y'hear?

I understand it's a console site where the interview appeared, but one small knuckle-rap to Marco for not addressing part of a question:

PSINext: The advantages of reduced space and preserved quality seem like they would have merit in a number of environments. Do you think there may be a place for NAO32 on the desktop, or even on Microsoft or Nintendo's offerings?

Marco: Dunno about Nintendo's offering, but it might have merit on Microsoft's console if developers wanted something that takes the same storage space as an FP10 render target, but with a much higher level of quality. NAO32 on Xenos would cost developers shading power relative to FP10, however, and they would lose the ability to use the eDRAM for blending as well. So at this time, I believe something like NAO32 makes more sense on RSX than on Xenos.

The question actually focuses on desktop, with an "or even" bringing competing consoles into the picture as almost an afterthought. . .but the answer focuses on the add-on and doesn't address the main question re desktop at all.

Yes, this would be desktop bigot (that would be me) concern. :p
 
geo said:
Great interview; kudos to all involved. :D

And, en passant thread browsing comment, welcome to Fran --don't be a stranger, y'hear?

I understand it's a console site where the interview appeared, but one small knuckle-rap to Marco for not addressing part of a question:

The question actually focuses on desktop, with an "or even" bringing competing consoles into the picture as almost an afterthought. . .but the answer focuses on the add-on and doesn't address the main question re desktop at all.

Yes, this would be desktop bigot (that would be me) concern. :p

I don't know what Nao/Marco thinks about it, but having tried a similar colour space (I'm not 100% sure what Marco does to CIE before it becomes NAO32, but we just used the ordinary CIE colour space and tweaked the values to make the most of the 8-bit range of the framebuffer) I'll give my tuppence worth...

I think it could offer advantages on any environment where FP16 isn't close to being "free" when compared to 8-bit. It offers FP16-like quality at the bandwidth and storage cost of int8, and with the ability to leverage any 8-bit specific features (such as AA) for a particular chipset. What you lose, is blending... which may or may not be a serious problem.

So it's a trade-off, as using it introduces a bunch of new problems to address, and it would be application dependant as to whether it was a outright win or not. However in many cases, much geometry is opaque and blending can (and usually is) done as a later pass, enabling the best of both worlds to be employed.

Were I writing a PC application right now, I'd probably go with a solution like this.
 
Back
Top