Playstation 3 RSX Graphics is NV47 Based

Discussion in 'Beyond3D News' started by Dave Baumann, Mar 30, 2006.

Thread Status:
Not open for further replies.
  1. predicate

    Newcomer

    Joined:
    Jan 6, 2006
    Messages:
    128
    Likes Received:
    2
    I'm sorry, I just can't view this as anything but ball-fumbling on behalf of Sony. If they're releasing a year later with hardware not only less powerful, but potentially up to half as powerful because of a single bottleneck, then they've fumbled as far as I'm concerned. And I think average Joe consumer will notice that, if 3rd Generation 360 games look substantially better than anything PS3 can put out.
     
  2. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,909
    Likes Received:
    8
    The "only" part hey?

    Procedural textures says hi. Good thing you don't develop any PS3 games to make use of only 10% of FlexIO... I'm sure there are devs with better imaginations...

    Your opinion. I'll wait for final RSX details before making stealth Xenos > 7800 GTX > RSX comments...
     
  3. BenSkywalker

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    823
    Likes Received:
    5
    So you are saying that the 7600GT should have free, no- not free- but actually improve performance in bandwidth limited situations by simply increasing the bandwidth demands because of their compression implementation? This is in stark contrast to the performance hit that ATi's parts take- http://www.beyond3d.com/reviews/ati/r580/index.php?p=10. It seems that you are saying that the PS3 should be able to enable 2x MSAA with less of a performance hit then Xenos- clearly eDRAM was a poor choice if they just needed to improve their compression tecniques a bit.

    Cell can access GDDR3 also, and bandwidth utilization from GDDR3 to Cell isn't going to be too high in terms of graphics load(Cell would likely be better off storing certain types of data in GDDR3 for use due to reduced latency). There are a lot of differences between coding for a PC and a console- when utilizing a console you do whatever will work. Polyphony streams texture data off of an optical drive for use in real time in a game already released(GT4 Nurburging track). Implying they won't store texture data in one pool of RAM because somehow it will make the other pool of RAM useless doesn't make any sense.

    GT4 looks comparable to Forza- one running on a rasterizer that in many ways is inferior to a Voodoo1 while the other is running a fully shader capable GPU. The devs that think in a PC centric way get slaughtered in the console market.

    You slice up the the memory on both sides- mem that requires frequent read/writes you store in GDDR3 and dump the rest in XDR. We also still don't know the amount of cache on chip for RSX which could change things considerably and given the hypothetical layout that we are discussing here if nothing else it would imply that the PS3 would be vastly superior at utilizing AF compared to Xenos- to date this is a major stumbling point for the 360 and gives the appearance of a severe weakness. The heavy angle dependancy and the very poor filtering in general(looks no better then 4x at best with 0x being applied frequently) are certainly a major issue that shouldn't have been overlooked. Maybe this will improve with later titles, but I own all of the supposedly best looking titles for the 360 to date and they all have the same problem(Kameo it is likely the least bothersome, Oblivion the most).

    So far we haven't seen anything that indicates that the 360 can compete with what the 7600GT is capable of in a theoretical sense- of course this is due to devs not having a lot of hands on time with final hardware(in relative terms) and the same will be true of the PS3. One thing is certain, if 360 games are not looking a lot better then the PS3 launch titles when it hits then all of the talk of Xenos's superiority will be an amusing memory.
     
  4. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
    :?: How do you derive anything about Xenos's fillrate with MSAA from any of those figures? All the MSAA samples are produced on the daughter die, which has enough bandwidth to support them uncompressed. Xenos produces 4 MSAA colour samples per cycle, whereas all desktop graphics are limited to 2, and maintains its double Z in all AA operations (i.e. 8 Z samples per cycle with 4x MSAA) which no other part does except X1600 (but, again, that's limited to two cycle AA for 4x AA so the maximum number of Z samples produced per cycle is 4).
     
  5. BenSkywalker

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    823
    Likes Received:
    5
    When did I say anything about Xenos's fillrate? I was talking about this incredible compression technique that nVidia is supposed to have in the 7600GT that allows it to perform better when 2x MSAA is enabled versus no AA in an entirely bandwidth limited situation. If this were in fact reality- it would have been a much better utilization of resources to go with this incredible compression technique instead of utilizing eDRAM. The only way this wouldn't be obvious to you is if nVidia didn't actually have a compression technique that made bandwidth limitations go down by increasing the amount of bandwidth required. My comment about relative performance is assuming that ATi was accurate in their assertion for the overhead of AA due to tiling(5% IIRC) versus the increase in performance the provided numbers demonstrate for the 7600GT.
     
  6. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
    Bear in mind the test is a synthetic one, and in no way represents a real world scenario. The method by which it is rendering is fairly optimal for pixel compression purposes in this case. Bear in mind as well, that this is just for 2x AA - only Xenos, at the moment, produces 4x at full speed, everything else halves the fillrate for 4x (and G7x also quarters the Z fill).
     
  7. BenSkywalker

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    823
    Likes Received:
    5
    Taking all of that into consideration the fact is that it is increasing its performance when the bandwidth requirements are going up at least a marginal amount. If the performance were to stay exactly the same then it would certainly be reasonable to assume a perfect compression match could produce ideal numbers, but with it showing an increase there has to be another factor influencing the results. Clearly the 7600GT is not entirely bandwidth limited in that particular test.
     
  8. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    BenSkywalker, I have no idea what you're going on about here. The 4% increase with 2xAA? Big deal. That could be a memory controller tuned for 2xAA access patterns.

    That is an ideal test. Fullscreen quads will compress just about perfectly, and give absolutely no vertex shader or setup bottleneck. There are no textures. There are no pixel shaders. This test tells us an upper bound on real-world fillrate. My point is 2.9 GPix/s is the best case for PS3. Bandwidth is the only possibility for not reaching the rate suggested by its core. I don't know which program Dave used, but you can ask him and get someone with a 7600GT to over/underclock the mem if you're still not convinced.

    In any real-world situation, the 7600GT will show a decrease in performance with 2xAA, and especially with 4xAA.

    If the frequent reads/writes are in GDDR3 (which I agree with), then how is an appreciable amount of the bandwidth load going to be with XDR? Answer: It won't be.

    The thing you have to understand about cache is it helps you reach your theoretical rate, not surpass it, except possibly for small textures in repeat addressing mode (not exactly the hallmark of cutting edge graphics). In a GPU, cache is not a substitute for bandwidth; rather, it just helps you avoid wasting it. G71 does not waste very much as it is.

    For AF, you're absolutely right. The texturing rate of RSX is superior to Xenos, and you'd think it would be better at AF. The only thing keeping me from declaring outright victory for RSX in this matter is the ability of ATI's X1K series to keep up with NVida with a heavy texturing rate deficit, with or without AF (X1600 vs. 6600GT, X1900 vs. 7900). Mike Houston (of GPGPU fame at Stanford) has shown that ATI's architectures have always been better at absorbing extra latency for texture fetches, so this could be the reason. As for current games lacking AF, I don't think we can declare it a hardware problem yet. Console developers have never worried about AF in the past. The vast majority of console gamers don't know what AF is, and don't seem to care much. I doubt AF is much of a priority right now, but hopefully that'll change. Nonetheless, the possibility remains that you're right and this is a hardware limitation.

    I disagree. For example, I doubt two equally skilled developers working on a 7600GT and a 7800GT (which is ~40% faster in tests) would result in graphics that look "a lot better" on the 7800GT. Comparing visual quality at the same framerate is much different than comparing framerate with identical workloads. Like I said before, these are closed platforms, so we can't do apples to apples tests. Not even the best graphics expert in the world can simply look at the output of a game and quantify the GPU's power within a factor of 1.5. It's all about the software, i.e. coder and artist talent. The hardware is close enough that it'll get lost in the huge variation of developer capability for different ISV's.
     
  9. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Jaws, why do you enter arguments with me about 3D graphics hardware and software when you have no experience in either?

    Procedurally generated textures by CELL will be stored in memory - either XDR or GDDR3 - if they are to be used by RSX. First of all, RSX can't make a memory request for a block of texels and have Cell then reinterpret that request in a way to generate texels on the fly to fulfill that request. Secondly, even if this was possible, computationally CELL cannot procedurally generate texels at a rate that would come even close to the speed at which you can simply load texels from XDR. Finally, procedural textures which save bandwidth (usually they're used to avoid repetition) have never been even within an order of magnitude as fast as loading normal textures, so you will degrade performance no matter what.

    What pisses me off most is the condescending way in which you said this. You're clearly attempting to point out that I am incapable of considering the most obvious answers. I did think of procedural textures, but immediately dismissed them because they will never come close to saving you rendering time. You don't know anything about procedural texture generation, so WTF? If you were curious, a simple question like "what about procedural textures?" would suffice.
    Therein lies the problem. Final RSX details won't do jack for you in making that judgement. Regardless, you and millions of others think paper numbers are all you need.
     
    Acert93, Jawed and nelg like this.
  10. predicate

    Newcomer

    Joined:
    Jan 6, 2006
    Messages:
    128
    Likes Received:
    2
    I am the first to admit that I know precisely diddly squat about working on graphics, so my question is:
    Where does the volumetric ray-traced clouds of WarHawk fit into the picture?
    Ultimately, isn't it the games that will give the best suggestion?
     
    #90 predicate, Apr 10, 2006
    Last edited by a moderator: Apr 10, 2006
  11. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    First of all, we're only talking about the GPU. Secondly, we've already known that it's "potentially" even slower than half. The 7600GT, bandwidth restricted, can only get 1.5Gpix/s fillrate with 4xAA (color+z). Throw in alpha blending, and it'll probably halve again (no hard data, unfortunately) down to under 1GPix/s. Xenos gets 4GPix/s regardless of the situation.

    This is just one aspect of rendeing, though. As BenSkyWalker pointed out, texturing speed is superior on RSX (when bandwidth isn't an issue which could happen in a compressed-texture heavy shader). Overall, RSX won't be only half as powerful. RSX just about doubles the 7600GT everywhere, except bandwidth stays the same. That doesn't mean performance stays the same, it just means it won't double.

    Just look at the 7600GT vs. the 6800GS. You'd expect the 7600GT to be 31% faster than the 6800GS (it has 2.6x the MADD rate too) by looking at the core's capability, but the reduced bandwidth keeps it only ~10-15% faster in the B3D review. Nonetheless, it didn't drop in performance overall due to 30% less bandwidth.

    Does anyone here have a G71? We should be able to clock the memory at 350MHz (700MHz DDR) with RivaTuner and the core at 550MHz. 47% less bandwidth, 22% higher core compared to the 7900GT (which is ~20% faster than the 7800GT). Then we can see the impact on performance. My guess is it'll slow down a bit, maybe 10%.
     
  12. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    This is very hard to assess. First of all, we don't know how it's being ray traced. We don't know what sort of simulation is being done. Most importantly, we don't know how it's being composited into the final image.

    I can make a guess that CELL is outputting the ray-tracing into a buffer in XDR that can be treated as a texture, and while it's doing this RSX is rendering the rest of the scene with CELL also running the game code. For the sake of making bandwidth as large as possible, assume the result is a 4-channel FP32 texture that fills the screen (14.7MB) and is generated at 60fps. When compositing this texture on the rest of the scene, it can hopefully be transferred at a rate of 20GB/s through FlexIO, but it still only adds up to 0.9GB/s overall.

    Games is what matters in the end, yes. That's why a less than optimal solution doesn't make much difference in the console world. However, games won't give you a very good suggestion about a system's power, because software makes a bigger difference than hardware, as I mentioned earlier.
     
  13. BenSkywalker

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    823
    Likes Received:
    5
    With a better then 50% increase in effectiveness? There has to be another factor involved. The same factors that make it an ideal candidate for optimal compression with AA apply to non AA performance also.

    Non modified texture data can be streamed via XDR.

    It helps mask latency, with a mem controller tuned for accessing XDR for texture data given the pattern of mem access a reasonable cache will allow them to offload basic texture data to XDR and free up considerable amounts of bandwidth for GDDR3.

    I would strongly disagree with this. The eDRAM of Xenos is in essence cache. Obviously we are not going to see anything near that size on RSX, but it certainly can help you exceed your theoretical bandwidth peak of your local RAM if there is enough of it.

    Nor will they ever need to know what it is. When we start seeing some cross platform titles and gamers are watching them run side by side at their local EB if the PS3 has 16x AF running what they are going to see is how much more crisp the PS3 looks. They likely will attribute it to the PS3's capability of running at higher resolutions(even though they will almost certainly both be running on 720p displays), but they will see the sizeable benefit of it no matter if they know what it is or not.

    But give the dev an extra year working with the 7800GT and you will see a very large rift in performance. That is what we are supposedly talking about if Xenos ends up having an edge over RSX.
     
  14. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,909
    Likes Received:
    8
    I'm flattered that you single me out. You wanna take that chip off your shoulder before it crushes you and prevents you from typing. Claiming that you've worked for ATI doesn't give some infallible authority and a free pass to post without contention. And you certainly don't need that experience to smell BS...

    And they will NEED to pass through FlexIO at some stage for RSX to use. So vertex data is NOT the "only part"...

    Your opinion and you don't know for a *fact* what performance hit will be incurred. The point being that CELL will excel at procedural texture generation and in order for RSX to consume this, FlexIO bandwidth *WILL* be used and so vertex data is not the "only part" like you claimed.

    Play down FlexIO, but there *WILL* be better devs than you with more that a PC-centric view, to take advantage of that b/w, especially in a closed console environment...

    Nah, expecting 10% FlexIO usage, yet expecting RSX to be b/w limited is laughable...

    Oh and I was pointing out that NV, Sony, Rambus engineers > you, and PS3 devs > you, because of this...

    Maybe if you put a qualifier like "IMO", then perhaps I would've asked the question differently, instead of claiming authority by working for ATI, and then subsequently saying,

    "I'm not spreading BS here."... or "Texturing needs direct access to memory, so the link speed is meaningless."... or " The only part of the 3D software pipeline that Cell can generate dynamic data for and feed directly to RSX via FlexIO is vertex data,"...

    ... and the final, stealthy and apologetic, Xenos>7800GTX>RSX... etc, etc...

    Perhaps not everyone is as presumptuous as you and need more that a few high level numbers for RSX, especially when in comaprison, we know the world about Xenos...
     
  15. Titanio

    Legend

    Joined:
    Dec 1, 2004
    Messages:
    5,670
    Likes Received:
    51
    Err..my point was that RSX does not have a direct twin in the PC space. Not to mention the context it is going into is very different. That doesn't reflect on the workloads or end results, but how one might go about them. Console and PC programming are quite different, in that in the console space you have the opportunity to mould your software around the hardware to extract the best possible performance. That does often lead to different approaches than one might take with PC systems.

    It simply does NOT make sense for there to be a 35GB/s pipe between Cell and RSX if that is not going to be used for generous memory access on either chip's part, into the others' pool. Why else do you think it's there? It's pretty clear that that is the intention. I think furthermore it speaks volumes as to how that bandwidth is split in each direction, with most of it going from Cell to RSX.

    It simply doesn't make sense to complain about GDDR3 bandwidth and talking about the NEEDS of a game, while completing ignoring what the other bandwidth can do for you. If a game really needs more bandwidth, I don't think it'll turn up its nose at XDR access. And if you don't think it's useful for anything beyond vertex access, evidently the guys at Sony and nVidia disagree with you. As do a number of people actually working with the system who I've seen you enter into argument with. You have little idea of the characteristics and behaviour related to RSX access of XDR except that it'll be higher latency. But you don't really know what Sony and nVidia have done to accomodate that, and judging by hints from various people, a lot of the changes to RSX versus a regular G70 relate to that, and how latency may be hidden.

    I agree the RSX's buffers will reside in GDDR3, or at least primary ones, but anything else is probably fair game IMO, depending on your requirements.

    Oh, and a side note on Cell involvement - things it can do, and is doing in games like Warhawk, while the actual bandwidth requirement to send over Cell's buffer to RSX or vice versa for final composition is small, you're neglecting the larger amount of main memory bandwidth such usage can save and buy you for other things.
     
  16. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Well, believe it or not, you've just proved yourself wrong with this statement, and pretty much made the rest of your post irrelevant. I said "generate dynamically and feed directly through FlexIO". Generating a procedural texture and storing in memory doesn't save bandwidth, and is absolutely no different of a case for consideration than a normal texture in memory. Try again Jaws.

    For vertex data, you can amplify data. A height field, for example, only needs one FP32 value (or even FP16) per vertex. An SPE can read these 4 bytes and generate position (X,Y,Z), texture coordinates (U,V), normals (using neighboring heights), tangents, binormals, etc., to make a 50+ byte vertex all of which is needed by RSX. If you want to talk about an additional 35 GB/s being available to RSX, then this is the type of thing you need. However, as mentioned previously, vertex bandwidth is the smallest piece of the pie.

    The point I made earlier was that over 200MB of space, other than the framebuffer, is in the GDDR3. No dev in their right mind would waste that, so it'll be filled with textures and and maybe vertex data. So if the color buffer is in GDDR3, Z-buffer is in GDDR3, and most textures are in GDDR3, how much of the rendering bandwidth can be over FlexIO? Especially if devs are saying there's a penalty for XDR texturing?
     
    #96 Mintmaster, Apr 10, 2006
    Last edited by a moderator: Apr 10, 2006
  17. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    So, do I think Sony wasn't thinking in making FlexIO so fast? Not at all.

    FlexIO bandwidth is 35 GB/s because it handles bursts well. Average usage will be waaay lower than that, but when it is used, it'll be as fast as possible. Clipped and culled primitives will give a burst of vertex traffic. Framebuffer copies will be done quickly when needed. A jpg can be quickly decompressed by CELL and copied to GDDR3 for use as a texture in the upcoming scene. For example, if in each frame you transfer 50MB with such tasks, it's 3GB/s average. On FlexIO these tasks take ~10% of your available rendering time. On a 10GB/s connection, it takes 33% of your available rendering time. That's why FlexIO is there, Titanio.

    Before you think all texturing will be done over XDR, read the last paragraph of my previous post. Now, consider a game at 60fps. To saturate FlexIO you need to transfer 333MB to RSX and 250MB from RSX every frame, and it must be a continuous, steady load. If the primary buffers are in GDDR3, just what can you possibly do to saturate FlexIO even half the time? Even a FP16 HDR framebuffer copy for CELL post-processing is only 7.3MB.

    Regarding bandwidth NEEDS of a game, you have colour, z, texturing, and vertex. That's it. The first two form the majority (especially when the the third is optimized with compression) and sit in GDDR3. Most of the third will be in GDDR3 (see prev. post). The fourth is the smallest. While it would be great not to ignore XDR, what choice is there under these circumstances?
     
    #97 Mintmaster, Apr 10, 2006
    Last edited by a moderator: Apr 10, 2006
  18. Titanio

    Legend

    Joined:
    Dec 1, 2004
    Messages:
    5,670
    Likes Received:
    51
    Memory footprint doesn't necessarily tell us anything about its bandwidth requirements. You may have a much smaller footprint of textures or vertices in XDR requiring proportionally more bandwidth than the larger pool in GDDR3.

    There's a penalty relative to the registers available with a smaller pools of threads on RSX, but that doesn't tell us anything of how the typical number of registers per thread with a higher level of threading on RSX would compare to any of the chips you so often like to compare to. We've had the heavy suggestion of various increases in other local memories that could be linked to support for higher threading, such that it would not surprise me if register numbers also recieved a boost.

    Besides, this is no doubt completely under the developers control. If they need that bandwidth they can work out how much to trade off in terms of available registers per thread, before they start to need something else more than that bandwidth.

    I'm not sure about the behaviour of FlexIO with bursts versus continuous access, but you could apply the same logic to any memory bus. Does the fact that 50MB of data will take a much smaller proportion of frame-time coming over a 22.4GB/s bus for example, compared to 100MB, mean that I'll never use the latter? Or would never want to? Or would never exert the bus beyond single-digit and low double-digit percentages of use? It might take a smaller proportion of my frametime to transfer less data (obviously), but that doesn't mean it would be a dealbreaker to transfer more.

    Maybe you should read mine period before coming to such conclusions. I never suggested all texturing would or should be done from XDR. I think and said some.
     
    #98 Titanio, Apr 10, 2006
    Last edited by a moderator: Apr 10, 2006
  19. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,909
    Likes Received:
    8
    Errr... Nope. SPUs work with local stores, so in order for RSX to consume data generated there, it HAS to pass through FlexIO... therefore FlexIO is used more than "only" vertex data...

    See above.

    I'm not even disagreeing with the above. You made an absolute statement by saying "only" and went on to say that you expect 10% FlexIO usage. Fell free to disagree, if you think FlexIO is "useless". We'll agree to disagree...
     
  20. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    What? You're trying to say RSX can texture directly from a local store? Find me a source for this, because I don't believe it for a second.

    Also, I never said FlexIO was only for vertex data. This is what I said:
    "The only part of the 3D software pipeline that Cell can generate dynamic data for and feed directly to RSX via FlexIO is vertex data"
    I said it in reply to this:
    "It has a 22.4GB/s local bus, and 35GB/s to the CPU and XDR"
    Basically, I'm looking for way in which 35GB/s becomes meaningful as opposed to leftover XDR bandwidth. To exceed the latter, you need to generate stuff on Cell and feed it to RSX. Vertex traffic is the only candidate here. Procedural textures doesn't fit the bill for all the reasons mentioned previously.

    A) If you agree with me, then 10% is very reasonable.
    B) Go read my recent posts. I just explained why it isn't useless.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...