PlayStation 4 (codename Orbis) technical hardware investigation (news and rumours)

Discussion in 'Console Technology' started by Love_In_Rio, Jan 28, 2013.

Thread Status:
Not open for further replies.
  1. DRS

    DRS
    Newcomer

    Joined:
    May 22, 2009
    Messages:
    135
    Likes Received:
    0
    Not sure where you're heading here, I miss the punchline. BTW the depthcheck bw I was referring to could be decreased by using hierarchical Z instead of per pixel Z checks, which only requires 26GB/s if I'm not mistaken. So it doesn't need to be that bad. For alpha blending I don't know any alternative, so that will still decrease the fillrate to half its maximum .
     
  2. KSterson

    Newcomer

    Joined:
    Mar 5, 2013
    Messages:
    29
    Likes Received:
    0
    Location:
    Paris
    I wasn't saying you were wrong, just that based on your same calculation, Liverpool has a bandwidth advantage even compare to a HD 7870 (143 GB/s vs 176 GB/s). And even if you can't fully exploit the raster pipeline capabilities it's a better situation. DF is wrong stating 16 ROPs is enough for 1080p. Every 7790 benchmark proves otherwise.

    (It does make a lot less sense without half of the post)
     
  3. Rangers

    Legend

    Joined:
    Aug 4, 2006
    Messages:
    12,791
    Likes Received:
    1,596
    How do we know it's the rops, and not the bandwidth or the 1GB 7790 framebuffer vs 2GB in 7850, crippling 7790 vs 7850?

    A 660 Ti with 24 ROPS outperforms a GTX 580 with 48 ROPS at 1080P. Where's the sweet spot? Dunno offhand. We'd need a bench similar cards with 16 and 32, which I cant see happening (they will be from different vendors at least)
     
  4. Averagejoe

    Regular

    Joined:
    Jan 20, 2013
    Messages:
    328
    Likes Received:
    0

    Sony apply for a content switching patent,don't know if that is relevant to my theory or if the patent is for other purpose and can't or can be use for that..
     
  5. Averagejoe

    Regular

    Joined:
    Jan 20, 2013
    Messages:
    328
    Likes Received:
    0

    The 1GB framebuffer should be easy to spot if the problem is there,in most cases there will be no difference in most games but those that require allot or ram,if i am not mistaken.

    http://www.anandtech.com/show/6359/the-nvidia-geforce-gtx-650-ti-review/6

    Look at the review of the 650Ti,the 1gb version run all games like the 2GB version,but when the topic is Skyrim the 2GB version almost double the 1GB in performance.

    The same for he 7850 1GB vs 2GB model.
     
  6. KSterson

    Newcomer

    Joined:
    Mar 5, 2013
    Messages:
    29
    Likes Received:
    0
    Location:
    Paris
    That's a different architecture (gpu generation also) and 660 Ti has better pixel (higher clock) and texel capabilities.

    About the memory limitation we can check this month with the upcoming 2GB versions out, but i have been trying to OC memory up to 1650 Mhz on a 7790 and you don't get any boost at all on non vram limited games (less than 1024 MB in afterburner).

    Core clock work. I have little doubts that gpu is crippled by its pixel fillrate, still 17.2 Gpixels/s even with 1075 Mhz core OC, a reference 7850 has 27,5 at 860 Mhz. It's a really big difference.
     
  7. Rangers

    Legend

    Joined:
    Aug 4, 2006
    Messages:
    12,791
    Likes Received:
    1,596
    i think that's sort of the point lol. of course there's no difference on non-vram limited games!

    i did look into 7850 1gb vs 2gb a little bit, and there doesn't seem to be a difference on most titles. but some there is.

    it's impossible to narrow down completely. but for some reason i suspect memory bandwidth is the biggest issue. but it should be mostly down to that or ROPS (but unknown which), with some effect by 1gb fb.

    it also comes into play imo that pc is a arena where hardware bends around software, and not vice versa, yet there are limits to that.

    if you suddenly doubled the shaders on a 7850, left everything else the same but now it is 3.6 tf, pc titles would not see too much speedup, only the cases that were shader limited to any extent. but if you did the same in a console, programmers would use up all those new shaders, bending the software to the hardware.
     
  8. KSterson

    Newcomer

    Joined:
    Mar 5, 2013
    Messages:
    29
    Likes Received:
    0
    Location:
    Paris
    What i meant was that in either case there s no the difference vram limited or not, because in this particular case (no vram limitation) it's not about the amount of memory, it just leave ROPs or bandwidth (again 0,0% increase with +10 GB/s, just on memory, no core clock, but maybe at +20/+30 it makes a sudden magical jump..). Core clock does have a direct influence on fillrate as you certainly aware, the boost in performance is there, slight (but it's there).

    Of course it's PC environement, maybe the conclusion are not as clear, or can be totally applied to console environement, but in the end it gives a good idea of obvious bottlnecks i think.
     
  9. Rangers

    Legend

    Joined:
    Aug 4, 2006
    Messages:
    12,791
    Likes Received:
    1,596
    But it could be shaders the bottleneck that causes speedup with core clock, or anything else too. Hard to isolate as you say.

    I guess you could sort of start to isolate out mem bandwidth if you could get those benches, BW OC vs core OC, on titles you know arent VRAM limited. Starting to get pretty complex though.

    It seems that without fail, when I see a OC test compare mem OC vs core OC, the core OC is more effective. Dunno why that is, as it doesn't seem to be logical. The 650 Ti BOOST for example, seems to get most of it's performance bump from it's more bandwidth.
     
  10. KSterson

    Newcomer

    Joined:
    Mar 5, 2013
    Messages:
    29
    Likes Received:
    0
    Location:
    Paris
    To me shaders performance is mostly related to normalized Gflops value (true enough if you compare the same unified architecture) also with practical scheduling efficiency.

    7790 lacks 2 CU compare to a 7850 but the theorical peak is the same at reference clocks, it's 1792 Gflops vs 1761 (7790 is even higher, also in Mtriangles/s and slightly in Texels/s).
     
  11. DrJay24

    Veteran

    Joined:
    May 16, 2008
    Messages:
    3,894
    Likes Received:
    634
    Location:
    Internet
    I would guess AMD knows the sweet spot. We can try to reason it out, but ultimately it is their GCN tech and they make 16 and 32 ROP parts. It is safe to assume they know the bottlenecks and would advise Sony accordingly. I'd love to reverse engineer the technical reasons, but like you said you would need the same part with 16 and 32 and they don't make it. They have a line in the sand, low end has 16 and high end 32, other things also change when moving up and down the performance scale.
     
  12. patsu

    Legend

    Joined:
    Jun 25, 2005
    Messages:
    27,709
    Likes Received:
    145
    Jeff Rigby PM'ed me in GAF. I will just relay a short message.

    He thinks the Flash RAM will come in handy in low power mode. In this case, the ARM CPU will remain awake, and does its things without spinning up the HDD. Apparently there will be strict EU guidelines on low power mode.

    I haven't done any reading in this area.
     
  13. ERP

    ERP
    Veteran

    Joined:
    Feb 11, 2002
    Messages:
    3,669
    Likes Received:
    49
    Location:
    Redmond, WA
    I think the EU guideline is 0.5 watts.
    But it allows preprogrammed wake ups and higher power draw when doing something like background downloading for example.
     
  14. MrFox

    MrFox Deludedly Fantastic
    Legend

    Joined:
    Jan 7, 2012
    Messages:
    6,488
    Likes Received:
    5,996
    I'm still not sure I understand it completely, but it looks like the directive can almost be ignored for both the PS4 and the 720. As long as it has a reason to be up, it's not considered standby. Even the GDDR5 refresh.
    http://ec.europa.eu/energy/efficien...tion/guidelines_for_smes_1275_2008_okt_09.pdf
     
  15. Wynix

    Veteran

    Joined:
    Feb 23, 2013
    Messages:
    1,052
    Likes Received:
    57
    I think he is referring to the new law for gaming consoles which is something like 45 watts peak for when the consoles are not being used for gaming, i believe the 0.5 watt limit you mentioned is only for standby mode.

    /speculation/ It's also quite possibly part of the reason that nintendo went low power with the Wii U, they jumped the gun in conforming to the new law before it was finalised, then Sony and Microsoft made them change the law to when the consoles are not gaming therefore allowing them to keep the ~200watt consoles.
     
  16. tunafish

    Regular

    Joined:
    Aug 19, 2011
    Messages:
    627
    Likes Received:
    414
    Actually, since modern rops typically optimize for 64-bit color and more than one sample, they should be capable of way more throughput than that, by 2-4 times.

    However, they also have the gpu L2 caches between them and the memory interface. They often provide more that 2x amplification for the memory bandwidth, depending a lot on the load ofc.
     
  17. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Is that taking into account the color and depth caches per render backend? The GCN diagrams don't clearly show the L2 being an intermediate step for ROP output.

    The whitepaper indicates that at least for MSAA that writes go to the framebuffer via the memory controllers.

    http://www.amd.com/us/Documents/GCN_Architecture_whitepaper.pdf
     
  18. DRS

    DRS
    Newcomer

    Joined:
    May 22, 2009
    Messages:
    135
    Likes Received:
    0
    I understand, the memory speed isn't capable of that amount of throughput though. This thread did make me realize that a fine granularity ROP setup isn't that bad at all; blending or multisampling isn't needed all the time, so keeping resources idling in those cases is a waste of silicon.

    Isn't that amplification just applicable for local (and more or less discrete) read/writes? I mean, these cache sizes are far from the framebuffer sizes right? So I can see it help when doing DOF, but not when multipassing in the way DOOM 3 did (separately blend each light into the framebuffer) .

    I didn't read thru the entire paper, but aren't Z$ and C$ cache blocks?
     
    #1278 DRS, Apr 7, 2013
    Last edited by a moderator: Apr 7, 2013
  19. ultragpu

    Banned

    Joined:
    Apr 21, 2004
    Messages:
    6,242
    Likes Received:
    2,306
    Location:
    Australia
    Michiel Van Der Leeuw, technical director at Guerrilla Games gives his own take on the system's efficiency.
    http://www.videogamer.com/ps4/killz...ce_bottlenecks_claims_killzone_developer.html
    So the 8g gddr5 ram is not excessive after all, I wonder what crazy things they would do with it that 4g couldn't.
     
  20. Cjail

    Cjail Fool
    Veteran

    Joined:
    Feb 1, 2013
    Messages:
    2,027
    Likes Received:
    211
    GG, Santa Monica & Naughty Dog and Sony Studios in general with just 512MB amazed me.
    With 8GB they will scare me.

    I wonder if with a "easier to use/code" console more dev will be able to achieve the quality that, on PS3, was only achieved by Sony Studios.
     
    #1280 Cjail, Apr 8, 2013
    Last edited by a moderator: Apr 8, 2013
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...