Could this gen have started any earlier?

Discussion in 'Console Technology' started by Shifty Geezer, Nov 20, 2015.

  1. Rangers

    Legend

    Joined:
    Aug 4, 2006
    Messages:
    12,579
    Likes Received:
    1,394

    I currently have a hand me down HD 6970 in my PC....IIRC it's like 1600 SP's. A lot of brute force but not nearly as smart as GCN. I suspect on the whole it's significantly "more powerful" than the GPU in One. A quick google shows HD 6970 released in late 2010.

    Card is gigantic, looks like it could bludgeon a man to death.

    I'm still using my ancient Q6600 and just 4GB of RAM as well. My PC might be a good test case against possible 2011 console, heh. I'm interested to see above somebody running Witcher 3 on a Q6600 and a 6950. I did not think my CPU was enough to really run modern games, although I have little interest in PC gaming anyway.
     
  2. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    15,855
    Likes Received:
    3,059
    did you see my post ?
     
  3. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,724
    Likes Received:
    194
    Location:
    Stateless
    I suspect the card still perform quite well, there are really few reference point for the Terrascale 3 GPU, there are no direct comparison the closest in SPUs is the HD 7870 XT based on Tahiti LE (a review where both cards are tested). It bests both the XB1 and the PS4.
    I've always thought that VLIW4 architecture did not had the life it deserved. GCN brang greater compute performance at the cost of quite some transistors, in a constrained environment as console it would have been great.
    Honestly 4GB is fine for PC, If you are not big on multi-task, for console 8GB is overkill. Your PC while burning power like a mofo is outperforming both console both CPU and GPU, as for RAM it won't set a bottleneck either.
    As for using it as a ref for 2010 & 2011 time paradox type of console, I would say it is not fit. It is a bit like people expecting 4GB this gen. Your CPU TDP is 105 Watts if I checked right , the GPU is 250 Watts, and then you have to add the others parts.
     
    #23 liolio, Nov 25, 2015
    Last edited: Nov 25, 2015
  4. Rangers

    Legend

    Joined:
    Aug 4, 2006
    Messages:
    12,579
    Likes Received:
    1,394
    I was referring to your post!
     
  5. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    15,855
    Likes Received:
    3,059
  6. HMBR

    Regular

    Joined:
    Mar 24, 2009
    Messages:
    417
    Likes Received:
    105
    4890 was still on 55nm and early GDDR5 days , a 40nm 800SPs card from 2011 would be A LOT more power efficient, if you look at the 5770 from 2009/2010(850MHz 800SPs 40nm 128bit memory) it was already pretty OK in terms of power draw, although the mobile versions were running lower clocks (700MHz) it was good enough for laptops.
    so I think it's more than realistic for a console in 2011, perhaps they could have goeven further, near 1000SPs on 40nm

    and it's possible that they would have used VLIW4, but until mid 2012 all APUs were VLIW5... GCN I don't know, the 7970 was available (limited) for purchase in January 2012, probably to soon.

    I don't think comparing a PC with "2011" specs would do much good in current games, there is a serious lack of optimization for old Radeons going on (driver support ended officially like 2 days ago, but realistically it's been almost nothing for the past few years with bugs accumulating and not a lot of optimization going on for non-GCN cards) and limited ram capacity, while in a console they would still be heavily optimized for the architecture they were locked to, I think comparing the 6870 with the 7870 with pre-2013 games is a decent way of seeing the difference,
    http://anandtech.com/bench/product/780?vs=857

    7870 was a 210mm2 GPU, the 6870 a 250mm2 GPU, both with 256bit GDDR5 but higher clocks (and TDP!) for the 7870

    or you can compare something closer to the actual PS4 perf (7850) I think and the number of SPs I mentioned earlier, but also less than half the memory bandwidth
    http://anandtech.com/bench/product/538?vs=549

    looking at the benchmarks, I would think a VLIW5 console with not to far memory bandwidth (and amount of SPs/ROPs/TMUs) could be doing OK with a Witcher 3 or Fallout 4 with some quality reduction but with the same gameplay and mostly visual style, still, some situations could have less than half the performance.

    the Wii U is just something else, with rumored 320 or less SPs and 64bit DDR3, from a post earlier, I think even in 2009 or 2010 a PS4 would have easily over 2x the Wii U GPU specs, like the original Wii specs, I don't think it's of any relevance for the other consoles specs.
     
  7. Allandor

    Regular Newcomer

    Joined:
    Oct 6, 2013
    Messages:
    375
    Likes Received:
    178
    You just forget the cooling system and the price of the console.
    At this time, the 4890 wasn't cheaper than a whole console right now. Also the cooling drove people mad and the power-consumption alone was higher than those, of the current consoles.

    as I wrote before, yes there were many things possible, to get almost the same performance but at what cost
    - noise
    - power consumption
    - price

    the wii U, yeah, it is not that performant, but it is a really small device. I wish it would be a little bit bigger, and would have a almost silent cooling solution.
     
    liolio likes this.
  8. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,724
    Likes Received:
    194
    Location:
    Stateless
    Indeed that topic is a remake of the predicting last gen thread, with the same high expectations that did not pawned out: even the PS4 is at the bottom end, may be a little below that, of what the enthusiasts here (so most people) were expecting. Power was a big issue back then, so was the memory chips size.

    Something that is sinking in me as time passes is that whereas UMA design (or close ala 360) are attractive they set too many constraints on manufacturers. I do get the cost saving aspect but I believe the benefit are not worth it. Looking at the both the XB1 the PS4 the pursuit of the APU approach came with crazy complications in the case of Microsoft or a significant cost overhead for Sony (even it is transparent to costumers). Once the "ideology" is left aside what I'm left thinking from a comparison with the PC world is that NUMA is the way (one chip or not) to conciliate the need for a significant amount of RAM and the need for some fast RAM.

    I would add that the far no manufacturer agreed to the compromises that come with PC like NUMA system, real slow main RAM (which does not impact CPU performances that much) and fast Vram. One could say Sony got Close with the PS3 but putting too much focus on sustaining the SPU max performances (as well as putting too many of them) and ended with a tiny pool of fast memory on the CPU side.

    If were to do a more cutting edge design than the one I laid down a page before, I would go with something like this:
    Same Xenon II connected to 2GB of DDR3 RAM (single channel).
    A custom Terrascale 3 GPU 8CU 16 ROPS, conservative clockspeed, connected through a 128 bit bus to 1GB of GDDR5
    X4 BRD player, HDD, ships fall 2010 for 399$, software compatible on a title basis.
    Would definitely have bridged nicely the ps360 with what would have come next.
     
    milk likes this.
  9. HMBR

    Regular

    Joined:
    Mar 24, 2009
    Messages:
    417
    Likes Received:
    105
    as I said, I'm not considering a 800SPs VLIW5 55nm GPU like the 4870 or 4890 realistic for 2008 consoles (when the 4800s were high end), but for 2011 I think it would be realistic in terms of cost, power and so on

    the 5870 mobility was a lower clocked 5770 (700MHz 800SPs, 128bit GDDR5) with around 50W TDP in 2010
    the regular desktop 5770 was under $150 by mid 2010, with 1GB GDDR5, so a similar design with a few more SPs, sounds possible

    by 2011 their Llano APU had 400SPs VLIW5, like their 2014 Kaveri GCN APU had 512SPs, both stuck with slow ram (128bit DDR3)
     
  10. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    11,034
    Likes Received:
    5,576
    Well in 2011 the safe manufacturing node was at 40nm. Given how the XBone and PS4 are ~350mm^2 APUs and AMD was already making APUs at the time, perhaps a good estimation would be to see what APU we could get with 350mm^2 at TSMC's 40nm, using 2011's technology.

    350mm^2 would be good for 255mm^2 for a GPU like Barts (DX11 featureset, 4*64bit GDDR5 + 14 SIMDs + 56 TMUs + 32 ROPs) and 8 Bobcat cores, which at 9mm^2 (with 512KB L2 cache) would come at 72mm^2 + southbridge (Brazos' southbridge FCH was 28mm^2) + whatever glue logic needed.

    [​IMG]

    If AMD wanted to go with VLIW4 (better for compute), that would be 14 SIMDs * 64 shaders = 896 shader processors. Even at a modest 600MHz for the GPU, that would result in >1 TFLOP of theoretical compute performance (>4x Xenos in X360).
    Also, at the time we had 4Gbit GDDR5 chips working at 1GHz, so 4GB of 256bit at 4000MT/s -> 128GB/s.

    Of course, we wouldn't get anything like asynchronous compute and the HSA capabilities wouldn't be as refined, but with a 1TFLOP GPU, 8*1.6GHz Bobcat cores and 4GB RAM I believe the developers could reach pretty damn good visuals, probably reaching what we've seen in the first generation of XBone and PS4 titles like Killzone Shadowfall, Tomb Raider 2013, Battlefield 4 and Ryse at 900p.
     
    #30 ToTTenTranz, Nov 27, 2015
    Last edited: Nov 27, 2015
  11. Esrever

    Regular Newcomer

    Joined:
    Feb 6, 2013
    Messages:
    766
    Likes Received:
    527
    GCN at 28nm brought in a 40% increase in power efficiency over VLIW4, I assume even more over VLIW5. The absolute best case would be about 70% the performance of the current consoles with 4GB of either ddr3 or GDDR5. Also looking at mobile chips is pointless as they are binned and can't be used to compare for power efficiency. The 6870/6850 used more power than the 7850(the performance of the ps4). The most you'd end up with a 6770 level gpu.

    That is if you aren't looking at memory or price. GDDR5 4GB would be expensive. DDR3 would require something like ESRAM to be effective and then also expensive. Going with a discrete gpu with dedicated memory would just be shooting yourself in the foot and be expensive. To not be memory limited with DDR3, and not increase the cost to use ESRAM, its likely the console will simply be scaled back. 50GB/s from DDR3 with a 6670 level gpu would be the cheap and effective solution. You could see one maker going with the cheap solution with ~700Gflops with DDR3 at $300 and one maker with ~1.2TFLOPS and GDDR5 at $450. The performance difference would be a lot on paper but neither would be able to do native 1080p. And even the expensive solution would be only be about half as powerful as the ps4. Also the difference would probably be less than xbox vs ps2, I would imagine the cheaper console would sell more. If both console makers were looking at these prices, they are probably unlikely to want to go the expensive route.
     
  12. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,724
    Likes Received:
    194
    Location:
    Stateless
    Well the compound effect of GC and 28nm lithography, we don't know how thing are split. The review I linked below is the only fair comparison I've seen between the old VLIW and GCN (not even in its first rendition). The difference between both card is not as much 40%.
    The GCN card base frequency is actually lower the one of the Cayman card but it can 'turbo' up to 975MHz. Then AMD has iterated a couple time on its 28 nm physical implementation (/whatever is the more correct wording), it never did on the VLIW4.
    Actually I remember arguing about the merit of VLIW4 at a time when there was no proper material to make a sound comparison. Now or actually a couple years ago though the topic was not longer hot as GCN by the time Tahiti LE shipped was confirm inside the new generation of consoles.
    As I think about it again well I believe that AMD should have stick to it / I have gone back to my primary "gut driven" opinion.
     
  13. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    11,034
    Likes Received:
    5,576
    Regarding gaming performance with engines that don't access compute features very often, you don't really know how much of this comes from the 40-28nm transition.
    Which is why I made a comparison of what could be done with same die-sizes using 40nm instead of 28nm, which was still to fresh in 2011.

    I don't think you have the power consumption numbers right.
    The TDP difference between the 6870 and the 7850 is 21W. Between 6850 and 7850 it's 7W. And those GPUs are clocked between 800 and 900MHz.
    The raw numbers I suggested were for 600MHz, so they would consume less than that. For example, the mobile HD 6990M is rated at 75W, GDDR5 included. It's a Barts with all 14 SIMDS clocked at 715MHz. Lower the target core clocks to increase yields and you get an even more feasible GPU.

    I don't think 4GB of 4000MT/s GDDR5 in 2011 would be more expensive than 8GB of 5500MT/s in 2013. The PS4 came with that.
     
  14. Esrever

    Regular Newcomer

    Joined:
    Feb 6, 2013
    Messages:
    766
    Likes Received:
    527
    Mobile chips are binned. You can't feasibly sell a console with only binned parts unless you want to have no yields. The most efficient desktop GPU at the time was the 6850. The 6850 however could use as much power as a 7870 in practice and lower the clock would simply decrease performance. This is also a time when the CPU used more power too so if you want 8 bobcat cores, you'd need to double the cpu power draw compared to the current consoles.
    2011 was back when the 580 only came with 1.5GB of vram. And when the 6950 1GB was $40 cheaper than the 6950 2GB.
     
  15. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    11,034
    Likes Received:
    5,576
    You don't say.. And what would happen to said "no yields" if, say, the target clocks were lowered?

    No, it couldn't. In every possible scenario except idling, it wouldn't.


    And if electronics and physics are anything to go by, lower clocks for the same chip will also require lower voltage and lower power consumption.


    You'll need to provide a source claiming that Jaguar has twice the performance/watt of Bobcat.
    If actual benchmarks between the E-350 (Bobcat) and the E1-2500 (Jaguar) are considered, that assumption couldn't be any further from the truth.

    And nowadays the GTX 960 4GB costs $40 more than the GTX960 2GB. Does that make you think the extra 2GB of GDDR5 costs $40 more to the OEMs? Do you think Sony is paying $160 for 8GB of GDDR5?
     
  16. Esrever

    Regular Newcomer

    Joined:
    Feb 6, 2013
    Messages:
    766
    Likes Received:
    527
    And it will lower performance but the power efficiency wouldn't necessarily change.
    Ok, well I guess that is true and I must have miss remembered but it still used more than the 7850 which is where the cutoff is.
    yes but at a lower performance, which is what I meant, it wouldn't magically make more performance out of the same power envelope
    You are comparing binned and cutdown chips vs full chips. Compare the 4 core jaguar to the 2 core bobcat, both fully enabled. They use almost the same power. Thus 8 bobcat = 16 jaguar core of power consumption.
    But back then, they didn't start at 2 GB trying to move up to 4GB, they actually spent R&D trying to lower the price of the card by cutting down the vram by redesigning the PCB. I don't see how that is comparable to trying to get people to spend more on more vram. They had the MSRP of the 2GB card and tried to cut down the price by getting rid of the vram.
     
  17. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    21,578
    Likes Received:
    7,130
    Location:
    ಠ_ಠ
    Bobcat vector perf would be an even bigger step down too?
     
  18. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,363
    Likes Received:
    3,944
    Location:
    Well within 3d
    Bobcat's physical vector resources were half the width of Jaguar.
    Another unknown is how well it could have been scaled in core count.
    Jaguar's shared inclusive L2 lead to easy coherence between 4 cores before taking a serious inter-module coherence hit if data was resident in the other module.
    Bobcat didn't have that, and I am struggling to recall any AMD APU solution that has had more than two CPU-coherent units/modules plugged into the uncore. I have not seen measurements of Bobcat's remote L2 sharing penalties, but I think AMD's various choices with Jaguar's L2 lead to the job of core scaling past 2 cores being easier.

    The two-module solution in the consoles is not the greatest, but Bobcat implementations had even less core scaling in mind. A desktop core had better prospects in terms of what could be plugged into an APU, in terms of throughput and performance.
     
    iroboto and TheAlSpark like this.
  19. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    11,034
    Likes Received:
    5,576
    No one in this thread claimed that a 40nm Terascale GPU would be comparable to a 28nm GCN one in power efficiency, so there's no point in trying to counter an argument that never existed.


    But exactly how important is inter-core cache coherency for a videogame? For example, I remember Kentsfield and Yorkfield (Core 2 Quad) not having any L3 cache across the 4 cores, only L2 per pair, and their scaling in games, compared to higher-clocked dual-cores on the same architecture, was pretty good:

    [​IMG] [​IMG]

    Obviously, 4 dual-core modules of Bobcat wouldn't scale as good as 2 four-core modules of Jaguar. I'm not ever going to suggest that. But unlike the PC, in a console the developers would have enough control to separate the tasks between modules in order to prevent the exchange of data between different modules. E.g. one 2-core module for O.S., network; one module for feeding the GPU; one module for anything non-graphics like physics, A.I., sound, etc. and one last module for the developer to do whatever he pleases like hosting a multiplayer server, video encoding, etc..

    The biggest difference here is that while a 4-module Bobcat at 1.5GHz would consume well below 20W of the power budget, two dual-core modules of Stars cores (what AMD had at the time..) at 2GHz would probably consume about 3 times that. Not to mention that 4 Stars cores occupied almost half of Llano's 228mm^2 area at 32nm:

    [​IMG]

    That's easily 80mm^2 for the CPU cores (+L2) alone, on DF's 32nm. Do that on TSMC's 40nm and you might get close to 100mm^2 for the CPU+L2.


    Bobcat + Terascale would never, ever reach the utilization ratios of Jaguar + GCN. 8*Bobcat + 14 VLIW4/5 SIMDs + GDDR5 would perform well below what the XBOne can do. But IMO it would be a damn worthy upgrade to the PS360 back in 2011.
     
    #39 ToTTenTranz, Dec 3, 2015
    Last edited: Dec 3, 2015
  20. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,363
    Likes Received:
    3,944
    Location:
    Well within 3d
    Bobcat did not have modules. I had two cores with their private L2 caches. It's a comparison of 2 quad-Jaguar modules with bad inter-module sharing penalties against 2 single-core Bobcat cores with unknown penalties. If AMD's modern uncore performance is any indication, it would have been pretty poor and it's harder to not share when all you get to work with are two cores.
    The Intel CPUs have better cores capable of tolerating latency, and even with Core2 Quad their sharing penalties were better than modern AMD architectures--which have gotten worse the more the GPU has integrated.

    This assumes that could be done. My point in mentioning that we've only seen APUs with two CPU-coherent domains is that for Jaguar such a domain is 4 cores, but for Bobcat it is 1.
    The capacity for AMD's crossbar interconnect to expand to arbitrary module/core counts has shown itself to be a limiting factor.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...