Next Generation Hardware Speculation with a Technical Spin [2018]

Discussion in 'Console Technology' started by Tkumpathenurpahl, Jan 19, 2018.

Tags:
Thread Status:
Not open for further replies.
  1. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,833
    Likes Received:
    18,633
    Location:
    The North
    hmm... is navi suppose to be the 9th?
     
  2. MrFox

    MrFox Deludedly Fantastic
    Legend

    Joined:
    Jan 7, 2012
    Messages:
    6,488
    Likes Received:
    5,996
    GFX9 is Vega
     
    Shoujoboy, iroboto and BRiT like this.
  3. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,511
    Likes Received:
    24,411
    Seems more like naming convention change for the AMDGPU unit tests.

    CIVI-DAG became CIVI-NEXT.
    GFX9-DAG became GFX9-NEXT.
    Added HAWAII-NEXT.
    Added FIJI-NEXT.
     
  4. chris1515

    Legend

    Joined:
    Jul 24, 2005
    Messages:
    7,157
    Likes Received:
    7,965
    Location:
    Barcelona Spain

    What is CIVI?
     
  5. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,833
    Likes Received:
    18,633
    Location:
    The North
    lol I was going to ask this next.
    Is the implication here that this could be a lead that Vega could be part of PS5?
     
  6. chris1515

    Legend

    Joined:
    Jul 24, 2005
    Messages:
    7,157
    Likes Received:
    7,965
    Location:
    Barcelona Spain
    Probably a new AMD GPU architecture
     
  7. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,511
    Likes Received:
    24,411
    Someone's joke of CM? Once upon a time at my work, someone responsible for setting up new accounts misread "ADAM" as "A D A I V I". Since then we've always forced Adam to use ADAIVI.
     
  8. chris1515

    Legend

    Joined:
    Jul 24, 2005
    Messages:
    7,157
    Likes Received:
    7,965
    Location:
    Barcelona Spain
    Maybe Navi
     
  9. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,833
    Likes Received:
    18,633
    Location:
    The North
    if a C turned becomes a N
    and the IVI is a A V I

    then yea... sure ;)
    my most scientific post of the day. I need to go back to work. The posts came too fast and furious today, hard to keep up
     
    TheAlSpark likes this.
  10. MrFox

    MrFox Deludedly Fantastic
    Legend

    Joined:
    Jan 7, 2012
    Messages:
    6,488
    Likes Received:
    5,996
    CI, VI, and CIVI have been there for a while.
    VI would be volcanic island?
     
    anexanhume, Shoujoboy, BRiT and 2 others like this.
  11. vipa899

    Regular

    Joined:
    Mar 31, 2017
    Messages:
    922
    Likes Received:
    354
    Location:
    Sweden
    So where does that put a 8-core jaguar? Jaguar 8 core must be very efficient being able to outperform a FX8350 @ 4ghz with its 2ghz?
     
  12. see colon

    see colon All Ham & No Potatos
    Veteran

    Joined:
    Oct 22, 2003
    Messages:
    2,756
    Likes Received:
    2,206
    So those numbers for Ryzen and Jaguar are per core, the Bulldozer are per module. Does Jaguar really have a performance penalty for DP? Don't Jaguar and Bulldozer both have the same FMAC only Bulldozer has one per module and Jaguar has 1 per core (or 4 per 4 core module)?

    According to those numbers an 8 core Jaguar would do 24 DP FLOPS per cycle or 64 SP Flops per cycle while a 8 core Bulldozer would be at 32 DP FLOPS or 64 SP FLOPS per cycle. I was under the understanding that each Jaguar core used the same FPU as each Bulldozer module, but apparently not. Still, Bulldozer has a pretty long pipeline which can hold back IPC.
     
    vipa899 likes this.
  13. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    It's likely a combination of CI and VI. Sea Islands and Volcanic Islands. Since there was already a Southern Islands, SI was taken.

    For multiplication, double precision takes an extra iteration for Jaguar that blocks additional multiplications. The extra gap in cycles reduces further than the half-rate expected for going to double-precision.

    The architectures are different. For one, Jaguar doesn't have an FMAC and it lacks a fair number of extensions supported by the Bulldozer line. Bulldozer also has a higher priority for double-precision, while Jaguar saved hardware by reducing throughput for that data type.

    The Bulldozer module would presumably not be running at the same clock as the Jaguar one, and likely would target something close to twice the clock speed while only having half as many cores.
    Per leaks about the early PS4 architecture, Sony almost decided on a 2-module Steamroller APU. Throughput would have been generally equivalent, but single-threaded performance would have favored the Steamroller one. By that revision of Bulldozer, some notable shortcomings relative to a Jaguar implementation would have been improved (wider decoders, better branch prediction, etc).
    Whether power consumption, Steamroller's dependence on Globalfoundries, or some other factor put Sony off is unclear.
     
  14. vipa899

    Regular

    Joined:
    Mar 31, 2017
    Messages:
    922
    Likes Received:
    354
    Location:
    Sweden
    @3dilettante
    So a FX8350 on its stock speed is faster then 8 core jaguar found in consoles?
    How is that clock for clock?
     
  15. see colon

    see colon All Ham & No Potatos
    Veteran

    Joined:
    Oct 22, 2003
    Messages:
    2,756
    Likes Received:
    2,206
    Ahhh... I'd read that Jaguar's FPUs were double or two way 128bit FPUs which i equated as being the same as the FMAC in Bulldozer. So is Jaguar 128bit per pipe with a performance penalty to combine 2 pipes into DP, and Bulldozer is 2x128bit per module (2 cores) with no performance penalty for DP?
     
  16. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    The FX8350 is a 220W (edit: correction--that's the 9590, 125W for Vishera) processor running from 4 to 4.2 GHz versus the 1.6 GHz Jaguar cores.
    For non FPU workloads, Vishera had as many cores as the PS4 running over twice as fast. That's vastly better multithreaded throughput and single-threaded performance.
    The shared FPUs made it so that there were 4 FPUs, but they were individually much more heavyweight than Jaguar. The exact mixes each design favored do not align all the time (FMA can be better generally, but in a few spots worse versus separate MUL and ADD), but Vishera's FPU could hit the same peak numbers as two Jaguar FPUs and supported a more robust set of shuffle and vector integer operations, even without considering it was running over twice as fast.

    Then there's the data paths and cache subsystem, which were wider and faster versus the power-conscious Jaguar.

    However, Vishera was a vastly bigger investment in terms of power and silicon, so being much faster doesn't mean it was necessarily an order of magnitude faster than the more modest Jaguar.

    Overall, the per-clock and other efficiency measures would have left the first two Bulldozer core variants out. However, by the time Steamroller came about, process changes and architectural changes fixed bugs and improved per-clock performance measurably. Steamroller at half the cores but twice the clock speed of Jaguar was what Sony may have nearly gone with for Orbis.

    Jaguar has one 128-bit pipe for addition, and one 128-bit pipe for multiplication. Miscellaneous operations and vector integer ops are distributed among those two ports.
    Bulldozer has 4 FPU pipes, two for FMA. Permutes, moves, and integer ops were spread among all four.
    Steamroller and later had 3 FPU pipes, where some of the integer operations and miscellaneous instructions were moved onto the remaining three.
    There's a strong mixture of extensions and operations supported by Bulldozer versus Jaguar, so a lot of the other less glamorous elements of floating point workloads could be better handled by Bulldozer without fighting for cycles on Jaguar's smaller number of ports.

    For DP, the Bulldozer line could get half-rate. The FMA pipe is counted as performing 2 operations and provides additional flexibility in the pure addition or pure multiplication case. It's not as strong if there's an equal mix of additions and multiplications that cannot chained together into an FMA.
    For Jaguar, the addition pipe could perform a DP addition at half rate. For DP multiplication, the unit could produce the expected half-rate per instruction, but it then had to loop back through the smaller multiplier for an extra cycle--cutting performance further.

    For the consoles, the DP case isn't all that important, however.
     
    #3736 3dilettante, Nov 22, 2018
    Last edited: Nov 22, 2018
  17. Magnum_Force

    Newcomer

    Joined:
    Mar 12, 2008
    Messages:
    104
    Likes Received:
    70
    It's a shame that Microsoft at least didn't put a dozer core in the Xbox one X.

    The size and power consumption of the last of the bulldozer line, Excavator, had been reduced significantly, so much so that on 28nm 2 Excavator modules would have been only slightly bigger than a jaguar 4 core complex. They could have theoretically fit 8 excavator cores in a similar silicon budget as what they have now. The FX 9800p also manages to run 2 modules/4 cores at 2.7ghz base in just 15 w power budget.

    It would have madw the possibility of repurposing the One X as a low end next gen machine a bit more palatable.
     
    vipa899 likes this.
  18. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    The transistor budget would likely have been larger for Excavator. It was a later microarchitecture that took advantage of a tweaked process and higher-density libraries to cram more transistors into a similar area to Jaguar. Whether Jaguar had some level of similar compaction, or could have been redesigned in the same way is unclear, although at that point in time AMD was in no position to continue revamping the Jaguar line as it did Bulldozer.

    Whether Excavator's density gains versus a more stagnant Jaguar line would have carried over to 16nm is not clear, the node jump would have reset the architectures to a similar starting point.
     
    BRiT likes this.
  19. Magnum_Force

    Newcomer

    Joined:
    Mar 12, 2008
    Messages:
    104
    Likes Received:
    70
    I thought the high density libraries were more a trade off of clock speed for a smaller size as they switched to a different metal stack to enable greater density. You see this on the power curve - at the top end of the clock, Steamroller is actually more efficient than excavator, at least according to AMDs slides.

    As to why it was not chosen for the consoles, most likely it was the fact that excavator was built on GF process and not TMSC's, as you've mentioned.
     
  20. see colon

    see colon All Ham & No Potatos
    Veteran

    Joined:
    Oct 22, 2003
    Messages:
    2,756
    Likes Received:
    2,206
    Is this the case for games in general? I had an old A10 laptop and a Phenom 2 940 and for most games the A10 was GPU bound, but older games like Quake 1 at lower resolutions ran better on it IIRC.
     
    vipa899 likes this.
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...