AMD: Pirate Islands (R* 3** series) Speculation/Rumor Thread

Discussion in 'Architecture and Products' started by iMacmatician, Apr 10, 2014.

Tags:
  1. CNCAddict

    CNCAddict Regular

  2. pjbliverpool

    pjbliverpool B3D Scallywag Legend

    Does anyone else think it's cool that we're back to the R3xx series? I wonder if this one will be as good as the last? If so, count me in.
     
  3. Wynix

    Wynix Veteran

  4. kotakaja

    kotakaja Banned

    Check Slideshare M. Mantor
    R9 290x is Sea Island

    From CodeXL 1.4
    http://i.imgur.com/8LjI06P.jpg

    Hawai is Sea Island
    Volcanic Island is something else
     
  5. Alexko

    Alexko Veteran Subscriber

    So what's in that Volcanic Islands menu?

    And by the way, what's Kalindi?
     
  6. Nemo

    Nemo Newcomer

    Tonga? :lol:
     
  7. kotakaja

    kotakaja Banned

    Kalindi is Sea island in Kabini/Temash
    i am not so sure on Beema/Mullins
     
  8. kotakaja

    kotakaja Banned

    Sea Island CU: 4 Tiles, 16 ALU per Tiles, (4 Thread), Branching is expensive

    From AMD, Heterogeneous Coherency System
    Slide: http://i.imgur.com/nkBqAIi.jpg

    1 CU = 16 Tile, Per Tile control 4 ALU (or More), Increase Thread, Branching more efficient
    so there wont be same ALU number for each GPU Type
    one GPU could be 64 ALU per CU or
    1 CU could be 128 ALU , or 1 CU could be 64 Tiles
    (64 ALU vs 128 ALU is not much different in die area, SRAM is the one takes much of die area)

    Slide:http://i.imgur.com/XKGOFal.jpg
    Showed Heterogeneous Coherence System Spec, GPU is full coherent
    DDR is stacked, DDR type is DDR3, 16 Channel, BW like pascal 700-800 GB/sec
    L3 = Embedded Memory not eSRAM

    There is one console also have full coherency system, certainly wont use GDDR5
     
  9. AlexV

    AlexV Heteroscedasticitate Moderator Veteran

    That (if present) might not mean what you think it means. There's more than one way to tile a workload. Just saying.
     
  10. Alexko

    Alexko Veteran Subscriber

    Well, there's definitely some truth to the rumors:

    [​IMG]
     
  11. kotakaja

    kotakaja Banned

    of course but that is Ex unit represent 4 ALU is hardware view, and not a software view.

    Even in software view, you are bound to hardware capability
    Sea island is 4x SIMD16, you can not tile that into 16 x SIMD4

    That also present on Mike Mantor paper on 2013
    they try to increase thread, but reduce ALU per thread
     
  12. mosen

    mosen Regular

    You are speculating, right?!
     
  13. kotakaja

    kotakaja Banned

    about what ?

    16 EX is clearly on that slide,

    and we Know that GCN CU what we know so far is only
    4 Ex unit x SIMD 16
     
  14. mosen

    mosen Regular


    Please, provide me a link to the full paper. Also your last sentence:

    Is in contrasts with our knowledge about PS4 which has hUMA and accidently (!) uses GDDR5.
     
  15. AlexV

    AlexV Heteroscedasticitate Moderator Veteran

    Sometimes there are only so many little boxes that can fit on a single slide.
     
  16. DmitryKo

    DmitryKo Regular

    Looks like Iceland will replace low-end Oland and Cape Verde (i.e. R5 250 and R7 240/250), while Tonga should replace higher-end Curacao and Tahiti (R7 265/270/270X and R9 280/280X) to bring them up to GCN 1.1 spec R7 260 and R9 29x.

    I'd imagine these new desktop parts could be R5 255/255X and R7 245/245X for Iceland, and R7 275/275X and R9 285/285X for Tonga.
     
    Last edited by a moderator: May 19, 2014
  17. Anteru

    Anteru Newcomer

    FWIW, you can actually run analyze and look at the generated ISA.

    For a ray-tracing kernel, I observed the following:

    • Instruction encoding with Tonga is longer compared to Hawaii (2152 bytes on Hawaii vs. 2232 bytes on Tonga). This is (as far as I can tell) only due to buffer loads, which are twice the size with the offset being now a separate 32-bit dword.
    • A nop after 3 loads which was required on Hawaii is missing on Tonga.
    • Scalar register usage is way up: 42 on Hawaii vs 94 on Tonga. This won't fly if they don't expand the number of SGPR.
    • Some mac instructions got replaced by mad (no mac anymore, maybe)
    • Some very minor control flow changes. A complex loop is set up slightly different (mostly instruction have been shuffled.)
    • There's a v_add_u32 which was not present on Hawaii, and a v_mul_lo_u32 (possibly even more)
    Overall, the changes look pretty minor, except for the load instruction encoding and the scalar register usage.
     
  18. kotakaja

    kotakaja Banned

    PS4 certainly is not Full Coherent


    The paper is from Micro46 (Dec 2013)
    called "Heterogeneous Coherence System"
    and it use Stacked DDR3 + EmbRAM/eSRAM (depends on what term you like to use)


    And from Some console architecture panel:
     
  19. iroboto

    iroboto Daft Funk Legend Subscriber

    The article you are referring to is here:http://research.cs.wisc.edu/multifacet/papers/micro13_hsc.pdf

    Though, there is a table indicating the setup used for their testing, I don't think that ensures that
    a) hardware design must mimic the setup they used for testing to obtain full coherence
    b) I still don't see that their setup uses both stacked DDR3 + embedded ram, I've interpreted this as _just_ stacked DDR3 and a large 16mb L3 cache, unless I'm missing something.

    edit: if there is one thing I found really interesting about the article is that
    700 GB/S of bandwidth was required to not bottleneck 32 CUs / or as they wrote with 700 GB/S of bandwidth available they were able to perform all tasks within the limitations of 32 CUs.

    While I don't necessarily want to go off topic here but here comes some painfully inaccurate math
    12 CU / 32 CU is about 37.5%
    37.5% of 700 GB/S is 262

    X1 total system theoretical bandwidth is 204 (102 both ways) + 67.8 ~ = ~~271 gb/s
    or 192 (MS release #) + 68 ~ 260
     
    Last edited by a moderator: May 22, 2014
  20. pTmdfx

    pTmdfx Regular

    The most important bit in the paper is the cache coherence protocol that allows a region to be temporarily removed from the coherence domain, and also the design concepts of the related facilities to make the protocol happen. The setup is not really important after all, so as the architecture used in the papers, though combining with other papers it shows the research trend (or even the architecture trend) of AMD. QuickRelease (2014) is another paper from the same team of researchers.

    For PlayStation 4, if one assumes it is architecturally the same as Kaveri, it should be "fully coherent", or more specifically it is capable of accessing the coherent system memory by bypassing all GPU caches. Still fully coherent, yah?
     
Loading...

Share This Page

Loading...