AMD: Pirate Islands (R* 3** series) Speculation/Rumor Thread

Discussion in 'Architecture and Products' started by iMacmatician, Apr 10, 2014.

Tags:
  1. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,583
    Likes Received:
    703
    Location:
    Guess...
    Does anyone else think it's cool that we're back to the R3xx series? I wonder if this one will be as good as the last? If so, count me in.
     
  2. Wynix

    Veteran Regular

    Joined:
    Feb 23, 2013
    Messages:
    1,052
    Likes Received:
    57
  3. kotakaja

    Banned

    Joined:
    Apr 30, 2014
    Messages:
    22
    Likes Received:
    0
    Check Slideshare M. Mantor
    R9 290x is Sea Island

    From CodeXL 1.4
    http://i.imgur.com/8LjI06P.jpg

    Hawai is Sea Island
    Volcanic Island is something else
     
  4. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,496
    Likes Received:
    910
    So what's in that Volcanic Islands menu?

    And by the way, what's Kalindi?
     
  5. Nemo

    Newcomer

    Joined:
    Sep 15, 2012
    Messages:
    125
    Likes Received:
    23
    Tonga? :lol:
     
  6. kotakaja

    Banned

    Joined:
    Apr 30, 2014
    Messages:
    22
    Likes Received:
    0
    Kalindi is Sea island in Kabini/Temash
    i am not so sure on Beema/Mullins
     
  7. kotakaja

    Banned

    Joined:
    Apr 30, 2014
    Messages:
    22
    Likes Received:
    0
    Sea Island CU: 4 Tiles, 16 ALU per Tiles, (4 Thread), Branching is expensive

    From AMD, Heterogeneous Coherency System
    Slide: http://i.imgur.com/nkBqAIi.jpg

    1 CU = 16 Tile, Per Tile control 4 ALU (or More), Increase Thread, Branching more efficient
    so there wont be same ALU number for each GPU Type
    one GPU could be 64 ALU per CU or
    1 CU could be 128 ALU , or 1 CU could be 64 Tiles
    (64 ALU vs 128 ALU is not much different in die area, SRAM is the one takes much of die area)

    Slide:http://i.imgur.com/XKGOFal.jpg
    Showed Heterogeneous Coherence System Spec, GPU is full coherent
    DDR is stacked, DDR type is DDR3, 16 Channel, BW like pascal 700-800 GB/sec
    L3 = Embedded Memory not eSRAM

    There is one console also have full coherency system, certainly wont use GDDR5
     
  8. AlexV

    AlexV Heteroscedasticitate
    Moderator Veteran

    Joined:
    Mar 15, 2005
    Messages:
    2,528
    Likes Received:
    107
    That (if present) might not mean what you think it means. There's more than one way to tile a workload. Just saying.
     
  9. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,496
    Likes Received:
    910
    Well, there's definitely some truth to the rumors:

    [​IMG]
     
  10. kotakaja

    Banned

    Joined:
    Apr 30, 2014
    Messages:
    22
    Likes Received:
    0
    of course but that is Ex unit represent 4 ALU is hardware view, and not a software view.

    Even in software view, you are bound to hardware capability
    Sea island is 4x SIMD16, you can not tile that into 16 x SIMD4

    That also present on Mike Mantor paper on 2013
    they try to increase thread, but reduce ALU per thread
     
  11. mosen

    Regular

    Joined:
    Mar 30, 2013
    Messages:
    452
    Likes Received:
    152
    You are speculating, right?!
     
  12. kotakaja

    Banned

    Joined:
    Apr 30, 2014
    Messages:
    22
    Likes Received:
    0
    about what ?

    16 EX is clearly on that slide,

    and we Know that GCN CU what we know so far is only
    4 Ex unit x SIMD 16
     
  13. mosen

    Regular

    Joined:
    Mar 30, 2013
    Messages:
    452
    Likes Received:
    152

    Please, provide me a link to the full paper. Also your last sentence:

    Is in contrasts with our knowledge about PS4 which has hUMA and accidently (!) uses GDDR5.
     
  14. AlexV

    AlexV Heteroscedasticitate
    Moderator Veteran

    Joined:
    Mar 15, 2005
    Messages:
    2,528
    Likes Received:
    107
    Sometimes there are only so many little boxes that can fit on a single slide.
     
  15. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    696
    Likes Received:
    580
    Location:
    55°38′33″ N, 37°28′37″ E
    Looks like Iceland will replace low-end Oland and Cape Verde (i.e. R5 250 and R7 240/250), while Tonga should replace higher-end Curacao and Tahiti (R7 265/270/270X and R9 280/280X) to bring them up to GCN 1.1 spec R7 260 and R9 29x.

    I'd imagine these new desktop parts could be R5 255/255X and R7 245/245X for Iceland, and R7 275/275X and R9 285/285X for Tonga.
     
    #96 DmitryKo, May 18, 2014
    Last edited by a moderator: May 19, 2014
  16. Anteru

    Newcomer

    Joined:
    Jul 4, 2004
    Messages:
    114
    Likes Received:
    3
    FWIW, you can actually run analyze and look at the generated ISA.

    For a ray-tracing kernel, I observed the following:

    • Instruction encoding with Tonga is longer compared to Hawaii (2152 bytes on Hawaii vs. 2232 bytes on Tonga). This is (as far as I can tell) only due to buffer loads, which are twice the size with the offset being now a separate 32-bit dword.
    • A nop after 3 loads which was required on Hawaii is missing on Tonga.
    • Scalar register usage is way up: 42 on Hawaii vs 94 on Tonga. This won't fly if they don't expand the number of SGPR.
    • Some mac instructions got replaced by mad (no mac anymore, maybe)
    • Some very minor control flow changes. A complex loop is set up slightly different (mostly instruction have been shuffled.)
    • There's a v_add_u32 which was not present on Hawaii, and a v_mul_lo_u32 (possibly even more)
    Overall, the changes look pretty minor, except for the load instruction encoding and the scalar register usage.
     
  17. kotakaja

    Banned

    Joined:
    Apr 30, 2014
    Messages:
    22
    Likes Received:
    0
    PS4 certainly is not Full Coherent


    The paper is from Micro46 (Dec 2013)
    called "Heterogeneous Coherence System"
    and it use Stacked DDR3 + EmbRAM/eSRAM (depends on what term you like to use)


    And from Some console architecture panel:
     
  18. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    7,905
    Likes Received:
    6,188
    The article you are referring to is here:http://research.cs.wisc.edu/multifacet/papers/micro13_hsc.pdf

    Though, there is a table indicating the setup used for their testing, I don't think that ensures that
    a) hardware design must mimic the setup they used for testing to obtain full coherence
    b) I still don't see that their setup uses both stacked DDR3 + embedded ram, I've interpreted this as _just_ stacked DDR3 and a large 16mb L3 cache, unless I'm missing something.

    edit: if there is one thing I found really interesting about the article is that
    700 GB/S of bandwidth was required to not bottleneck 32 CUs / or as they wrote with 700 GB/S of bandwidth available they were able to perform all tasks within the limitations of 32 CUs.

    While I don't necessarily want to go off topic here but here comes some painfully inaccurate math
    12 CU / 32 CU is about 37.5%
    37.5% of 700 GB/S is 262

    X1 total system theoretical bandwidth is 204 (102 both ways) + 67.8 ~ = ~~271 gb/s
    or 192 (MS release #) + 68 ~ 260
     
    #99 iroboto, May 22, 2014
    Last edited by a moderator: May 22, 2014
  19. pTmdfx

    Newcomer

    Joined:
    May 27, 2014
    Messages:
    249
    Likes Received:
    129
    The most important bit in the paper is the cache coherence protocol that allows a region to be temporarily removed from the coherence domain, and also the design concepts of the related facilities to make the protocol happen. The setup is not really important after all, so as the architecture used in the papers, though combining with other papers it shows the research trend (or even the architecture trend) of AMD. QuickRelease (2014) is another paper from the same team of researchers.

    For PlayStation 4, if one assumes it is architecturally the same as Kaveri, it should be "fully coherent", or more specifically it is capable of accessing the coherent system memory by bypassing all GPU caches. Still fully coherent, yah?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...