AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Discussion in 'Architecture and Products' started by Kaotik, Jan 2, 2019.

Thread Status:
Not open for further replies.
  1. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    No, AMD isn't playing that game anymore.
    I'm sorry but AMD CPUs are accidentally popular once more.
     
    Lightman likes this.
  2. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    13,878
    Likes Received:
    4,724
    I have $800 on the side for a new card. I've love to get a new amd card. Last time I went with vega 56 for $375 on launch day. If we can double that price and I can get 50% more performance over a 2080ti esp with ray tracing and it has comparable features i'm down with it
     
    Lightman likes this.
  3. Frenetic Pony

    Regular

    Joined:
    Nov 12, 2011
    Messages:
    807
    Likes Received:
    478
    Makes total sense, I'd expect non matrix fp16 operations to also win. The a100 is clearly designed to maximize matrix multiplication machine learning, and unless AMD redesigned the GPU entirely I'd expect it to just be a fairly linear advancement over Vega.

    I wonder how much demand there is for such a GPU?
     
    Lightman likes this.
  4. techuse

    Veteran

    Joined:
    Feb 19, 2013
    Messages:
    1,426
    Likes Received:
    909
    A1xLLcqAgt0qc2RyMz0y and Konan65 like this.
  5. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    In their earlier article, the facts of which they re-confirmed in the most recent one, they had an alleged MI100 slide showing a roughly quarter-rate FP64 at 9,5 TFLOPS and 42 TFLOPS for FP32. But What I don't get is how they would generate 150 TFLOPS of FP16 out of it and how MI100 would not be named MI150 then.

    Then in their last slide shown above, there's a "2.4x better FP32 performance" mentioned (which, sans marketing, probably is rather +140%, yet, in the other slides from the same article, it shows a marginal advantage in "delivered SGEMM" capped at 300 Watt.

    Either I need more coffee, or I just cannot make much sense out of these numbers.

    FWIW, with standard GCN-CUs, you'd need somewhat north of 2210 MHz in a 120CU part for 34 TFLOPS of FP32. Giving it a 95% efficiency for SGEMM, it's something around 2330 MHz.
     
  6. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    It's 42TF peak for SGEMM FP32, not for teh usual ops.
    They just name them bigger numbers now, even pretending MI60 doesn't exist and all.
     
  7. Rootax

    Veteran

    Joined:
    Jan 2, 2006
    Messages:
    2,401
    Likes Received:
    1,845
    Location:
    France
    In an Adoredtv video (a discussion between 3 people), some guy said the fp32 number is "tricky", as they finally get how they obtain that number. But he didn't reveal why.
     
  8. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    By finally figuring out the thing has GEMM engines yes.
     
  9. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Then maybe they just forgot to mention and footnote that little factoid here. Funny, because in some AMD slide decks, there is more space for footnotes than for actual content.
     
    Rootax and nnunn like this.
  10. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    Not sure why these are even discussed in Ampere thread, but that's not an AMD slide.
     
    pharma likes this.
  11. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    I realize, they are not AMD-branded, but "OEM system availability" seems to indicate, it's not a slide from a specific manufacturer. But whose are they?
     
  12. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    They're quite surely just Adored's own fabrications, I mean, what company would ever write stuff like "but not much else" on a slide?
    upload_2020-7-31_12-54-30.png `

    edit:
    To be clear, I'm not saying it's impossible for him to have the real slide deck too, but those are not from AMD or any other company and I wouldn't bet the others are real either.

    Like this one:
    upload_2020-7-31_12-59-5.png

    The AMD portion is copy/pasted from another source and someone forgot to fill in the same background color within letters like "0" and "O" etc
     
    #2392 Kaotik, Jul 31, 2020
    Last edited: Jul 31, 2020
  13. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    He has the slides but can't even read them properly, unfortunately.
     
    A1xLLcqAgt0qc2RyMz0y likes this.
  14. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    967
    Likes Received:
    1,223
    Location:
    55°38′33″ N, 37°28′37″ E
    US10706609B1 Efficient data path for ray triangle intersection
    https://patents.google.com/patent/US10706609B1/en

    A division unit would be useful for a straightforward implementation of a certain type of ray-triangle intersection test that is useful in ray tracing operations. This certain type of ray-triangle intersection test includes a step that transforms the coordinate system into the viewspace of the ray, thereby reducing the problem of intersection to one of 2D triangle rasterization. However, a straightforward implementation of this transformation requires floating point division, as the transformation utilizes a shear operation to set the coordinate system such that the magnitudes of the ray direction on two of the axes are zero. This shear operation, when applied to the vertices of the triangle, requires multiplication by a ratio of the ray direction magnitude in one axis to the ray direction magnitude in another axis, which requires division. Instead of using the most straightforward implementation of this transform, the technique described herein scales the entire coordinate system by the magnitude of the ray direction in the axis that is the denominator of the shear ratio, which removes division.
    ...
    Conceptually, the ray-triangle test involves projecting the triangle into the viewspace of the ray so that it is possible to perform a simpler test similar to testing for coverage in two dimensional rasterization of a triangle as is commonly performed in graphics processing pipelines. More specifically, projecting the triangle into the viewspace of the ray transforms the coordinate system so that the ray points downwards in the z direction and the x and y components of the ray are 0 ... The vertices of the triangle are transformed into this coordinate system. Such a transform allows the test for intersection to be made by simply asking whether the x, y coordinates of the ray fall within the triangle defined by the x, y coordinates of the vertices of the triangle, which is the rasterization operation described above.
    ...
    The ray-triangle intersection unit 702 does not include a divider. Not including a divider is beneficial because a divider consumes a large amount of computer chip die area and power. Thus, not including a divider improves the amount of die area taken up by the ray intersection unit 139.
     
    Krteq likes this.
  15. Rootax

    Veteran

    Joined:
    Jan 2, 2006
    Messages:
    2,401
    Likes Received:
    1,845
    Location:
    France
    With the patents about amd RT, do we already know if this method can do things that Turing can't, or vice-versa ?
     
  16. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    967
    Likes Received:
    1,223
    Location:
    55°38′33″ N, 37°28′37″ E
    I'm lost in model numbers. They say that Big Navi is just barely faster than RTX 2080 Ti, and that it should target RTX 3080 and not RTX 3090. How that's different from previous assumptions of ~2x the number of CUs in 5700 XT, and how do they know the performance of RTX 3000-series?
     
  17. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    They're throwing shit at the wall and seeing what sticks.
    Whichever sticks will win and let the specific e-beggar claim the fame.
     
  18. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,236
    Likes Received:
    4,259
    Location:
    Guess...
    Yes I totally agree the best place for a hardware decompression block (or at least the most likely place) if the PC were to get one is on the GPU. I'd edge more towards it being a shader based implementation though if it were to exist at all. But if we do get that then that's most of the problem solved as far as I can see. The buses in a normal setup are still plenty fast enough to match or exceed even PS5 throughput if a similar level of compression is being used and the software IO overheads are reduced compared with today through for example unbuffered data transfers, GPUDirect Storage style DMA or whatever magic DirectStorage brings to the table.

    That said, as long as it could be used like a regular SSD too, then an SSD directly connected to the GPU could have some big advantages, especially if it used 8 lanes. That would both allow it to double the speed of regular setups (assuming any drives supported it) and place it perfectly for a GPU based decompression solution to decompress all data coming off the disk rather than just that intended for the GPU itself. It's also bypass any technical limitations in DMA'ing data directly from the SSD to the GPU.

    I'm not sure how it would compare performance wise but I was more looking at it as a way to offload the CPU. PC GPU's will have plenty of horsepower to spare next gen in relation to consoles but it'll be a long time before CPU's can spare the equivalent of 5x Zen2 cores just to spend on decompression.

    Agreed. I suggest this as a way to sidestep the decompression bottleneck rather than speed up IO. Raw bandwidth isn't really an issue given that 7GB/s drives ar eincomping which even without compression are well into next gen console territory (little ahead of XSX, little behind PS5).

    I'm not sure loading times would have to be especially long. You wouldn't need to fill up your main RAM with cached data just to start the game, that could stream in in the background. Lets assume you can start the game with "only" 8GB in VRAM. On a 7GB/s drive that's going to be ready in under 2 seconds. Even if you pre-cache another 120GB DRAM in the background that'd take less that 20 seconds of in game streaming and then you'd likely have the entire game cached in RAM!
     
  19. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    Rogame, who has good track record so far, claims it's 80 CUs for Big Navi



    Edit: komachi ensaka, also with good record, says the same
     
    chris1515 likes this.
  20. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    It is valid.
    Notice the smaller number of RBEs per SE.
    :^)
     
    Lightman likes this.
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...