Nvidia Pascal Announcement

Discussion in 'Architecture and Products' started by huebie, Apr 5, 2016.

Tags:
  1. Berek

    Regular

    Joined:
    Oct 17, 2004
    Messages:
    271
    Likes Received:
    4
    Location:
    Houston, TX
    Did anyone catch which DisplayPort version they are using? I assume at least 1.3, but 1.4 just came out and could be integrated as well, though it's perhaps a bit early yet for 1.4.
     
  2. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
    Nah they didn't mention any of that.
     
  3. Ext3h

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    337
    Likes Received:
    294
    I don't think the GP100, respectively the boards, even HAVE a DisplayPort, or any display adapter at all.
     
  4. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    I'm having trouble following your train of thoughts as well.

    To make things clear:
    - I’m talking about the reason why Geforce GTX Titan is not allowed to boost in DP-mode and has a lower baseclock there as well
    - I’m arguing, that this is merely a precautionary measure than actually a problem with power consumption in DP mode
    - I reason that this is because the chip has to feed only 1/3rd the number of ALUs compared to SP-Mode on twice as much/wide/many data paths
     
  5. Ext3h

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    337
    Likes Received:
    294
    Might as well be just the delicate timing when feeding the DP ALUs in serial, which, as long as the hardware doesn't have 64bit wide data paths, does require twice as long as regular.
    Or just a path of critical length in the DP ALUs which don't respond well to even slightly exceeding boost speeds. If SP mode was actually just a frequency devider on each DP, that would explain that.
     
  6. dnavas

    Regular

    Joined:
    Apr 12, 2004
    Messages:
    375
    Likes Received:
    7
    Yeah, the reason I asked is that it seemed ... interesting that the number of fp32 units per SM went down, while the number of (functionally) fp16 units per SM remained the same -- the picture would be essentially the same had the SM shown twice the number of fp16 units and called out issue rates as once, half, and quarter, except that that may not be true if what we have are VLIW2 or some other register-layout-driven sharing arrangement. Is the issue width the same for fp16 and fp32? Does one issue a full-width warp VLIW2, a double-sized warp with a single op across both "halves" of each fp32 register, or two full-width warps each using half sized registers?

    And on a different tack, what does it mean that we have a dedicated fp64 unit, but no dedicated fp16 units? Should I look at the delayed Volta and come to the conclusion that Pascal is a dry-run on unified alu architecture, but aimed at the easier fp16/fp32 "split"? [What else is Volta going to bring if not that?]
     
  7. Berek

    Regular

    Joined:
    Oct 17, 2004
    Messages:
    271
    Likes Received:
    4
    Location:
    Houston, TX
    That's a good point, I think you're right. I'm getting ahead of myself here on the products for consumers I suppose. I was excited to see something about that at GTC, but I guess we'll be waiting for Computex. Has to be soon though with the schedule they've been talking about.
     
  8. Benetanegia

    Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    222
    Likes Received:
    136
    But do you see how your posts are conflicting? For example, you insisnt on this line:

    But you've been made abundantly clear, and it's a point that has been made multiple times here on B3D by multiple posters, that Kepler can't actually use all of its ALUs most of the times. And the times in which it can, it's because it can do so without increasing the data being moved around. Hence Kepler's ALU utilization is limited entirely by how much data can be moved around. Since 1/3rd of ALUs are idling most of the time, for all intents and purposes, on DP-mode the chip is feeding half the number of ALUs compared to SP-Mode on twice as much/wide/many data paths. So the amount of work done and data moved is "at the very least" the same in both SP and DP. Now, correct me if I'm wrong but DP ALUs consume a little more than 2x as much as SP ALUs and factor in a little bit of extra memory accesses, due to decreased locality and I don't see how that wouldn't result in slightly higher power consumption...
     
  9. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,489
    Likes Received:
    400
    Location:
    Varna, Bulgaria
    The way Pascal handles FP16 data is probably no different than what Nvidia has implemented in Tegra X1 -- the compiler packs together two independent op's from the same type (add, mul) that are statically scheduled for execution, as a single op by the FP32 ALUs.
     
    dnavas and Jawed like this.
  10. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,430
    Likes Received:
    433
    Location:
    New York
    Good point.
     
  11. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,489
    Likes Received:
    400
    Location:
    Varna, Bulgaria
  12. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
    second link not workin ;)
     
  13. OlegSH

    Regular Newcomer

    Joined:
    Jan 10, 2010
    Messages:
    360
    Likes Received:
    252
    Razor1 likes this.
  14. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
  15. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    No, and that line alone is without the context I already gave multiple time, which you acknowledge first in the part below only to immediately to discard it.

    What's true is that Kepler is not able to feed more than 2/3rds of its SPFP-ALUs in the worst case. You otoh make this the default behaviour („for all intents and purposes ...“) which it is definitely not.
    True, the DPFP ALUs should be (a bit?) more power hungry than the SP-ones. I'll throw another thing in the mix then: Rasterizers, Geometry, Tessellators, ROPs. All mostly unused in what's a typical application with DPFP I'd wager. Plus I am quite positive that Ld/St and SFUs cannot be tasked in parallel with the DPFP ALUs edit: This is probably not true, since it says in the Kepler Whitepaper: "Unlike Fermi, which did not permit double precision instructions to be paired with other instructions, Kepler GK110/210 allows double precision instructions to be paired with other instructions." (just not necessarily all other instructions). I don't know from the top of my head whether or not the INT functions might require larger-than.minimally viable adders and multipliers as well as datapaths from the SPFP-ALUs as well, but I would tend to believe so as well.

    And then there's more: SP-Mode does not only employ all those extra circuitry quite regularly, but it does so while utilizing a higher frequency, requiring a higher voltage, thus in turn upping power consumption even more.

    I frankly admit that I do not have exact power numbers for all this, but the odds are increasingly thin for 1/3rd of the units (albeit being utilized equivalently to more than their number alone suggests), making way less frequent use of other resources in the chip while causing (slightly?) higher data movement, being power limited below the normal operation mode's base frequency.

    Having said this, I'm quite open to other aspects of this discussion, things we have not yet touched. But just repeating the same thing over and over again seems quite pointless.
     
    #176 CarstenS, Apr 6, 2016
    Last edited: Apr 6, 2016
  16. Benetanegia

    Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    222
    Likes Received:
    136
    Still, "file not found" on the architecture one. :-(
    I make it the default behavior because as per your own words, for all intents and purposes, and in regards to power consumption, it is the default behavior. Unless what you said here is false, that is:

    So in worst case only 2/3rds of peak throughput. In best case, full throughtput but with the exact same data movement as in the worst case. Since the grand majority of power consumption comes from moving data around, both cases are similar in regards to power consumption.

    How are Rasterizers, Geometry, Tessellators, ROPs. etc. at all relevant? We are talking about FP32 compute vs FP64 compute here afaik. What are those used for FP32 computing that couldn't be used when computing FP64??

    One of the biggest problems is that you dismiss Nvidia's explanation, while not being able to provide any valid alternative. The closest you came to having an explanation was:

    DP-Mode does probably mean something scientific 99% of times. But the opposite is not true. SP-Mode does not prevent scientific code being run and/or does not ensure in any way a less "constant and first and foremost much longer load on the ALUs instead of the highly variable load of a game or normal application." Whatever normal application means to you here. If the applications being run are the "problem" there's hardly any differentiation between FP32 and FP64, unless for whatever reason FP64 is less efficient.

    I'd also like to know, even in case you were right (or Ext3h or any other alternative), why you think that telling that TDP was the reason, is in any way better than saying the actual reason. Especially with all the PR nightmare Fermi was precisely because of TDP. It makes no sense to me. If for example the reason was what Ext3h said, why wouldn't they simply state so? Even mentioning TDP spells trouble for the general consumers. A more technical reason on the other hand, something the general public doesnt even know about would be a much better explanation. Or even excuse.
     
  17. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,420
    Likes Received:
    179
    Location:
    Chania
    You may excuse the interruption, but Mr. Triolet @hardware.fr seems also certain that there will be something along the line of a "GP102":

    https://translate.googleusercontent...0.html&usg=ALkJrhiJmqpGFprnBDes2q7JStRXfIRhUg

     
  18. ninelven

    Veteran

    Joined:
    Dec 27, 2002
    Messages:
    1,702
    Likes Received:
    117
    Not sure I buy this.... I mean why bother with texturing at all if it is strictly an HPC part.
     
  19. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    I think the non-HBM2 cards are HBM1 prototype units. I suspect that NVidia decided to "pack" the chips like that so that they could apply active cooling according to the spec for HBM2.

    HBM2 modules are taller than HMB1 and the packaging for the GPU die looks designed to take account of that, so that GP100 and HBM2 dies have top surfaces that match up. So I'm guessing that the HBM1 prototypes were packaged up in this peculiar way to make a mechanically more sound mating with the heatsink.

    The wrinkle in this theory is the idea that HBM2 is taller. It may not be if only 4-high HBM2 is used, instead of 8-high.

    What's potentially of greater concern is that there are so few GP100s available that the majority of them shown were non-HBM2 variants. Well, to be honest, if NVidia's talking about this being in full production near the end of the year, this is arguably just a reflection of productisation.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...