Nvidia Pascal Speculation Thread

Discussion in 'Architecture and Products' started by DSC, Mar 25, 2014.

Tags:
Thread Status:
Not open for further replies.
  1. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,382
    DevKits are created to develop applications on top of a chip. Simulations are used to verify the core features of a chip. They have nothing to do with each other and the target a completely different audience. If you can abstract the core new features behind an API and package it as a (large) performance improvement of the same thing, then a devkit with a previous chip is totally fine. You unblock software developers who will need years before they have something production ready. You unblock the mechanical developers who are developing today the car that will be on the road 5 years from now.

    It's not that a company like Nvidia couldn't afford to make another chip, it's that it would provide no benefit to them. Not the benefits that you imagine there to be. It's that it would allocate core development resources that could be spent on something else more effectively. Icera was needed to break into the mobile phone market. A strategic investment for future growth that was later abandoned. Shit happens. Shield is a similar strategic investment. It remains to be seen whether it will pay off at some point, but it's a product that's supposed to sell in volume.

    A 28nm Pascal would be a tactical move that's part of a strategic push into automobile computing, but with a high cost and low value and no volume.
     
  2. Benetanegia

    Regular

    Joined:
    Sep 4, 2015
    Messages:
    394
    Likes Received:
    425
    Which is why you would want them to start as soon as posible.

    I don't think nV are the only ones offering software. And it's still a full year away from being released anyway, when with the first Drive PX only a quarter difference between announcement and actual release. A lot of things can happen in a year, especially when you don't know where your competitors are. Lack of announcements doesn't mean nV doesn't have any competition out there waiting to be unleashed.

    Speaking of the compute market, I don't think nV hesitated one bit spending $30M and then some in order to ensure they'd snatch that market. In fact, I remember reading over $1B figures for the develpment of Fermi, which iirc was said to be nearly twice as much as previous generations. And that on top of making the architecture in such a way that it was liable to jeopardizing their conpetitiveness in the gaming market. It was a huge risk, a huge bet that nV didn't bat an eye calling. So I fail to see a $30M investment as an excess, considering the potential returns. It's obvious we have a different view on this, and I don't think we are going to agree, so it's probably better to leave it at that.
     
  3. Benetanegia

    Regular

    Joined:
    Sep 4, 2015
    Messages:
    394
    Likes Received:
    425
    And what if you can't? Can you honestly say that is always posible?

    I understand that it's not necessary, but I'm still having a hard time believing that it wouldn't help at all. And as I said, it's not only about offering that minor help, it shows commitment. I return to the Phoenix platform, for that purpose. It was clear Nvidia wouldn't take the phone market by storm that late in the game no matter what. The phone market is huge, but nV's posibitites with or without the Phoenix platform weren't very different. But the platform helped bringing T4i to the market and it was an endeavor nV deemd worth taking. Even a 3-6 month difference in the automotive market right now could mark the difference between being just one small player holding a 10% of it or being the dominant market. The "pot" in this case is much much bigger and hence so should the bets you should be calling.

    Anyway, and remember we're speaking hypothetically, what do you think of the posibility of actually releasing a 28nm Pascal chip early this year? There was about a 1 month window for 750 Ti, between its existence being known and its launch.
     
  4. Benetanegia

    Regular

    Joined:
    Sep 4, 2015
    Messages:
    394
    Likes Received:
    425
    I don't see how this is so different, except it's the other way around and that the potential reward is much higher this time around, because nV's opportunities in the phone market were already almost nil by the time. Never sold platform to help sell silicon, versus never sold silicon to sell platform. Silicon is more expensive, but the potential reward is also much higher. And as you say, maybe the silicon could be sold anyway.
     
  5. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,382
    You're welcome to discuss general hypotheticals somewhere else, but for this particular case, there is a very high likelihood that you can.

    Nobody is expecting Nvidia to release some ground breaking new deep learning invention. Just some architectural improvements that will improve speed on some workloads.
     
  6. Benetanegia

    Regular

    Joined:
    Sep 4, 2015
    Messages:
    394
    Likes Received:
    425
    No one is speaking about revolutionary, but 3 ops out of something that's only supposed to be able to execute 2xFP16 or 2xINT8? How do you figure that is performed? And could it be satisfactorily emulated in a way that performance on the final product would be predictable for all or most use cases?

    For instance, and it's unrelated, but can you really emulate FMAD instructions in silicon that is only capable of MUL and ADD, in a way that would be completely useful for the programmer? It's my understanding that the simulation would consume significantly more bandwidth, aside from requiring twice as many cycles to execute. And I don't think it would be in a totally predictable way in all or most cases.
     
  7. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,382
    It. Doesn't. Matter. It's still just a performance optimization.

    And if it's emulated, it doesn't even have to be bit compatible, because, for deep learning, it doesn't matter. That's why you can use FP16 in the first place.
     
  8. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    If you drop back to a design that has been shoehorned into 28nm rather than 14/16nm, we're likely already talking about a factor of 2.x performance regression, and if there are particular low-level differences that crop up between different implementations of an architecture at the same node--much less wholly different ones, there's no guarantee that the whole range of outcomes is going to be applicable.

    As far as development ahead of the hardware, this has worked for the consoles for a number of generations despite some rather glaring architectural differences.
    If this is merely a development seeding project, there's a raft of non-process related workarounds that can give the necessary compensation for missing hardware.
    Can't get a native INT8 instruction and lose by a factor of 4? This isn't a product, so just brute-force it with extra hardware in a tower case. The code doesn't care.

    Turing-complete is what it is, so generally yes. And if the use is for validation, what more does the programmer need?
    There are types of validation that need very low-level details, but this is running into an area where a separate physical implementation at 28nm runs the risk of not carrying forward to the target node.
     
  9. Benetanegia

    Regular

    Joined:
    Sep 4, 2015
    Messages:
    394
    Likes Received:
    425
  10. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Thanks for doing the work! Good news for G5X then as it might actually be used. :) It's only board-level design then that needs to be redone.
     
  11. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Thanks silent_guy, good to know! :)

    So it's QDR... Does anyone know the burst length required to get optimal efficiency for GDDR5 vs 5X then - is it 64B vs 128B? or 256B?!
     
  12. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,382
    In QDR, the prefetch size is 512 bits and the burst length is 16. GDDR5X allows for seem less switching between QDR and DDR, but I don't know if that's useful in practice.
    An interesting new feature are pseudo channels where you can select 2 columns of 256 bits each from the same bank/row combo instead of a monolithic 512 bits. In DDR that becomes 2x 128bits.
     
    pharma, Razor1 and homerdog like this.
  13. nnunn

    Newcomer

    Joined:
    Nov 27, 2014
    Messages:
    40
    Likes Received:
    31
    Just wondering, wasn't the "Maxwell-we-got" really the "Paxwell-we-had-to-have" because 20nm got scrapped? Hence juicy stuff getting stripped out, leaving a SP gamer tweak instead of HPC innovation?

    Meanwhile there must have been some full-fat contracts in place for automotive/robotics... and robust discussions. Maybe some branch project? How much might the first self-driving robo-cop be worth?
     
  14. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    I have seen some discussion of Maxwell being a result of doing an additional architectural shift at 28nm with planned 20nm ideas feeding into it.
    That situation is still gated by what Nvidia decided was worth manufacturing, which in this case also is what Nvidia thought was worth selling. This seems to follow given the cost of deciding the former.

    The scenario I asked that question about has the "good enough to build" question already answered, which would then raise the question why the usually strong link between building and selling wouldn't apply.

    That doesn't mean that there have not been projects that do manufacture chips that are not productized. Sometimes, it's a research project with limited presence outside of the project itself. There are examples where it happens for other reasons, but usually because problems or outside factors force the abandonment of making it a product rather than going through that much of the process with no intent to sell it.
     
  15. RecessionCone

    Regular Subscriber

    Joined:
    Feb 27, 2010
    Messages:
    505
    Likes Received:
    189
    Occam's Razor says you're probably just misinterpreting PowerPoint slides. Less likely that magic is happening.
     
  16. Benetanegia

    Regular

    Joined:
    Sep 4, 2015
    Messages:
    394
    Likes Received:
    425
    I didn't misinterpret anything myself.

    http://www.anandtech.com/show/9903/nvidia-announces-drive-px-2-pascal-power-for-selfdriving-cars

    "Curiously, NVIDIA also used the event to introduce a new unit of measurement – the Deep Learning Tera-Op, or DL TOPS – which at 24 is an unusual 3x higher than PX 2’s FP32 performance. Based on everything disclosed by NVIDIA about Pascal so far, we don’t have any reason to believe FP16 performance is more than 2x Pascal’s FP32 performance. So where the extra performance comes from is a mystery at the moment. NVIDIA quoted this and not FP16 FLOPS, so it may include a special case operation (ala the Fused Multiply-Add), or even including the performance of the Denver CPU cores."
     
  17. Benetanegia

    Regular

    Joined:
    Sep 4, 2015
    Messages:
    394
    Likes Received:
    425
    Following the Source link from the Anandtech article, Nvidia says this:

    If you have a theory* about what "specialized instruction" means, I'd really like to hear it.

    * If you actually know what it means, I'd appreciate it even more.
     
  18. dnavas

    Regular

    Joined:
    Apr 12, 2004
    Messages:
    375
    Likes Received:
    7
    I thought that 24 DLOPS was just AVG(16 16-bit ops, 32 8-bit ops) or some nonsense like that?
     
  19. Pixel

    Veteran

    Joined:
    Sep 16, 2013
    Messages:
    1,008
    Likes Received:
    477
    Will gamers even at 4K really see any noticeable improvement in performance or visuals of games using HBM2 over the upcoming GDDR5X?
    Can Nvidia implement perhaps forms of AA if developers don't take advantage of this gargantuan amount of memory bandwidth?

    We already are in a console era where big publisher games are designed around the limited memory bandwidth of consoles.

    Assuming there is a wide memory interface in the GP104 and GDDR5X is used like the shipping manifest suggested, plus color compression, shouldn't that be more than enough?
    http://www.tweaktown.com/news/49578...l-gp104-gpu-spotted-feature-gddr5x/index.html

    http://www.techpowerup.com/forums/threads/article-just-how-important-is-gpu-memory-bandwidth.209053/
    [​IMG]
    [​IMG]
    [​IMG]
     
    #679 Pixel, Jan 16, 2016
    Last edited: Jan 16, 2016
  20. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,382
    Voxilla already posted a link to an EETimes article that says that it can do 8 bit operations. Isn't that specialized enough?
    Google "deep neural net 8 bit" and you find plenty of article claiming that 8 bit is enough for deep learning. What more do you want to hear?
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...