AMD announces new GPGPU card, hints at RV670 specs

Discussion in 'GPGPU Technology & Programming' started by Dave Baumann, Nov 8, 2007.

  1. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,641
    Location:
    London
    I thought about MXM for rack-mounting but then decided that "embedded" isn't the same concept as rack-mounting.

    Embedded is something you deliver in a package - it might be a beefy laptop as you argue or it might be "mini-compute" type thing that operates as a desktop/deskside unit. Or something like a medical visualisation unit?

    Jawed
     
  2. Tim Murray

    Tim Murray the Windom Earle of mobile SOCs
    Veteran

    Joined:
    May 25, 2003
    Messages:
    3,278
    Location:
    Mountain View, CA
    Deskside, you might as well use standard PCIe things because there aren't really restrictions on the form-factor. I think MXM will be for racks, probably not for laptops (I can't imagine that AMD will decide to restrict Brook+/CAL/CTM to FireStream at this point, and double precision for 99.9999999% of laptops seems ridciculous) because consumer RV670s will work there, and then they'll probably just put the FireStream cards in a deskside unit, just like the Tesla deskside thing (probably cheaper and simpler to use external PCIe and standard cards than anything crazy).

    Unless you want to talk about embedded for, say, cars, robots, things like that. That would be intriguing.
     
  3. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,641
    Location:
    London
    I gave the example of medical imaging, because that's exactly the kind of device you can buy today as a stand-alone box.

    Jawed
     
  4. Tim Murray

    Tim Murray the Windom Earle of mobile SOCs
    Veteran

    Joined:
    May 25, 2003
    Messages:
    3,278
    Location:
    Mountain View, CA
    Er, I severely doubt that a standalone medical imaging device that uses the 9170 would be ready (from a software standpoint) within the next year or so.
     
  5. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,726
    What's the point in ECC? I doubt the difference in soft errors would be even close to a single order of magnitude (that GPU is a huge target, far more area reserved for actual computation than in the normal systems with ECC). If you can't deal with them ECC just provides an illusion of reliability.
     
  6. 3vi1

    Newcomer

    Joined:
    Jan 25, 2007
    Messages:
    22
    AMD is not leveraging their product correctly.. Thx for th link Jawed..

    In that entire PDF I did not once see "Financial Calculations" leveraging the DP aspect of their new stream computing line.. Or did I miss something? They need to go after the derivative market like Nvidia because that's where the money is..

    I swear when ATI sold AMD their company at 7 billion they snatched the balls off that company right along with it..


    Ok.. as far as MXM goes, couldn't this aspect of their presentation be used for biomedical appliations like imaging and visual systems ala Global Hawk or surveillance?




    I also imagine that stream computing will be a boon for voice recognition systems and other streaming applications coming to PCs. The market is moving more twoards a mobile application anyhow, people are abandoning home PCs in favor of mobility - makes sense to me.

    A day late and a dollar short = AMD..


    PS: That PDF had some of the slopiest image work I have ever seen.. And I am not at all surprised it came from AMD considering the state of affairs.
     
  7. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,641
    Location:
    London
    Chicken and egg. To me this is merely a question of form-factor - MXM is a compact alternative to PCI Express connectivity - even if it's cabled PEG that we've seen demonstrated earlier in the year.

    As for the readiness of the software, well it's quite possible that a company already has the software running - there's no need to wait for 9170 to do the R&D.

    Jawed
     
  8. 3dilettante

    Legend

    Joined:
    Sep 15, 2003
    Messages:
    6,748
    Location:
    Well within 3d
    The ECC is for the 2 GiB of RAM.

    IBM's rule of thumb is 1 bit error per month per GiB of memory.

    One such card is going to have 2 errors a month.
    Assuming we try to pack these babies into a compute node, it's 2-4 cards a node.
    At 8 errors, a month, that is more than weekly.

    A large system might have hundreds to thousands of nodes.

    At a hundred nodes, the sytem is going to be hitting a silent data error in video RAM every hour.
    At a thousand, the system is going to hit an error every five minutes, or would if anyone had enough faith in GPGPU to put together a system at that scale.

    As for ECC on the GPU itself, I've not heard of any such measures for GPUs.
    CPU and other processors have been doing that since 90nm to avoid the rising error rates inherent to smaller geometries.
    Whether that is important depends on the error rates no GPU designer has disclosed.
     
  9. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,726
    You can't determine the value of ECC without making a guess at it.

    PS. I don't think ECC will usually help much for logic errors.
     
  10. 3dilettante

    Legend

    Joined:
    Sep 15, 2003
    Messages:
    6,748
    Location:
    Well within 3d
    ECC is usually used on CPU caches and parity is used on the register files.

    SRAM has a higher error rate than DRAM and that CPUs need such features to keep error rates the same as process features shrink.

    CPUs have had a much higher burden placed on them, since they also manage the system.

    Whether GPUs need such measures, given their increasing use of cache and massive register files, is something their designers must evaluate when they push their products into new fields.
     
  11. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,726
    Cosmic rays can flip a bit in a latch or on a gate just like they can a SRAM cell ... what makes ECC so effective for DRAM/SRAM in most systems is the simple fact that RAM and caches make up so much of the area. GPUs are special.
     
  12. 3dilettante

    Legend

    Joined:
    Sep 15, 2003
    Messages:
    6,748
    Location:
    Well within 3d
    Error detection in memory is also important because the initial event can persist.

    A cosmic ray hitting a latch in the divider unit won't matter unless there is a divide instruction that just happens to be going through that precise layer of logic at that exact time in the clock cycle.

    A bit flip to a memory cell does not end with the next clock cycle, and any time is a good time for a bit flip to wreak havoc.
    That's why register files have parity, even though the registers themselves are actually rather small in relation to logic.
    Error detection on memory is relatively cheap, compared to logic with built-in error checking.

    Error detection and correction for logic is an active area of research, however.
    The future geometries are expected to reduce reliability to the point that it will no longer be safe to assume any given unit will function correctly to the standards placed on it today.
     
  13. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,517
    Location:
    British Columbia, Canada
    Sure it might be nice to have ECC, but keep it in perspective: for the price/performance ratio of GPUs and even AMD's new card, it's more cost effective to just buy TWO, perform computation completely redundantly and compare the results. This method is also resilient to errors in the chip logic. With a sufficiently abstract computation platform you could even buy one NVIDIA part and one AMD part to get some protection from errors in the hardware/compilers/drivers ;)

    I'm not saying that ECC isn't useful, but it's definitely not a show-stopper. It would matter for comparison if - say - NVIDIA supported it and AMD didn't but right now no GPGPU cards do. Thus the worst you can knock the cards is doubling their effective price/performance vs. platforms with ECC, and even if you do that they're still worth it by a long shot :)
     
  14. ShaidarHaran

    ShaidarHaran hardware monkey
    Veteran

    Joined:
    Mar 31, 2007
    Messages:
    3,941
    Excellent point. Reminds me of a story...

    Once upon a time I worked for a retail PC outlet. Since it was a big-box store, margins on PCs were extremely slim. Of course, management was always pressuring us to sell "the extended warranty". Since "warranty" is a dirty word in most customer's vocabularies, one of my co-workers would do whatever it took to sell more than just a PC to every customer, even if he couldn't sell them on the warranty. Seeing as how PCs are commodities nowadays, pushing anything more than just the computer out the door got to be quite difficult.
    Anyway, one time a particularly "hard-sell" customer who only wanted to buy a cheapo $399 PC and "didn't see the value in purchasing a warranty" was sold a second system as a spare in case the first one broke. I could never sell like that, was too busy reading Ace's & RWT & the like :p
     
  15. 3dilettante

    Legend

    Joined:
    Sep 15, 2003
    Messages:
    6,748
    Location:
    Well within 3d
    Even if ECC RAM costs more than double standard RAM, I don't think that's true.
    I've seen 1GB DDR2 priced at $40-50 non-ECC non-registered, and ECC registered between $70-80.
    Non-registered ECC 1GB DDR2 RAM exists and I've seen it priced at around $50.

    ECC or some kind of correction on video RAM is likely to catch over 90% of memory errors.
    From a large system standpoint, knowing that one card amongst 4000 cards has memory that is failing is much easier to catch if ECC errors keep popping up.

    Your method is twice the price, twice the power consumption, and half the compute density at a system level, at the same or lesser performance.

    If the system was designed to max out at 4 cards per motherboard, it's doubled the number of PCI-Ex slots needed, and likely every other system component besides the hard disks.

    My computer cluster burns 1 MWatt and catches +90% of errors.
    Yours burns 2 MWatts, catches a few percent more errors, takes up twice the floor space, and is slower.

    For large systems, it likely is.
    And I'm sure there are some workloads that would really like the throughput with a modicum of data checking.
     
    #35 3dilettante, Nov 9, 2007
    Last edited by a moderator: Nov 9, 2007
  16. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,517
    Location:
    British Columbia, Canada
    I think you're missing my point... it's not that GPU+ECC == 2*GPU as that clearly isn't the case. My point is that even without ECC GPUs still provide order-of-magnitude better price/performance *and* power/performance than - say - a CPU cluster (for many tasks). Thus the lack of ECC compared to CPU clusters is not critical, as you can afford to introduce high-level redundancy into the system and still be laughing yourself to the bank.

    What are you comparing this too? A mythical GPU with ECC? If that existed, I would agree that it would probably be worth looking at. Since it doesn't, however, the comparison is a bit moot.
     
  17. 3dilettante

    Legend

    Joined:
    Sep 15, 2003
    Messages:
    6,748
    Location:
    Well within 3d
    I misread your statement as a comparison between a GPU solution with ECC against a GPU solution without ECC.
    I did not interpret it as a comparison between a CPU-only system versus a CPU+GPU system.

    I'd agree with the price-performance in single-precision throughput, assuming the workload doesn't have very long runtimes and can tolerate error.

    DP may be a spoiler: a quad-core top bin Harpertown is expected to reach about ~50 Gflops when it comes out.
    Dual socket puts it at 100 Gflops DP.
    At 500 GFLOPS SP, 1/4 puts a single stream processor board at 125 GFLOPS.
    That's not quite an order of magnitude.

    In terms of power consumption, Harpertown is rated at 120W per chip.

    A single stream processor has a TDP 150W.
    Since we're doubling hardware, it's 300 Watts for 125 GFLOPS.

    GPGPU= 125 GFLOPS DP at 300W
    CPU dual socket (near future)= 100 GFLOPS DP at 240W

    Granted, Intel's TDP doesn't match AMD's definition for its CPUs. I don't know how AMD measures it for GPUs.

    In that comparison, the GFLOPS/W is actually the same, though I'll give the edge to the GPU since it might fall short of 150 W each processor.

    Of course, Harpertown is likely to sell for over a $1.5k per processor.
    So pricewise, 2 Harpertowns alone would be $3k.
    AMD is competitively priced at $1,999 per board and we now need two of them.
    It's $3,998 for 125 GFLOPS.

    Considering that a powerful processor and the accompanying platform must be bought to support and direct the cards, a significant portion of the CPU-only machine's cost must go towards the cost of the GPU system.

    I figure these considerations are not as important for very small projects, say one or two machines running tasks that don't take too long to run.

    It rules out doubling GPGPUs, at least for big systems with long run times that need DP.

    Single-precision should be better.

    edit:

    On further reflection
    Memory bandwidth would be an edge for the GPU, if it's around 80 GB/s like it is for the consumer cards versus Intel's 20 GB/s.

    Still not quite an order of magnitude, but still workable for small systems.
     
    #37 3dilettante, Nov 9, 2007
    Last edited by a moderator: Nov 9, 2007
  18. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,000
    Location:
    O Canada!
    You're making an assumption that performance of DP ops are 25% single precision.
     
  19. 3dilettante

    Legend

    Joined:
    Sep 15, 2003
    Messages:
    6,748
    Location:
    Well within 3d
    That is true, 1/4 was only mentioned somewhere in this thread, not confirmed by AMD.

    The press release found it necessary to make a footnote that the product's GFLOPS rating was for SP throughput.

    If DP were 1/2, my math would be off by a factor of two and two Firestream cards at $3,998 dollars running the exact same work unit for the sake of redundancy would provide 250 GFLOPS DP.

    Not an order of magnitude improvment over the upcoming Harpertown, but better.

    I'm assuming that DP throughput is not the same as SP, otherwise AMD would not have added that sneaky little footnote. I'm also assuming that it's not greater than SP, since then AMD would have a higher GFLOPS rating in their PDF.

    The cost of a the surrounding sytem capable of running two such cards in tandem or just having twice the nodes just for the sake of redundancy, however, might still eat away at the cost advantages.
     
  20. Tim Murray

    Tim Murray the Windom Earle of mobile SOCs
    Veteran

    Joined:
    May 25, 2003
    Messages:
    3,278
    Location:
    Mountain View, CA
    If they're not, I'll eat two hats, and I'll make videos to sell on the Internet.
     

Share This Page

Loading...