Leaked Intel Nehalem performance projections over AMD Shanghai

Discussion in 'PC Industry' started by Bludd, Feb 25, 2008.

  1. Farhan

    Newcomer

    Joined:
    May 19, 2005
    Messages:
    152
    Likes Received:
    13
    Location:
    in the shade
    I wouldn't be surprised if Nehalem's L2s are around 256KB-512KB. They are probably optimized for bandwidth/latency, given the beefy cores they have to feed. The large L3 will provide coverage.
     
  2. suryad

    Veteran

    Joined:
    Aug 20, 2004
    Messages:
    2,479
    Likes Received:
    16
    Is it just me or are these cores just getting out of hand? I mean i love technology as much as the next guy but I dont know what the average joe would be doing with quad cores or octacore systems! Also software has not caught up with the hardware evolution. Its not that I am anti technology...but it would be good to see software performance and parallelization etc etc and better programming languages and better compiles...the whole shebang. I know all tasks cant be parallelized. I wonder how much imrpovement these cores will bring in terms of single threaded performance.
     
  3. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    22,146
    Likes Received:
    8,533
    Location:
    ಠ_ಠ
    Perhaps for a consumer, but the topic at hand is dealing with servers. ;)
     
  4. Bludd

    Bludd Experiencing A Significant Gravitas Shortfall
    Veteran

    Joined:
    Oct 26, 2003
    Messages:
    3,794
    Likes Received:
    1,479
    Location:
    Funny, It Worked Last Time...
    If the new micro architecture in Nehalem is to Core 2 what Core 2 was to Pentium 4, well then... :drool:
     
  5. ShaidarHaran

    ShaidarHaran hardware monkey
    Veteran

    Joined:
    Mar 31, 2007
    Messages:
    4,027
    Likes Received:
    90
    Only in a multi-thread environment, unfortunately. Single-thread performance is said to be on the order of 10-25% faster than Penryn at the same clock speed.
     
  6. Bludd

    Bludd Experiencing A Significant Gravitas Shortfall
    Veteran

    Joined:
    Oct 26, 2003
    Messages:
    3,794
    Likes Received:
    1,479
    Location:
    Funny, It Worked Last Time...
    ISVs have to step up to the plate and work hard to get a good multi-thread baseline framework in place.

    Man, I am really excited about Nehalem. It will be good to see the FSB finally buried.
     
  7. ShaidarHaran

    ShaidarHaran hardware monkey
    Veteran

    Joined:
    Mar 31, 2007
    Messages:
    4,027
    Likes Received:
    90
    True, but not all code is more than trivially parallelizable, and some not at all. Plus it's no easy task.

    I also look forward to Nehalem, but am confident my dual-core Penryn will keep me happy until then.
     
  8. Bludd

    Bludd Experiencing A Significant Gravitas Shortfall
    Veteran

    Joined:
    Oct 26, 2003
    Messages:
    3,794
    Likes Received:
    1,479
    Location:
    Funny, It Worked Last Time...
    Yes, but it doesn't mean people shouldn't work hard at solving the problems. There are people who try solving NP-complete problems too. :D
     
  9. ShaidarHaran

    ShaidarHaran hardware monkey
    Veteran

    Joined:
    Mar 31, 2007
    Messages:
    4,027
    Likes Received:
    90
    I didn't mean to imply that the problems that face devs should not attempt to be solved, simply that they are not easily solvable and thus we should not expect solutions overnight.
     
  10. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    There are now shots of wafers with Shanghai cpus on them.

    Others have eyeballed the count, and my glancing at it seems to point to the following rough number of complete chip widths/lengths along the longest and widest parts of the grid.

    Shanghai on a 300 mm wafer: 20x15
    Nehalem on a 300 mm wafer: 20x14

    I'd like some folks who are more dilligent to double check my skimming, but it seems to indicate that the chips are closer than the 20-30% disparity brought up earlier.
    From a cadidate die per wafer perspective, the advantage Shanghai has over Nehalem doesn't seem to match that gap.
     
  11. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    http://chip-architect.com/news/Shanghai_Nehalem.jpg

    I'm not sure about the memory channel count labeled on the Nehalem, but I'm not sure which variant this is supposed to be.

    I don't think Hans DeVries would be too far off on the numbers I'm interested in:

    The die sizes are almost the same, not the 20-30% difference being rumored.

    Maybe there is some margin of error over which core variant is involved, but the transistor count on the Nehalem seems high enough to be right.

    It's notable just how poor AMD's L3 cache density is, particularly compared to Intel's.

    The L3 is actually the same density as the L2 for AMD, kind of negating a significant part of the reason to have it.
     
  12. The_Wolf_Who_Cried_Boy

    Newcomer

    Joined:
    Feb 18, 2005
    Messages:
    172
    Likes Received:
    9
    Location:
    Floating face down in the stagnant pond of life.
    Not that I'm remotely an expert in such matters but just guessing it's a sweet spot for performance/power consumption/leakage (?) If I'm remembering correctly IBM have demonstrated much higher SRAM cache densities on their POWER series on a given process node relative to AMD so it's obviously not an innate deficiency for SOI.

    In regards to Nehalem's L3, isn't density optimised SRAM as a general rule at the slower and leakier end of the spectrum for characteristics?
     
  13. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    It might be the sweet spot for power/yield for AMD.

    For an SRAM to operate reliably, whole swaths of cells along the bit lines must all function correctly under a wide range of conditions.

    Manufacturing variation becomes an issue, even with redundancy.

    Yields for chips with a lot of SRAM can be influenced by the voltage the SRAM runs at and the size of the SRAM cells as well.

    Higher voltage means higher tolerance for variation amongst the components, and larger cells are more resistant to variation because they simply have more bulk to tolerate it.

    Higher voltage means higher power consumption that AMD doesn't want, and larger cells means poorer density.

    IBM's SRAM is for chips that are meant for high end servers, where the volumes need not be all that high. IBM can charge an arm and a leg per chip and then charge for system services. It can toss a lot more chips and tolerate way higher TDPs than AMD can allow.

    That means IBM in a much better position to tolerate less manufacturability and higher power draw than AMD is.

    Realworldtech has an article on Barcelona that also indicates that signal integrity forced a design compromise on the cache cells.

    http://www.realworldtech.com/page.cfm?ArticleID=RWT051607033728&p=8

    Significantly, it is apparently the case that SOI actually did hurt AMD's cache density, at least for Barcelona.

    What is telling is that Barcelona also used the same cache cells for the L2 and L3, which means similar compromises may have been made for Shanghai as well, since it too has the same density for both caches.
     
  14. The_Wolf_Who_Cried_Boy

    Newcomer

    Joined:
    Feb 18, 2005
    Messages:
    172
    Likes Received:
    9
    Location:
    Floating face down in the stagnant pond of life.
    As a complete mathmatic illiterate I take a 5 sigma variance to mean a tolerance of a fifth of one standard deviation?

    Also curious as to why the current northbridge is so clock limited if the cache cells are identical to the L2.
     
  15. MTd2

    Newcomer

    Joined:
    May 13, 2004
    Messages:
    212
    Likes Received:
    0
    Hmm, are you sure it's lower? On close inspection you can see larger "road", like avenues, on Shanghai, dividing blocks of higher denside. Whereas on Nehalem is everything packed. If this thing works like the road traffic, Shanghai should have less traffic gems, less bottlenecks.

    Sigma means standard deviation:

    http://en.wikipedia.org/wiki/Standard_deviation

    The confidence intervals are as follows:
    σ 68.26894921371%
    2σ 95.44997361036%
    3σ 99.73002039367%
    4σ 99.99366575163%
    5σ 99.99994266969%
    6σ 99.99999980268%
    7σ 99.99999999974%

    That means 1 every 5 million elements, transistors I think, are incorrectly printed, on average.
     
  16. ShaidarHaran

    ShaidarHaran hardware monkey
    Veteran

    Joined:
    Mar 31, 2007
    Messages:
    4,027
    Likes Received:
    90
    Erm, I think it's a pretty fair assumption that Intel not only has better density than AMD, but fewer defects per given metric. That's always been the case at every given process node. Intel defines MPU manufacturing. Everyone else just tries to keep up (and fails).
     
  17. Farhan

    Newcomer

    Joined:
    May 19, 2005
    Messages:
    152
    Likes Received:
    13
    Location:
    in the shade
    I think it refers to the read error rate, not directly physical defects.

    The single ended read instead of a small swing read probably means that they are not using differential bitlines and sense amps to amplify the small signal between them. So basically the cells have to be larger and/or the bitlines have to be shorter because the cells have to drive these (long) wires all the way down or close to a logical 0 which means there is more I/O overhead because the cell arrays can't be very large (and that's probably why you see more gaps between the cell blocks in the Barcelona/Shanghai). In a differential read cell there is a sense amplifier which senses a small voltage difference between a bit and _bit line for each cell and amplifies that signal. The cell only has to swing the long bitline a small amount, usually around 100-200mV. So that means smaller cells and/or longer bitlines (larger cell arrays) and probably lower power (for reads).
     
    #37 Farhan, Mar 8, 2008
    Last edited by a moderator: Mar 8, 2008
  18. Farhan

    Newcomer

    Joined:
    May 19, 2005
    Messages:
    152
    Likes Received:
    13
    Location:
    in the shade
    Looks like there will probably be 2MB L3 versions of Shanghai, from the L3 layout.
     
  19. MTd2

    Newcomer

    Joined:
    May 13, 2004
    Messages:
    212
    Likes Received:
    0
    Yes! I just saw the sigma error and thought about the imprinting errors! I was very lazy :)

    What you said makes sense.
     
  20. crystall

    Newcomer

    Joined:
    Jul 15, 2004
    Messages:
    149
    Likes Received:
    1
    Location:
    Amsterdam
    It is likely that just as with Barcelona the L3 reuses the same cells designed for the L2. This is certainly suboptimal as L3 cells can usually be optimized for size instead of speed as L3 read latency will be usually dominated by the interconnect latency, not cell performance. However given the cost and time constraint AMD currently has that sounds like a good decision. It is also possible that AMD will respin the processor later with a 'better' L3 once the 45nm process matures and its designers are more familiar with it.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...