HD problems in Xbox 360 and PS3 (Zenji Nishikawa article @ Game Watch)

Discussion in 'Console Technology' started by one, Apr 26, 2006.

  1. Mmmkay

    Regular

    Joined:
    Jul 3, 2005
    Messages:
    627
    Likes Received:
    31
    It doesn't in any way contradict the perfectly valid point that you're making, but I think it's actually 4x the density not 6.

    For our 21m SPE, a third of the die is SRAM weighing in at 14m transistors. So the equivalent total of SRAM transistors for that whole SPE die size would be 14/.33 = 42.42m

    The logic portion, covering 66% eats 7m which would resolve to being 7/.66 = 10.61m

    Therefore the ratio would be 4:1.
     
  2. Carl B

    Carl B Friends call me xbd
    Legend

    Joined:
    Feb 20, 2005
    Messages:
    6,266
    Likes Received:
    63
    I don't think we should take the LS/SPE ratios to be what we're working with when it comes to Xenos though. The eDRAM daughter die is roughly one-third the size of the main die, and roughly under one-half the transistors. A portion of that *is* logic also and not straight memory, but still I think that would be a better place to go from for any deconstruction-based estimates, rather than the Cell's SPEs.
     
  3. Laa-Yosh

    Laa-Yosh I can has custom title?
    Legend Subscriber

    Joined:
    Feb 12, 2002
    Messages:
    9,568
    Likes Received:
    1,455
    Location:
    Budapest, Hungary
    Yeah, to me it looks like they have a very low iteration on the fractals and the resulting puffy stuff really wouldn't do as realistic smoke...
     
  4. TurnDragoZeroV2G

    Regular

    Joined:
    Nov 14, 2005
    Messages:
    583
    Likes Received:
    23
    Location:
    Who knows...
    I'd agree with that. I'd imagine there'd be some significant differences in density between eDRAM and 6T SRAM. Not that I have anything to go off at all regarding that.

    Not relevant, but on a side note, any info on whether that rumor that ATI doesn't count non-logic transistors most of the time (i.e., texture/vertex cache and register file for X1K and/or Xenos) for main dies (dice? :grin:) has any merit at all?

    Or Jawed, whether that's ~24K or ~73K registers in Xenos?

    Bah, ignore me, I'm just trying to derail the thread. :wink:
     
  5. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    http://www-03.ibm.com/chips/photolibrary/photo10.nsf/WebViewNumber/ED994790FAECFD6900256FEA0062126B

    I make the RAM 1/4 of an SPE, not 1/3.

    As to the density/count on the EDRAM - there's something fishy going on which makes the comparison with SPE pretty dodgy. 14m transistors for 256KB versus 80m transistors for 10MB, indicates that Cell's memory is using 7x the number of transistors per byte of memory.

    So my comparison isn't standing up very well. EDRAM memory is prolly more dense than EDRAM logic, but I need something other than SPE SRAM as starting point :cry:

    Jawed
     
  6. Mmmkay

    Regular

    Joined:
    Jul 3, 2005
    Messages:
    627
    Likes Received:
    31
    Well I was basing my calculations on your numbers ;)

    Based off that picture, it's 30.5% by my observations.

    That would make it 4.5x the density.
     
  7. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Sorry, I realise how I've misled, now - I mistakenly said that the RAM is ~33% of the SPE, when I meant (and calculated from) the RAM being 33% of the area of the logic, which means that the RAM is ~25% of the total area of the SPE. ARGH. Sorry.

    Jawed
     
    #247 Jawed, May 2, 2006
    Last edited by a moderator: May 2, 2006
  8. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Something's going on, compare NV40 and R420:

    http://www.beyond3d.com/misc/chipcomp/?view=chipdetails&id=62
    http://www.beyond3d.com/misc/chipcomp/?view=chipdetails&id=63

    they're both 130nm, but the ATI is low-k. I think low-k is supposed to allow for more density (or can be traded-off against higher clocks - prolly the latter in this case).

    My guess is that there's 1.1MB of register file in Xenos:

    http://www.beyond3d.com/forum/showpost.php?p=723497&postcount=73

    R580 is similarly endowed, but the numbers are different: supposedly 3 FP32s per fragment, with 24576 fragments in flight. Don't know what the numbers are for vertex shading.

    Jawed
     
  9. TurnDragoZeroV2G

    Regular

    Joined:
    Nov 14, 2005
    Messages:
    583
    Likes Received:
    23
    Location:
    Who knows...
    If they really did reserve 12 registers per fragment/vertex, then... I've thought about it before, but I still think that's incredible. If that were the case, then that'd certainly mean they were looking pretty far ahead as far as shaders (what's the most that have been used so far? I believe I've heard something along the lines of 8/9 for a shader in Far Cry? Or perhaps that was something else). Well, then again, one of the quotes you had stated performance started dropping off significantly with 16/16+ registers, so it's also an upper limit. Still....

    In any case, if registers and texture/vertex caches use 6T SRAM (would that be standard, or is cheaper solution used?), and if ATI didn't count such transistors, that's up to ~58M transistors that are never mentioned for these chips.

    Where's that quote about not wanting to count your jewels using either ATI or NV's method? :lol:
     
  10. Crossbar

    Veteran

    Joined:
    Feb 8, 2006
    Messages:
    1,821
    Likes Received:
    12
    Considering that DRAM is one transistor and one capacitance per memory cell and the SPE SRAM probably is 6 transistors/cell and the rough estimate I think you are quite spot on. There are some room for differences in the number of transistors in the address logic depending on how the memory lines are organised and the operation frequency. Considering the difference in size I would anyway expect the SPE memory to have more overhead for the address logic in proportion to the memory size.
     
  11. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Me too. Like the 128-threads per shader unit in R5xx. It seems like complete overkill. But if you bear in mind that if you execute a shader with 9 registers, then you'll get 1/3 of the threads. It's really about flexibility, in the end.

    G71 appears to support 4 FP32s per fragment (no hard evidence that I'm aware of, though), with 6 quads, each running 880 fragments. That's 330KB of register file.

    I've got the four-light shader from Far Cry here, and it uses 10 registers. Though that's an old compilation and it'd prolly work out much less now (I remember newer compilations of the same shader compile to code that runs significantly faster).

    I don't know how we'd find this stuff out. It seems unlikely that the register file is running ultra-fast RAM, because access to it is pipelined. So I presume that means a low transistor count.

    For the cache I guess things are different, you'd prolly want the fastest RAM implementation possible. Quantities are very low - 32K of texture cache in Xenos (prolly about the same in R580).

    Jawed
     
  12. Edge

    Regular

    Joined:
    Apr 26, 2002
    Messages:
    613
    Likes Received:
    10
    That's because 14 m is is for the SRAM, DMA, MMU, and bus interface. It should be ~12.3 million for the SRAM (6T), and ~1.7 million for the other stuff.
     
    #252 Edge, May 2, 2006
    Last edited by a moderator: May 2, 2006
  13. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Okay. I wasn't sure about specifics, but I knew they were there. Anyway, the point is there's still a substantial amount of additional logic needed if there was no eDRAM.

    I think I gave the wrong impression here. I fully expect this level of hardware (everything from ~7800GT level and upwards) to put out far better graphics than we're seeing today. A closed platform will help that, hopefully. But he predicted that one day I'll think, "God, how'd they do that?", which to me is the highest tier of awe. :cool:

    I've been thinking about these alternatives for a couple years. For example, I was thinking about doing multiple offset texture accesses per pass instead of multiple single texture passes. The problem is that you drift farther away from the realistic model. When racing games generate smoke/wheelspray/dust at the tires, the sprites intersect with the ground, and you see the discrete polygons. Having more lightly coloured sprites ameliorates this.

    The Warhawk video showed a similar problem when the plane passed through the clouds (although it doesn't matter for that game since plane-cloud intersections are rare and fleeting), with an abrupt transition in colour when crossing the cloud boundary. Looks to me like the final compositing is done with single z, alpha, and colour values per pixel.

    If intersection isn't a problem, though, then this technique does indeed look promising. I'm quite curious to know what exactly they're doing. I wonder if there is any precomputation and thus animation restrictions? From what I've heard around here, they said something about raytracing on CELL. Although I think a realistic scattering simulation is infeasible, they could cast two rays to determine the distance through the cloud in the view direction (for transparency) and sun direction (for shading). That's possible in a low poly cloud, I think.

    I like the cleverness of the volumetric fog technique, but IMO it doesn't give you the feel for parallax and the variety that textured alpha layers do.

    I've seen ray marching techniques (i.e. steep parallax mapping and variants) used in fur and grass rendering, but not only is that expensive, it doesn't look as good as alpha blending techniques (like the Tomohide demo). It's not as flexible or accurate either.

    By no means is this list exhaustive, but I'm skeptical that there's a good substitute out there for most alpha effects.

    Okay, that's a very good point. Nonetheless, I find it rather shocking that there would be such low incentive for PC devs to conserve bandwidth.
     
  14. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Jaws and TurnDragoZeroV2G, interesting discussion about the register file.

    Assuming you're right, I'm thinking that ATI doesn't have an issue with extra transistors here because they're high yielding and high density. The path length of SRAM cells is very short compared to the arithmetic units, so it's probably switching at a quarter the speed it could be. On top of that, I think redundancy for defects is easy to implement on a fine scale. So if there are 30M more transistors compared to a design with fewer register resources, it's around 15% extra transistors. High density maybe means less than 10% extra die space. If it never comes out defective, the net cost may be the same as 6% more logic transistors.

    I too think accomodating 12 registers without penalty is overkill, but if the above is correct, it's probably not a bad decision.
     
  15. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    I have the same expectations, but I can still be "wowed". Because "wow" factor is a function of artistry too. I am also spend alot of time reading graphics papers, looking at latest and greatest techniques, but being aware of an algorithm, and seeing it actually put to use are too different things.

    I'm fully aware of what offline renderers can do, but there is a big difference between those tools in the hands of an amateur, and those tools in the hands of WETA FX for example.

    Or take a pencil, or a camera, or a paintbrush and palette. You can read a book on sketching technique, or painting technique, or photography, and be aware of what can be generated using various approaches. But, that doesn't innoculate you from being rocked by a truly beautiful work of art.

    In that way, I think game programming is more of an art than a science.
     
    Shifty Geezer likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...