HD problems in Xbox 360 and PS3 (Zenji Nishikawa article @ Game Watch)

Jawed said:
On top of that, if you look at the RAM in each Cell SPE, you see it's 6x as dense as the logic. It consumes ~33% of the die area of an SPE, but consists of 2x the number of transistors as the logic does (14m transistors of memory versus 7m of logic). If Xenos EDRAM has anything like the same density, then excluding the 25m transistors of logic, the 80m transistors of EDRAM would translate into about 13m transistors of logic, in terms of area.
It doesn't in any way contradict the perfectly valid point that you're making, but I think it's actually 4x the density not 6.

For our 21m SPE, a third of the die is SRAM weighing in at 14m transistors. So the equivalent total of SRAM transistors for that whole SPE die size would be 14/.33 = 42.42m

The logic portion, covering 66% eats 7m which would resolve to being 7/.66 = 10.61m

Therefore the ratio would be 4:1.
 
I don't think we should take the LS/SPE ratios to be what we're working with when it comes to Xenos though. The eDRAM daughter die is roughly one-third the size of the main die, and roughly under one-half the transistors. A portion of that *is* logic also and not straight memory, but still I think that would be a better place to go from for any deconstruction-based estimates, rather than the Cell's SPEs.
 
Fafalada said:
I don't necesserily agree with this - Warhawk already presents one interesting way to do quality volume smoke without going overdraw happy. Not to say everything has an alternative - but I do believe we have lots of room to rethink particle approaches with new machines.

Yeah, to me it looks like they have a very low iteration on the fractals and the resulting puffy stuff really wouldn't do as realistic smoke...
 
xbdestroya said:
I don't think we should take the LS/SPE ratios to be what we're working with when it comes to Xenos though. The eDRAM daughter die is roughly one-third the size of the main die, and roughly under one-half the transistors. A portion of that *is* logic also and not straight memory, but still I think that would be a better place to go from for any deconstruction-based estimates, rather than the Cell's SPEs.

I'd agree with that. I'd imagine there'd be some significant differences in density between eDRAM and 6T SRAM. Not that I have anything to go off at all regarding that.

Not relevant, but on a side note, any info on whether that rumor that ATI doesn't count non-logic transistors most of the time (i.e., texture/vertex cache and register file for X1K and/or Xenos) for main dies (dice? :D) has any merit at all?

Or Jawed, whether that's ~24K or ~73K registers in Xenos?

Bah, ignore me, I'm just trying to derail the thread. ;)
 
http://www-03.ibm.com/chips/photolibrary/photo10.nsf/WebViewNumber/ED994790FAECFD6900256FEA0062126B

I make the RAM 1/4 of an SPE, not 1/3.

As to the density/count on the EDRAM - there's something fishy going on which makes the comparison with SPE pretty dodgy. 14m transistors for 256KB versus 80m transistors for 10MB, indicates that Cell's memory is using 7x the number of transistors per byte of memory.

So my comparison isn't standing up very well. EDRAM memory is prolly more dense than EDRAM logic, but I need something other than SPE SRAM as starting point :cry:

Jawed
 
Jawed said:
I make the RAM 1/4 of an SPE, not 1/3.

Well I was basing my calculations on your numbers ;)

Based off that picture, it's 30.5% by my observations.

That would make it 4.5x the density.
 
Sorry, I realise how I've misled, now - I mistakenly said that the RAM is ~33% of the SPE, when I meant (and calculated from) the RAM being 33% of the area of the logic, which means that the RAM is ~25% of the total area of the SPE. ARGH. Sorry.

Jawed
 
Last edited by a moderator:
TurnDragoZeroV2G said:
Not relevant, but on a side note, any info on whether that rumor that ATI doesn't count non-logic transistors most of the time (i.e., texture/vertex cache and register file for X1K and/or Xenos) for main dies (dice? :D) has any merit at all?
Something's going on, compare NV40 and R420:

http://www.beyond3d.com/misc/chipcomp/?view=chipdetails&id=62
http://www.beyond3d.com/misc/chipcomp/?view=chipdetails&id=63

they're both 130nm, but the ATI is low-k. I think low-k is supposed to allow for more density (or can be traded-off against higher clocks - prolly the latter in this case).

Or Jawed, whether that's ~24K or ~73K registers in Xenos?
My guess is that there's 1.1MB of register file in Xenos:

http://www.beyond3d.com/forum/showpost.php?p=723497&postcount=73

R580 is similarly endowed, but the numbers are different: supposedly 3 FP32s per fragment, with 24576 fragments in flight. Don't know what the numbers are for vertex shading.

Jawed
 
If they really did reserve 12 registers per fragment/vertex, then... I've thought about it before, but I still think that's incredible. If that were the case, then that'd certainly mean they were looking pretty far ahead as far as shaders (what's the most that have been used so far? I believe I've heard something along the lines of 8/9 for a shader in Far Cry? Or perhaps that was something else). Well, then again, one of the quotes you had stated performance started dropping off significantly with 16/16+ registers, so it's also an upper limit. Still....

In any case, if registers and texture/vertex caches use 6T SRAM (would that be standard, or is cheaper solution used?), and if ATI didn't count such transistors, that's up to ~58M transistors that are never mentioned for these chips.

Where's that quote about not wanting to count your jewels using either ATI or NV's method? :LOL:
 
Jawed said:
As to the density/count on the EDRAM - there's something fishy going on which makes the comparison with SPE pretty dodgy. 14m transistors for 256KB versus 80m transistors for 10MB, indicates that Cell's memory is using 7x the number of transistors per byte of memory.
Considering that DRAM is one transistor and one capacitance per memory cell and the SPE SRAM probably is 6 transistors/cell and the rough estimate I think you are quite spot on. There are some room for differences in the number of transistors in the address logic depending on how the memory lines are organised and the operation frequency. Considering the difference in size I would anyway expect the SPE memory to have more overhead for the address logic in proportion to the memory size.
 
TurnDragoZeroV2G said:
If they really did reserve 12 registers per fragment/vertex, then... I've thought about it before, but I still think that's incredible.
Me too. Like the 128-threads per shader unit in R5xx. It seems like complete overkill. But if you bear in mind that if you execute a shader with 9 registers, then you'll get 1/3 of the threads. It's really about flexibility, in the end.

G71 appears to support 4 FP32s per fragment (no hard evidence that I'm aware of, though), with 6 quads, each running 880 fragments. That's 330KB of register file.

If that were the case, then that'd certainly mean they were looking pretty far ahead as far as shaders (what's the most that have been used so far? I believe I've heard something along the lines of 8/9 for a shader in Far Cry?
I've got the four-light shader from Far Cry here, and it uses 10 registers. Though that's an old compilation and it'd prolly work out much less now (I remember newer compilations of the same shader compile to code that runs significantly faster).

In any case, if registers and texture/vertex caches use 6T SRAM (would that be standard, or is cheaper solution used?), and if ATI didn't count such transistors, that's up to ~58M transistors that are never mentioned for these chips.
I don't know how we'd find this stuff out. It seems unlikely that the register file is running ultra-fast RAM, because access to it is pipelined. So I presume that means a low transistor count.

For the cache I guess things are different, you'd prolly want the fastest RAM implementation possible. Quantities are very low - 32K of texture cache in Xenos (prolly about the same in R580).

Jawed
 
Jawed said:
As to the density/count on the EDRAM - there's something fishy going on which makes the comparison with SPE pretty dodgy. 14m transistors for 256KB versus 80m transistors for 10MB, indicates that Cell's memory is using 7x the number of transistors per byte of memory.

That's because 14 m is is for the SRAM, DMA, MMU, and bus interface. It should be ~12.3 million for the SRAM (6T), and ~1.7 million for the other stuff.
 
Last edited by a moderator:
Fafalada said:
In current GPUs those are so small they barely register in terms of die space used. Apparently only a few odd freaks like nAo and me think larger sized pixel caches would be useful.:devilish:
Okay. I wasn't sure about specifics, but I knew they were there. Anyway, the point is there's still a substantial amount of additional logic needed if there was no eDRAM.

Well I do think nAo was also saying you would be surprised given your actual expectations from this hardware. Whether something is wow-factor in absolute terms it's mostly a matter of individual perspective anyways
I think I gave the wrong impression here. I fully expect this level of hardware (everything from ~7800GT level and upwards) to put out far better graphics than we're seeing today. A closed platform will help that, hopefully. But he predicted that one day I'll think, "God, how'd they do that?", which to me is the highest tier of awe. :cool:

I don't necesserily agree with this - Warhawk already presents one interesting way to do quality volume smoke without going overdraw happy. Not to say everything has an alternative - but I do believe we have lots of room to rethink particle approaches with new machines.
I've been thinking about these alternatives for a couple years. For example, I was thinking about doing multiple offset texture accesses per pass instead of multiple single texture passes. The problem is that you drift farther away from the realistic model. When racing games generate smoke/wheelspray/dust at the tires, the sprites intersect with the ground, and you see the discrete polygons. Having more lightly coloured sprites ameliorates this.

The Warhawk video showed a similar problem when the plane passed through the clouds (although it doesn't matter for that game since plane-cloud intersections are rare and fleeting), with an abrupt transition in colour when crossing the cloud boundary. Looks to me like the final compositing is done with single z, alpha, and colour values per pixel.

If intersection isn't a problem, though, then this technique does indeed look promising. I'm quite curious to know what exactly they're doing. I wonder if there is any precomputation and thus animation restrictions? From what I've heard around here, they said something about raytracing on CELL. Although I think a realistic scattering simulation is infeasible, they could cast two rays to determine the distance through the cloud in the view direction (for transparency) and sun direction (for shading). That's possible in a low poly cloud, I think.

I like the cleverness of the volumetric fog technique, but IMO it doesn't give you the feel for parallax and the variety that textured alpha layers do.

I've seen ray marching techniques (i.e. steep parallax mapping and variants) used in fur and grass rendering, but not only is that expensive, it doesn't look as good as alpha blending techniques (like the Tomohide demo). It's not as flexible or accurate either.

By no means is this list exhaustive, but I'm skeptical that there's a good substitute out there for most alpha effects.

Well someone smart has once said here marketting makes most of hw-design decisions, and I'd say the rest are dictated by target platform - and closed boxes have quite different requirements then PC.
For one, you absolutely don't care how your product runs existing/legacy software/benchmarks. ;)
Okay, that's a very good point. Nonetheless, I find it rather shocking that there would be such low incentive for PC devs to conserve bandwidth.
 
Jaws and TurnDragoZeroV2G, interesting discussion about the register file.

Assuming you're right, I'm thinking that ATI doesn't have an issue with extra transistors here because they're high yielding and high density. The path length of SRAM cells is very short compared to the arithmetic units, so it's probably switching at a quarter the speed it could be. On top of that, I think redundancy for defects is easy to implement on a fine scale. So if there are 30M more transistors compared to a design with fewer register resources, it's around 15% extra transistors. High density maybe means less than 10% extra die space. If it never comes out defective, the net cost may be the same as 6% more logic transistors.

I too think accomodating 12 registers without penalty is overkill, but if the above is correct, it's probably not a bad decision.
 
Mintmaster said:
I think I gave the wrong impression here. I fully expect this level of hardware (everything from ~7800GT level and upwards) to put out far better graphics than we're seeing today. A closed platform will help that, hopefully. But he predicted that one day I'll think, "God, how'd they do that?", which to me is the highest tier of awe. :cool:

I have the same expectations, but I can still be "wowed". Because "wow" factor is a function of artistry too. I am also spend alot of time reading graphics papers, looking at latest and greatest techniques, but being aware of an algorithm, and seeing it actually put to use are too different things.

I'm fully aware of what offline renderers can do, but there is a big difference between those tools in the hands of an amateur, and those tools in the hands of WETA FX for example.

Or take a pencil, or a camera, or a paintbrush and palette. You can read a book on sketching technique, or painting technique, or photography, and be aware of what can be generated using various approaches. But, that doesn't innoculate you from being rocked by a truly beautiful work of art.

In that way, I think game programming is more of an art than a science.
 
Back
Top