Devourer said:pc999 said:By this spec can anyone estimate the power of this in comparation to CELL
Oh-oh, dangerous question
A racional , empiric estimation ...
Devourer said:pc999 said:By this spec can anyone estimate the power of this in comparation to CELL
Oh-oh, dangerous question
By this spec can anyone estimate the power of this in comparation to CELL
I suppose I should have been more specific about tense.Vince said:Right. And Joe's still right. The designs have basically been locked down and the specs have been in developers hands for several months now. Any change, especially one as large as dropping the eDRAM, would constitute a major failure. These things are pretty static, Sony has two years of R&D prior to today on MS and possibly a year after today on them - so there is alot of set-piece thinking when designing a console.Chalnoth said:Right. I was trying to state that anything that is currently in development may well differ from that plan. As an example, depending upon engineering concerns, the eDRAM may be reduced or dropped altogether.
DemoCoder said:The MSAA buffer *is* the backbuffer.
Anyway, "spilling" the backbuffer into main memory seems to defeat most of the purpose of keeping it in eDRAM. Keeping Z in eDRAM would make sense, since it needs to be read and written to many times. But if you're doing alpha blending, once you spill to main memory, you're performance is effectively bottlenecked by the read/write rate of main memory. When rendering, sure, you'd have a huge FB bandwidth to write to, but then, that FB has to be flushed to main memory at a certain point, and you'll have a stall, since it can't be flushed faster than the pipelines are filling up the edram.
"spilling" buffers only really gains you big boosts if you have a TBDR.
I think this would be the best way to make use of relatively small amounts of eDRAM. You could have a FIFO "tile cache" that would automatically demote the last written tile to the end of the list (such that the tile last written to is always going to be the first to output).DemoCoder said:Yeah, you need to do the "inner loop" on a tile at a time and finally write it out to main memory will it never be touched again during that render pass. If it gets touched again in the same pass, that means reading the spilled tiles back into eDRAM.
I'd see it being put to better use for DVD playback.bloodbob said:Is it just me or does the external video scaler seem odd wouldn't it be better to render the stuff at the native resolution rather then have a seperate chip doing the scaling???
DemoCoder said:I'm not clear how copy-on-write is supposed to solve the problem. Copy on write saves space, when say, you fork() a process, and all memory is shared until you write to a page, and then that page is copied. That saves you the trouble of wasting 2x the memory immediately when fork()ing.
But you still need enough eDRAM to store one complete copy, and it appears to me that there isn't enough to store one HDTV 4xFSAA HDR frame. Are you suggesting that only areas that fail 4:1 compression get "spilled" to main memory? Even with virtualized FB memory, it still seems like it would be a huge hit to performance, since the page-ins and page-outs will happen at main memory speeds.
Virtual memory works so long as you only require a portion of the VM space for your inner loop or hotspot. Virtual Memory, on a CPU forexample, would not work very well if every app in the system had to touch every part of it's code, since you'd be bogged down in page faults. It works because in the vast majority of apps, only a small portion of the code needs to be in main memory. (e.g. what percentage of the Emacs code base is needed to edit a text file?)
Ditto for texture virtualization, if you have a huge texture atlas, but only need a portion of the pixels for any given frame.
With the FB, it is much more likely that a huge number of pixels will be touched more than once, and therefore, the entire FB will be paged in/out at some point, limiting you to the main memory bandwidth.
bbot said:Why does everyone assume that three separate cpus are being used? Isn't it more likely, from looking at the diagram, that one tri-core cpu is being used?
bbot said:Why does everyone assume that three separate cpus are being used? Isn't it more likely, from looking at the diagram, that one tri-core cpu is being used?
TBR in this case..since I read this patent I believe Sony is going to do the same on their cell based visualizer.DemoCoder said:Yeah, you need to do the "inner loop" on a tile at a time and finally write it out to main memory will it never be touched again during that render pass. If it gets touched again in the same pass, that means reading the spilled tiles back into eDRAM.
CPU:
3 identical cores, 64bit PowerPC at 3.5G+
total 6 HW threads (AI, Rendering, Generation/Loading/Unpacking, Collision/Physics, Audio, Graphics)
32K 2way I-cache, 32K 4way D-cache
1MB 8way shared L2 cache
84+ GFlops in one chip
GPU:
The GPU can read/write all the system memory, including the CPU L2 Cache.
shader 3.0+, 4096 instr, dynamic branching, unified VS & PS
memory write from shader code
48 ALUs(shader process) each at 500M Hz
466 GFLOPS
Free 4xFSAA
Frame buffer format:
INT, 2:10:10:10, 8:8:8:8, 16:16:16:16
FP, 11f:11f:10f, 16f:16f:16f:16f, 32f:32f:32f:32f
Pixel rate 4.0 BPix/S
Triangle generate 500 MTri/s
PixelShader 48 Ginstr/s
VertShader 48 Ginstr/s
Tex rate 8 Gpix/s
Tex Bandwidth 22GB/s
Actually looking closer, the diagram pretty clearly states that 33GB/s read is for memory(22) + L2 cache(11) combined, not eDram.aaronspink said:From the diagram, it certainly seems that the GPU uses both the console main memory and the embedded memory together ( 33 GB/s read stream and 22 GB write stream out of GPU to CPU/Main Mem).
Well all the little pluses would likely indicate parts that are possible subjects to change no ? Most of numbers with'+' next to them seem linked to clock speeds anyhow, which is usually not locked down until the last moment.Chalnoth said:Anyway, my point was that it's not clear to me that that file was before or after "spec lockdown." As a side note: if the console is set to be released by the end of 2005, then spec lockdown may be right about now.
aaronspink said:DemoCoder said:With only 10mb of EDRAM, FSAA and HDTV resolutions are not supportable. 1280*720*8bpp*4xfsaa = 29mb (yes, compression can be used, but you cannot depend on a guaranteed compression ratio, you have to allocate a buffer the size of the worst case). If you use 64-bit FB for HDR, it's worse: 1280*720*12bpp*4x = 44mb. Of course, it's all even worse for 1080i.
From the diagram, it certainly seems that the GPU uses both the console main memory and the embedded memory together ( 33 GB/s read stream and 22 GB write stream out of GPU to CPU/Main Mem). If this is true, you could use the embedded dram as the primary buffer in the case of AA if you have confidence that the majority of pixels will compress. You would then add the additional buffers in the main memory to hold the additional samples for those pixels that did not compress.
Aaron Spink
speaking for myself inc.
Megadrive1988 said:By this spec can anyone estimate the power of this in comparation to CELL
no because CELL is an architecture not a specific chip. there will be many different processors based on CELL.
it's like asking:
'can anyone estimate the power of this in comparison to X86, MIPs, SuperH, ARM, etc.'
what you might want to ask is:
'how will this (these Xbox 2 specs) compare to the PS3's Cell-based chipset'