Xenon Development Kits available.... NOW ( Alpha version )

Ease the burden? :)

Dude, 51GB/s, if it indeed is that much, will still be QUITE fast in that time-frame. Remember, it's unlikely even a P4 bus will have progressed past 12GB/s... Considering the limited resolutions consoles will work with, 51GB will be a lot.
 
Guden Oden said:
Ease the burden? :)

Dude, 51GB/s, if it indeed is that much, will still be QUITE fast in that time-frame. Remember, it's unlikely even a P4 bus will have progressed past 12GB/s... Considering the limited resolutions consoles will work with, 51GB will be a lot.

Dude, you have to run a GPU off that in addition to the CPU. It's possible that the latest ATI and nVidia cards will have bandwith surpassing that.. today. I just saw DeFuria post on GDDR3 I think, maybe he remembers but the number was like 75GB/sec. eDRAM would have bandwith in the 100's of GB/sec in that timeframe.
 
Deano said:
One wrinkle in this design is that a true segmented architecture would require a fast copy from write-only to read-only for render-to-texture/vertex/index array techniques.
Question is whether you really want to segment that in the first place. When you unify the vertex/fragment resources, you open up some interesting(IMO) venues for creative use of write-to-read ops (can't very well call them render-to-texture anymore :p ).
But segmentation could put a foul stop to that :p

Vince said:
eDRAM would have bandwith in the 100's of GB/sec in that timeframe.
Blah Vince, is that supposed to be sarcasm? A 1ghz chip with GS eDram bus would have like over 300GB/Sec :p
But more in real world, I seem to remember Sony already demoing chip with eDram that was like 4-8x faster then what they have in GS? I can't remember the article nor the exact numbers though.
 
Fafalada said:
Deano said:
One wrinkle in this design is that a true segmented architecture would require a fast copy from write-only to read-only for render-to-texture/vertex/index array techniques.
Question is whether you really want to segment that in the first place. When you unify the vertex/fragment resources, you open up some interesting(IMO) venues for creative use of write-to-read ops (can't very well call them render-to-texture anymore :p ).
But segmentation could put a foul stop to that :p

From a software point of view I agree, but from a hardware view maybe it makes sense?

At least I hope it makes sense from a hardware view, else why do it? Personally I want the opposite, read/write to the framebuffer at will but ...
 
Fafalada said:
Deano said:
One wrinkle in this design is that a true segmented architecture would require a fast copy from write-only to read-only for render-to-texture/vertex/index array techniques.
Question is whether you really want to segment that in the first place. When you unify the vertex/fragment resources, you open up some interesting(IMO) venues for creative use of write-to-read ops (can't very well call them render-to-texture anymore :p ).
But segmentation could put a foul stop to that :p

IMO the only real limitation from segmentation is not being able to read from the buffer that your writing to. This doesn't work on most modern GPU's anyway, so it wouldn't be a new restriction (although it'd be a nice feature to have).

Writing to EDRAM and copying to main RAM might be faster for your "render to texture effects" than rendering directly to main RAM.

In fact on Xbox Rendering to a non swizzled render Target and copying it (with the GPU) to a Swizzled Texture before using it in subsequent rendering can actually work out faster then just rendering to a swizzled target or rendering directly from the none swizzled one.
 
Somebody asked for an arcane PR or Presentation ?



Panajev TO THE RESCUEEEEEEEEEEEEEEEEEEEEE!!!!!!!!

Embedded DRAM: When Absolutely Nothing Else Will Do!
One of the most impressive papers I saw at ISSCC 2001 described a 3D graphics engine that integrated 32 Mbyte of high performance embedded DRAM. This device is apparently a development vehicle for a next generation graphics platform from Sony, possibly a follow on product to the Playstation 2. The device, developed in conjunction with United Memories Inc. of Colorado, is quite ambitious even for the 0.18 um, 5 level metal process used. It consists of two groups of eight 15.75 mm2 16 Mb DRAM macros arranged around a common central region containing a 3D graphics engine. The DRAM macro is distinguished by its separate 256 bit wide read and write data paths that operate in dual data rate (DDR) mode up to 714 MHz for a data transfer rate of 1.43 Gbps per macro pin. Thus each macro provides a peak bandwidth of almost 46 Gbyte/s of read or write bandwidth.

The entire device incorporates 16 DRAM macros and operates them at 500 MHz for a peak overall bandwidth of 512 GB/s of read or write bandwidth. The DRAM macros also include the capability of performing simultaneous read and write operations. To avoid separate read and write column address paths into the macro, a special first-in, first-out (FIFO) column address buffer is used to store addresses for future write operations. This permits straightforward implementation of read-modify-write (RMW) cycles as shown in Figure 8.



Figure 8 Sony Graphics Engine Embedded DRAM read-modify-write Feature

The use of the read-modify-write feature allows the device to approach 1000 GB/s of effective merged pixel fill bandwidth. This capability shows what a powerful tool architected memories like custom embedded DRAM is for system designers targeting specific applications like 3D graphics. This contrasts sharply with discrete memory solutions that utilize commodity DRAM devices. Not only do commodity DRAMs lack separate read and write data paths, but they also impose a read-to-write and write-to-read bus turn around penalty that can significantly reduce their effective bandwidth from its peak value. It would nominally take 320 direct Rambus channels (requiring about 15,000 signal, power, and ground pins!) to match just the read bandwidth of the Sony device. The bandwidth of the Sony device compared to several other game consoles and 3D graphics engines for PC applications is shown in Figure 9.



Figure 9 Comparison of 3D Rendering Bandwidth

It is obvious from just the size (252 mm2) and power (8 W for reads, 11 W for writes, and 18 W for RMW cycles) of the 16 embedded DRAM macros in this device that it is entirely unsuitable for high volume production, especially for a consumer electronics device like a game console. Perhaps Sony can incorporate the current 0.18 um device into workstation graphics applications and await a future shrink to a 0.13 um process to bring it into the cost realm of discretionary consumer products.

http://www.realworldtech.com/page.cfm?ArticleID=RWT022001001645
 
DeanoC said:
From a software point of view I agree, but from a hardware view maybe it makes sense?
I should think it makes sense from hw point yeah - for one, afaik multiported memory is more complicated and expensive then using multiple single ported blocks.
But I was talking from my greedy software point of view, where we all agree anyhow apparently.

ERP said:
IMO the only real limitation from segmentation is not being able to read from the buffer that your writing to.
Well I guess it depends on how intercommunication between your edram segments is implemented. If data have to go through main bus from one to the other, it becomes a performance issue how much you want transfer between them.

I'm not entirely sure what swizzled texture refers to on XBox btw, is it at all similar to PS2 swizzled textures? (I used the term 'reordered' myself but apparently SCE folks think swizzling sounded better).


Pana, yeah that's what I was remembering. Anyway I think I misread Vince's post anyhow, he said "in 100s" which kinda imples a few 100 anyhow (damn sleepless nights).
 
Dont quote the RWT article, they mixed up 2 seperate presentations. There was a presentation on a graphics chip with 32 MB eDRAM, but it was simply the extended GS.

They never actually manufactured a complete ASIC with the fast new memory macros, and the isolated macro they manufactured didnt perform at the level quoted ... that performance was only achieved in simulation.
 
Dude, you have to run a GPU off that in addition to the CPU. It's possible that the latest ATI and nVidia cards will have bandwith surpassing that.. today. I just saw DeFuria post on GDDR3 I think, maybe he remembers but the number was like 75GB/sec. eDRAM would have bandwith in the 100's of GB/sec in that timeframe.


thank you, Vince :)


Nvidia and ATI should have over 50 GB/sec cards this year. 2005-2006 cards will no doubt surpass that. so Xbox 2's reported 51 GB/sec bandwidth is looking modest, IF that is for everything ala Xbox's 6.4 GB/sec.

PS3's eDRAM bandwidth on Visualizer, and perhaps Broadband Engine as well, could reach into the 100s of GB/sec which will greatly ease the strain on a 25 GB/sec Rambus XDR main memory bandwidth. If PS3's main memory bandwidth is higher than 25 GB/sec, so much the better.
 
MfA said:
Dont quote the RWT article, they mixed up 2 seperate presentations. There was a presentation on a graphics chip with 32 MB eDRAM, but it was simply the extended GS.

They never actually manufactured a complete ASIC with the fast new memory macros, and the isolated macro they manufactured didnt perform at the level quoted ... that performance was only achieved in simulation.

Are you sure ? I have on a good word from a Sony guy ( Team Nishi will probably not say anything to you ) that Sony had worked on an actual e-DRAM macro with busses that were 4x as wide as the GS's e-DRAM busses ( they called it jokingly GS 1.5 [still I cannot say that the ISSCC's chip was the one using these new macros, but I cannot even exclude that eventuality] ).


Also, if you are sure of what you say please contact Paul DeMone at RWT so that the apprpriate research and corection process can start.
 
Fafalada said:
DeanoC said:
From a software point of view I agree, but from a hardware view maybe it makes sense?
I should think it makes sense from hw point yeah - for one, afaik multiported memory is more complicated and expensive then using multiple single ported blocks.
But I was talking from my greedy software point of view, where we all agree anyhow apparently.

ERP said:
IMO the only real limitation from segmentation is not being able to read from the buffer that your writing to.
Well I guess it depends on how intercommunication between your edram segments is implemented. If data have to go through main bus from one to the other, it becomes a performance issue how much you want transfer between them.

I'm not entirely sure what swizzled texture refers to on XBox btw, is it at all similar to PS2 swizzled textures? (I used the term 'reordered' myself but apparently SCE folks think swizzling sounded better).


Pana, yeah that's what I was remembering. Anyway I think I misread Vince's post anyhow, he said "in 100s" which kinda imples a few 100 anyhow (damn sleepless nights).


Swizzling is just reordering to get better texture cache coherency, NVidias reordering is basically an address bit twiddle, that re-orders the texture in recursively defined squares of 4 pixel groups.

FWIW I don't think render to a texture will be an issue, and the benefits of the EDRAM in this case probably outway the additional software complexity.
 
Dude, you have to run a GPU off that in addition to the CPU. It's possible that the latest ATI and nVidia cards will have bandwith surpassing that.. today

Firstly I'd be extremely suprised if ATI/Nvidia's next cards have more then 50GB/s memory bandwidth. Secondly when XBox was released there were graphics cards that had already been out for 6 months that had almost 50% more bandwidth then it. Did that stop XBox being a very powerful console that produced graphics well above the afore mentioned cards?.. nope. 51GB/s sounds like it should be good for a console in mid to late 2005.
 
Just read the papers which were presented, one was presented by Sony (about the macro).

The other by the partner which designed the 32 MB GS (I dont remember their name). They certainly werent using 714 MHz double clocked eDRAM.

Marco

PS. it is perfectly feasible for some parts of the internal pipeline on the 32 MB GS to be 4 times as wide, that is not really what is important though. The paper mentioning the 714 MHz memory macro is quite clearly entirely seperate, and is quite explicit in stating that only a single macro was put on a chip (and didnt perform all that well).

PPS. I dont feel like getting into a pissing match with mr. Demone, read the papers and make up your own mind.
 
Marco, I was not asking you to go in a pissing match with him, I will read the papers and e-mail him myself then: neither I wqas trying to get into a pissing contest with you ( sorry if it seemed that way ).

I was only asking that in order to make the article I linked as correct as possible as people do use it as a quick and easy reference.

PS. it is perfectly feasible for some parts of the internal pipeline on the 32 MB GS to be 4 times as wide, that is not really what is important though. The paper mentioning the 714 MHz memory macro is quite clearly entirely seperate, and is quite explicit in stating that only a single macro was put on a chip (and didnt perform all that well).

I think you are right: I was not told about the GS 1.5 as having considerably increased clock frequency, but only about the wider busses to the DRAM macros and to some unspecified enhancements to the rendering core ( minor ones, bigger were expected in the GS 2, but that project was pre-CELL IIRC ).

I want to thank you for reminding me about the performance of the DDR DRAM Macro as I do remember that I was told that SCE had experimented with DDR data signalling in e-DRAM, but came away disappointed by the practical results ( you awoke that memory of mine : ) ).
 
Megadrive1988 said:
( minor ones, bigger were expected in the GS 2, but that project was pre-CELL IIRC ).

any info on what was planned for GS2, Panajev ? 8) or thats it..... :?

That is it. :(

Edit: well, the GS2 was the core which was supposed to get the first serious revision of the rendering core, but mostly in terms of clock-speed IIRC ( yes, some new features were considered ).
 
That is it.

Edit: well, the GS2 was the core which was supposed to get the first serious revision of the rendering core, but mostly in terms of clock-speed IIRC ( yes, some new features were considered ).

kinda figured that. oh well :oops:

All I remember is what I read in articles such as those from EEtimes.
both the GS2 and EE2 were supposed to be "enhanced" architectures, over GS/EE.
 
...

1. How does Panajev get a hold of Xbox Next document??? Is he a developer???
2. Can you confirm the CPU count? Is it two, three, or four???
 
Back
Top