More Xenon rumors

Fox5 said:
But a video card can't access system memory over an agp bus?

Yeah, it can, but it isn't a very precise system. Basically it supports block transfers, not random access and is typically used to fetch textures and command lists for rendering the next frame. Sometimes rendered frames are downloaded back to the computer too (screenshot captures, for example).

And by extension couldn't the shaders?

No, there are no shader instructions to access system memory. Heck, I don't even think there are instructions to access GRAPHICS memory...
 
Just thinking out loud here but wouldnt a 256bit bus for the main memory increase costs of producing the PCB? The ideal layer amount is 4 in motherboard production, but a 256bit bus would require more traces and more layers...

What if it turns out to be 128bit memory but at silly speeds...800MHz plus which would simplify the board layout and require less layers but would increase the need for voltage to the RAM.

Increasing the efficiency of MAIN RAM with eDRAM and other such buffers helps enormously (look at GameCube) and it is a known fact that current motherboard MAIN RAM never peaks at its theoretical max.
 
london-boy said:
Well, my GFFX doesn't get close to using all of its 256MB RAM in most situations anyway, by next year we will have even more RAM on cards, and the games using all of that RAM will come out long after those cards are released.

I don't think PCIe will have such a huge impact, like AGP8c didn't on 4x and 2x. Situations where the video card uses the main RAM are not that many. Amount of Video Ram is not the bottleneck in video cards. Usually.

In the ATI Ruby demo on my 128MB 9700pro, even at 640x480 with no AA I end up using all the memory on my video card, and have to set my AGP aperture to 512MB before I get decent performance.
 
Fox5 said:
london-boy said:
Well, my GFFX doesn't get close to using all of its 256MB RAM in most situations anyway, by next year we will have even more RAM on cards, and the games using all of that RAM will come out long after those cards are released.

I don't think PCIe will have such a huge impact, like AGP8c didn't on 4x and 2x. Situations where the video card uses the main RAM are not that many. Amount of Video Ram is not the bottleneck in video cards. Usually.

In the ATI Ruby demo on my 128MB 9700pro, even at 640x480 with no AA I end up using all the memory on my video card, and have to set my AGP aperture to 512MB before I get decent performance.

Is there a way to run that demo on a 5900U?
 
Tahir said:
Just thinking out loud here but wouldnt a 256bit bus for the main memory increase costs of producing the PCB?

PCB cost is a minimal factor according to some in the graphics tech forum. It's just laminated fibreglass after all, I think most people would agree it's unlikely it could compare to buying 1600-ish MHz memory to get the 25+ GB/s bandwidth the proposed specs sheet calls for using a 128-bit bus...

helps enormously (look at GameCube) and it is a known fact that current motherboard MAIN RAM never peaks at its theoretical max.

Current main memory in PCs are held back by many factors. Noise, capacitance, reflections etc from traces, stubs and connectors, and so on. It's the fact there are multiple memory chips sharing the same trace sitting in several sockets with connections that are upwards of a decimeter in length in total. Cut away the flab and chips can be run with tighter timings at higher speeds. :)
 
Guden Oden said:
Tahir said:
Just thinking out loud here but wouldnt a 256bit bus for the main memory increase costs of producing the PCB?

PCB cost is a minimal factor according to some in the graphics tech forum. It's just laminated fibreglass after all, I think most people would agree it's unlikely it could compare to buying 1600-ish MHz memory to get the 25+ GB/s bandwidth the proposed specs sheet calls for using a 128-bit bus...

helps enormously (look at GameCube) and it is a known fact that current motherboard MAIN RAM never peaks at its theoretical max.

Current main memory in PCs are held back by many factors. Noise, capacitance, reflections etc from traces, stubs and connectors, and so on. It's the fact there are multiple memory chips sharing the same trace sitting in several sockets with connections that are upwards of a decimeter in length in total. Cut away the flab and chips can be run with tighter timings at higher speeds. :)

Not really related, but would say one 1 GB stick of ram overclock better than 2 512MB?
 
london-boy said:
Fox5 said:
london-boy said:
Well, my GFFX doesn't get close to using all of its 256MB RAM in most situations anyway, by next year we will have even more RAM on cards, and the games using all of that RAM will come out long after those cards are released.

I don't think PCIe will have such a huge impact, like AGP8c didn't on 4x and 2x. Situations where the video card uses the main RAM are not that many. Amount of Video Ram is not the bottleneck in video cards. Usually.

In the ATI Ruby demo on my 128MB 9700pro, even at 640x480 with no AA I end up using all the memory on my video card, and have to set my AGP aperture to 512MB before I get decent performance.

Is there a way to run that demo on a 5900U?

Yes, I think it may just be as simple as downloading the demo and running it. You may need some dlls though, I'm not sure.
 
Tahir said:
Just thinking out loud here but wouldnt a 256bit bus for the main memory increase costs of producing the PCB? The ideal layer amount is 4 in motherboard production, but a 256bit bus would require more traces and more layers...

What if it turns out to be 128bit memory but at silly speeds...800MHz plus which would simplify the board layout and require less layers but would increase the need for voltage to the RAM.

Increasing the efficiency of MAIN RAM with eDRAM and other such buffers helps enormously (look at GameCube) and it is a known fact that current motherboard MAIN RAM never peaks at its theoretical max.

The main difference cost between 128 and 256 bit bus is not(completely) in layer or pcb but in design and verification to do for correct implementation, high clock and width bus is not so easy to develop and here we are talking about chips that are clocked very high.
Why spend a lot for a 256 bit bus when you can build a very large bus(like 2560 bit in ps2 architecture) with EDRAM?
And the point is that you don't need an insane amount of BW between Main Ram and Graphics Chip...imho....
 
If the gpu has 10MB of eDRAM, it would mean the total bandwidth to memory is much greater than just the bandwidth to *main* memory.
 
I would vote for more edram. Thats about it . The rest of the specs seem fine and on par with what sony will come out with.
 
jvd said:
The rest of the specs seem fine and on par with what sony will come out with.

How do you know...©?

Especially if MS can deliver a 3,5GHz XCPU2 made on 90nm in end 2005, it mean that Sony have a lot of chance of delivering the 4GHz "Holy" BE (of course it needs the 4PUs) by 2006... And then the prophecy will fulfill. :D

j/k more or less since i also want the Teraflops PS3. BTW those Xenon hypothetical specs are balanced (except the edram, imo) but they're not mind blowing compared to the BE (4PUs setup, of course).

And that may explain Jay Allard comment about Sony hardware revolution and MS software revolution... But i'm just thinking out loud.
He didn't talked about Nintendo revolution revolution BTW. :LOL:
 
Hey, if xbox2 has say 6 2ghz cores(of whatever cpu is in the g5), how powerful would that be? I mean, it would be pretty powerful, but I've seen photoshop benchmarks where a single opteron(or athlon fx?) performs as well as 4 of the mac cpus. On the xbox 2 there wouldn't be an os to worry about, so make that maybe 2-3 cores = 1 opteron, and it they can offer the performance of 2-3 top end opterons, that would still be really powerful.(how about opterons running in 64 bit mode though?) Possibly even a dual core 3 ghz athlon 64 couldn't stand up to that.
 
vliw said:
The main difference cost between 128 and 256 bit bus is not(completely) in layer or pcb but in design and verification to do for correct implementation

I think you worry too much. By the time nextbox comes out, ATi will have been doing 256-bit buses for half a decade. They know how to implement this sort of stuff on the board level.

high clock and width bus is not so easy to develop and here we are talking about chips that are clocked very high.

No, we're not. 25-ish GB/s is not very high, you can achieve more than that with plain DDR memory and nextbox will have DDR2 or GDDR3 mem.

Besides, it's not as if they'd need to reinvent the wheel or anything. The memory chip traces can pretty much be copy/pasted from a graphics card if they want to.

Why spend a lot for a 256 bit bus

What makes you think they'd need to spend a lot?

when you can build a very large bus(like 2560 bit in ps2 architecture) with EDRAM?

eDRAM is typically not very big. It can't hold all geometry/texture data, and CPU needs plenty of bandwidth too - in addition to network, other I/O etc etc.

And the point is that you don't need an insane amount of BW between Main Ram and Graphics Chip...imho....

How do you figure that?
 
the slow memory and low amount makes me question the specs


I can understand them wanting to use slower memory for the main pool but to only have 256 megs doesn't make much sense .


I believe when the system is finally released we will see 512 megs of ram with 40-50gig bandwidth.
 
jvd said:
the slow memory and low amount makes me question the specs


I can understand them wanting to use slower memory for the main pool but to only have 256 megs doesn't make much sense .


I believe when the system is finally released we will see 512 megs of ram with 40-50gig bandwidth.

I think any spec needs to be judged with the release date and price of the console. Release is likely to be Q4 2005 and the price could well be less than $299? It's quite feasable that M$ could release a lower priced console similar to the GC this gen. When R420s and NV40s are similarly priced today for just the cards, the XB2 console in less than 18 mths seems like a bargain...but the danger to M$ is that the PS3 seems likely to be a bigger bargain!
 
More eDRAM would obviously be nice. TSMC has a Mosys license so the eDRAM they put into the Xbox 2 GPU will be of a much higher density than what was on the Gamecube Flipper chip. Unless Flipper used 1t-SRAM-Q, which I don't think it did.

I really hope they don't use GDDR-3 or DDR-2 for the main system ram pool. From a technical point of view I'd like to see Microsoft put in 512 mb of RLDRAM II. Having high efficenty utilization of bus bandwidth and very low latency makes a lot more sense to me.

With eDRAM providing oddles of bandwidth to the GPU and the CPU cores have L2 cache, why go with a memory technology that gives you bandwidth in exchange for bloated latency? The CPU for sure won't need much an overwhelming amount of bandwidth. However low latency main ram pool would be a boon to the entire system.

RLDRAM would give the Xbox 2 the best of both worlds, plenty of bandwidth with low latency.
 
Megadrive1988 said:
I absolutely agree Brimstone - though I was not aware of RLDRAM's advantages over DDR2 / GDDR3.

Here is an intresting article on RLDRAM II

RLDRAM II Offers Full Bus Utilization

For years, an obvious distinction has existed between static random access memory (SRAM) and dynamic random access memory (DRAM). SRAM, comprised of six-transistor (6T) cells, accesses data more quickly but requires a large silicon area per bit. For example, in a 90nm device, a single cell typically spans approximately one square micron, however, can be accessed in just two or three clock cycles (depending on the perspective) at frequencies such as 250MHz. In contrast, mainstream DRAM is comprised of one-transistor (1T) cells yet has a relatively long data access time. DRAM's silicon area per bit is much smaller, typically, 0.065 square microns at 90nm, or 1/15th the area of SRAM. This difference in cell sizes increases with each process generation.

SRAM's last remaining advantage is its ability to randomly access the array. A typical high-speed SRAM is capable of a new random access every clock cycle. The question for DRAM becomes, can relatively slow DRAM mimic faster SRAM and thus eliminate SRAM's final advantage?

The inflection point in the evolution of memory technology is the introduction of a low-latency memory leveraging inexpensive DRAM cells that operate at very high frequencies (e.g. 533MHz clock), very fast request rates (e.g. 2ns), and reasonably good random access repeat rates (e.g. 15ns). This new memory, reduced latency DRAM (RLDRAM), in particular its second generation, RLDRAM II, bridges the gap between SRAM and traditional DRAM.

Overview of Low-Latency DRAMs

Currently, two DRAM families - fast cycle RAM (FCRAM) and RLDRAM - target low random access latencies and fast bank cycle times, though, in the opinion of the author, RLDRAM offers vastly more flexibility and covers a far broader range of practical applications.

The first generation of RLDRAM targets a 600MHz data rate and 25ns tRC. RLDRAM II adds several advanced concepts to numerous features already demonstrated in technologies such as QDR SRAM and DDR SRAM. Micron introduced the first instantiation of this architecture in August 2003. Although published device goals specified an 800MHz data rate and 20ns tRC, a 900MHz data rate and 16ns tRC were demonstrated.

Network switches, routers and line cards need better DRAM solutions. The "trick" is not offering bandwidth - many devices do - but the means to make it sustainable. RLDRAM II's reduced tRC enables higher data availability than standard DRAM.

This large, diverse segment can best be described with a concrete example. A high-definition television (HDTV) is a consumer electronics device under great pricing pressure. HDTV's price will fall rapidly as consumer demand develops, but presently its numerous memory buses make it too expensive for the average consumer. To make its price more attractive, one device must provide the scratchpad memory for all its software functions; act as shadow memory for code; satisfy the high scan rate, provide memory for processes such as decoding, standards conversion, etc. RLDRAM's low-latency/fast tRC is essential to satisfy these numerous, simultaneous demands. The 36-bit wide architecture is suitable for 3 x 12-bit color.

Performance Comparison Scenario


Click fig to enlarge
This operating scenario examines device sensitivity to the read:write ratio. For example at R:W of 4:1 RLDRAM would read from bank 0, then bank 1, then bank 2, then bank 3 and then write to bank 4 with banks always available. For DDR2, the same 4:1 ratio assumes that bursts come from the already open row: activate, read, read, read, read, precharge, activate, write, precharge. Assumptions are those most favorable to the device. Results are shown in the Fig. The curves in the figure are most easily distinguishable at a read:write ratio of 1:1. It is plain that at any given frequency, RLDRAM outperforms any competing solution. For highly data-streamed applications, GDDR3 is the performance winner, but for read:write ratios of 1:1, RLDRAM SIO with 4- or 8-word bursts comes out on top. Further, for most applications having read:write ratios of 2:1 or greater (and 1:2 or less), RLDRAM common I/O (CIO) devices outperform all other solutions, regardless of clock frequency. In this comparison we assumed availability of 333MHz FCRAMs, though this has yet to be introduced into the market.

Many other scenarios have been analyzed which for the sake of brevity cannot be discussed in detail. One such scenario is particularly interesting. The above discussion ignores most internal DRAM resource availability issues such as a bank being busy at the time of a request. When this is accounted for, it can be demonstrated that the 8-bank architecture of the RLDRAM presents additional performance advantages. In a particular scenario of 16-word read followed by 16-word write requests, RLDRAM at 4-word burst outperforms DDR2 SDRAM by 3.8x and outperforms FCRAM by 1.52x at the same clock frequency. This performance margin is extended when maximum frequencies are considered. Comparing 266MHz DDR2 and FCRAM with 400MHz RLDRAM, the RLDRAM outperforms DDR2 by 5.7x and outperforms FCRAM by 2.28x.

DDR2 SDRAM

1.8V HSTL-style receivers provide maximum system performance. They have tighter input specifications relative to Vref than SSTL_18 receivers with compatible voltage and Vref levels. Differential clocks should be employed with DDR2 devices since they cannot tolerate single-ended clocks like RLDRAM II can. The output clock situation is complicated and needs special care. QK/QKon RLDRAM differs from the optional output clocks RDQS/RDQS on some versions of DDR2. The bus controller should have low impedance drive capability, compliant to SSTL_18 standards if high loads are expected. Of course, simulations should be performed with the desired topology to verify signal integrity and termination requirements.

An important distinction for systems requiring error detection and/or correction is that RLDRAM is based on nine bits while DDR2 is based on eight. So a 72-bit wide bus requires four x18 RLDRAMs or four x16s plus one x8 DDR2 SDRAM. Four devices are far easier to route than five. RLDRAM performs its work in 17 clock cycles, whereas DDR2 requires 36 clock cycles before the command sequence can be repeated. This scenario is as favorable as a comparison gets for DDR2, because it is assumed that burst operations can continue for an already open resource, whereas for RLDRAM II, it is assumed that the next data comes from a bank that will be available. Statistically, the RLDRAM II assumption is more likely to be true than that of DDR2.

QDR SRAM

Having "invented" the QDR SRAM, we naturally considered its feature set when we created the RLDRAM SIO device. At the time, we intended RLDRAM to be used on QDR-style buses, especially when the required density cannot be achieved with SRAM; when the cost of SRAM is too high; and when the SRAM frequency is inadequate. RLDRAM is available today at 400MHz clock, whereas QDR SRAM is only available up to 250MHz. RLDRAM actually outperforms QDR SRAM when data can be so ordered as to have sufficient availability. This is quite feasible if data is "chunked" into larger groups. Some systems' performances are limited entirely by the random command repeat rate. From that standpoint, with a tRC of 20ns, RLDRAM is equivalent to a 50MHz 4-word burst QDR SRAM. If, however, larger data groups can be used, the tRC limit fades. QDR II needs no assumptions; it will always respond to random requests. The comparison assumes availability of 300MHz QDR II SRAMs, which, so far, do not exist, and shows the slower 300MHz RLDRAMs, although faster parts have been produced. The challenge to the designer is data ordering and request constraining such that full bus utilization can be sustained. If achieved, the rewards are dramatic: 100% bus utilization, increased frequency, and lower cost.

RLDRAM II offers many advantages, including the industry's fastest tRC (16ns to 20ns), full bus utilization at 2-, 4-, and 8-word data burst lengths, and the lowest bus turnaround of any memory device previously produced. It is available in x9, x18, and x36 versions. The RLDRAM SIO version is the only lower-cost alternative to QDR SRAM. An SIO permits 100% bus utilization in situations having balanced (or nearly balanced) read-to-write ratios. The device has a flexible 1.5V/1.8V I/O. The outputs are impedance controlled for wide support of different system and loading topologies. On-die termination provides clean high-frequency operation. RLDRAM is optimized for the lowest system cost. (It also offers the lowest system power consumption, owing to its low 1.8V core voltage, high internal segmentation, high bank count, and smaller data access sizes, as compared with DDR2, for example.) It is scalable to higher frequencies and lower tRC values.

by J. Thomas Pawlowski, Senior Director of Architecture Development, Micron Technology, Inc, USA

(May 2004 Issue, Nikkei Electronics Asia)

http://neasia.nikkeibp.com/nea/200405/mspe_305291.html
 
Guden Oden said:
vliw said:
The main difference cost between 128 and 256 bit bus is not(completely) in layer or pcb but in design and verification to do for correct implementation

I think you worry too much. By the time nextbox comes out, ATi will have been doing 256-bit buses for half a decade. They know how to implement this sort of stuff on the board level.

high clock and width bus is not so easy to develop and here we are talking about chips that are clocked very high.

No, we're not. 25-ish GB/s is not very high, you can achieve more than that with plain DDR memory and nextbox will have DDR2 or GDDR3 mem.

Besides, it's not as if they'd need to reinvent the wheel or anything. The memory chip traces can pretty much be copy/pasted from a graphics card if they want to.

Why spend a lot for a 256 bit bus

What makes you think they'd need to spend a lot?

when you can build a very large bus(like 2560 bit in ps2 architecture) with EDRAM?

eDRAM is typically not very big. It can't hold all geometry/texture data, and CPU needs plenty of bandwidth too - in addition to network, other I/O etc etc.

And the point is that you don't need an insane amount of BW between Main Ram and Graphics Chip...imho....

How do you figure that?

You mind too much about bw between main memory and graphics chip, but if you have EDRAM you are more interested about the bw between EDRAM and graphics chip(in EDRAM environment you can design very width bus and so you can achieve a lot of bw).
To render 3D graphics you need 2 things, an insane amount of floating point power and an insane amount of bw between memory(EDRAM) and Graphics Chip(ATI logic units).
In pc arena you must use other technologies like GDDR3 and high bus that are not cost effective but is the only viable solution..imho....
 
vliw,

You didn't actually respond to my post, you just reiterated the position you'd already made public in your previous post. :?

Of course main memory bandwidth is important. Where else is the CPU going to get instructions and data structures from? Where will the GPU fetch textures from? And so on. One can't just rely on eDRAM, that doesn't solve anything.
 
Back
Top