Xbox 360 CPU Finaly Revealed

Why would it have to be a "multiplier"? The bus interface is obviously run asynchronously from the rest of the CPU. I believe the document even states there are separate PLLs on the chip for bus and CPU, which means separate clock signals.
 
Guden Oden said:
Why would it have to be a "multiplier"? The bus interface is obviously run asynchronously from the rest of the CPU. I believe the document even states there are separate PLLs on the chip for bus and CPU, which means separate clock signals.

The reason is that when you have an asynchronous clock boundry crossing you end up spending more hardware and having higher latencies. In general, one would like integer or half integer gearboxes throughout the system, as it is much more robust and higher performance. Also having non-asyncronous interfaces makes test/debug a lot easier.

The fact that you have seperate PLLs or clocks, is an orthoganol issue to whether the clocks in the system are integer or half integer multipliers of each other.

Aaron Spink
speaking for myself inc.
 
aaronspink said:
The reason is that when you have an asynchronous clock boundry crossing you end up spending more hardware and having higher latencies.
Considering there'd be I/O buffers at the bus interface in any case wether it runs synchronously or not, I really doubt a few cycles of extra latency's going to be very noticeable. It is a serial-like interface after all, which tend to require a bit of buffering anyway. A synchronous interface isn't going to be as flexible, making the FSB's performance subservient to the CPU die's as a whole. With an asynchronous design the core and FSB speeds can be maxed out independently. I'd think that's the reason IBM/MS went in this direction.

as it is much more robust and higher performance.
Which explains why every single GPU since the TNT has had synchronous bus, GPU and memory clocks. ;)

The fact that you have seperate PLLs or clocks, is an orthoganol issue to whether the clocks in the system are integer or half integer multipliers of each other.
When people use big words it's always a big plus if they know how to spell them. :p In any case, it may be orthogonal, but if the buses were synchronized there'd be no point in having two separate PLLs. Obviously the system was designed to be flexible; CPU core, FSB, GPU core and GDDR clocks are all decoupled from each other, and much the same relationship exists in PS3 I might add. Obviously the hardware engineers felt this is a superior design or else they wouldn't have bothered.

For a PC where you're building a tiered system that to a very large extent builds on previous generations where some products are destined for the low-end and some for the high-end, having everything nice and synchronized might be the way to go. When dealing with what has to be very fast yet cheap hardware whose properties (including cost-effectiveness at any particular performance level) is essentially unknown at the time of design, it wouldn't surprise me if the safer choice is going asynchronous.

So it's one thing what's best on paper, when something else entirely was built in reality (more than once I might add), I tend to trust reality. ;)
 
quick question...

from the IBM link said:
Floating point instructions are sent to a combined VMX/FPU unit, which has available two simultaneous threads for the VMX and two for the FPU. Once again, the delayed-execution issue queue reduces load latency to two cycles. The load/store unit (LSU) might operate out-of-order with respect to the FPU, but the final results are correct. Each stage in the FP/VMX is also 11 FO4. As a result the pipelines are quite deep and result in significant delay for instruction completion. Scalar double-precision floating point operations have 10-cycle latency. VMX operations have four or 14-cycle latency, depending on the operation.

how much is that exactly?
 
LunchBox said:
how much is that exactly?
How much is a double-precision floating point number? It's 64 bits. How much is 10 cycles? Well, at 3.2 GHz a cycle is 0.312 nanoseconds, so 10 cycles is 3.12 ns.
 
Back
Top