What happened to SLDRAM?

Deep buffers, prefetching, caching, speculative data loading and really high clock rates will help you overcome much of the latency problems. But to overcome them you'll need bandwidth, because you move data before you need it, not after, ideally it arrives right when you need it.
 
Saem said:
Deep buffers, prefetching, caching, speculative data loading and really high clock rates will help you overcome much of the latency problems. But to overcome them you'll need bandwidth, because you move data before you need it, not after, ideally it arrives right when you need it.
"ideally"

Entropy
 
Fine, fine, "IDEALLY."

Nevertheless, we have to trade-off in different places to get desired performance. Of course, I'm a big believe in the right tool for the job and if it was upto me, I'd make multiple types of MPUs as functional units within the main MPU and have different memory types to handle different situations. But that's getting expensive. ;)
 
Saem said:
Fine, fine, "IDEALLY."

Nevertheless, we have to trade-off in different places to get desired performance. Of course, I'm a big believe in the right tool for the job and if it was upto me, I'd make multiple types of MPUs as functional units within the main MPU and have different memory types to handle different situations. But that's getting expensive. ;)

:)
Yup. I've been fortunate in that scientific computing can afford and justify more specialized hardware, most of what is new in PC-space have been used for a long time in more expensive computers. Applications differ though, and microprocessors have had the edge in straight scalar non bandwidth dependent performance for some time. As I'm sure you are aware, all the features you bring up have been used for a long time in scientific/engineering computers, with much larger caches particularly relative to the core-main mem speed gap. The situation with PC CPUs is much much worse in terms of the CPUs outspeeding the memory systems that feed them.

So I'd consider latency to be quite important along with bandwidth.

It will be interesting to compare the Hammer with its 333MHz bus Athlon siblings, as it doesn't have all that much to gain in IPC over the Athlon according to AMDs presentations and nominal bandwidth will be identical. Most of the gain we see will be from the improved main memory latency, so it will give us some data to chew on pertaining to the relative importance of different parameters in the memory hierarchy, over a range of applications.

To me, this is known as "fun".
To others, the above sentence might justifiably be called "perverse". :)

Entropy
 
I think it's fun as well.

BTW, there should be significant IPC gains in the way of the FPU on the Hammer, since that unit is horribly imbalanced when you take into consideration the pipe that feeds the processor. An Athlonesque FPU on the P4 would be disgustingly strong. And I think the reason Intel really got rid of the second FPU pipe is because if would have spanked IA64 processors so bad, it's not funny.
 

What I meant to say was that I really don't think I'll see anything interesting from Micron. Maybe it will do well in some specialized market segment or something.

The nForces are 4 layer and run fairly well. Same with the i845s and so on. At least that's my understanding. I'm not sure where you're getting 6 layers from, unless we're heading into workstation and lowend server.

The nForces have stability problems. There are a number of DDR SDRAM boards that are six layer, IIRC.

If the Latency wouldn't be important RDRAM would have been much faster then SDRAM from the beginning. Most Benchmarks show that Latency is really important. Latency means that the CPU is idle for as long as it takes to feed the first data.

Which benchmarks? RIMM4200 can have quite a bit less latency than PC1066, but the gains are meager at best. The P4 masks latency fairly well. Unfortunately, it is hard to mask low BW. If the BW of RAM was anywhere close to the needs of the processor, latency would become more important.

the Interface of RDRAM runs at 533 MHz. RDRAM is DDR like normal DDR-SDRAM.

:rolleyes:

BTW, there should be significant IPC gains in the way of the FPU on the Hammer, since that unit is horribly imbalanced when you take into consideration the pipe that feeds the processor. An Athlonesque FPU on the P4 would be disgustingly strong. And I think the reason Intel really got rid of the second FPU pipe is because if would have spanked IA64 processors so bad, it's not funny.

Have you seen the SPEC scores for the Itanium? The FPU scores are incredible. Even with a second FPU, I would have a tough time seeing the P4 score higher in SPEC.
 
Elmic,

You're kidding me right? Have you seen the throughput on the FPU in the P4? The extra pipe would have made a fair bit of difference, I'm sure there are enough operations in the SPEC tests that would benifit from the P4 not lagging behind due to a non-pipelined operation. Too bad I can't profile the code, but I'm pretty sure it would have done some serious damage to Itanium -it's beat it, but if the pipeline was there then it would have been a thrashing. Itanium 2 is a slightly different story, the reason here being a very large cache (which would help on the tests with large data sets) and double the system bandwidth of the P4. Even then, the P4 wouldn't have been that far behind with comprable performance. The Xeons with the extra cache (L3) might have been really powerful.
 
Why are you posting Athlon scores? I've already stated it's being held back by it's system bandwidth.

As for the P4, I believe the 2.8 scores in the high 800s. Too lazy to check.
 
Didn't see P4 scores where I was looking. Of course it is kind of silly to compare the two since their market segments won't overlap much.
 
Back
Top