Will L2 cache on X360's CPU be a major hurdle?

I don't think either PS3 or the 360 have ANY "major" hurdles both systems are very capable without either having a crippling hardware design flaws.(althgouh for some reason some people try to convince others of it)

Neither Microshaft or Sony are dumb companies they wouldn't design somthing if it would make life extremely difficult for devs.
 
Athlon 64s with 512K of cache have no problems taking out Intel junk with 2MB of cache in games - why should XB360 have any problems?

Jawed
 
Jawed said:
Athlon 64s with 512K of cache have no problems taking out Intel junk with 2MB of cache in games - why should XB360 have any problems?

Jawed

How many PC games currently actually utilise heavy multithreading? I'm not sure if it's a good indicator of what's to come, or something to compare against.
 
My point is that arguments centred on cache size might just as well be arguments centred on "bitnesss" or GFLOPs.

Gawd, is it that difficult to understand?

Jawed
 
Jawed said:
My point is that arguments centred on cache size might just as well be arguments centred on "bitnesss" or GFLOPs.

Gawd, is it that difficult to understand?

Jawed

Your point didnt' suggest that to me, but you've made it now :)
 
I agree with c0_re.

MS, Sony, nVidia and ATI are all old hands at console development. They also profile existing games, prototypes and test code extensively. While there may be hiccups, I don't (refuse to) believe we will find major technical mistake.

Back on topic w.r.t. Xenon's L2 cache... I think the PS3 team has similar findings (that hardware cache plays a minor role for large dynamic media data common in game programming).

* http://www.rambus.co.jp/events/Main1_2_SCE_Suzuoki.pdf

* http://arstechnica.com/articles/paedia/cpu/ps2vspc.ars/1 (See "dynamic media applications")

EDIT: Grammar
 
Titanio said:
Jawed said:
Athlon 64s with 512K of cache have no problems taking out Intel junk with 2MB of cache in games - why should XB360 have any problems?

Jawed

How many PC games currently actually utilise heavy multithreading? I'm not sure if it's a good indicator of what's to come, or something to compare against.

The majority of cache requests are going to be served by level 1 cache. Current games have a 98-99% hit rate in the 64KB D$ on the A64.

So the shared level 2 cache is going to take care of the level 1 cache misses from each core. Even if the 32KB D$ of each XeCPU core will have worse hitrate it won't be by much.

Cheers
Gubbi
 
Guden Oden said:
Oh and by the way, as someone else brought it up: the local store SRAM isn't zero wait-state. No SRAM running at 3.2GHz is going to be zero wait-state, it isn't physically possible, or at least not with current technology. Even SRAM running at a fraction of that speed have wait-states of a couple cycles.

That is correct. My bad. I wanted to say that the SPE's LS is L1 memory, but then worried about people thinking I meant L1 cache. I chose to call it "0-wait state" memory instead since, from a programmer's perspective, L1 memory is "0-wait"--the execution pipeline is mated with L1 to mask its latency.
 
Shifty Geezer said:
LunchBox said:
Cache = scratch pad with labels
No. You cannot deliberately write to cache.

That's not exactly true any longer. Most cpu ISAs include prefetch instructions that specifically prefetch data into cache.

It's not entirely like memory, as the program still cannot define precisely where the data will be written, and cannot control whether the data will be overwritten prior to when the data is needed.

Some of the cache locking added to Xenon might take care of part of that problem.
 
Though you can't, to the best of my knowledge, say 'read this data from L2, process, write back to L2, come back to at a later date. This seems to be something new to the next-gen console CPUs AFAIK, simple because such a feature in a generic PC component would never get used as a dev could never be sure of it.
 
Shifty Geezer said:
Though you can't, to the best of my knowledge, say 'read this data from L2, process, write back to L2, come back to at a later date. This seems to be something new to the next-gen console CPUs AFAIK, simple because such a feature in a generic PC component would never get used as a dev could never be sure of it.

Not directly, but it can be done, sort of.

This is dependent on whether or not the destination address has already been cached. If so, then the overwrite is done automatically and what you just said will occur transparently.

If you prefetch and address and it is later modified, the change will most likely be in cache, so long as it is not overwritten or invalidated. It's somewhat risky since without locking something could come along and overwrite it.
 
Gubbi said:
The majority of cache requests are going to be served by level 1 cache. Current games have a 98-99% hit rate in the 64KB D$ on the A64.

There must be some prefetching involved then. Or if the majority of accesses are within 64K wouldn't that contradict what people have been saying about the 256K of SPE local memory being too small?
 
The majority of cache requests are going to be served by level 1 cache. Current games have a 98-99% hit rate in the 64KB D$ on the A64.

So the shared level 2 cache is going to take care of the level 1 cache misses from each core. Even if the 32KB D$ of each XeCPU core will have worse hitrate it won't be by much.

You cant compare the Athlon64 cpu with the cpu in X360.
The design of this Cpu looks more like a prescott and thats being nice.
 
seismologist said:
Gubbi said:
The majority of cache requests are going to be served by level 1 cache. Current games have a 98-99% hit rate in the 64KB D$ on the A64.

There must be some prefetching involved then.

No doubt there is. Interestingly miss rates are very low even in the absence of the explicit prefetch hints. The automatic prefetcher must do a fairly good job. I'll try to upload some data on this later.

seismologist said:
Or if the majority of accesses are within 64K wouldn't that contradict what people have been saying about the 256K of SPE local memory being too small?

Wasn't me. :)

My beef with the SPEs are not the size of the local store, but the lack of demand loading and automatic coherence.

Cheers
Gubbi
 
seismologist said:
Gubbi said:
The majority of cache requests are going to be served by level 1 cache. Current games have a 98-99% hit rate in the 64KB D$ on the A64.

There must be some prefetching involved then. Or if the majority of accesses are within 64K wouldn't that contradict what people have been saying about the 256K of SPE local memory being too small?

Caches on CPU's work because in general code tends to use a number of relatively simple access patterns, which are extremly coherent over any small time period.

Where the local memories fail is in cases like a tree walk, generally frame to frame you will access a small subset and almost the exact same subset of the datastructure, but the data structure itself may be very large in comparison to that subset.

The advantage of the local memory is that it forces you to think about how to best organse your data. The disadvantage, is that it forces you to think how to organise your code and data. If you can't reasonably partition the data, you can't run the algorythm on the SPE. If you can then you probably should.
 
seismologist said:
Gubbi said:
The majority of cache requests are going to be served by level 1 cache. Current games have a 98-99% hit rate in the 64KB D$ on the A64.

There must be some prefetching involved then. Or if the majority of accesses are within 64K wouldn't that contradict what people have been saying about the 256K of SPE local memory being too small?

Well, for one thing the SPE's local storage serves as both data and instruction storage. In the case of the A64, this would up the requirement to 128 kB.

The other issue is that the way SPEs are given chunks of data and instructions to run a task means that piecemeal prefetching could be harmful.
 
overclocked said:
The majority of cache requests are going to be served by level 1 cache. Current games have a 98-99% hit rate in the 64KB D$ on the A64.

You cant compare the Athlon64 cpu with the cpu in X360.
The design of this Cpu looks more like a prescott and thats being nice.

Of course I can. XeCPU has 32KB D$ and 1MB L2 cache. A64 has 64KB D$ and ½MB L2 cache, - fairly similar.

Cheers
Gubbi
 
overclocked said:
Of course I can. XeCPU has 32KB D$ and 1MB L2 cache. A64 has 64KB D$ and ½MB L2 cache, - fairly similar.

Well you can but it wont reflect reality.

In what way?

Or do you think D$ hit rate will plummit going from 64KB to 32KB? :rolleyes:

Cheers
Gubbi
 
Back
Top