Will L2 cache on X360's CPU be a major hurdle?

c0_re · Jul 18, 2005

I don't think either PS3 or the 360 have ANY "major" hurdles both systems are very capable without either having a crippling hardware design flaws.(althgouh for some reason some people try to convince others of it)

Neither Microshaft or Sony are dumb companies they wouldn't design somthing if it would make life extremely difficult for devs.

Shifty Geezer · Jul 18, 2005

Some would beg to differ given the PS2

Jawed · Jul 18, 2005

Athlon 64s with 512K of cache have no problems taking out Intel junk with 2MB of cache in games - why should XB360 have any problems?

Jawed

Titanio · Jul 18, 2005

Jawed said:
Athlon 64s with 512K of cache have no problems taking out Intel junk with 2MB of cache in games - why should XB360 have any problems?

Jawed

How many PC games currently actually utilise heavy multithreading? I'm not sure if it's a good indicator of what's to come, or something to compare against.

Jawed · Jul 18, 2005

My point is that arguments centred on cache size might just as well be arguments centred on "bitnesss" or GFLOPs.

Gawd, is it that difficult to understand?

Jawed

Titanio · Jul 18, 2005

Jawed said:
My point is that arguments centred on cache size might just as well be arguments centred on "bitnesss" or GFLOPs.

Gawd, is it that difficult to understand?

Jawed

Your point didnt' suggest that to me, but you've made it now

patsu · Jul 18, 2005

I agree with c0_re.

MS, Sony, nVidia and ATI are all old hands at console development. They also profile existing games, prototypes and test code extensively. While there may be hiccups, I don't (refuse to) believe we will find major technical mistake.

Back on topic w.r.t. Xenon's L2 cache... I think the PS3 team has similar findings (that hardware cache plays a minor role for large dynamic media data common in game programming).

* http://www.rambus.co.jp/events/Main1_2_SCE_Suzuoki.pdf

* http://arstechnica.com/articles/paedia/cpu/ps2vspc.ars/1 (See "dynamic media applications")

EDIT: Grammar

Gubbi · Jul 18, 2005

Titanio said:
Jawed said:

Athlon 64s with 512K of cache have no problems taking out Intel junk with 2MB of cache in games - why should XB360 have any problems?

Jawed

Click to expand...

How many PC games currently actually utilise heavy multithreading? I'm not sure if it's a good indicator of what's to come, or something to compare against.

The majority of cache requests are going to be served by level 1 cache. Current games have a 98-99% hit rate in the 64KB D$ on the A64.

So the shared level 2 cache is going to take care of the level 1 cache misses from each core. Even if the 32KB D$ of each XeCPU core will have worse hitrate it won't be by much.

Cheers
Gubbi

phat · Jul 18, 2005

Guden Oden said:
Oh and by the way, as someone else brought it up: the local store SRAM isn't zero wait-state. No SRAM running at 3.2GHz is going to be zero wait-state, it isn't physically possible, or at least not with current technology. Even SRAM running at a fraction of that speed have wait-states of a couple cycles.

That is correct. My bad. I wanted to say that the SPE's LS is L1 memory, but then worried about people thinking I meant L1 cache. I chose to call it "0-wait state" memory instead since, from a programmer's perspective, L1 memory is "0-wait"--the execution pipeline is mated with L1 to mask its latency.

3dilettante · Jul 18, 2005

Shifty Geezer said:
LunchBox said:

Cache = scratch pad with labels

Click to expand...

No. You cannot deliberately write to cache.

That's not exactly true any longer. Most cpu ISAs include prefetch instructions that specifically prefetch data into cache.

It's not entirely like memory, as the program still cannot define precisely where the data will be written, and cannot control whether the data will be overwritten prior to when the data is needed.

Some of the cache locking added to Xenon might take care of part of that problem.

Shifty Geezer · Jul 18, 2005

Though you can't, to the best of my knowledge, say 'read this data from L2, process, write back to L2, come back to at a later date. This seems to be something new to the next-gen console CPUs AFAIK, simple because such a feature in a generic PC component would never get used as a dev could never be sure of it.

3dilettante · Jul 18, 2005

Shifty Geezer said:
Though you can't, to the best of my knowledge, say 'read this data from L2, process, write back to L2, come back to at a later date. This seems to be something new to the next-gen console CPUs AFAIK, simple because such a feature in a generic PC component would never get used as a dev could never be sure of it.

Not directly, but it can be done, sort of.

This is dependent on whether or not the destination address has already been cached. If so, then the overwrite is done automatically and what you just said will occur transparently.

If you prefetch and address and it is later modified, the change will most likely be in cache, so long as it is not overwritten or invalidated. It's somewhat risky since without locking something could come along and overwrite it.

seismologist · Jul 18, 2005

Gubbi said:
The majority of cache requests are going to be served by level 1 cache. Current games have a 98-99% hit rate in the 64KB D$ on the A64.

There must be some prefetching involved then. Or if the majority of accesses are within 64K wouldn't that contradict what people have been saying about the 256K of SPE local memory being too small?

overclocked · Jul 18, 2005

The majority of cache requests are going to be served by level 1 cache. Current games have a 98-99% hit rate in the 64KB D$ on the A64.

So the shared level 2 cache is going to take care of the level 1 cache misses from each core. Even if the 32KB D$ of each XeCPU core will have worse hitrate it won't be by much.

You cant compare the Athlon64 cpu with the cpu in X360.
The design of this Cpu looks more like a prescott and thats being nice.

Gubbi · Jul 18, 2005

seismologist said:
Gubbi said:

The majority of cache requests are going to be served by level 1 cache. Current games have a 98-99% hit rate in the 64KB D$ on the A64.

Click to expand...

There must be some prefetching involved then.

No doubt there is. Interestingly miss rates are very low even in the absence of the explicit prefetch hints. The automatic prefetcher must do a fairly good job. I'll try to upload some data on this later.

seismologist said:
Or if the majority of accesses are within 64K wouldn't that contradict what people have been saying about the 256K of SPE local memory being too small?

Wasn't me.

My beef with the SPEs are not the size of the local store, but the lack of demand loading and automatic coherence.

Cheers
Gubbi

ERP · Jul 18, 2005

seismologist said:
Gubbi said:

The majority of cache requests are going to be served by level 1 cache. Current games have a 98-99% hit rate in the 64KB D$ on the A64.

Click to expand...

There must be some prefetching involved then. Or if the majority of accesses are within 64K wouldn't that contradict what people have been saying about the 256K of SPE local memory being too small?

Caches on CPU's work because in general code tends to use a number of relatively simple access patterns, which are extremly coherent over any small time period.

Where the local memories fail is in cases like a tree walk, generally frame to frame you will access a small subset and almost the exact same subset of the datastructure, but the data structure itself may be very large in comparison to that subset.

The advantage of the local memory is that it forces you to think about how to best organse your data. The disadvantage, is that it forces you to think how to organise your code and data. If you can't reasonably partition the data, you can't run the algorythm on the SPE. If you can then you probably should.

3dilettante · Jul 18, 2005

seismologist said:
Gubbi said:

The majority of cache requests are going to be served by level 1 cache. Current games have a 98-99% hit rate in the 64KB D$ on the A64.

Click to expand...

There must be some prefetching involved then. Or if the majority of accesses are within 64K wouldn't that contradict what people have been saying about the 256K of SPE local memory being too small?

Well, for one thing the SPE's local storage serves as both data and instruction storage. In the case of the A64, this would up the requirement to 128 kB.

The other issue is that the way SPEs are given chunks of data and instructions to run a task means that piecemeal prefetching could be harmful.

Gubbi · Jul 18, 2005

overclocked said:
The majority of cache requests are going to be served by level 1 cache. Current games have a 98-99% hit rate in the 64KB D$ on the A64.

Click to expand...

You cant compare the Athlon64 cpu with the cpu in X360.
The design of this Cpu looks more like a prescott and thats being nice.

Of course I can. XeCPU has 32KB D$ and 1MB L2 cache. A64 has 64KB D$ and Â½MB L2 cache, - fairly similar.

Cheers
Gubbi

overclocked · Jul 18, 2005

Of course I can. XeCPU has 32KB D$ and 1MB L2 cache. A64 has 64KB D$ and Â½MB L2 cache, - fairly similar.

Well you can but it wont reflect reality.

Gubbi · Jul 18, 2005

overclocked said:
Of course I can. XeCPU has 32KB D$ and 1MB L2 cache. A64 has 64KB D$ and Â½MB L2 cache, - fairly similar.

Click to expand...

Well you can but it wont reflect reality.

In what way?

Or do you think D$ hit rate will plummit going from 64KB to 32KB?

Cheers
Gubbi

Will L2 cache on X360's CPU be a major hurdle?

c0_re

Shifty Geezer

uber-Troll!

Jawed

Titanio

Jawed

Titanio

patsu

Gubbi

phat

3dilettante

Shifty Geezer

uber-Troll!

3dilettante

seismologist

overclocked

Gubbi

ERP

3dilettante

Gubbi

overclocked

Gubbi

Similar threads