Will L2 cache on X360's CPU be a major hurdle?

Discussion in 'Console Technology' started by Korrupt, Jul 18, 2005.

  1. PC-Engine

    Banned

    Joined:
    Feb 7, 2002
    Messages:
    6,799
    Likes Received:
    12
    You can lock portions of the cache. This is well known fact.
     
  2. cobragt

    Newcomer

    Joined:
    May 22, 2005
    Messages:
    31
    Likes Received:
    0
    that doesnt sound right at all for me. the SPE's have their own cache which is there local memory.

    Edit:
    the 256k high speed local memory of each cell is like a programmer controlled cache. There is no cache hardware to keep it synchronized with main memory, it's a scratchpad that the programmer controls. It's better than regular cache.
     
  3. Bobbler

    Bobbler Shazbot!
    Veteran

    Joined:
    May 22, 2005
    Messages:
    1,827
    Likes Received:
    29
    Location:
    Minneapolis, MN
    It seems to me that each SPE has direct access to ram through the EIB -- the PPE is no more special than an SPE to the internal bus (EIB). Everything has equal access to the ram as far as I know... they are all just ports on the EIB "ring". Does that even make sense that the 7 (or 8 ) SPEs would need to go through the PPE's cache to get anything from ram?

    Maybe you have some actual info; what you said goes against everything I've heard about how the Cell works.
     
  4. Inane_Dork

    Inane_Dork Rebmem Roines
    Veteran

    Joined:
    Sep 14, 2004
    Messages:
    1,987
    Likes Received:
    46
    Not only is that a huge can of worms, but it's not really on topic. The guy asked about the XeCPU, not Cell.


    Anyway, I know the CPU can lock L2 cache lines for the GPU to grab data from. I don't know that the CPU can lock the cache in any other circumstance, though many people seem to presume it's true.
     
  5. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,106
    Likes Received:
    16,898
    Location:
    Under my bridge
    Me?! I'm the last person inn the world to trust for accurate stats!

    Cell's LS provides data to the SPE's at the same rate as data is available from L1 cache. What it doesn't do is fetch data itself - the cache needs to be managed. The SPU sends requests for data to be fetched from RAM over the FlexIO without getting in the way of any other storage, so in effect you have 7 LS and 1 L2 cache independantly accessing RAM over FlexIO as I understand it.

    Looking at XeCPU, we've got 1 MB L2 cache. Between three cores that's 333 MB each which isn't bad - more than SPUs (though really the figures aren;t comparable as they're used differently.) Adding another 3 hardware threads takes that down a lot.

    However I don't know if that's really gonna be too bad. Yeah, more cache is great, but the cache can be managed similarly to SPU LS with prefetching commands etc. Also threads and cores can share data by writing it to the unifed cache.

    I think it quite possible that a couple threads lock down say 64 KB for 32+32 KB buffered streamed data, while another thread works on some material that stored locally to feed a fourth...three or four threads could probably be quite happily fed on 256 KB, leaving 384 KB each for two major threads (generic processing) would shouldn't be too bad a limiting factor.
     
  6. SanGreal

    Regular

    Joined:
    Jul 15, 2002
    Messages:
    406
    Likes Received:
    1
    Location:
    New Jersey
  7. Guden Oden

    Guden Oden Senior Member
    Legend

    Joined:
    Dec 20, 2003
    Messages:
    6,201
    Likes Received:
    91
    Your point is based on a false premise. Each SPE does have a direct connection to main RAM through its DMA controller.

    You're confused; local store CAN'T function as a cache because it ISN'T a cache. It's just a quite normal piece of random-access memory. Caches are divided up into what is known as cache lines, each line equipped with a "tag" which tells the cache controller logic which sequence of addresses in memory is stored in that particular cache line. When a read request comes in, the cache controller reads through its tags to see if that address is stored or not and acts accordingly. If it is, the data is delivered straight to the CPU. If it isn't, a request for it is generated and then the CPU has to wait for it to come in. It all works automatically, and the CPU cannot differentiate between cache and main memory. It's completely transparent, and typically there is no way of telling if a piece of data is present in cache or not as cache isn't addressable as memory; it's a MIRROR of the memory it caches.

    As local store is just memory, the program has to decide what is stored in the store and what isn't. This doesn't make it into some sort of "software controlled" cache; it ISN'T CACHE plain and simple. It's just a quarter megabyte SRAM memory, that's it. Think of the SPE as a computer of its own equipped with 256 kB memory attached to its motherboard and an I/O controller to bring data in and out of that memory.

    Oh and by the way, as someone else brought it up: the local store SRAM isn't zero wait-state. No SRAM running at 3.2GHz is going to be zero wait-state, it isn't physically possible, or at least not with current technology. Even SRAM running at a fraction of that speed have wait-states of a couple cycles.
     
  8. PC-Engine

    Banned

    Joined:
    Feb 7, 2002
    Messages:
    6,799
    Likes Received:
    12
    Ah ok thanks for the explanation. :)
     
  9. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,661
    Likes Received:
    1,114
    The SPEs have 6 cycles of load-to-use latency (5 "wait-state" cycles).

    Cheers
    Gubbi
     
  10. LunchBox

    Regular

    Joined:
    Mar 13, 2002
    Messages:
    901
    Likes Received:
    8
    Location:
    California
    i am very layman...

    am i getting this straight?

    local store = scratch pad

    Cache = scratch pad with labels

    ?????????
     
  11. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    It seems local store load/write ops now have a smaller latency (4 cycles), according Mr Suzuoki's presentation at Rambus conference.
     
  12. fouad

    Banned

    Joined:
    Jun 29, 2005
    Messages:
    65
    Likes Received:
    0
     
  13. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,661
    Likes Received:
    1,114
    Link?

    The MPR presentation shows the LS unit of the SPE to provide data after stage FW06.

    Cheers
    Gubbi
     
  14. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
  15. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,106
    Likes Received:
    16,898
    Location:
    Under my bridge
    No. You cannot deliberately write to cache. The cache is just like a HDD cache or CD cache. The idea is that it stores recently used data from the storage device (RAM, HDD, CD) so that if that information is accessed again, rather than having to obtain it from the slow source it can be retrieved from the fast cache.

    I don't know how effective it is in real terms. I always though it very good, but Deano's Blog posted here by SanGreal pointed out investigations that show accessing the same data again doesn't hapen very often, so cache isn't too useful. Managed storage doesn't have that problem but has the faff of the dev having to manage it. XeCPU can prefetch data this way I believe.
     
  16. Acert93

    Acert93 Artist formerly known as Acert93
    Legend

    Joined:
    Dec 9, 2004
    Messages:
    7,782
    Likes Received:
    162
    Location:
    Seattle
    I believe it was arstechnica (could be wrong), but I think they hit on the fact cache is important for a diverse working environment (like using Word, Excel, Outlook, Browsing, etc) but less so for gaming. He used Quake 3 and the old Celerons without cacheto make the point. Quake 3 ran just as well on a Celeron, even though the CPU did T&L. Why? Much of the gaming data was streamed geometry and not reused.

    Obviously games are more complex these days, but then again we are talking about closed box systems as well.

    i.e. One of those "Desktop PCs and Game Consoles are different animals".

    More cache would not hurt (and probably would help) but there is always that tradeoff point. Would 2 cores and 2MB of cache have been better than 3 cores and 1MB of cache? IBM did not think so. Will developers agree? We will find out sooner or later ;)
     
  17. Embedded Sea

    Newcomer

    Joined:
    May 10, 2005
    Messages:
    17
    Likes Received:
    0
    The problem with SPEs can be related to the story of four people examining an elephant in a dark room. They can't figure out what the heck it is, one thinks it's a pillar, one thinks it's a tree, one thinks they feel the top of a roof. Having local memory to one processing unit in a game is essentially forcing you to get those four people to not care about the big picture - they only examine one little piece, do their thing on it, and move on, even if what they conclude is relatively dumb and limited.
     
  18. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    Most of the times this could be a good thing (tm) :)
     
  19. 3roxor

    Regular

    Joined:
    Jun 29, 2005
    Messages:
    671
    Likes Received:
    3
    I think 1Mb is just to litle.. B.t.w..anyone know why they didn't use a higher clocked G5 dual processor model since they go up to 2.7Ghz :?:
     
  20. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,106
    Likes Received:
    16,898
    Location:
    Under my bridge
    The SPE's aren't forced to work in isolation. They can share information. And in a dark room where instead of considering 1 elephant there is a sock, a basketball, and apple and a piano, all four can work on their own without any problem, which is kinda the point of multithreading.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...