Blogcast Audio of Tech Engineers(?) on PS3 vs Xbox360

Status
Not open for further replies.
Don't get me wrong - I'm not pretending that Sony isn't hyping stuff either.

That said - they're not enticing people with what looks like leaked insider info. MS is using the Major Nelson channel as a tool to make what they're saying look more legit when it's really propaganda. It's one thing to lay your cards on the table at E3 where you can be criticized in a more public manner (which both MS and Sony are being), but Major Nelson is the equivalent of the devil whispering in your ear. He's the equivalent Son of Sam's dog. :devilish:
 
Shifty Geezer said:
aaaaa00 said:
What I think they're trying to get at is that if you want an SPU to access main memory, you have to schedule a DMA from it to local store.
Thanks for the explanation aaaaa00. That's actually a good point then. So the SPE's haven't got direct addressing per se? I thought that was implemented.

Doing a direct access without caching would destroy your performance, since main memory access time is (potentially) in the thousands of cycles.

I don't think there's any sane way of doing direct main memory access without a cache of some sort in between to help you.
 
aaaaa00 said:
Shifty Geezer said:
aaaaa00 said:
What I think they're trying to get at is that if you want an SPU to access main memory, you have to schedule a DMA from it to local store.
Thanks for the explanation aaaaa00. That's actually a good point then. So the SPE's haven't got direct addressing per se? I thought that was implemented.

Doing a direct access without caching would destroy your performance, since main memory access time is (potentially) in the thousands of cycles.

I don't think there's any sane way of doing direct main memory access without a cache of some sort in between to help you.

But then again, they don't suffer from cache misses either. All the local memory is just a very large set of on-board registers to the SPE cores. A register file, that can be accessed directly without any latency. And block moves through DMA are much more efficient than loading single values from main memory.
 
Shifty Geezer said:
Also what's the overhead (transistor count I guess) for implementing direct memory access that STI felt it worth going without?

The overhead is all the logic you need to build a cache: to keep it consistent with main memory, with the other SPUs, the tag bits, the searching, etc.

Caching isn't free, it costs transistors. But those transistors aren't wasted, they do useful work.

Considering they're suppoed to work on voluminous data streams, not having easy access to main RAM locations seems a bit daft :D

Not necessarily daft.

As a simple example, in a simple streaming type algorithm, lets say it is practical to cut your work into chunks that are self-contained, and can fit entirely into 1/3 of LS or less.

You divide LS into three pieces, A, B, and C.

Then you schedule a DMA into chunk A of the LS, and work on the stuff already loaded into the chunk B of the LS, and DMA the results in chunk C to main memory. Then you swap chunks.

ABC
BCA
CAB
etc.

This is basically streaming processing.

However, doing this relies on making sure everything that you need to do your work fits into the LS. This is not always possible.
 
Righto. So A cache basically uses a lot of logic to manage the memeory access effeciently and remove the burden from the programmer, whereas a SPE needs to have the developer manage memory access with queued DMA accesses.

Will compilers be able to manage this at all, or is it all the skill of the programmer?
 
DiGuru said:
All the local memory is just a very large set of on-board registers to the SPE cores. A register file, that can be accessed directly without any latency.

Strictly speaking, not true. LS has latency (and so does cache), just a lot less than main memory. :)

And block moves through DMA are much more efficient than loading single values from main memory.

Depends on how big your data structure is and where the break even point is for the setup costs of your DMAs.
 
twotonfld said:
Don't get me wrong - I'm not pretending that Sony isn't hyping stuff either.

That said - they're not enticing people with what looks like leaked insider info. MS is using the Major Nelson channel as a tool to make what they're saying look more legit when it's really propaganda. It's one thing to lay your cards on the table at E3 where you can be criticized in a more public manner (which both MS and Sony are being), but Major Nelson is the equivalent of the devil whispering in your ear. He's the equivalent Son of Sam's dog. :devilish:
It's for the fans. You won't see this in gaming mags or in MS ads. This is for damage control caused by Sony's pre-E3 presser. Both are blatant cases of propaganda for the fanboys on each side to fight over. I applaud the move even though I don't take either side seriously.

The most dissapointed will be those who expect some kind of leap in graphical fidelity on either side. If you still believe that the PS3 is 2-3 times more powerful than the Xbox 360 then you're in for a rude awakening. Likewise, if you expect Xbox 360 to have the same power advantage that the Xbox had over the PS2, then you'll be miserable. The proof in the pudding will not be online articles comparing apples to oranges, but it will be on the screen.
 
Shifty Geezer said:
Righto. So A cache basically uses a lot of logic to manage the memeory access effeciently and remove the burden from the programmer, whereas a SPE needs to have the developer manage memory access with queued DMA accesses.

Will compilers be able to manage this at all, or is it all the skill of the programmer?
A great compiler could manage execution long before a single thread is fired off, but I'm not sure if that could get the "cache" hits up to the 90% sweet-spot. The lack of predictive logic puts the burden on Cell compiler writers.
 
A stream works just like a file, that is read or written in blocks. At the start, you read the first block, process it, write the results and request the next block. And you go on doing that until all data is processed.

The interesting thing of a stream is, that the whole thing doesn't have to exist at the start. Only the first block. And when the first block is being processed, you can build the next block. And so on, until you are done.

This is great for distributed processing, as you can just have the CPU build the first block, give all the other units a program to execute on their stream, and tell them which unit to contact to get the next block in the stream. And if they have all reached the end of the stream and there are no more blocks left, you're done.

It doesn't even matter very much what size the blocks are, you can even have each unit use their own size. You only have to send them a bit of data if they ask for it. They will do so until their preferred block size is filled and they can start processing.

And of course you can use interrupts, DMA and mix data streams to make all that even more seamless and continious.
 
It's for the fans. You won't see this in gaming mags or in MS ads. This is for damage control caused by Sony's pre-E3 presser.
Who knows, I wouldn't be surpried seeing this sort of stuff printed in mags... After all, much of the mainstream media that covers games has already printed Sony's power advantage statements, and even adopted "Xbox 1.5" monicker.

However, Sony has put more burden to themselves doing this. People will expect more from them now. I for one was honestly very disappointed in all the next gen games on the E3 show floor. Not only were they completely uninspired, but all were even completely technically unimpressive (note, Gears of War was only behind closed doors, but even that game suffers from crappy framerate at this point, and I didn't get to see Alan Wake, which seems like an intriguing and nice looking game). As I said in the other thread, the next gen games really should/must look like demos Sony demonstrated, or there's not much point otherwise. After all, I think that footage is largely achievable and some of the best looking stuff was realtime footage anyways (like Heavenly Sword).
 
Alpha_Spartan said:
twotonfld said:
Don't get me wrong - I'm not pretending that Sony isn't hyping stuff either.

That said - they're not enticing people with what looks like leaked insider info. MS is using the Major Nelson channel as a tool to make what they're saying look more legit when it's really propaganda. It's one thing to lay your cards on the table at E3 where you can be criticized in a more public manner (which both MS and Sony are being), but Major Nelson is the equivalent of the devil whispering in your ear. He's the equivalent Son of Sam's dog. :devilish:
It's for the fans. You won't see this in gaming mags or in MS ads. This is for damage control caused by Sony's pre-E3 presser. Both are blatant cases of propaganda for the fanboys on each side to fight over. I applaud the move even though I don't take either side seriously.

The most dissapointed will be those who expect some kind of leap in graphical fidelity on either side. If you still believe that the PS3 is 2-3 times more powerful than the Xbox 360 then you're in for a rude awakening. Likewise, if you expect Xbox 360 to have the same power advantage that the Xbox had over the PS2, then you'll be miserable. The proof in the pudding will not be online articles comparing apples to oranges, but it will be on the screen.

I can't applaud the move - I find it much more disingenuous than concept art and "XBox 1.5". I think the more appropriate move would've been to release tech demos. MS puched themselves in the nuts showing mostly uninspiring footage. The group that did this forgot that to tout its snowboarding game MS had to play with photoshop. MS has to realize that it's not the style of the presentation (MTV *gag*) it's the content of it.

That said - monkeys everwhere are flinging poo these days.

From the next gen, I expect from all consoles close to what I saw from Sony's conference. I don't think that's out of reach. That said, I think there are going to be some first gen letdowns since everyone (sans N) is rushing to market.
 
Don't expect prerendered CG quality, or you'll be in for a huge disappointment...
 
DiGuru said:
aaaaa00 said:
Shifty Geezer said:
aaaaa00 said:
What I think they're trying to get at is that if you want an SPU to access main memory, you have to schedule a DMA from it to local store.
Thanks for the explanation aaaaa00. That's actually a good point then. So the SPE's haven't got direct addressing per se? I thought that was implemented.

Doing a direct access without caching would destroy your performance, since main memory access time is (potentially) in the thousands of cycles.

I don't think there's any sane way of doing direct main memory access without a cache of some sort in between to help you.

But then again, they don't suffer from cache misses either. All the local memory is just a very large set of on-board registers to the SPE cores. A register file, that can be accessed directly without any latency. And block moves through DMA are much more efficient than loading single values from main memory.


More on the memory access issue from the Patent...

"... For each main-memory access, the processor would have to consult four lookup tables... Three of those tables are in DRAM, which implies slow off-chip memory references; the other table is in the DMA controller’s SRAM. In some cases, the delays caused by the table lookups might eat more clock cycles than reading or writing the actual data. The patent hints that some keys might unlock multiple memory locations or sandboxes, perhaps granting blanket permission for a rapid series of accesses, within certain bounds."
 
Laa-Yosh said:
Don't expect prerendered CG quality, or you'll be in for a huge disappointment...

Why would I expect that - I expect what I've seen from GoW at a real frame rate, what I've seen of Heavenly Sword and What I've seen of Fight Night.

Those blow anything I've got at home now away - so I'll be content.
 
PC-Engine said:
gmoran said:
PC-Engine said:
SONY started using GFLOPS as a performance metric. MS is only doing it in return. An eye for an eye. ;)

1TFLOPS isn't being honest, 2 TFLOPS isn't either. It's just funny that SONY is stuck in this GFLOPS/TFLOPS game that they started. :LOL:

I know you are not being entirely serious here, but Sony's use of GFLOPS was as legitimate as these things can ever be termed legitimate; whereas NV's original use of NV FLOPS, and MS's and Sony's subsequent use of that metric isn't.

I don't know why you're still arguing this. You cannot compare EE to a P3 because EE needs to do all the TnL that a P3 does NOT need to do. The TnL is done by the NV2A so if you want to compare GFLOPS then you need to compare the EE vs the CPU+GPU understand?

It is very unjust to say this: you are basically making the argument that FP heavvy calculations are just useless on the XCPU, because the GPU does all of them anyways.

The EE is still better than the XCPU at Vector Math processing even though a portion of it is taken away in T&L (it is you as a programmer that decides where the enphasis is: matching an equivalent Xbox game by displaying many polygons but with ridiculous physics processing or show less polygons on screen but have more advanced interaction between all elements of the scene): NV2A was not too good at accellerating many things beyond graphics (not the ideal spot for GPGPU programming) so its GFLOPS were dedicated to one task.

End: I am sure though that SEGA never mentioned to anyone that their CPU was capable of 1.4 GFLOPS while PC's at the time were not... ;).
 
Mythos said:
More on the memory access issue from the Patent...

"... For each main-memory access, the processor would have to consult four lookup tables... Three of those tables are in DRAM, which implies slow off-chip memory references; the other table is in the DMA controller’s SRAM. In some cases, the delays caused by the table lookups might eat more clock cycles than reading or writing the actual data. The patent hints that some keys might unlock multiple memory locations or sandboxes, perhaps granting blanket permission for a rapid series of accesses, within certain bounds."

Yes, it's optimized for streams. And most of that look-up overhead is just part of the low-level memory management, like with every processor since the 386 and 68030. It is is performed directly in hardware, completely transparent to the developer. Every computer architecture does that, to keep track of what is where.
 
PC-Engine trolled 3 pages of this thread, and almost each and every time, people fed the trolls. Congratulation people, you suck at the internet.

The signal:noise ratio of this thread is ridiculous, like a lot of threads theses days, thanks to the E3. But this thread takes the cake.
 
Yes I agree , the thread is getting locked but the reason isn't just pc-engine . You guys all need to cut it out and some of the new members need to chill out too
 
Status
Not open for further replies.
Back
Top