Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 23-May-2005, 07:37   #51
TexT
Junior Member
 
Join Date: Mar 2005
Posts: 26
Default

Sony is at it again...

Quote:
As an example of the advances being made, Pearson noted that Sony's new PlayStation 3 computer games console is 35 times as powerful as the model it replaced, and in terms of processing is "one percent as powerful as a human brain".
http://uk.news.yahoo.com/050522/323/fjiiv.html
TexT is offline   Reply With Quote
Old 23-May-2005, 07:42   #52
Mordecaii
Member
 
Join Date: May 2005
Posts: 297
Default

Sony had nothing to do with that quote by the way... Just wanted to mention that before the cries of "OMG TEH EVVIL $ONY HYPE MACHINE!!!11!!1!" This guy is mostly talking about being able to download a person's brain into a computer so you didn't "truly" die and how it'll be possible by 2050...
Mordecaii is offline   Reply With Quote
Old 23-May-2005, 07:57   #53
Geeforcer
Harmlessly Evil
 
Join Date: Feb 2002
Posts: 2,027
Default

I wonder what matrix were they using. My pocket calculator is 1000 more powerful then my brain when it comes to solving differential equations within a given time span.
Geeforcer is offline   Reply With Quote
Old 23-May-2005, 09:02   #54
rwolf
Rock Star
 
Join Date: Oct 2002
Location: Canada
Posts: 961
Default

http://www.extremetech.com/article2/...1818127,00.asp

Quote:
The 48 ALUs are divided into three SIMD groups of 16. When it reaches the final shader pipe, each of the 16 ALUs has the ability to write out two samples to the 10MB of EDRAM. Thus, the chip is capable of writing out a maximum of 32 samples per clock. At 500MHz, that means a peak fill rate of 16 gigasamples. Each of the ALUs can perform 5 floating-point shader operations. Thus, the peak computational power of the shader units is 240 floating-point shader ops per cycle, or 120 billion shader ops per second at 500MHz
Quote:
The 10MB of EDRAM is actually on a separate die, at least initially. As future process technologies become available, it is possible that it could be on the same piece of silicon as the GPU. Still, the EDRAM resides on the same package, and has a wide bus running at 2GHz to deliver 256GB/sec of bandwidth. That's a true 256GB/sec, not one of those fuzzy counting methods where the 256GB is "effective" bandwidth that accounts for all kinds of compression. The GPU writes the back buffer, Z buffer, and stencil buffer to the EDRAM. When it is finally able to drawn to the screen, the EDRAM transfers the back buffer to the 512MB of GDDR3 for scan-out. The EDRAM does not store any textures—the full 10MB gets pretty much filled up with 1280x720 HD resolution, including Z, stencil, and anti-aliasing sub-pixel samples.

There's even a little magic that happens at that phase. The EDRAM has built in logic to perform Z compare, alpha blending, and resolving anti-aliasing samples into pixels. Normally those operations happen on the GPU, and require not only valuable silicon real estate and on-chip caches, but eat into memory bandwidth as data has to go back and forth to the GPU from the main graphics RAM. ATI's solution of building that logic into the EDRAM where the back, Z, and stencil buffers live eliminates a lot of data transfer and save time and silicon space on the GPU die itself. Because of the bandwidth savings and absolutely massive bandwidth to EDRAM, the Xbox 360 should be able to perform frame buffer effects like motion blur, depth of field, or lens flare with incredible speed.
8)
rwolf is offline   Reply With Quote
Old 23-May-2005, 10:42   #55
london-boy
Me me me
 
Join Date: Apr 2002
Posts: 15,367
Default

Quote:
Originally Posted by TexT
Sony is at it again...

Quote:
As an example of the advances being made, Pearson noted that Sony's new PlayStation 3 computer games console is 35 times as powerful as the model it replaced, and in terms of processing is "one percent as powerful as a human brain".
http://uk.news.yahoo.com/050522/323/fjiiv.html
Well if each generation is even just 20 times as powerful as the last one, it will only take about 3 or 4 generations for consoles to become as powerful as our brains, according to the guy... That's about 20-25 years...
And only for consoles.

Before we get there, Blue Gene will have taken over the world 5 times and a half and we'll all be slaves of the machines.
london-boy is offline   Reply With Quote
Old 23-May-2005, 10:46   #56
Vaan
Member
 
Join Date: Mar 2005
Location: Zaragoza, Aragón, Spain, Europe, World...
Posts: 115
Send a message via MSN to Vaan
Default

256GB/s effective... between what?
__________________
A 255 character limit on my signature? What is the byte number 256 used for? We want to know all the truth!!
Vaan is offline   Reply With Quote
Old 23-May-2005, 11:16   #57
nAo
Nutella Nutellae
 
Join Date: Feb 2002
Location: San Francisco
Posts: 4,308
Default

If there are 256 GBytes/s between GPU and edram why they underdesigned their ROPs and fill rate halves when rendering to 64 bits render targets?
I don't believe in the 256 GBytes/s number as the real bandwith between GPU core and edram.
nAo is offline   Reply With Quote
Old 23-May-2005, 12:19   #58
Jawed
Regular
 
Join Date: Oct 2004
Location: London
Posts: 9,867
Send a message via Skype™ to Jawed
Default

I don't believe there's 256GB/s between GPU and EDRAM, either.

I do believe there's 256GB/s between the ROPs and the back-buffer.

Jawed
Jawed is offline   Reply With Quote
Old 23-May-2005, 15:23   #59
JAD
Junior Member
 
Join Date: May 2005
Location: Netherlands
Posts: 25
Default

Quote:
Originally Posted by Jawed
I don't believe there's 256GB/s between GPU and EDRAM, either.

I do believe there's 256GB/s between the ROPs and the back-buffer.

Jawed
Isn't that already sort of agreed upon, and because of the ROPs and eDram being on the same die you can't really call it bandwidth either?

Or am I missing something?
__________________
There's just so much you can do with any time given.
JAD is offline   Reply With Quote
Old 23-May-2005, 15:46   #60
Jawed
Regular
 
Join Date: Oct 2004
Location: London
Posts: 9,867
Send a message via Skype™ to Jawed
Default

For a while people were saying that the 256GB/s figure inside the EDRAM was fictional...

It was originally described as "effective".

It would appear it's real, not effective.

Though we're still waiting to get hard facts, so I'll continue to "believe" rather than treat it as a hard fact (admittedly, hard to do).

Jawed
Jawed is offline   Reply With Quote
Old 23-May-2005, 15:59   #61
Nite_Hawk
Senior Member
 
Join Date: Feb 2002
Location: Minneapolis, MN
Posts: 1,202
Send a message via ICQ to Nite_Hawk Send a message via AIM to Nite_Hawk Send a message via MSN to Nite_Hawk
Default

Quote:
Originally Posted by Jawed
For a while people were saying that the 256GB/s figure inside the EDRAM was fictional...

It was originally described as "effective".

It would appear it's real, not effective.

Though we're still waiting to get hard facts, so I'll continue to "believe" rather than treat it as a hard fact (admittedly, hard to do).

Jawed
I think people are getting really mixed up over everything. Between 32GB/s external bus throughput to 256Gb/s external bus throughput to 256GB/s internal bus throughput. People end up mixing up terms and buses. I think most people were of the opinion that the 256GB/s external bus width was probably fictional, but anything on the edram chip internally could talk much faster. Granted, I don't know if people expected the edram "processor" to be able to do blending/aa/etc.

Nite_Hawk
Nite_Hawk is offline   Reply With Quote
Old 23-May-2005, 17:49   #62
Lazy8s
Senior Member
 
Join Date: Oct 2002
Posts: 2,833
Default

Jaws:
Quote:
PS3 ~ 2 TFLOPS

X360 ~ 1 TFLOPS
Microsoft claimed 'more than 1 TFLOP'. The X360 GPU probably rates well over 1 TFLOP by itself by counting its fixed functionality as floats in the same way nVidia did. It appears Microsoft was counting this way and just rounded to the nice 1 TFLOP spec, and Sony outdid them in their announcement by not rounding down.
Lazy8s is offline   Reply With Quote
Old 24-May-2005, 19:39   #63
j^aws
Senior Member
 
Join Date: Jun 2004
Posts: 1,908
Default

Quote:
Originally Posted by Titanio
Quote:
Originally Posted by Jaws
They're not exactly identical. However the CELL PPE and XeCPU are both Power based, 12 Flops per cycle, 2-way SMT, in-order cores...
Was there confirmation of this? Specifically the in-order bit?
No official details but the 2-way SMT and 12 flops per cycle was inferred from 115 and 218 GFlops @ 3.2 GHz for XeCPU and CELL.


Quote:
Originally Posted by blakjedi
Quote:
Originally Posted by Jaws
Xenos is capable of 24 billion dot products per second. If you allocate 37.4 billion to RSX, that's a helluva increase considering they're both on 90nm, no?
I had asked in the xenos thread whether the work output spoken of related to Xenos includes the edram or is just the shader part...

"However, using Sony's claim, 7 dot products per cycle * 3.2 GHz = 22.4 billion dot products per second for the CPU. That leaves 51 - 22.4 = 28.6 billion dot products per second that are left over for the GPU. That leaves 28.6 billion dot products per second / 550 MHz = 52 GPU ALU ops per clock."
Sorry your question and your quote don't seem related or I'm missing what your asking here? If your asking whether the fixed function logic/ALUs on the EDRAM module are included, then no...it's only shader ALUs.

The fixed function stuff would be included in the 1TFLOP number of X360 though...

Your above quote is the 51 Giga dots/sec for both CELL and RSX. I took 8 dots/cycle for CELL (VMX+7 SPU)...but the above assumes 7, excluding the VMX for CELL.

This would suggest that the '52' number is 52 vec4 units contributing to the 136 shader ops per cycle for RSX, then 136-52 ~ 84 ALUs would be scalar ALUs or ones not capable of dot products on the RSX...i.e.

52 Vec4 units + 84 vec?/scalar units?

Vec4 + scalar units can be paired,


RSX

52 Vec4 + 52 Scalar + 32 vec? units?

:?

Quote:
Originally Posted by rwolf
http://www.extremetech.com/article2/...1818127,00.asp

Quote:
The 48 ALUs are divided into three SIMD groups of 16. When it reaches the final shader pipe, each of the 16 ALUs has the ability to write out two samples to the 10MB of EDRAM. Thus, the chip is capable of writing out a maximum of 32 samples per clock. At 500MHz, that means a peak fill rate of 16 gigasamples. Each of the ALUs can perform 5 floating-point shader operations. Thus, the peak computational power of the shader units is 240 floating-point shader ops per cycle, or 120 billion shader ops per second at 500MHz
...
8)
I agree Xenos is cool! 8)

But some of these sites are really just confusing all these numbers.

It's 48 Billion shader ops per second for Xenos in the *official* specs,

http://www.xbox.com/assets/en-us/xbo...FactSheets.zip

Also the "240 floating-point shader ops per cycle" they mention can be easily confused with single precision 240 floating-point ops per cycle (flops)! Which is not accurate as that would be 480 flops per cycle with FMADD! :P

Anyway, the numbers on the first page of this thread are accurate from the info we have...and these random sites are throwing all sorts of conflicting numbers around...


Quote:
Originally Posted by Lazy8s
Jaws:
Quote:
PS3 ~ 2 TFLOPS

X360 ~ 1 TFLOPS
Microsoft claimed 'more than 1 TFLOP'. The X360 GPU probably rates well over 1 TFLOP by itself by counting its fixed functionality as floats in the same way nVidia did. It appears Microsoft was counting this way and just rounded to the nice 1 TFLOP spec, and Sony outdid them in their announcement by not rounding down.
IIRC, from official specs,

RSX ~ 1.8 TFlops
CELL ~ 0.218 TFlops

X360 is still quoted at system total ~ 1 TFlops
XeCPU ~ 0.115 TFlops
Xenos ~ 0.885 TFlops

Not sure why one would 'round down' and the other 'round up' given the oportunity. But it could well be that the RSX has alot of fixed function logic on-board that counts to that number whilst the Xenos transistor count has 10 MB of eDRAM which wouldn't contribute to that number...
j^aws is offline   Reply With Quote
Old 24-May-2005, 20:13   #64
blakjedi
Senior Member
 
Join Date: Nov 2004
Location: 20001
Posts: 2,189
Send a message via AIM to blakjedi Send a message via MSN to blakjedi Send a message via Yahoo to blakjedi Send a message via Skype™ to blakjedi
Default

How in the world does the Nvidia rate at 1.8 Teraflops? Nomatter what I've read it just doesnt add up.
blakjedi is offline   Reply With Quote
Old 24-May-2005, 21:22   #65
ShootMyMonkey
Senior Member
 
Join Date: Mar 2005
Posts: 1,160
Default

Same way Xenos rates at 900 GFLOPS... it's called misleading the consumer. For instance, RSQ could be counted as 1 FLOP, but not in marketing-land. Instead, we'll count the lookup as one FLOP, and count all the FLOPs used in the NR refinement, and then you'd get something like 15-odd flops in a single shader instruction. Or perhaps you can imagine that it does SIN/COS using the first 4/5 terms of the Maclaurin Series and geometrically mirroring the results. That would amount to... what... 30 FLOPs per instruction? So all you have to do is consider how much computing power the GPU would have if you did nothing but SIN and/or COS and/or RSQ for every single instruction you'll ever execute. There's a few TFLOPs for you.
__________________
Life is veritably the exact opposite of a vacuum cleaner. Vacuums tend to suck less and less as time goes on.
ShootMyMonkey is offline   Reply With Quote
Old 25-May-2005, 02:19   #66
AkiraX
Registered
 
Join Date: May 2005
Posts: 3
Default

Quote:
Originally Posted by Nite_Hawk
Quote:
Originally Posted by Jawed
For a while people were saying that the 256GB/s figure inside the EDRAM was fictional...

It was originally described as "effective".

It would appear it's real, not effective.

Though we're still waiting to get hard facts, so I'll continue to "believe" rather than treat it as a hard fact (admittedly, hard to do).

Jawed
I think people are getting really mixed up over everything. Between 32GB/s external bus throughput to 256Gb/s external bus throughput to 256GB/s internal bus throughput. People end up mixing up terms and buses. I think most people were of the opinion that the 256GB/s external bus width was probably fictional, but anything on the edram chip internally could talk much faster. Granted, I don't know if people expected the edram "processor" to be able to do blending/aa/etc.

Nite_Hawk

"ATI: The 2-terabit (256GB/sec) number comes from within the EDRAM, that’s the kind of bandwidth inside that RAM, inside the chip, the daughter die. But between the parent and daughter die there’s a 236Gbit connection on a bus that’s running in excess of 2GHz. It has more than one bit obviously between them."

http://firingsquad.com/features/xbox...view/page3.asp


also, old diagram:
http://www.xbitlabs.com/misc/picture...bg.gif&1=1
AkiraX is offline   Reply With Quote
Old 25-May-2005, 03:16   #67
AkiraX
Registered
 
Join Date: May 2005
Posts: 3
Default

Quote:
Originally Posted by Nite_Hawk
Quote:
Originally Posted by Jawed
For a while people were saying that the 256GB/s figure inside the EDRAM was fictional...

It was originally described as "effective".

It would appear it's real, not effective.

Though we're still waiting to get hard facts, so I'll continue to "believe" rather than treat it as a hard fact (admittedly, hard to do).

Jawed
I think people are getting really mixed up over everything. Between 32GB/s external bus throughput to 256Gb/s external bus throughput to 256GB/s internal bus throughput. People end up mixing up terms and buses. I think most people were of the opinion that the 256GB/s external bus width was probably fictional, but anything on the edram chip internally could talk much faster. Granted, I don't know if people expected the edram "processor" to be able to do blending/aa/etc.

Nite_Hawk
FiringSquad: What types of operations do the EDRAMs 192 processors perform?

ATI: Well they do z-compares, they do alpha blends, they do blends of samples to make a pixel. That kind of thing. They do stencil operations also. And this is the first time memory has access to something like this, right in the memory, so it never leaves the memory die. The memory and the logic is all built into one die. And it’s also a power savings by the way.

http://firingsquad.com/features/xbox...view/page3.asp
AkiraX is offline   Reply With Quote
Old 25-May-2005, 03:38   #68
Lazy8s
Senior Member
 
Join Date: Oct 2002
Posts: 2,833
Default

Jaws:
Quote:
IIRC, from official specs,

RSX ~ 1.8 TFlops
CELL ~ 0.218 TFlops

X360 is still quoted at system total ~ 1 TFlops
XeCPU ~ 0.115 TFlops
Xenos ~ 0.885 TFlops
I don't think the total "targeted" FLOPS "power" of the Xenos graphics chipset has ever been disclosed. The PR rough guideline for total system performance is too vague to consider it an absolute quantity useful in deriving 885 GFLOPS for the GPUs. Considering the NV40 was already rated around 1 TFLOP by similar nVidia accounting, I suspect X360's next generation graphics chipset probably delivers something comparable and more.
Quote:
Not sure why one would 'round down' and the other 'round up' given the oportunity.
Microsoft probably felt claiming the magical TFLOP barrier would be spoiling enough, and Sony was left in the position to be more exact in order to show that there would still be some improvement in power for their system.
Lazy8s is offline   Reply With Quote
Old 25-May-2005, 08:25   #69
Xmas
Off-season
 
Join Date: Feb 2002
Location: On the pursuit of happiness
Posts: 3,019
Default

Quote:
Originally Posted by blakjedi
How in the world does the Nvidia rate at 1.8 Teraflops? Nomatter what I've read it just doesnt add up.
NVidia claims 360 Gflops for NV40, counting PS, VS, texturing and blending. That figure is a bit on the high side, but probably not too far off.
If we take the 136 to 53 shader ops comparison as RSX being "2.57 times NV40", we arrive at 920 Gflops. And btw, it could very well mean RSX has 28 of 32 pixel pipelines (a parallel to Cell )
Xmas is offline   Reply With Quote
Old 25-May-2005, 08:30   #70
jvd
Naughty Boy!
 
Join Date: Feb 2002
Location: new jersey
Posts: 12,731
Send a message via AIM to jvd
Default

Quote:
", we arrive at 920 Gflops
which is 880gflops less than they claim
__________________
Freexbox 360 !!!
Free Psp!
jvd is offline   Reply With Quote
Old 25-May-2005, 09:01   #71
nAo
Nutella Nutellae
 
Join Date: Feb 2002
Location: San Francisco
Posts: 4,308
Default

Quote:
Originally Posted by Xmas
NVidia claims 360 Gflops for NV40, counting PS, VS, texturing and blending. That figure is a bit on the high side, but probably not too far off.
Do you remember where you read those numbers? some official nvidia document?
BTW, you have a PM
nAo is offline   Reply With Quote
Old 25-May-2005, 09:04   #72
Xmas
Off-season
 
Join Date: Feb 2002
Location: On the pursuit of happiness
Posts: 3,019
Default

Quote:
Originally Posted by jvd
Quote:
", we arrive at 920 Gflops
which is 880gflops less than they claim
That's where they counted the other parts: triangle setup, the whole Z subsystem, LOD calculation, interpolators, whatever.
Given the emphasis on HDR, they have probably doubled the capabilities of the TMUs handling FP textures, so sampling a FP16 texture is very likely single clock. And texturing is more than 40% of that NV40 figure.
Xmas is offline   Reply With Quote
Old 25-May-2005, 09:14   #73
Xmas
Off-season
 
Join Date: Feb 2002
Location: On the pursuit of happiness
Posts: 3,019
Default

Quote:
Originally Posted by nAo
Do you remember where you read those numbers? some official nvidia document?
BTW, you have a PM
http://developer.nvidia.com/object/x...entations.html
It's in the slides pdf.
But I'm not sure these numbers are entirely correct. The counting for texture and blend flops seems a bit off.
Xmas is offline   Reply With Quote
Old 26-May-2005, 17:32   #74
j^aws
Senior Member
 
Join Date: Jun 2004
Posts: 1,908
Default

Isolating XeCPU and CELL isn't strictly a total system, apples to apples comparison but I've noticed a few peak metrics missing alongside GFlops. Namely integer and scalar meterics. I haven't seen official numbers on these yet but here's some peak numbers from what we know so far (please feel free to correct me),

-XeCPU, integer, 32bit

1 core ~ 1VMX + 1 IU ~ 4 + 1 ~ 5 integer ops per cycle

3 cores ~ 3*5 ~ 15 integer ops per cycle
15*3.2 GHz ~ 48 Billion integer ops per second

-XeCPU, scalar

1 core ~ FPU + IU ~ 2 scalar ops per cycle

3 cores ~ 3*2 ~ 6 scalar ops per cycle
6*3.2GHz ~ 19.2 Billion scalar ops per second

-XeCPU, FP, 32 bit

115 GFlops


-CELL, integer, 32 bit

PPE ~ 1VMX + 1 IU ~ 4 + 1 ~ 5 integer ops per cycle

7 SPUs ~ 7*4 ~ 28 integer ops per cycle

CELL ~ 33 integer ops per cycle
33*3.2GHz ~ 105.6 Billion integer ops per second

-CELL, scalar

PPE ~ FPU + IU ~ 2 scalar ops per cycle

7 SPUs ~ 7*1 ~ 7 scalar ops per cycle

CELL ~ 9 scalar ops per cycle
9*3.2 GHz~ 28.8 billion scalar ops per second

-CELL, FP, 32 bit

218 GFlops


CELL vs XeCPU

CELL~ 105.6 Billion integer ops per second, 32bit
XeCPU~ 48 Billion integer ops per second, 32bit

CELL~ 28.8 Billion scalar ops per second, 32bit
XeCPU ~ 19.2 Billion scalar ops per second, 32bit

CELL~ 218 GFlops, 32bit
XeCPU~ 115 GFlops, 32bit

Off course these are peak numbers...
j^aws is offline   Reply With Quote
Old 27-May-2005, 07:25   #75
PC-Engine
Naughty Boy!
 
Join Date: Feb 2002
Posts: 6,802
Default

Anybody know where and when MS gave out the 115.2 GFLOPS number? It isn't in any of their official documents. :?
__________________
I've got a working quantum computer prototype in my backyard. The only problem is, it crashes at temperatures above absolute zero therefore is not very overclocker friendly.
PC-Engine is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Different filtering methods Zvekan 3D Architectures & Chips 41 30-Jul-2003 01:23
Kyoto FLAMEWAR! RussSchultz General Discussion 91 14-May-2003 00:57
My response to the latest HardOCP editorial on benchmarks... Joe DeFuria 3D Architectures & Chips 216 26-Feb-2003 11:34
GF4 has inflated 3dmarks scores so says the INQ..... jb 3D Architectures & Chips 126 19-Jun-2002 23:35
nVIDIA Cg Compiler & Language Embraced By Industry Dave Baumann Press Releases 0 14-Jun-2002 21:27


All times are GMT +1. The time now is 00:45.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.